taps.apps.mapreduce¶
MapreduceApp ¶
MapreduceApp(
    data_dir: Path,
    map_tasks: int | None = None,
    generate: bool = False,
    generated_files: int = 10,
    generated_words: int = 10000,
)
Mapreduce application.
Parameters:
- 
        
data_dir(Path) –Text file directory. Either contains existing text files (including in subdirectories) or will be used to store the randomly generated files.
 - 
        
map_tasks(int | None, default:None) –Number of map tasks. If
None, one map task is generated per text file. Otherwise, files are evenly distributed across the map tasks. - 
        
generate(bool, default:False) –Generate random text files for the application.
 - 
        
generated_files(int, default:10) –Number of text files to generate.
 - 
        
generated_words(int, default:10000) –Number of words per text file to generate.
 
Source code in taps/apps/mapreduce.py
                  close() ¶
run() ¶
Run the application.
Parameters:
Source code in taps/apps/mapreduce.py
        map_task() ¶
Count words in files.
reduce_task() ¶
generate_word() ¶
generate_text() ¶
Generate a paragraph with the specified number of words.
Source code in taps/apps/mapreduce.py
        
      generate_files() ¶
generate_files(
    directory: Path,
    file_count: int,
    words_per_file: int,
    *,
    min_word_length: int = 2,
    max_word_length: int = 10
) -> list[Path]
Generate text files with random text.
Parameters:
- 
        
directory(Path) –Directory to write the files to.
 - 
        
file_count(int) –Number of files to generate.
 - 
        
words_per_file(int) –Number of words per file.
 - 
        
min_word_length(int, default:2) –Minimum character length of randomly generated words.
 - 
        
max_word_length(int, default:10) –Maximum character length of randomly generated words.
 
Returns:
Raises:
- 
            
ValueError–if
directoryis not empty.