taps.apps.mapreduce¶
MapreduceApp
¶
MapreduceApp(
data_dir: Path,
map_tasks: int | None = None,
generate: bool = False,
generated_files: int = 10,
generated_words: int = 10000,
)
Mapreduce application.
Parameters:
-
data_dir(Path) –Text file directory. Either contains existing text files (including in subdirectories) or will be used to store the randomly generated files.
-
map_tasks(int | None, default:None) –Number of map tasks. If
None, one map task is generated per text file. Otherwise, files are evenly distributed across the map tasks. -
generate(bool, default:False) –Generate random text files for the application.
-
generated_files(int, default:10) –Number of text files to generate.
-
generated_words(int, default:10000) –Number of words per text file to generate.
Source code in taps/apps/mapreduce.py
close
¶
run
¶
Run the application.
Parameters:
Source code in taps/apps/mapreduce.py
map_task
¶
Count words in files.
reduce_task
¶
generate_word
¶
generate_text
¶
Generate a paragraph with the specified number of words.
Source code in taps/apps/mapreduce.py
generate_files
¶
generate_files(
directory: Path,
file_count: int,
words_per_file: int,
*,
min_word_length: int = 2,
max_word_length: int = 10
) -> list[Path]
Generate text files with random text.
Parameters:
-
directory(Path) –Directory to write the files to.
-
file_count(int) –Number of files to generate.
-
words_per_file(int) –Number of words per file.
-
min_word_length(int, default:2) –Minimum character length of randomly generated words.
-
max_word_length(int, default:10) –Maximum character length of randomly generated words.
Returns:
Raises:
-
ValueError–if
directoryis not empty.