taps.apps.mapreduce¶
MapreduceApp
¶
MapreduceApp(
data_dir: Path,
map_tasks: int | None = None,
generate: bool = False,
generated_files: int = 10,
generated_words: int = 10000,
)
Mapreduce application.
Parameters:
-
data_dir
(Path
) –Text file directory. Either contains existing text files (including in subdirectories) or will be used to store the randomly generated files.
-
map_tasks
(int | None
, default:None
) –Number of map tasks. If
None
, one map task is generated per text file. Otherwise, files are evenly distributed across the map tasks. -
generate
(bool
, default:False
) –Generate random text files for the application.
-
generated_files
(int
, default:10
) –Number of text files to generate.
-
generated_words
(int
, default:10000
) –Number of words per text file to generate.
Source code in taps/apps/mapreduce.py
close
¶
run
¶
Run the application.
Parameters:
Source code in taps/apps/mapreduce.py
map_task
¶
Count words in files.
reduce_task
¶
generate_word
¶
generate_text
¶
Generate a paragraph with the specified number of words.
Source code in taps/apps/mapreduce.py
generate_files
¶
generate_files(
directory: Path,
file_count: int,
words_per_file: int,
*,
min_word_length: int = 2,
max_word_length: int = 10
) -> list[Path]
Generate text files with random text.
Parameters:
-
directory
(Path
) –Directory to write the files to.
-
file_count
(int
) –Number of files to generate.
-
words_per_file
(int
) –Number of words per file.
-
min_word_length
(int
, default:2
) –Minimum character length of randomly generated words.
-
max_word_length
(int
, default:10
) –Maximum character length of randomly generated words.
Returns:
Raises:
-
ValueError
–if
directory
is not empty.