Parallel Classification Module

parallel_classification.process_csv_in_chunks(csv_file_path, chunk_size=10000, output_file_path='classified_periods.csv')[source]

Processes a CSV file in chunks, classifying periods for each chunk.

Reads the specified CSV file in chunks of a specified size, applies classification to each chunk using the classify_periods function, further classifies each period within the chunk using classify_period, and compiles the results into a single DataFrame. The final DataFrame is then saved to a new CSV file.

Parameters:

csv_file_path (str) – The path to the CSV file to be processed.
chunk_size (int, optional) – The number of rows per chunk to read from the CSV. Defaults to 10000.
output_file_path (str, optional) – Path where the fully processed and classified CSV file will be saved. Defaults to ‘classified_periods.csv’.

Returns:

This function does not return a value. It saves the processed and classified data directly to a CSV file specified by output_file_path.

Return type:

None

Example

Below is an example of how to use the process_csv_in_chunks function:

>>> csv_file_path = 'path/to/your/large_csv_file.csv'
>>> output_file_path = 'path/to/save/classified_periods.csv'
>>> process_csv_in_chunks(csv_file_path, chunk_size=10000, output_file_path=output_file_path)
Processed and classified data saved to path/to/save/classified_periods.csv