Batch Processor Module ====================== .. module:: QhX.batch_processor The ``batch_processor`` module is part of the QhX package, designed for processing large datasets in parallel batches. It uses ``DataManager`` for data loading and preprocessing, and the ``ParallelSolver`` for executing data processing tasks in parallel. Overview -------- The module allows for the processing of data in specified batch sizes using a predetermined number of parallel workers. It aims to enhance processing efficiency when dealing with large datasets. Functions --------- .. autofunction:: QhX.batch_processor.process_batches This function is responsible for orchestrating the batch processing workflow. It involves loading and optionally grouping the dataset, splitting it into batches, and processing each batch in parallel. Usage ----- The module can be used as a standalone script or imported into other scripts or modules within the QhX package. When executed as a script, it requires the batch size as a mandatory argument, with optional arguments for the number of workers and the starting index for processing. .. code-block:: bash python -m QhX.batch_processor 100 25 0 This command processes the dataset in batches of 100 using 25 parallel workers, starting from the first record. Installation and Requirements ----------------------------- Ensure that the QhX package is properly installed and configured in your environment. The ``batch_processor`` module depends on other components of the QhX package, such as ``DataManager`` and ``ParallelSolver``. See Also -------- - :doc:`data_manager` - :doc:`parallelization_solver`