Running Experiments parallel with MPI#
Parallelizing experiments can significantly reduce the execution time, especially for computationally expensive functions. With sweepexp, you can easily parallelize your experiments using MPI (Message Passing Interface). To achieve this, you’ll need the mpi4py module and an MPI implementation such as OpenMPI or MPICH.
Using MPI#
In the mpi mode, experiments are distributed across multiple processes. The main process (rank 0) is responsible for distributing tasks and collecting results, while worker processes (other ranks) execute the experiments. The following example shows how an experiment can be executed in parallel with MPI:
import time
from sweepexp import sweepexp
def my_slow_function(param: float) -> dict:
time.sleep(2)
return {"result": param ** 2}
sweep = sweepexp(
func = my_slow_function,
parameters = { "param": [1, 2, 3] },
mode = "mpi",
)
# We want to measure the total duration of the experiments
start_time = time.time()
# Run the experiments in parallel with MPI
sweep.run()
# Calculate the total duration
total_duration = time.time() - start_time
print(f"Total duration: {total_duration:.2f} seconds")
# Print the results
has_results = "result" in sweep.data
if has_results:
print(sweep.data.result.values)
else:
print("No results found.")
To execute the script in parallel, use the mpiexec (or mpirun, srun, etc.) command followed by the number of processes and the script name. For example, to use 4 processes:
mpiexec -l -n 4 python mpi_example.py
When you run the script, you’ll see output similar to this:
[0] Total duration: 2.11 seconds
[0] [1. 4. 9.]
[2] Total duration: 2.18 seconds
[2] No results found.
[3] Total duration: 2.21 seconds
[3] No results found.
[1] Total duration: 2.21 seconds
[1] No results found.
Explanation of the Output#
Main Process Results: The main process ([0]) displays the total execution time and the results of all experiments. Only the main process has access to the aggregated results.
Worker Processes: Worker processes ([1], [2], [3]) handle the execution of experiments and return results to the main process. Since the results are only collected and stored by the main process, the worker processes do not have access to the results and print “No results found.”
Execution Time: The total duration (approximately 2 seconds) corresponds to the time required to execute a single instance of the slow function, as all three experiments are executed in parallel.
Dynamic Task Assignment#
If there are more experiments than available worker processes, tasks are dynamically assigned. Once a worker process finishes a task, it is assigned the next available experiment. Hence, the number of processes does not need to match to the number of experiments.