(howto-mpi)= # Run simulations in parallel with MPI **MPI parallelism** distributes the mesh and linear system across multiple processes to accelerate large simulations. Every {py:class}`~EasyFEA.Simulations._Simu` in the {py:mod}`EasyFEA.Simulations` namespace (except {py:class}`~EasyFEA.Simulations.DIC`) supports it — no changes to your script are required. It is built on [`mpi4py`](https://mpi4py.readthedocs.io) and [`petsc4py`](https://petsc4py.readthedocs.io): when running with more than one rank, EasyFEA partitions the mesh at the element level with [gmsh](https://gmsh.info/), assembles the distributed linear system, and solves it with [PETSc](https://petsc.org). ```{note} `petsc4py` is **required** for parallel execution. EasyFEA raises an assertion error at simulation construction time if it is not available. ``` --- ## Install petsc4py The simplest method uses [conda-forge](https://conda-forge.org), which distributes pre-built binaries for Linux, macOS, and Windows: ```bash conda install -c conda-forge petsc petsc4py mpi4py ``` This installs a complete, self-consistent stack — an MPI implementation (OpenMPI or MPICH depending on the platform), PETSc, and Python bindings for both — with no compiler or system library required. Verify the installation: ```bash python -c "from petsc4py import PETSc; print(PETSc.Sys.getVersion())" python -c "from mpi4py import MPI; print(MPI.Get_library_version())" ``` For HPC clusters where MPI is provided by the system (e.g. via `module load`), PETSc and `petsc4py` must be compiled against that MPI to ensure binary compatibility. Refer to the [PETSc installation guide](https://petsc.org/release/install/) and the [petsc4py documentation](https://petsc4py.readthedocs.io/en/stable/install.html) for instructions. --- ## Run a script in parallel Any EasyFEA script runs in parallel without modification. Use `mpirun` (or `mpiexec`) with the desired number of processes: ```bash mpirun -n 4 python my_simulation.py ``` ```{tip} Start with a small number of ranks (2–4) to verify correctness before scaling up. Partitioning overhead dominates on coarse meshes. ``` ```{note} Each rank prints timing and progress output independently. To avoid cluttered terminal output, set `verbosity=False` at simulation construction time (the default) and use `simu.Results_Set_Iteration_Summary(...)` to control progress reporting. ``` --- ## How parallelism works in EasyFEA EasyFEA uses **element-level domain decomposition**: each rank owns a disjoint subset of elements, plus a layer of **ghost elements** at partition boundaries. Ghost elements are copies of elements owned by a neighbouring rank; they are needed so that each rank can assemble the full stiffness contribution of every shared node without inter-rank communication during assembly. The node coordinate array is **not** distributed — all ranks hold the full node array. The parallel execution proceeds as follows for each solve: 1. Each rank assembles only its owned (and ghost) elements into their global sparse matrices (K, C, M) and load vector (F). 2. PETSc solves the distributed system $\Arm \, \xrm = \brm$ using a Krylov method (default: CG with GAMG preconditioner). 3. An `Allreduce` over the disjoint owned DOF sets of each rank reconstructs the full solution vector on **every** rank. Consequently, all ranks hold the same, complete DOF result after each solve. The partition data for each element group ({py:class}`~EasyFEA.FEM._GroupElem`) can be inspected via {py:meth}`~EasyFEA.FEM._GroupElem._Get_partitioned_data`. ### Iteration saving {py:meth}`~EasyFEA.Simulations._Simu.Save_Iter` stores only the primary unknowns per iteration (e.g. displacement for elastic simulations, displacement and damage for phase-field). Derived quantities such as stress and strain are not stored — they are recomputed on demand by `Result()`. Because all ranks hold the same DOF vector after the solve, only rank 0 writes the result file. Iteration results are kept in **memory** by default. To write them to **disk** and prevent memory issues, provide a folder at construction time or set it afterward: ```python # at construction time simu = Simulations.Elastic(mesh, mat, folder="path/to/folder") # or after construction simu.folder = "path/to/folder" ``` Iteration results will then be saved in `path/to/folder/Results` as `results*.pickle` files. ### Saving simulation {py:meth}`~EasyFEA.Simulations._Simu.Save` writes one pickle file per rank (`path/to/folder/simulation_rank{N}.pickle`) since each rank holds a different mesh partition. {py:func}`~EasyFEA.Simulations.Load_Simu` reloads the appropriate file per rank transparently. ### Phase-field convergence For phase-field simulations, energy-based convergence criteria (`convOption=1` or `convOption=2`) reduce partial energy contributions across all ranks via `Allreduce` before evaluating the stopping criterion. The convergence flag is therefore consistent across ranks and produces results identical to a serial run. --- ## Tune the linear solver The default solver (`cg` + `gamg`) is a good general-purpose choice for both serial and parallel execution. For large meshes or specific problem types, it can be tuned via the advanced method {py:meth}`~EasyFEA.Simulations._Simu._Solver_Set_PETSc4Py_Options`. ```{note} {py:meth}`~EasyFEA.Simulations._Simu._Solver_Set_PETSc4Py_Options` is an advanced API (single-underscore prefix). Use it when the default solver is too slow or fails to converge. ``` --- ## Post-process results ### Plotting and in-memory post-processing Each rank holds only its local mesh partition. To post-process or visualize results on the complete global mesh, call {py:meth}`~EasyFEA.Simulations._Simu._Gather` after the solve loop: ```python simu._Gather() # assembles the full mesh on rank 0; no-op on other ranks if MPI_RANK == 0: Display.Plot_Result(simu, "uy") ``` Explicit calls to {py:meth}`~EasyFEA.Simulations._Simu._Gather` are only necessary for in-memory post-processing within the script, and to avoid spawning multiple figures or deadlocking non-root ranks. ### Export to ParaView {py:func}`~EasyFEA.Utilities.Paraview.Save_simu` supports fully parallel export. Each rank writes its own `.vtu` piece; rank 0 additionally writes the `.pvtu` parallel descriptor and the `.pvd` timeline. ParaView reads the `.pvd` and assembles all pieces automatically. ```python from EasyFEA import Paraview # Call directly after the solve loop — do NOT call _Gather() first. Paraview.Save_simu(simu, folder, N=200) ``` ```{warning} Do **not** call {py:meth}`~EasyFEA.Simulations._Simu._Gather` before {py:func}`~EasyFEA.Utilities.Paraview.Save_simu` in MPI mode. After `_Gather`, rank 0 holds the full global mesh while other ranks still hold their partitions. {py:func}`~EasyFEA.Utilities.Paraview.Save_simu` would then write the full mesh once (rank 0) and each partition again (other ranks), producing duplicate elements and corrupted fields in ParaView. {py:func}`~EasyFEA.Utilities.Paraview.Save_simu` raises an exception if `simu.isGathered` is `True` to prevent this. ``` --- ## Known limitations - **Serial-only solvers.** The `scipy` and `pypardiso` solvers are not available in MPI mode. `petsc4py` is the only supported solver when `MPI_SIZE > 1`. - **Beam simulations.** {py:class}`~EasyFEA.Simulations.Beam` simulations are not yet supported in MPI mode. The {py:class}`~EasyFEA.Models.Beam.Isotropic` material requires a 2D cross-section mesh to compute section properties (area, second moments of area $I_y$, $I_z$) by integrating over all section elements. Partitioning the section across ranks would yield incorrect properties. - **DIC analyses.** {py:class}`~EasyFEA.Simulations.DIC` analyses are not yet supported in MPI mode. - **Topology optimisation.** The mesh-independence sensitivity filter (Sigmund, 1998) builds a full $N_e \times N_e$ element-distance matrix from all element centroids at once. With domain decomposition, each rank holds only a subset of centroids, making the global filter construction impossible without an explicit gather step. - **Interactive visualization.** PyVista interactive windows and `plt.show()` calls could be guarded by `if MPI_RANK == 0:` after calling {py:meth}`~EasyFEA.Simulations._Simu._Gather`, to avoid spawning multiple figures or deadlocking non-root ranks.