Run simulations in parallel with MPI#

MPI parallelism distributes the mesh and linear system across multiple processes to accelerate large simulations. Every _Simu in the EasyFEA.Simulations namespace (except DIC) supports it — no changes to your script are required. It is built on mpi4py and petsc4py: when running with more than one rank, EasyFEA partitions the mesh at the element level with gmsh, assembles the distributed linear system, and solves it with PETSc.

Note

petsc4py is required for parallel execution. EasyFEA raises an assertion error at simulation construction time if it is not available.


Install petsc4py#

The simplest method uses conda-forge, which distributes pre-built binaries for Linux, macOS, and Windows:

conda install -c conda-forge petsc petsc4py mpi4py

This installs a complete, self-consistent stack — an MPI implementation (OpenMPI or MPICH depending on the platform), PETSc, and Python bindings for both — with no compiler or system library required.

Verify the installation:

python -c "from petsc4py import PETSc; print(PETSc.Sys.getVersion())"
python -c "from mpi4py import MPI; print(MPI.Get_library_version())"

For HPC clusters where MPI is provided by the system (e.g. via module load), PETSc and petsc4py must be compiled against that MPI to ensure binary compatibility. Refer to the PETSc installation guide and the petsc4py documentation for instructions.


Run a script in parallel#

Any EasyFEA script runs in parallel without modification. Use mpirun (or mpiexec) with the desired number of processes:

mpirun -n 4 python my_simulation.py

Tip

Start with a small number of ranks (2–4) to verify correctness before scaling up. Partitioning overhead dominates on coarse meshes.

Note

Each rank prints timing and progress output independently. To avoid cluttered terminal output, set verbosity=False at simulation construction time (the default) and use simu.Results_Set_Iteration_Summary(...) to control progress reporting.


How parallelism works in EasyFEA#

EasyFEA uses element-level domain decomposition: each rank owns a disjoint subset of elements, plus a layer of ghost elements at partition boundaries. Ghost elements are copies of elements owned by a neighbouring rank; they are needed so that each rank can assemble the full stiffness contribution of every shared node without inter-rank communication during assembly. The node coordinate array is not distributed — all ranks hold the full node array.

The parallel execution proceeds as follows for each solve:

  1. Each rank assembles only its owned (and ghost) elements into their global sparse matrices (K, C, M) and load vector (F).

  2. PETSc solves the distributed system \(\Arm \, \xrm = \brm\) using a Krylov method (default: CG with GAMG preconditioner).

  3. An Allreduce over the disjoint owned DOF sets of each rank reconstructs the full solution vector on every rank. Consequently, all ranks hold the same, complete DOF result after each solve.

The partition data for each element group (_GroupElem) can be inspected via _Get_partitioned_data().

Iteration saving#

Save_Iter() stores only the primary unknowns per iteration (e.g. displacement for elastic simulations, displacement and damage for phase-field). Derived quantities such as stress and strain are not stored — they are recomputed on demand by Result(). Because all ranks hold the same DOF vector after the solve, only rank 0 writes the result file.

Iteration results are kept in memory by default. To write them to disk and prevent memory issues, provide a folder at construction time or set it afterward:

# at construction time
simu = Simulations.Elastic(mesh, mat, folder="path/to/folder")

# or after construction
simu.folder = "path/to/folder"

Iteration results will then be saved in path/to/folder/Results as results*.pickle files.

Saving simulation#

Save() writes one pickle file per rank (path/to/folder/simulation_rank{N}.pickle) since each rank holds a different mesh partition. Load_Simu() reloads the appropriate file per rank transparently.

Phase-field convergence#

For phase-field simulations, energy-based convergence criteria (convOption=1 or convOption=2) reduce partial energy contributions across all ranks via Allreduce before evaluating the stopping criterion. The convergence flag is therefore consistent across ranks and produces results identical to a serial run.


Tune the linear solver#

The default solver (cg + gamg) is a good general-purpose choice for both serial and parallel execution. For large meshes or specific problem types, it can be tuned via the advanced method _Solver_Set_PETSc4Py_Options().

Note

_Solver_Set_PETSc4Py_Options() is an advanced API (single-underscore prefix). Use it when the default solver is too slow or fails to converge.


Post-process results#

Plotting and in-memory post-processing#

Each rank holds only its local mesh partition. To post-process or visualize results on the complete global mesh, call _Gather() after the solve loop:

simu._Gather()  # assembles the full mesh on rank 0; no-op on other ranks

if MPI_RANK == 0:
    Display.Plot_Result(simu, "uy")

Explicit calls to _Gather() are only necessary for in-memory post-processing within the script, and to avoid spawning multiple figures or deadlocking non-root ranks.

Export to ParaView#

Save_simu() supports fully parallel export. Each rank writes its own .vtu piece; rank 0 additionally writes the .pvtu parallel descriptor and the .pvd timeline. ParaView reads the .pvd and assembles all pieces automatically.

from EasyFEA import Paraview

# Call directly after the solve loop — do NOT call _Gather() first.
Paraview.Save_simu(simu, folder, N=200)

Warning

Do not call _Gather() before Save_simu() in MPI mode. After _Gather, rank 0 holds the full global mesh while other ranks still hold their partitions. Save_simu() would then write the full mesh once (rank 0) and each partition again (other ranks), producing duplicate elements and corrupted fields in ParaView. Save_simu() raises an exception if simu.isGathered is True to prevent this.


Known limitations#

  • Serial-only solvers. The scipy and pypardiso solvers are not available in MPI mode. petsc4py is the only supported solver when MPI_SIZE > 1.

  • Beam simulations. Beam simulations are not yet supported in MPI mode. The Isotropic material requires a 2D cross-section mesh to compute section properties (area, second moments of area \(I_y\), \(I_z\)) by integrating over all section elements. Partitioning the section across ranks would yield incorrect properties.

  • DIC analyses. DIC analyses are not yet supported in MPI mode.

  • Topology optimisation. The mesh-independence sensitivity filter (Sigmund, 1998) builds a full \(N_e \times N_e\) element-distance matrix from all element centroids at once. With domain decomposition, each rank holds only a subset of centroids, making the global filter construction impossible without an explicit gather step.

  • Interactive visualization. PyVista interactive windows and plt.show() calls could be guarded by if MPI_RANK == 0: after calling _Gather(), to avoid spawning multiple figures or deadlocking non-root ranks.