Run simulations in parallel with MPI#
MPI parallelism distributes the mesh and linear system across multiple processes to accelerate large simulations.
Every _Simu in the EasyFEA.Simulations
namespace (except DIC) supports it — no changes to your script are required.
It is built on mpi4py and petsc4py: when running with more than one rank, EasyFEA partitions the mesh at the element level with gmsh, assembles the distributed linear system, and solves it with PETSc.
Note
petsc4py is required for parallel execution. EasyFEA raises an assertion
error at simulation construction time if it is not available.
Install petsc4py#
The simplest method uses conda-forge, which distributes pre-built binaries for Linux, macOS, and Windows:
conda install -c conda-forge petsc petsc4py mpi4py
This installs a complete, self-consistent stack — an MPI implementation (OpenMPI or MPICH depending on the platform), PETSc, and Python bindings for both — with no compiler or system library required.
Verify the installation:
python -c "from petsc4py import PETSc; print(PETSc.Sys.getVersion())"
python -c "from mpi4py import MPI; print(MPI.Get_library_version())"
For HPC clusters where MPI is provided by the system (e.g. via module load),
PETSc and petsc4py must be compiled against that MPI to ensure binary
compatibility. Refer to the
PETSc installation guide and the
petsc4py documentation
for instructions.
Run a script in parallel#
Any EasyFEA script runs in parallel without modification. Use mpirun (or
mpiexec) with the desired number of processes:
mpirun -n 4 python my_simulation.py
Tip
Start with a small number of ranks (2–4) to verify correctness before scaling up. Partitioning overhead dominates on coarse meshes.
Note
Each rank prints timing and progress output independently. To avoid cluttered
terminal output, set verbosity=False at simulation construction time (the
default) and use simu.Results_Set_Iteration_Summary(...) to control progress
reporting.
How parallelism works in EasyFEA#
EasyFEA uses element-level domain decomposition: each rank owns a disjoint subset of elements, plus a layer of ghost elements at partition boundaries. Ghost elements are copies of elements owned by a neighbouring rank; they are needed so that each rank can assemble the full stiffness contribution of every shared node without inter-rank communication during assembly. The node coordinate array is not distributed — all ranks hold the full node array.
The parallel execution proceeds as follows for each solve:
Each rank assembles only its owned (and ghost) elements into their global sparse matrices (K, C, M) and load vector (F).
PETSc solves the distributed system \(\Arm \, \xrm = \brm\) using a Krylov method (default: CG with GAMG preconditioner).
An
Allreduceover the disjoint owned DOF sets of each rank reconstructs the full solution vector on every rank. Consequently, all ranks hold the same, complete DOF result after each solve.
The partition data for each element group (_GroupElem) can be inspected via
_Get_partitioned_data().
Iteration saving#
Save_Iter() stores only the primary
unknowns per iteration (e.g. displacement for elastic simulations, displacement
and damage for phase-field). Derived quantities such as stress and strain are
not stored — they are recomputed on demand by Result(). Because all ranks
hold the same DOF vector after the solve, only rank 0 writes the result file.
Iteration results are kept in memory by default. To write them to disk and prevent memory issues, provide a folder at construction time or set it afterward:
# at construction time
simu = Simulations.Elastic(mesh, mat, folder="path/to/folder")
# or after construction
simu.folder = "path/to/folder"
Iteration results will then be saved in path/to/folder/Results as results*.pickle files.
Saving simulation#
Save() writes one pickle file per rank (path/to/folder/simulation_rank{N}.pickle) since each rank holds a different mesh partition.
Load_Simu() reloads the appropriate file per rank transparently.
Phase-field convergence#
For phase-field simulations, energy-based convergence criteria
(convOption=1 or convOption=2) reduce partial energy contributions across
all ranks via Allreduce before evaluating the stopping criterion. The
convergence flag is therefore consistent across ranks and produces results
identical to a serial run.
Tune the linear solver#
The default solver (cg + gamg) is a good general-purpose choice for both
serial and parallel execution. For large meshes or specific problem types, it
can be tuned via the advanced method
_Solver_Set_PETSc4Py_Options().
Note
_Solver_Set_PETSc4Py_Options() is an
advanced API (single-underscore prefix). Use it when the default solver is too
slow or fails to converge.
Post-process results#
Plotting and in-memory post-processing#
Each rank holds only its local mesh partition.
To post-process or visualize results on the complete global mesh, call _Gather() after the solve loop:
simu._Gather() # assembles the full mesh on rank 0; no-op on other ranks
if MPI_RANK == 0:
Display.Plot_Result(simu, "uy")
Explicit calls to _Gather() are only necessary for in-memory post-processing within the script, and to avoid spawning multiple figures or deadlocking non-root ranks.
Export to ParaView#
Save_simu() supports fully parallel
export. Each rank writes its own .vtu piece; rank 0 additionally writes the
.pvtu parallel descriptor and the .pvd timeline. ParaView reads the .pvd
and assembles all pieces automatically.
from EasyFEA import Paraview
# Call directly after the solve loop — do NOT call _Gather() first.
Paraview.Save_simu(simu, folder, N=200)
Warning
Do not call _Gather() before Save_simu() in MPI mode.
After _Gather, rank 0 holds the full global mesh while other ranks still hold their partitions.
Save_simu() would then write the full mesh once (rank 0) and each partition again (other ranks), producing duplicate elements and corrupted fields in ParaView.
Save_simu() raises an exception if simu.isGathered is True to prevent this.
Known limitations#
Serial-only solvers. The
scipyandpypardisosolvers are not available in MPI mode.petsc4pyis the only supported solver whenMPI_SIZE > 1.Beam simulations.
Beamsimulations are not yet supported in MPI mode. TheIsotropicmaterial requires a 2D cross-section mesh to compute section properties (area, second moments of area \(I_y\), \(I_z\)) by integrating over all section elements. Partitioning the section across ranks would yield incorrect properties.DIC analyses.
DICanalyses are not yet supported in MPI mode.Topology optimisation. The mesh-independence sensitivity filter (Sigmund, 1998) builds a full \(N_e \times N_e\) element-distance matrix from all element centroids at once. With domain decomposition, each rank holds only a subset of centroids, making the global filter construction impossible without an explicit gather step.
Interactive visualization. PyVista interactive windows and
plt.show()calls could be guarded byif MPI_RANK == 0:after calling_Gather(), to avoid spawning multiple figures or deadlocking non-root ranks.