Run simulations in parallel with MPI#
MPI parallelism distributes the mesh and linear system across multiple
processes to accelerate large simulations. Every
_Simu in the EasyFEA.Simulations
namespace supports it — no changes to your script are required. It is built on
mpi4py and
petsc4py: when running with more than one
rank, EasyFEA partitions the mesh at the element level, assembles the
distributed linear system, and solves it with PETSc.
Note
petsc4py is required for parallel execution. EasyFEA raises an assertion
error at simulation construction time if it is not available.
Install petsc4py#
The simplest method uses conda-forge, which distributes pre-built binaries for Linux, macOS, and Windows:
conda install -c conda-forge petsc petsc4py mpi4py
This installs a complete, self-consistent stack — an MPI implementation (OpenMPI or MPICH depending on the platform), PETSc, and Python bindings for both — with no compiler or system library required.
Verify the installation:
python -c "from petsc4py import PETSc; print(PETSc.Sys.getVersion())"
python -c "from mpi4py import MPI; print(MPI.Get_library_version())"
For HPC clusters where MPI is provided by the system (e.g. via module load),
PETSc and petsc4py must be compiled against that MPI to ensure binary
compatibility. Refer to the
PETSc installation guide and the
petsc4py documentation
for instructions.
Run a script in parallel#
Any EasyFEA script runs in parallel without modification. Use mpirun (or
mpiexec) with the desired number of processes:
mpirun -n 4 python my_simulation.py
Tip
Start with a small number of ranks (2–4) to verify correctness before scaling up. Partitioning overhead dominates on coarse meshes.
Note
Each rank prints timing and progress output independently. To avoid cluttered
terminal output, set verbosity=False at simulation construction time (the
default) and use simu.Results_Set_Iteration_Summary(...) to control progress
reporting.
How parallelism works in EasyFEA#
EasyFEA uses element-level domain decomposition: each rank owns a disjoint subset of elements, plus a layer of ghost elements at partition boundaries. Ghost elements are copies of elements owned by a neighbouring rank; they are needed so that each rank can assemble the full stiffness contribution of every shared node without inter-rank communication during assembly. The node coordinate array is not distributed — all ranks hold the full node array.
The parallel execution proceeds as follows for each solve:
Each rank assembles only its owned (and ghost) elements into local matrices (K, C, M) and load vector (F).
PETSc solves the distributed system \(\Arm \, \xrm = \brm\) using a Krylov method (default: CG with GAMG preconditioner).
An
Allreduceover the disjoint owned DOF sets of each rank reconstructs the full solution vector on every rank. Consequently, all ranks hold the same, complete DOF result after each solve.
The partition data for each element group can be inspected via
_Get_partitioned_data().
Iteration saving#
Save_Iter() stores only the primary
unknowns per iteration (e.g. displacement for elastic simulations, displacement
and damage for phase-field). Derived quantities such as stress and strain are
not stored — they are recomputed on demand by Result(). Because all ranks
hold the same DOF vector after the solve, only rank 0 writes the result file.
Iteration results are kept in memory by default. To write them to disk, provide a folder at construction time or set it afterwards:
# at construction time
simu = Simulations.Elastic(mesh, mat, folder="path/to/results")
# or after construction
simu.folder = "path/to/results"
Phase-field convergence#
For phase-field simulations, energy-based convergence criteria
(convOption=1 or convOption=2) reduce partial energy contributions across
all ranks via Allreduce before evaluating the stopping criterion. The
convergence flag is therefore consistent across ranks and produces results
identical to a serial run.
Tune the linear solver#
The default solver (cg + gamg) is a good general-purpose choice for both
serial and parallel execution. For large meshes or specific problem types, it
can be tuned via the advanced method
_Solver_Set_PETSc4Py_Options().
Note
_Solver_Set_PETSc4Py_Options() is an
advanced API (single-underscore prefix). Use it when the default solver is too
slow or fails to converge.
Symmetric positive-definite systems (most FEM problems):
# Direct solve — best for small to medium meshes; MPI-compatible via MUMPS
simu._Solver_Set_PETSc4Py_Options("preonly", "lu", "mumps")
# Iterative — scales to large meshes
simu._Solver_Set_PETSc4Py_Options("cg", "hypre") # BoomerAMG, best scalability
simu._Solver_Set_PETSc4Py_Options("cg", "gamg") # PETSc built-in AMG (default)
Non-symmetric systems:
simu._Solver_Set_PETSc4Py_Options("gmres", "bjacobi") # block Jacobi, MPI-compatible
simu._Solver_Set_PETSc4Py_Options("bcgs", "bjacobi") # BiCGSTAB, cheaper per iteration
The optional problemType argument restricts the configuration to a specific
sub-problem (e.g. elastic or damage in a phase-field simulation):
from EasyFEA.Models import ModelType
simu._Solver_Set_PETSc4Py_Options("preonly", "lu", "mumps", problemType=ModelType.elastic)
simu._Solver_Set_PETSc4Py_Options("gmres", "bjacobi", problemType=ModelType.damage)
Post-process results#
Plotting and in-memory post-processing#
Each rank holds only its local mesh partition. To post-process or visualize
results on the complete global mesh, call
_Gather() after the solve loop:
simu._Gather() # assembles the full mesh on rank 0; no-op on other ranks
if MPI_RANK == 0:
Display.Plot_Result(simu, "uy")
_Gather() is called automatically by
Save(), which writes one pickle file per
rank (simulation_rank{N}.pickle) since each rank holds a different mesh
partition. Load_Simu reloads the appropriate file per rank transparently.
Explicit calls to _Gather are only necessary for in-memory post-processing
within the script.
Export to ParaView#
Save_simu() supports fully parallel
export. Each rank writes its own .vtu piece; rank 0 additionally writes the
.pvtu parallel descriptor and the .pvd timeline. ParaView reads the .pvd
and assembles all pieces automatically.
from EasyFEA import Paraview
# Call directly after the solve loop — do NOT call _Gather() first.
Paraview.Save_simu(simu, folder, N=200)
Warning
Do not call _Gather() before
Paraview.Save_simu in MPI mode. After _Gather, rank 0 holds the full
global mesh while other ranks still hold their partitions. Save_simu would
then write the full mesh once (rank 0) and each partition again (other ranks),
producing duplicate elements and corrupted fields in ParaView. Save_simu
raises an exception if simu.isGathered is True to prevent this.
Known limitations#
Serial-only solvers. The
scipyandpypardisosolvers are not available in MPI mode.petsc4pyis the only supported solver whenMPI_SIZE > 1.Beam simulations.
Beamsimulations are not yet supported in MPI mode. TheIsotropicmaterial requires a 2D cross-section mesh to compute section properties (area, second moments of area \(I_y\), \(I_z\)) by integrating over all section elements. Partitioning the section across ranks would yield incorrect properties.DIC analyses.
DICanalyses are not yet supported in MPI mode.Topology optimisation. The mesh-independence sensitivity filter (Sigmund, 1998) builds a full \(N_e \times N_e\) element-distance matrix from all element centroids at once. With domain decomposition, each rank holds only a subset of centroids, making the global filter construction impossible without an explicit gather step.
Interactive visualization. PyVista interactive windows and
plt.show()calls must be guarded byif MPI_RANK == 0:after calling_Gather(), to avoid spawning multiple figures or deadlocking non-root ranks.