(howto-mpi)=
# Run simulations in parallel with MPI

**MPI parallelism** distributes the mesh and linear system across multiple processes to accelerate large simulations.
Every {py:class}`~EasyFEA.Simulations._Simu` in the {py:mod}`EasyFEA.Simulations`
namespace (except {py:class}`~EasyFEA.Simulations.DIC`) supports it — no changes to your script are required.
It is built on [`mpi4py`](https://mpi4py.readthedocs.io) and [`petsc4py`](https://petsc4py.readthedocs.io): when running with more than one rank, EasyFEA partitions the mesh at the element level with [gmsh](https://gmsh.info/), assembles the distributed linear system, and solves it with [PETSc](https://petsc.org).

```{note}
`petsc4py` is **required** for parallel execution. EasyFEA raises an assertion
error at simulation construction time if it is not available.
```

---

## Install petsc4py

The simplest method uses [conda-forge](https://conda-forge.org), which
distributes pre-built binaries for Linux, macOS, and Windows:

```bash
conda install -c conda-forge petsc petsc4py mpi4py
```

This installs a complete, self-consistent stack — an MPI implementation
(OpenMPI or MPICH depending on the platform), PETSc, and Python bindings for
both — with no compiler or system library required.

Verify the installation:

```bash
python -c "from petsc4py import PETSc; print(PETSc.Sys.getVersion())"
python -c "from mpi4py import MPI; print(MPI.Get_library_version())"
```

For HPC clusters where MPI is provided by the system (e.g. via `module load`),
PETSc and `petsc4py` must be compiled against that MPI to ensure binary
compatibility. Refer to the
[PETSc installation guide](https://petsc.org/release/install/) and the
[petsc4py documentation](https://petsc4py.readthedocs.io/en/stable/install.html)
for instructions.

---

## Run a script in parallel

Any EasyFEA script runs in parallel without modification. Use `mpirun` (or
`mpiexec`) with the desired number of processes:

```bash
mpirun -n 4 python my_simulation.py
```

```{tip}
Start with a small number of ranks (2–4) to verify correctness before scaling
up. Partitioning overhead dominates on coarse meshes.
```

```{note}
Each rank prints timing and progress output independently. To avoid cluttered
terminal output, set `verbosity=False` at simulation construction time (the
default) and use `simu.Results_Set_Iteration_Summary(...)` to control progress
reporting.
```

---

## How parallelism works in EasyFEA

EasyFEA uses **element-level domain decomposition**: each rank owns a disjoint
subset of elements, plus a layer of **ghost elements** at partition boundaries.
Ghost elements are copies of elements owned by a neighbouring rank; they are
needed so that each rank can assemble the full stiffness contribution of every
shared node without inter-rank communication during assembly. The node
coordinate array is **not** distributed — all ranks hold the full node array.

The parallel execution proceeds as follows for each solve:

1. Each rank assembles only its owned (and ghost) elements into their global sparse
   matrices (K, C, M) and load vector (F).
2. PETSc solves the distributed system $\Arm \, \xrm = \brm$ using a Krylov
   method (default: CG with GAMG preconditioner).
3. An `Allreduce` over the disjoint owned DOF sets of each rank reconstructs
   the full solution vector on **every** rank. Consequently, all ranks hold the
   same, complete DOF result after each solve.

The partition data for each element group ({py:class}`~EasyFEA.FEM._GroupElem`) can be inspected via
{py:meth}`~EasyFEA.FEM._GroupElem._Get_partitioned_data`.

### Iteration saving

{py:meth}`~EasyFEA.Simulations._Simu.Save_Iter` stores only the primary
unknowns per iteration (e.g. displacement for elastic simulations, displacement
and damage for phase-field). Derived quantities such as stress and strain are
not stored — they are recomputed on demand by `Result()`. Because all ranks
hold the same DOF vector after the solve, only rank 0 writes the result file.

Iteration results are kept in **memory** by default. To write them to **disk** and prevent memory issues, provide a folder at construction time or set it afterward:

```python
# at construction time
simu = Simulations.Elastic(mesh, mat, folder="path/to/folder")

# or after construction
simu.folder = "path/to/folder"
```

Iteration results will then be saved in `path/to/folder/Results` as `results*.pickle` files.

### Saving simulation

{py:meth}`~EasyFEA.Simulations._Simu.Save` writes one pickle file per rank (`path/to/folder/simulation_rank{N}.pickle`) since each rank holds a different mesh partition.
{py:func}`~EasyFEA.Simulations.Load_Simu` reloads the appropriate file per rank transparently.

### Phase-field convergence

For phase-field simulations, energy-based convergence criteria
(`convOption=1` or `convOption=2`) reduce partial energy contributions across
all ranks via `Allreduce` before evaluating the stopping criterion. The
convergence flag is therefore consistent across ranks and produces results
identical to a serial run.

---

## Tune the linear solver

The default solver (`cg` + `gamg`) is a good general-purpose choice for both
serial and parallel execution. For large meshes or specific problem types, it
can be tuned via the advanced method
{py:meth}`~EasyFEA.Simulations._Simu._Solver_Set_PETSc4Py_Options`.

```{note}
{py:meth}`~EasyFEA.Simulations._Simu._Solver_Set_PETSc4Py_Options` is an
advanced API (single-underscore prefix). Use it when the default solver is too
slow or fails to converge.
```

---

## Post-process results

### Plotting and in-memory post-processing

Each rank holds only its local mesh partition.
To post-process or visualize results on the complete global mesh, call {py:meth}`~EasyFEA.Simulations._Simu._Gather` after the solve loop:

```python
simu._Gather()  # assembles the full mesh on rank 0; no-op on other ranks

if MPI_RANK == 0:
    Display.Plot_Result(simu, "uy")
```

Explicit calls to {py:meth}`~EasyFEA.Simulations._Simu._Gather` are only necessary for in-memory post-processing within the script, and to avoid spawning multiple figures or deadlocking non-root ranks.

### Export to ParaView

{py:func}`~EasyFEA.Utilities.Paraview.Save_simu` supports fully parallel
export. Each rank writes its own `.vtu` piece; rank 0 additionally writes the
`.pvtu` parallel descriptor and the `.pvd` timeline. ParaView reads the `.pvd`
and assembles all pieces automatically.

```python
from EasyFEA import Paraview

# Call directly after the solve loop — do NOT call _Gather() first.
Paraview.Save_simu(simu, folder, N=200)
```

```{warning}
Do **not** call {py:meth}`~EasyFEA.Simulations._Simu._Gather` before {py:func}`~EasyFEA.Utilities.Paraview.Save_simu` in MPI mode.
After `_Gather`, rank 0 holds the full global mesh while other ranks still hold their partitions.
{py:func}`~EasyFEA.Utilities.Paraview.Save_simu` would then write the full mesh once (rank 0) and each partition again (other ranks), producing duplicate elements and corrupted fields in ParaView.
{py:func}`~EasyFEA.Utilities.Paraview.Save_simu` raises an exception if `simu.isGathered` is `True` to prevent this.
```

---

## Known limitations

- **Serial-only solvers.** The `scipy` and `pypardiso` solvers are not
  available in MPI mode. `petsc4py` is the only supported solver when
  `MPI_SIZE > 1`.

- **Beam simulations.** {py:class}`~EasyFEA.Simulations.Beam` simulations are
  not yet supported in MPI mode. The {py:class}`~EasyFEA.Models.Beam.Isotropic`
  material requires a 2D cross-section mesh to compute section properties
  (area, second moments of area $I_y$, $I_z$) by integrating over all section
  elements. Partitioning the section across ranks would yield incorrect
  properties.

- **DIC analyses.** {py:class}`~EasyFEA.Simulations.DIC` analyses are not yet
  supported in MPI mode.

- **Topology optimisation.** The mesh-independence sensitivity filter
  (Sigmund, 1998) builds a full $N_e \times N_e$ element-distance matrix from
  all element centroids at once. With domain decomposition, each rank holds
  only a subset of centroids, making the global filter construction impossible
  without an explicit gather step.

- **Interactive visualization.** PyVista interactive windows and `plt.show()`
  calls could be guarded by `if MPI_RANK == 0:` after calling {py:meth}`~EasyFEA.Simulations._Simu._Gather`, to avoid spawning multiple figures or deadlocking non-root ranks.