notations

1. Definitions

1.1. Matrices

Matrix Definition A matrix is a linear transformation between finite dimensional vector spaces.

Assembling a matrix Assembling a matrix means defining its action as entries stored in a sparse or dense format. For example, in the finite element context, the storage format is sparse to take advantage of the many zero entries.

Symmetric matrix

$A = A^T$

Definite (resp. semi-definite) positive matrix

All eigenvalue are

$>0$ (resp $\geq 0$) or
$x^T A x > 0, \forall\ x$ (resp. $x^T\ A\ x\geq 0\, \forall\ x$)

Definite (resp. semi-negative) matrix

All eigenvalue are

$<0$ (resp. $\leq 0$) or
$x^T\ A\ x < 0\ \forall\ x$ (resp. $x^T\ A\ x \leq 0\, \forall\ x$)

Indefinite matrix

There exists

positive and negative eigenvalue (Stokes, Helmholtz) or
there exists $x,y$ such that $x^TAx > 0 > y^T A y$

1.2. Preconditioners

1.2.1. Definition

Let $A$ be a $\mathbb{R}^{n\times n}$ matrix, $x$ and $b$ be $\mathbb{R}^n$ vectors, we wish to solve $A x = b$.

Definition: A preconditioner $\mathcal{P}$ is a method for constructing a matrix (just a linear function, not assembled!) $P^{-1} = \mathcal{P}(A,A_p)$ using a matrix $A$ and extra information $A_p$, such that the spectrum of $P^{-1}A$ (left preconditioning) or $A P^{-1}$ (right preconditioning) is well-behaved. The action of preconditioning improves the conditioning of the previous linear system.

Left preconditioning: We solve for $ (P^{-1} A) x = P^{-1} b $ and we build the Krylov space $\{ P^{-1} b, (P^{-1}A) P^{-1} b, (P^{-1}A)^2 P^{-1} b, \dots\}$

Right preconditioning: We solve for $ (A P^{-1}) P x = b $ and we build the Krylov space $\{ b, (P^{-1}A)b, (P^{-1}A)^2b, \dotsc \}$

Note that the product $P^{-1}A$ or $A P^{-1}$ is never assembled.

1.2.2. Properties

Let us now describe some properties of preconditioners

$P^{-1}$ is dense, $P$ is often not available and is not needed
$A$ is rarely used by $\mathcal{P}$, but $A_p = A$ is common
$A_p$ is often a sparse matrix, the \e preconditioning \e matrix

Here are some numerical methods to solve the system $A x = b$

Matrix-based: Jacobi, Gauss-Seidel, SOR, ILU(k), LU
Parallel: Block-Jacobi, Schwarz, Multigrid, FETI-DP, BDDC
Indefinite: Schur-complement, Domain Decomposition, Multigrid

1.3. Preconditioner strategies

Relaxation: Split into lower, diagonal, upper parts: $A = L + D + U$.
Jacobi: Cheapest preconditioner: $P^{-1}=D^{-1}$.

# sequential
pc-type=jacobi
# parallel
pc-type=block_jacobi

Successive over-relaxation (SOR)

\[\left(L + \frac 1 \omega D\right) x_{n+1} = \left[\left(\frac 1\omega-1\right)D - U\right] x_n + \omega b \\ P^{-1} = \text{$k$ iterations starting with $x_0=0$}\\\]

Implemented as a sweep.
$\omega = 1$ corresponds to Gauss-Seidel.
Very effective at removing high-frequency components of residual.

# sequential
pc-type=sor

1.3.1. Factorization

Two phases

symbolic factorization: find where fill occurs, only uses sparsity pattern.
numeric factorization: compute factors.
LU decomposition
preconditioner.

Expensive, for $m\times m$ sparse matrix with bandwidth $b$, traditionally requires $\mathcal{O}(mb^2)$ time and $\mathcal{O}(mb)$ space.

Bandwidth scales as $m^{\frac{d-1}{d}}$ in $d$-dimensions.

Optimal in 2D: $\mathcal{O}(m \cdot \log m)$ space, $\mathcal{O}(m^{3/2})$ time.

Optimal in 3D: $\mathcal{O}(m^{4/3})$ space, $\mathcal{O}(m^2)$ time.

Symbolic factorization is problematic in parallel.

Incomplete LU

Allow a limited number of levels of fill: ILU($k$).
Only allow fill for entries that exceed threshold: ILUT.
Usually poor scaling in parallel.
No guarantees.

1.3.2. 1-level Domain decomposition

Domain size $L$, subdomain size $H$, element size $h$

Overlapping/Schwarz
- Solve Dirichlet problems on overlapping subdomains.
- No overlap: $\textit{its} \in \mathcal{O}\big( \frac{L}{\sqrt{Hh}} \big)$.
- Overlap $\delta$: $\textit{its} \in \big( \frac L {\sqrt{H\delta}} \big)$.

pc-type=gasm # has a coarse grid preconditioner
pc-type=asm

Neumann-Neumann
- Solve Neumann problems on non-overlapping subdomains.
- $\textit{its} \in \mathcal{O}\big( \frac{L}{H}(1+\log\frac H h) \big)$.
- Tricky null space issues (floating subdomains).
- Need subdomain matrices, not globally assembled matrix.

Notes: Multilevel variants knock off the leading $\frac L H$.
Both overlapping and nonoverlapping with this bound.

BDDC and FETI-DP
- Neumann problems on subdomains with coarse grid correction.
- $\textit{its} \in \mathcal{O}\big(1 + \log\frac H h \big)$.

1.3.3. Multigrid

Hierarchy: Interpolation and restriction operators $ \Pi^\uparrow : X_{\text{coarse}} \to X_{\text{fine}} \qquad \Pi^\downarrow : X_{\text{fine}} \to X_{\text{coarse}} $

Geometric: define problem on multiple levels, use grid to compute hierarchy.
Algebraic: define problem only on finest level, use matrix structure to build hierarchy.

Galerkin approximation

Assemble this matrix: $A_{\text{coarse}} = \Pi^\downarrow A_{\text{fine}} \Pi^\uparrow$

Application of multigrid preconditioner ($ V $-cycle)

Apply pre-smoother on fine level (any preconditioner).
Restrict residual to coarse level with $\Pi^\downarrow$.
Solve on coarse level $A_{\text{coarse}} x = r$.
Interpolate result back to fine level with $\Pi^\uparrow$.
Apply post-smoother on fine level (any preconditioner).

Multigrid convergence properties

Textbook: $P^{-1}A$ is spectrally equivalent to identity
- Constant number of iterations to converge up to discretization error.
Most theory applies to SPD systems
- variable coefficients (e.g. discontinuous): low energy interpolants.
- mesh- and/or physics-induced anisotropy: semi-coarsening/line smoothers.
- complex geometry: difficult to have meaningful coarse levels.
Deeper algorithmic difficulties
- nonsymmetric (e.g. advection, shallow water, Euler).
- indefinite (e.g. incompressible flow, Helmholtz).
Performance considerations
- Aggressive coarsening is critical in parallel.
- Most theory uses SOR smoothers, ILU often more robust.
- Coarsest level usually solved semi-redundantly with direct solver.
Multilevel Schwarz is essentially the same with different language
- assume strong smoothers, emphasize aggressive coarsening.

2. Principles

Feel++ abstracts the PETSc library and provides a subset (sufficient in most cases) to the PETSc features. It interfaces with the following PETSc libraries: Mat , Vec , KSP , PC , SNES.

Vec Vector handling library
Mat Matrix handling library
KSP Krylov SubSpace library implements various iterative solvers
PC Preconditioner library implements various preconditioning strategies
SNES Nonlinear solver library implements various nonlinear solve strategies

All linear algebra are encapsulated within backends using the command line option --backend=<backend> or config file option backend=<backend> which provide interface to several libraries

Library

Format

Backend

PETSc

sparse

petsc

Eigen

sparse

eigen

Eigen

dense

eigen_dense

The default backend is petsc.

3. Somes generic examples

The configuration files .cfg allow for a wide range of options to solve a linear or non-linear system.

We consider now the following example [import:"marker1"](../../codes/mylaplacian.cpp)

To execute this example

# sequential
./feelpp_tut_laplacian
# parallel on 4 cores
mpirun -np 4 ./feelpp_tut_laplacian

As described in section

3.1. Direct solver

cholesky and lu factorisation are available. However the parallel implementation depends on the availability of MUMPS. The configuration is very simple.

# no iterative solver
ksp-type=preonly
#
pc-type=cholesky

Using the PETSc backend allows to choose different packages to compute the factorization.

Table 1. Table of factorization package
Package	Description	Parallel
`petsc`	PETSc own implementation	yes
`mumps`	MUltifrontal Massively Parallel sparse direct Solver	yes
`umfpack`	Unsymmetric MultiFrontal package	no
`pastix`	Parallel Sparse matriX package	yes

To choose between these factorization package

# choose mumps
pc-factor-mat-solver-package=mumps
# choose umfpack (sequential)
pc-factor-mat-solver-package=umfpack

In order to perform a cholesky type of factorisation, it is required to set the underlying matrix to be SPD.

// matrix
auto A = backend->newMatrix(_test=...,_trial=...,_properties=SPD);
// bilinear form
auto a = form2( _test=..., _trial=..., _properties=SPD );

3.2. Using iterative solvers

3.2.1. Using CG and ICC(3)

with a relative tolerance of 1e-12:

ksp-rtol=1.e-12
ksp-type=cg
pc-type=icc
pc-factor-levels=3

3.2.2. Using GMRES and ILU(3)

with a relative tolerance of 1e-12 and a restart of 300:

ksp-rtol=1.e-12
ksp-type=gmres
ksp-gmres-restart=300
pc-type=ilu
pc-factor-levels=3

3.2.3. Using GMRES and Jacobi

With a relative tolerance of 1e-12 and a restart of 100:

ksp-rtol=1.e-12
ksp-type=gmres
ksp-gmres-restart 100
pc-type=jacobi

3.3. Monitoring linear non-linear and eigen problem solver residuals

# linear
ksp_monitor=1
# non-linear
snes-monitor=1
# eigen value problem
eps-monitor=1

4. Solving the Laplace problem

We start with the quickstart Laplacian example, recall that we wish to, given a domain $\Omega$, find $u$ such that

\[-\nabla \cdot (k \nabla u) = f \mbox{ in } \Omega \subset \mathbb{R}^{2},\\ u = g \mbox{ on } \partial \Omega\]

4.1. Monitoring KSP solvers

feelpp_qs_laplacian --ksp-monitor=true

4.2. Viewing KSP solvers

shell> mpirun -np 2 feelpp_qs_laplacian --ksp-monitor=1  --ksp-view=1
  0 KSP Residual norm 8.953261456448e-01
  1 KSP Residual norm 7.204431786960e-16
KSP Object: 2 MPI processes
  type: gmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
     Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1000
  tolerances:  relative=1e-13, absolute=1e-50, divergence=100000
  left preconditioning
  using nonzero initial guess
  using PRECONDITIONED norm type for convergence test
PC Object: 2 MPI processes
  type: shell
    Shell:
  linear system matrix = precond matrix:
  Matrix Object:   2 MPI processes
    type: mpiaij
    rows=525, cols=525
    total: nonzeros=5727, allocated nonzeros=5727
    total number of mallocs used during MatSetValues calls =0
      not using I-node (on process 0) routines

5. Solvers and preconditioners

You can now change the Krylov subspace solver using the --ksp-type option and the preconditioner using --pc-ptype option.

For example,

to solve use the conjugate gradient,cg, solver and the default preconditioner use the following

./feelpp_qs_laplacian --ksp-type=cg --ksp-view=1 --ksp-monitor=1

to solve using the algebraic multigrid preconditioner, gamg, with cg as a solver use the following

./feelpp_qs_laplacian --ksp-type=cg --ksp-view=1 --ksp-monitor=1 --pc-type=gamg

6. Block factorisation

6.1. Stokes

We now turn to the quickstart Stokes example, recall that we wish to, given a domain $\Omega$, find $(\mathbf{u},p) $ such that

\[ -\Delta \mathbf{u} + \nabla p = \mathbf{ f} \mbox{ in } \Omega,\\ \nabla \cdot \mathbf{u} = 0 \mbox{ in } \Omega,\\ \mathbf{u} = \mathbf{g} \mbox{ on } \partial \Omega\]

This problem is indefinite. Possible solution strategies are

Uzawa,
penalty(techniques from optimisation),
augmented lagrangian approach (Glowinski,Le Tallec)

that The Inf-sup condition must be satisfied. In particular for a multigrid strategy, the smoother needs to preserve it.

6.1.1. General approach for saddle point problems

The Krylov subspace solvers for indefinite problems are MINRES, GMRES. As to preconditioning, we look first at the saddle point matrix $M$ and its block factorization $M = LDL^T$, indeed we have :

\[M = \begin{pmatrix} A & B \\ B^T & 0 \end{pmatrix} = \begin{pmatrix} I & 0\\ B^T C & I \end{pmatrix} \begin{pmatrix} A & 0\\ 0 & - B^T A^{-1} B \end{pmatrix} \begin{pmatrix} I & A^{-1} B\\ 0 & I \end{pmatrix}\]

Elman, Silvester and Wathen propose 3 preconditioners:

\[P_1 = \begin{pmatrix} \tilde{A}^{-1} & B\\ B^T & 0 \end{pmatrix}, \quad P_2 = \begin{pmatrix} \tilde{A}^{-1} & 0\\ 0 & \tilde{S} \end{pmatrix},\quad P_3 = \begin{pmatrix} \tilde{A}^{-1} & B\\ 0 & \tilde{S} \end{pmatrix}\]

where $\tilde{S} \approx S^{-1} = B^T A^{-1} B$ and $\tilde{A}^{-1} \approx A^{-1}$