Modernizing Feel++ Finite Element Library with Kokkidio, C++20, Eigen, and Kokkos
1. Overview
This document details the specifications for modernizing the Feel++ finite element library. The new design leverages modern C++20 features (concepts, constexpr functions, structured bindings), uses Eigen for linear algebra, and employs Kokkos (with Kokkidio) for performance portability on both CPU and GPU. The library will support high‑order finite element computations, automatic differentiation, and a flexible mesh data structure for heterogeneous convex elements.
2. Goals
-
Develop a versatile, high‑order finite element library that runs efficiently on both CPUs and GPUs.
-
Hide low‑level hardware details (CPU vs. GPU) behind a unified, high‑level API.
-
Support both compile‑time and runtime polynomial orders.
-
Use modern C++20 features (concepts, constexpr, structured bindings) and boost.mp11 for meta‑programming.
-
Integrate Kokkos for portability and Eigen for high‑performance linear algebra.
-
Leverage Kokkidio to bridge Eigen and Kokkos, so that familiar Eigen expressions can be used within Kokkos parallel dispatch.
-
Support automatic differentiation of polynomial bases, including recursive differentiation (e.g., gradients, Hessians, etc.).
-
Provide a mesh data structure that can store heterogeneous convex elements (Simplex, Hypercube, Prism, Pyramid, etc.) with MPI for distributed computing.
3. Components
3.1. 1. Polynomial Basis, Differentiation, and Automatic Differentiation
-
Implement polynomial basis classes (e.g., Dubiner) for L2‑orthonormal polynomial bases on simplices.
-
Support evaluation, gradient, and recursive differentiation via precomputed differentiation matrices.
-
For high‑order derivatives, utilize Eigen::Tensor for multi‑dimensional derivative data.
-
Design the API so that the gradient (or any derivative) of a polynomial set is itself a polynomial set.
-
Example:
#include <Kokkos_Core.hpp>
#include "dubiner.hpp"
#include <iostream>
int main() {
Kokkos::initialize();
{
// Create a Dubiner basis of order 3 for scalar fields.
Feel::Dubiner<3, double> dubiner(3);
double x = 0.2, y = -0.3;
double val = dubiner.evaluate(2, 1, x, y);
std::cout << "Dubiner basis value: " << val << "\n";
// Compute the gradient (first derivative)
auto grad = dubiner.gradient();
// Further differentiation is possible recursively:
auto hessian = dubiner.differentiate(2);
}
Kokkos::finalize();
return 0;
}
3.2. 2. Geometric Reference Elements and Mesh Data Structures
-
Redesign Reference classes (e.g., ReferenceSimplex) using Eigen for small fixed-size data (vertices, barycenters) and Kokkos (or Kokkidio’s ViewMap) for GPU-compatible storage when needed.
-
Support various element shapes: Simplex, Hypercube, Prism, Pyramid (in 3D), Simplex and Hypercube (in 2D).
-
Develop a Mesh<ConvexTypes> class to store heterogeneous convex elements in a Structure‑of‑Arrays (SoA) layout for cache‑ and GPU‑efficiency.
-
Create a specialized storage class deriving from std::tuple (or wrapping it) that provides a clear interface (e.g. getElements<Simplex3D>(), addElement<T>()).
-
Integrate MPI for distributed mesh partitioning.
-
Leverage Kokkidio to enable mixed use of Eigen (for expressive algebra) and Kokkos (for parallel execution).
-
Example:
#include <Kokkos_Core.hpp>
#include "mesh.hpp"
#include "simplex.hpp"
#include "hypercube.hpp"
#include <iostream>
int main(int argc, char** argv) {
Kokkos::initialize(argc, argv);
{
// Create a mesh with 10,000 vertices and 5,000 elements.
Mesh<Kokkos::DefaultExecutionSpace, Feel::Simplex<3>, Feel::Hypercube<3>> mesh(10000, 5000);
// Add a simplex element to the mesh.
Feel::Simplex<3> tetra;
mesh.addElement(tetra);
// Distribute mesh elements among MPI ranks (if running in parallel)
mesh.distributeMesh();
// Compute element volumes in parallel.
mesh.computeVolume();
}
Kokkos::finalize();
return 0;
}
3.3. 3. Polynomial Policy Classes and Meta‑Programming
-
Rewrite the policies for Scalar, Vectorial, Tensor2, Tensor2Symm, and Tensor3 using C++20, boost.mp11, and Eigen.
-
Use constexpr member functions to access properties such as rank, dimension, and number of components. For example:
static constexpr uint16_type getRank() noexcept { return rank; } static constexpr uint16_type getDim() noexcept { return nDim; } static constexpr uint16_type getComponents() noexcept { return nComponents; }
-
Implement meta‑functions (RankUp, RankDown, etc.) with boost.mp11.
-
Provide a Normalized policy to choose between normalized and unnormalized polynomial sets.
-
Use an Eigen‑based Storage policy (StorageEigen) instead of ublas.
-
Example:
#include "feelpoly_policy.hpp"
#include <iostream>
int main() {
using Scalar3D = Feel::Scalar<3>;
using Vectorial3D = Feel::Vectorial<3>;
std::cout << "Scalar3D rank: " << Scalar3D::getRank()
<< ", components: " << Scalar3D::getComponents() << "\n";
std::cout << "Vectorial3D rank: " << Vectorial3D::getRank()
<< ", components: " << Vectorial3D::getComponents() << "\n";
return 0;
}
3.4. 4. Differentiation and Polynomial Shapes
-
Introduce differentiation matrices to compute the derivative of polynomial coefficient vectors.
-
For high‑order derivatives, use Eigen::Tensor to represent and compute multi‑dimensional derivative data.
-
Define a new set of concepts and policies for polynomial shape types (e.g., scalar, vectorial, matrix‑valued, multi‑component).
-
Use constexpr member functions to query shape properties.
4. Design Guidelines
-
Use C++20 features (concepts, constexpr, structured bindings) for type safety and compile‑time property access.
-
Replace boost.mpl with boost.mp11 for meta‑programming.
-
Abstract CPU vs. GPU execution by introducing an ExecutionSpace template parameter. Use if‑constexpr to choose between implementations.
-
Integrate Kokkidio to provide interoperability between Eigen and Kokkos:
-
Use Kokkidio’s ViewMap and DualViewMap to wrap Eigen objects.
-
Use Kokkidio’s ParallelRange and parallel dispatch functions for GPU‑compatible loops.
-
-
Separate modules into:
-
Polynomial basis and differentiation.
-
Geometric reference elements and mesh structures.
-
Polynomial policy and storage classes.
-
-
Optimize data layouts using a Structure‑of‑Arrays (SoA) design for both CPU vectorization and GPU memory coalescing.
-
Develop extensive unit tests and performance benchmarks.
-
Document each module with Doxygen-style comments and usage examples.
5. Timeline and Deliverables
-
Design Document (1–2 weeks):
-
Finalize interface definitions, module interactions, and hardware abstraction using Kokkidio.
-
Provide architectural diagrams and API specifications.
-
-
Module 1 – Polynomial Basis & Differentiation (3–4 weeks):
-
Implement the Dubiner basis class with recursive differentiation.
-
Integrate differentiation matrices and Eigen::Tensor for high‑order derivatives.
-
Deliver unit tests and benchmarks.
-
-
Module 2 – Geometric Reference Elements & Mesh (4–6 weeks):
-
Implement Reference elements (Simplex, Hypercube, Prism, Pyramid) using Eigen for small data.
-
Develop a Mesh<ConvexTypes> class with a tuple-based storage class and SoA layout.
-
Integrate MPI domain decomposition and Kokkidio for CPU/GPU interoperability.
-
Deliver integration tests and scalability benchmarks.
-
-
Module 3 – Polynomial Policy Classes (3–4 weeks):
-
Rewrite polynomial policies (Scalar, Vectorial, Tensor2, Tensor2Symm, Tensor3) using C++20 and boost.mp11.
-
Provide constexpr getters and meta‑functions (RankUp, RankDown, etc.).
-
Deliver unit tests for each policy.
-
-
Integration and Final Testing (2–3 weeks):
-
Integrate all modules and run full‑scale tests on CPU, GPU, and distributed environments.
-
Finalize documentation and perform code reviews.
-
6. Additional Considerations
-
Kokkidio Integration: Use Kokkidio to seamlessly combine Eigen’s expressive syntax with Kokkos’s execution portability.
-
GPU-Specific Optimizations: Use Eigen for small fixed‑size matrices (e.g., reference element vertices) and Kokkos::View for large arrays.
-
Future Extensions: Consider adding symbolic differentiation and adaptive mesh refinement capabilities.
7. Example Usage
#include <Kokkos_Core.hpp>
#include "dubiner.hpp"
#include "mesh.hpp"
#include <iostream>
int main(int argc, char** argv) {
Kokkos::initialize(argc, argv);
{
// Create a Dubiner basis (order 3) for scalar fields.
Feel::Dubiner<3, double> dubiner(3);
double x = 0.2, y = -0.3;
double val = dubiner.evaluate(2, 1, x, y);
std::cout << "Dubiner basis value: " << val << "\n";
// Compute the gradient of the basis.
auto grad = dubiner.gradient();
// Create a heterogeneous mesh with Simplex and Hypercube elements.
Mesh<Kokkos::DefaultExecutionSpace, Feel::Simplex<3>, Feel::Hypercube<3>> mesh(10000, 5000);
Feel::Simplex<3> simplex;
mesh.addElement(simplex);
mesh.distributeMesh(); // MPI distribution if applicable
mesh.computeVolume();
}
Kokkos::finalize();
return 0;
}
8. Summary
This specification outlines the work required to modernize the Feel++ library by:
-
Developing a high‑order finite element library that runs on CPUs and GPUs.
-
Using modern C++20 features (concepts, constexpr, boost.mp11) and libraries (Eigen, Kokkos, Kokkidio) to create a unified, portable API.
-
Supporting automatic differentiation of polynomial bases with recursive derivative computations.
-
Providing a flexible, high‑performance mesh data structure for heterogeneous convex elements with MPI-based distributed computing.