Basic aspects

1. What is the purpose of the data collection/generation and its relation to the objectives of Feel++?

The primary purpose of data collection and generation in the Feel++ project is to support scientific computing, mathematical modeling, and numerical simulation across multiple domains. The objectives include:

Benchmarking and Validation: Collecting reference datasets to validate numerical methods and algorithms implemented in Feel++
Performance Analysis: Generating performance metrics to optimize high-performance computing implementations
Scientific Applications: Supporting multidisciplinary research in biomedical engineering, material science, fluid dynamics, and thermal modeling
Open Science: Providing reproducible computational results and datasets for the scientific community
Industrial Applications: Facilitating technology transfer through validated simulation tools and datasets

The data collection serves the core mission of Feel++ to democratize advanced mathematical modeling and simulation tools while maintaining scientific rigor and reproducibility.

2. What types and formats of data Feel++ generates/collects?

Feel++ generates and collects diverse types of data across multiple categories:

Simulation Data

Mesh files (MSH, GMSH formats)
Solution fields (HDF5, VTK, Ensight formats)
Time series data (CSV, JSON)
Convergence and error analysis data

Benchmarking Data

Performance metrics (execution time, memory usage, scalability)
Accuracy assessments (error norms, convergence rates)
Comparative analysis results

Application-Specific Data

Medical imaging data (DICOM, NIfTI) for biomedical applications
Material property databases
Experimental validation datasets
Sensor measurements and monitoring data

Documentation and Metadata

Configuration files (JSON, CFG)
Parameter studies and sensitivity analysis
Computational geometry and CAD files

The types and formats of data depends on the purpose of the data. Some datasets are provided for benchmarking and others for verification and validation. See Feel++ Data Types to understand what can be provided.

3. Will Feel++ re-use any existing data and how?

In general, the Feel++ library will not use existing data, except in the case of basic examples for the testing part. On the other hand, Feel++ applications will reuse data from different sources (mainly, third parties). Such data will be obtained, in general, directly from the third parties interested in the application outcomes. Also, some of the research groups will reuse their own data in the pilots.

In some cases, there are public repositories with data or public databases with part of the input data (e.g material properties), while in other cases, data will be kept private.

Since the Cemosis e-Infrastructure provides a data repository, partners collaborating on applications will aim at using such tool as much as possible, although we assume that in some cases this could not be the case (some simulations with some Feel++ applications might be done in the third party’s premises due to access restriction to some confidential data).

4. What is the origin of the data?

In the case of the Feel++ library, most of the data is generated by the Feel++ consortium (testing and gathering several metrics for validation purposes), with the exception of existing input examples.

Also, in the case of communication, data is generated by the consortium, although it is true that it is as result of questionnaires answered by third parties to the project (different stakeholders in the Modelling, Simulation and Optimization domain).

On the other hand, Feel++ applications uses data from third parties for the input of the pilot applications in some cases. Therefore, the origin of the data will vary depending on each application.

At this stage, the following external data sources have been identified:

Vivabrain: ICUBE Laboratory handling the MRI
Eye2Brain: Eugene and Marilyn Glick Eye Institute in Indianapolis;
HiFiMagnet: LNCMI National Lab for High Field Magnet;
Hemotum++: INSERM;
PO: PlasticOmnium automotive

5. What is the expected size of the data?

The data size varies significantly depending on the application domain and computational complexity:

Small-scale datasets (KB to MB)

Configuration files and parameter sets
1D and 2D simulation results
Performance metrics and logs
Documentation and metadata

Medium-scale datasets (MB to GB)

3D simulation results for standard problems
Time-dependent simulations with moderate resolution
Benchmark and validation datasets
Medical imaging data for typical studies

Large-scale datasets (GB to TB)

High-resolution 3D simulations
Long-time integration studies
Uncertainty quantification campaigns
Large-scale parallel performance studies

Application-specific estimates

Eye2Brain: 10 MB to 500 MB per patient dataset
HifiMagnet: 100 MB to 2 GB per magnetic field simulation
HemoTumPP: 50 MB to 1 GB per hemodynamic simulation
Industrial applications: Highly variable (1 GB to 10 TB)

The size of the data will also depend on the kind of data and other aspects, such as the concrete application and tools involved.

The main variation is given in the Feel++ application, since each of them uses different formats as well:

Eye2Brain: A few MBytes to hundreds of Mbytes;
HifiMagnet: A few MBytes to hundreds of Mbytes;

6. To whom might it be useful ('data utility')?

Taking into account the main two categories of data we will deal with, we consider that data could be useful for different stakeholders:

Data related to validation: Other researchers in the same field (HPC, Cloud, e-Infrastructures, MADFs) could be interested in order to do their own experiments and to compare solutions, as well as industry willing to participate in e-Infrastructures provisioning resources and software; domain;
Data used and generated in the Feel++ applications: Any researcher, industry and even policy makers interested in simulations results, depending on the domain of each Feel++ applications. We expect that other stakeholders will be interested in the data generated in the project. However, special care must be taken when sharing data regarding privacy and confidentiality, when input data is provided by third parties.