Fair Data

According to FORCE11 [1], FAIR data stands for Findable, Accessible, Interoperable, and Re-usable. It is a concept promoted as a way to facilitate data sharing among scientists. We discuss hereafter the meaning of each aspect and how we suggest to address them.

1. Making data findable

including provisions for metadata

1.1. Are the data produced and/or used in the project discoverable with metadata, identifiable and locatable by means of a standard identification mechanism (e.g. persistent and unique identifiers such as Digital Object Identifiers)?

Yes, Feel++ implements a comprehensive data discovery and identification system:

Persistent Identifiers
  • Digital Object Identifiers (DOIs) for published datasets through Zenodo integration

  • Git repository tags and commit hashes for version control

  • Unique simulation identifiers based on parameter fingerprints

Metadata Standards
  • Dublin Core metadata for general resource description

  • DataCite schema for research datasets

  • Custom Feel++ metadata schemas for simulation-specific information

Discovery Mechanisms
  • Integration with the Cemosis data repository and catalog

  • HAL (Hyper Articles en Ligne) repository for French academic publications

  • GitHub/GitLab repository metadata for source code and examples

  • Web-based data portals with search capabilities

1.2. What naming conventions do Feel++ follow?

Feel++ follows structured and consistent naming conventions:

File Naming
  • Descriptive names with domain, problem type, and resolution: heat_2d_square_h0.1.json

  • Timestamped outputs: result_YYYYMMDD_HHMMSS.h5

  • Version suffixes for iterative development: mesh_v1.2.msh

Directory Structure
  • Hierarchical organization: /domain/application/case/results/

  • Standardized subdirectories: /input/, /output/, /postprocessing/

Dataset Naming
  • Project prefix + description + version: feelpp_heat_benchmarks_v2.1

  • Application-specific conventions: eye2brain_patient_001_mri_t1

Variable and Field Naming
  • Physical quantities with units: temperature_celsius, velocity_ms

  • Standardized mathematical symbols: u (displacement), p (pressure), T (temperature)

1.3. Will search keywords be provided that optimize possibilities for re-use?

Yes, comprehensive keyword strategies are implemented:

Domain-Specific Keywords
  • Mathematical: "finite elements", "spectral methods", "Galerkin", "multiphysics"

  • Application domains: "biomedical", "thermal", "fluid dynamics", "electromagnetics"

  • Computational: "HPC", "parallel computing", "C++", "Python"

Methodological Keywords
  • "verification", "validation", "benchmarking", "uncertainty quantification"

  • "mesh generation", "adaptive refinement", "error estimation"

Interdisciplinary Keywords
  • Medical applications: "MRI", "hemodynamics", "brain modeling"

  • Engineering: "heat transfer", "structural mechanics", "optimization"

Technical Keywords
  • File formats, software versions, computational platforms

  • Performance metrics, scaling properties

1.4. Do we provide clear version numbers?

Yes, Feel++ implements comprehensive versioning:

Software Versioning
  • Semantic versioning (MAJOR.MINOR.PATCH) for Feel++ library releases

  • Git tags and releases for all software components

  • Docker image versioning for reproducible environments

Data Versioning
  • Version numbers for datasets: v1.0, v1.1, v2.0

  • Timestamped snapshots for evolving datasets

  • Checksum-based integrity verification

Documentation Versioning
  • Synchronized documentation with software releases

  • Change logs and migration guides between versions

  • API versioning for programmatic access

Reproducibility Support
  • Complete environment specifications (software stack, dependencies)

  • Provenance tracking for generated data

  • Version-locked configurations for long-term reproducibility

1.5. What metadata will be created? In case metadata standards do not exist in your discipline, please outline what type of metadata will be created and how.

Feel++ creates comprehensive metadata following established and custom standards:

Standard Metadata Schemas
  • Dublin Core for basic resource description

  • DataCite for research data citation

  • FAIR Data Point for FAIR compliance

  • CodeMeta for software metadata

Custom Feel++ Metadata
  • Simulation parameters and boundary conditions

  • Computational environment specifications (compiler, MPI, libraries)

  • Mathematical formulation details (equations, discretization)

  • Performance characteristics (runtime, memory, scaling)

Technical Metadata
  • File format specifications and structure

  • Data quality indicators and validation results

  • Provenance information (workflow, dependencies)

  • Access and usage statistics

Application-Specific Metadata
  • Medical data: Patient anonymization, imaging protocols

  • Engineering data: Material properties, experimental conditions

  • Benchmark data: Problem specifications, reference solutions

Metadata is stored in JSON-LD format for machine readability and integrated into the Cemosis data management infrastructure.

2. Making data openly accessible

2.1. Which data produced and/or used in the project will be made openly available as the default?

By default, the following data will be made openly available:

Freely Available Data
  • Benchmark datasets and validation cases

  • Educational examples and tutorials

  • Software documentation and user guides

  • Performance analysis and scaling studies

  • Non-sensitive simulation results

Open Source Components
  • Feel++ library source code (LGPL license)

  • Example configurations and test cases

  • Mesh generation tools and geometries

  • Post-processing scripts and visualizations

Restricted Access Data
  • Medical data (anonymized, with patient consent protocols)

  • Industrial proprietary datasets (partner agreements required)

  • Preliminary research results (embargo periods)

  • Third-party licensed data

The default principle is open access unless legal, ethical, or contractual restrictions apply. All restrictions are clearly documented with justification.

If certain datasets cannot be shared (or need to be shared under restrictions), explain why, clearly separating legal and contractual reasons from voluntary restrictions.
that in multi-beneficiary projects it is also possible for specific beneficiaries to keep their data closed if relevant provisions are made in the consortium agreement and are in line with the reasons for opting out.

2.2. How will the data be made accessible?

Data accessibility is ensured through multiple channels:

Primary Repositories
  • Cemosis Data Repository: Institutional repository at University of Strasbourg

  • Zenodo: For DOI assignment and long-term preservation

  • HAL (Hyper Articles en Ligne): French national repository

  • GitHub/GitLab: For source code and example datasets

Access Methods
  • Direct download via web interfaces

  • API access for programmatic retrieval

  • Version control systems (Git) for code and configuration

  • Streaming access for large datasets

Distribution Formats
  • Container images (Docker, Singularity) for complete environments

  • Package managers for software components

  • Cloud storage integration (academic and commercial platforms)

e.g. by deposition in a repository

2.3. What methods or software tools are needed to access the data?

Accessing Feel++ data requires different tools depending on data type:

Basic Access
  • Web browsers for metadata and documentation

  • Standard file managers for simple datasets

  • Text editors for configuration files

Scientific Computing Tools
  • Feel++ library for native data formats

  • ParaView/VisIt for visualization (VTK, Ensight formats)

  • HDF5 tools for large numerical datasets

  • Python scientific stack (NumPy, SciPy, Matplotlib)

Specialized Software
  • GMSH for mesh files

  • Medical imaging software (3D Slicer, ITK-SNAP) for biomedical data

  • CAD software for geometric models

  • Version control tools (Git) for code repositories

Development Environment
  • C++ compiler and development tools

  • Python interpreter with scientific libraries

  • Container platforms (Docker, Singularity)

  • High-performance computing environments

2.4. Is documentation about the software needed to access the data included?

Yes, comprehensive documentation is provided:

User Documentation
  • Installation guides for all platforms (Linux, macOS, Windows)

  • Quick start tutorials and examples

  • Complete API reference and user manual

  • Video tutorials and webinars

Technical Documentation
  • File format specifications

  • Data structure descriptions

  • Workflow documentation with step-by-step instructions

  • Troubleshooting guides and FAQ

Developer Documentation
  • Source code documentation (Doxygen)

  • Contribution guidelines

  • Build system instructions

  • Testing procedures and validation protocols

Accessibility Features
  • Multiple documentation formats (HTML, PDF, EPUB)

  • Multilingual support (English, French)

  • Searchable online documentation

  • Community forums and support channels

2.5. Is it possible to include the relevant software ?

e.g. in open source code

2.6. Where will the data and associated metadata, documentation and code be deposited?

Data deposition follows a multi-tier strategy utilizing certified repositories:

Primary Repositories
  • Zenodo (zenodo.org): Long-term preservation with DOI assignment for published datasets

  • HAL (hal.archives-ouvertes.fr): French national repository for academic publications and data

  • Cemosis Repository: Institutional repository for ongoing projects and internal use

  • GitHub/GitLab: Source code, examples, and version-controlled datasets

Specialized Repositories
  • Software Heritage: Permanent archival of all source code

  • Medical imaging repositories: Anonymized biomedical datasets (with ethics approval)

  • Domain-specific repositories: Field-appropriate archives for specialized datasets

Repository Selection Criteria
  • Certification status (ISO 16363, OAIS compliance)

  • Long-term sustainability and funding

  • Community adoption in relevant domains

  • Technical capabilities (APIs, search, visualization)

Integration Strategy
  • Automated deposition workflows from development environments

  • Metadata synchronization across repositories

  • Cross-repository linking and citation

  • Backup strategies across multiple platforms

Preference should be given to certified repositories which support open access where possible.

2.7. Have you explored appropriate arrangements with the identified repository?

If there are restrictions on use, how will access be provided?

2.8. Is there a need for a data access committee?

2.9. Are there well described conditions for access (i.e. a machine readable license)? How will the identity of the person accessing the data be ascertained?

3. Making data interoperable

3.1. Are the data produced in the project interoperable, that is allowing data exchange and re-use between researchers, institutions, organisations, countries, etc. (i.e. adhering to standards for formats, as much as possible compliant with available (open) software applications, and in particular facilitating re-combinations with different datasets from different origins)?

Yes, Feel++ prioritizes interoperability through standardized formats and protocols:

Standard File Formats
  • HDF5 for large numerical datasets (self-describing, cross-platform)

  • VTK/VTU for visualization and post-processing

  • JSON for configuration and metadata

  • NetCDF for climate and environmental data

Open Standards Compliance
  • MPI for parallel computing interoperability

  • CGNS for computational fluid dynamics

  • DICOM for medical imaging data

  • OpenFOAM mesh compatibility

Software Interoperability
  • ParaView/VisIt visualization compatibility

  • Python scientific ecosystem integration

  • R statistical computing interfaces

  • MATLAB/Octave data exchange

Cross-Platform Support
  • Linux, macOS, Windows compatibility

  • Container-based distribution (Docker, Singularity)

  • Cloud platform integration (AWS, Azure, Google Cloud)

  • High-performance computing cluster support

3.2. What data and metadata vocabularies, standards or methodologies will you follow to make your data interoperable?

3.3. Will you be using standard vocabularies for all data types present in your data set, to allow inter-disciplinary interoperability?

3.4. In case it is unavoidable that you use uncommon or generate project specific ontologies or vocabularies, will you provide mappings to more commonly used ontologies?

4. Making data re-usable

4.1. How will the data be licensed to permit the widest re-use possible?

Feel++ employs a comprehensive licensing strategy to maximize reuse while protecting appropriate interests:

Open Source Licenses
  • LGPL v3+ for Feel++ core library (allows commercial use with attribution)

  • MIT/BSD licenses for utilities and examples (maximum permissivity)

  • Creative Commons licenses for documentation and educational materials

  • CC0 (public domain) for benchmark datasets and reference solutions

Data-Specific Licensing
  • CC BY 4.0: Attribution required, commercial use allowed

  • CC BY-SA 4.0: Share-alike for derivative datasets

  • CC BY-NC 4.0: Non-commercial use only (for sensitive applications)

  • Custom licenses: For industry partnerships with specific requirements

Medical and Sensitive Data
  • Controlled access with data use agreements

  • Anonymization protocols preserving scientific utility

  • Time-limited access for research purposes

  • Compliance with GDPR and medical ethics requirements

Implementation
  • Clear license statements in all data packages

  • Machine-readable license metadata

  • Legal review for complex licensing scenarios

  • Community education on licensing implications

4.2. When will the data be made available for re-use? If an embargo is sought to give time to publish or seek patents, specify why and how long this will apply, bearing in mind that research data should be made available as soon as possible.

4.3. Are the data produced and/or used in the project useable by third parties, in particular after the end of the project? If the re-use of some data is restricted, explain why.

4.4. How long is it intended that the data remains re-usable?

4.5. Are data quality assurance processes described?

5. Bibliography

FORCE11, Guiding Principles for Findable, Accessible, Interoperable and Re-usable Data; Version B1.0. www.force11.org/fairprinciples