This Website is not fully compatible with Internet Explorer.
For a more complete and secure browsing experience please consider using Microsoft Edge, Firefox, or Chrome

SDM on an xLM Platform Delivers the FAIR Principles of Findability, Accessibility, Interoperability, and Reusability for Scientific and Engineering Simulation Data-sets

The Advanced Scientific Computing Research (ASCR) program in the US Department of Energy (DOE) Office of Science organized a workshop on the management and storage of scientific data in January 2022. 92 participants submitted position papers describing the challenges of managing scientific simulation data-sets generated to support research into nuclear fusion power plants. The purpose of this workshop was to prepare a research program to address the issue of the management and storage of scientific data.

“The parallel file system is the data management system” for most scientific HPCs worldwide and scientific workflows are managed separately in files rather than in a database. The workflow can only be understood by opening the datafile in a workflow management application. It’s therefore difficult to track data provenance or run ML or AI applications which need to access the complete dataset including both physical data and the multi-step process record.

Today, the separation of the datasets for scientific workflows into the physical data on the file system and the process record in a workflow file has several disadvantages:

• It creates a data management overhead for the scientist who has to manually record the process and the data separately, in two different places, and record how they are linked.

• It limits the portability of scientific workflows, data and results.

• Since the whole data set is not accessible through a single API to computer software, or in one place to scientists, it is difficult to run surrogate modelling, AI or ML applications which could provide invaluable insights to sets of datasets

• It is difficult to integrate scientific simulations, such as plasma behaviour, with engineering simulations, such as electromagnetic finite element analysis of the magnetic field to contain the plasma, to deliver joined-up scientific and engineering simulation of a complete reactor system.

The workshop participants concluded that the lack of a FAIR(1) data management capability for scientific simulation data delays and inhibits scientific discovery.

The DoE launched a new milestone-based fusion development program in September 2022. The lack of an information management platform for the scientific and engineering simulations necessary to design a fusion reactor system could delay such a program by 30%, based on experience in engineering programs.

This paper reviews the work done to develop a next-generation information system to support both scientific and engineering simulation for the development of a fusion reactor in the UK in 2020. It summarises the requirements and the data-sets to be managed which were expressed in the 2022 DoE workshop. It describes how the core SDM data-model, already proven for the management of large-scale engineering simulations, can be used to deliver FAIR simulation data management. It describes further data-management capabilities beyond FAIR which are nevertheless essential to the successful management of simulation data-sets.

References:
1 Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18

Document Details

ReferenceNWC23-5460-presentation
AuthorNorris. M
LanguageEnglish
TypePresentation
Date 16th May 2023
OrganisationtheSDMconsultancy
RegionGlobal

Download


Back to Previous Page