The Advanced Scientific Computing Research (ASCR) program in the US Department of Energy (DOE) Office of Science organized a workshop on the management and storage of scientific data in January 2022. 92 participants submitted position papers describing the challenges of managing scientific simulation data-sets generated to support research into nuclear fusion power plants. The purpose of this workshop was to prepare a research program to address the issue of the management and storage of scientific data.
“The parallel file system is the data management system” for most scientific HPCs worldwide and scientific workflows are managed separately in files rather than in a database. The workflow can only be understood by opening the datafile in a workflow management application. It’s therefore difficult to track data provenance or run ML or AI applications which need to access the complete dataset including both physical data and the multi-step process record.
Today, the separation of the datasets for scientific workflows into the physical data on the file system and the process record in a workflow file has several disadvantages:
• It creates a data management overhead for the scientist who has to manually record the process and the data separately, in two different places, and record how they are linked.
• It limits the portability of scientific workflows, data and results.
• Since the whole data set is not accessible through a single API to computer software, or in one place to scientists, it is difficult to run surrogate modelling, AI or ML applications which could provide invaluable insights to sets of datasets
• It is difficult to integrate scientific simulations, such as plasma behaviour, with engineering simulations, such as electromagnetic finite element analysis of the magnetic field to contain the plasma, to deliver joined-up scientific and engineering simulation of a complete reactor system.
The workshop participants concluded that the lack of a FAIR(1) data management capability for scientific simulation data delays and inhibits scientific discovery.
The DoE launched a new milestone-based fusion development program in September 2022. The lack of an information management platform for the scientific and engineering simulations necessary to design a fusion reactor system could delay such a program by 30%, based on experience in engineering programs.
This paper reviews the work done to develop a next-generation information system to support both scientific and engineering simulation for the development of a fusion reactor in the UK in 2020. It summarises the requirements and the data-sets to be managed which were expressed in the 2022 DoE workshop. It describes how the core SDM data-model, already proven for the management of large-scale engineering simulations, can be used to deliver FAIR simulation data management. It describes further data-management capabilities beyond FAIR which are nevertheless essential to the successful management of simulation data-sets.
References:
1 Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016). https://doi.org/10.1038/sdata.2016.18
Reference | NWC23-5460-presentation |
---|---|
Author | Norris. M |
Language | English |
Type | Presentation |
Date | 16th May 2023 |
Organisation | theSDMconsultancy |
Region | Global |
Stay up to date with our technology updates, events, special offers, news, publications and training
If you want to find out more about NAFEMS and how membership can benefit your organisation, please click below.
Joining NAFEMS© NAFEMS Ltd 2025
Developed By Duo Web Design