This Website is not fully compatible with Internet Explorer.
For a more complete and secure browsing experience please consider using Microsoft Edge, Firefox, or Chrome

A Methodology for Efficient Generation and Optimization of Simulation-based Training Data for Data-centric AI Applications in Engineering

The fidelity and the predictive capability of simulation models in the vehicle engineering domain have significantly increased during the last years. Beside the general acceleration of development processes and the enabled deep insights, this allows for a reliable prediction of the systems behaviour in multiple non-standard conditions and scenarios. This capability also opens up novel opportunities, for example, in using simulation data, which is predominantly used in passive safety, in active safety applications. Since these applications call for (near-) real-time-capability, a major challenge is to overcome the large computational cost and thus computation time of conventional simulation methods (e.g. finite element method). This challenge could be overcome using simulation-based training data and artificial intelligence (AI) techniques to predict quantities of interest. Due to the computational cost, the generation of large training datasets is not feasible. Therefore, particular attention is required in the generation and iterative optimization of the simulation-based training dataset. This could be referred to as a data-centric AI approach, which aims to systematically engineer the data needed to efficiently train AI-models. It can be used to generate generalized datasets that are not optimized for a specific prediction but rather to reflect the overall systems behaviour. Currently, one-shot sampling methods (e.g. Latin hypercube) are predominantly used in simulation-based training data generation, which do not make use of any prior knowledge regarding the system behaviour. Utilizing this knowledge, adaptive sampling methods are able to determine the “next best set of samples” (batch) in the parameter space and therefore efficiently increase the information density in the dataset. Adaptive sampling methods in combination with FE-simulation models are widely used for sensitivity analyses and meta-model-based optimization. These methods make use of global and local metamodel error metrics and can process single and multiple response functions using for example variance-based or cross-validation-based approaches. This study focuses on the implementation of a generalized adaptive sampling pipeline as well as the investigation of simulation based training data generation for vehicle safety applications considering the impact of adaptive batch size determination. The first step of the iterative procedure focuses on the selection of the number of samples added in this iteration (batch size) based on a data quality threshold. The response quantities for this batch of samples are obtained using FE-simulation. The following stage involves using the generated dataset to train a metamodel able to predict the quantities of interest. Based on local, precision-based metrics, a new batch of points is determined in the regions of the design space with the lowest precision to increase information density of the current dataset. In addition, a distance-based secondary criterion is employed to prevent concentration of samples. To guarantee a considerable gain in information density in every iteration, the batch size is determined adaptively. This is done based on the gradient in local and global evaluation metrics quantifying the performance of the metamodel which is used as an indicator to determine the current information content of the dataset. The termination of the iterative process is defined by set of criteria with respect to the maximum number of samples or the saturation of information density. The pipeline is tested using a simulation setup considering a Total HUman Model for Safety (THUMS) Head crashing onto a windshield. A comparison of the information density of datasets generated by different adaptive sampling strategies as well as conventional one-shot sampling procedures is conducted. In addition, potential for improvement in the pipeline’s modules are discussed including the initial batch size determination, handling of multiple response functions and enhancement of data quality metrics.

Document Details

ReferenceNWC23-0266-extendedabstract
AuthorsBallal. N Soot. T Dlugosch. M
LanguageEnglish
TypeExtended Abstract
Date 17th May 2023
OrganisationsFraunhofer EMI
RegionGlobal

Download


Back to Previous Page