Impressed by the recent successful stories of ChatGPT in many domains, we first pose the question: “If safety analysis can actually make use of LLMs?”. To answer, we conducted a case study of applying ChatGPT in the STPA for an AEB system.
We set the STPA results obtained by human safety experts as our baselines (published in Comparison of the HAZOP, FMEA, FRAM, and STPA Methods for the Hazard Analysis of Automatic Emergency Brake Systems and System-Theoretic Process Analysis (STPA) of Demand-Side Load Management in Smartgrids). We maintained the same application scenario as in original papers, which are the AEB systems and DSM systems. For subsequent comparisons, we adopted the same methodology as the original studies.
Control loop structures of three complexity levels for the two baselines, AEB (first row) and DSM (second row) systems.
We first define three levels of abstraction, ranging from coarse to fine-grained, that represents how human experts may interact with ChatGPT: Workflow Level, Semantics Level and Syntax Level. After that, we frame the following research questions (RQs),
RQ1 (Collaboration Scheme): How do various collaboration schemes of integrating ChatGPT into STPA affect the effectiveness and usability of STPA? RQ2 (Semantic Complexity): To what extent do variations in semantic complexity of individual input questions to ChatGPT affect the correctness and pertinence of STPA results? RQ3 (Prompt Guideline): Does the utilisation of syntactic-level prompt guidelines affect the correctness and pertinence of STPA results?
We consider three collaboration schemes incorporating ChatGPT into the STPA workflow in the case studies. Three ways of incorporating ChatGPT in the workflow of how human safety experts perform STPA: (a) One-off simplex collaboration (b) Recurring simplex collaboration (c) Recurring duplex collaboration. The Venn diagram of the sets of UCAs for the AEB system and the DSM system. The different colour represents the baseline (green), one-off simplex collaboration case (yellow), recurring simplex collaboration case (blue) and recurring duplex collaboration case (orange) respectively
Box and whisker plots of samples for RQ2 (Left: Number of correct UCAs across 3 groups of samples. Right: Proportion of correct UCAs across 3 groups of samples.)
Box and whisker plots of samples for RQ3 (Left: Number of correct UCAs across 3 groups of samples. Right: Proportion of correct UCAs across 3 groups of samples.)
Four-quadrant classification of risks with ways of mitigations.
The RQ2_RQ3_val_pdf
folder contains .pdf
type files that need to be independently reviewed. They contain all the results of RQ2 and RQ3. A black box in the file indicates findings which incorrect UCAs are identified in these interactions, and the 'correct' UCA is defined as an answer that is both correct and useful in this paper.
- Part of original ChatGPT response for each case (.mhtml) can be downloaded and opened locally using a web browser.
- We utilised the plus versions of ChatGPT (GPT4) provided by the official source in this paper.