Skip to content

YiQi0318/ChatGPT-STPA

Repository files navigation

Safety Analysis in the Era of Large Language Models: A Case Study of STPA using ChatGPT

Description

Impressed by the recent successful stories of ChatGPT in many domains, we first pose the question: “If safety analysis can actually make use of LLMs?”. To answer, we conducted a case study of applying ChatGPT in the STPA for an AEB system.

Baseline Examples

We set the STPA results obtained by human safety experts as our baselines (published in Comparison of the HAZOP, FMEA, FRAM, and STPA Methods for the Hazard Analysis of Automatic Emergency Brake Systems and System-Theoretic Process Analysis (STPA) of Demand-Side Load Management in Smartgrids). We maintained the same application scenario as in original papers, which are the AEB systems and DSM systems. For subsequent comparisons, we adopted the same methodology as the original studies.

Control Loop Structure

Control loop structures of three complexity levels for the two baselines, AEB (first row) and DSM (second row) systems. Image text

Research Questions

We first define three levels of abstraction, ranging from coarse to fine-grained, that represents how human experts may interact with ChatGPT: Workflow Level, Semantics Level and Syntax Level. After that, we frame the following research questions (RQs),

RQ1 (Collaboration Scheme): How do various collaboration schemes of integrating ChatGPT into STPA affect the effectiveness and usability of STPA? RQ2 (Semantic Complexity): To what extent do variations in semantic complexity of individual input questions to ChatGPT affect the correctness and pertinence of STPA results? RQ3 (Prompt Guideline): Does the utilisation of syntactic-level prompt guidelines affect the correctness and pertinence of STPA results?

Answer to RQ1

We consider three collaboration schemes incorporating ChatGPT into the STPA workflow in the case studies. Image text Three ways of incorporating ChatGPT in the workflow of how human safety experts perform STPA: (a) One-off simplex collaboration (b) Recurring simplex collaboration (c) Recurring duplex collaboration. Image text Image text The Venn diagram of the sets of UCAs for the AEB system and the DSM system. The different colour represents the baseline (green), one-off simplex collaboration case (yellow), recurring simplex collaboration case (blue) and recurring duplex collaboration case (orange) respectively

Answer to RQ2

Box and whisker plots of samples for RQ2 (Left: Number of correct UCAs across 3 groups of samples. Right: Proportion of correct UCAs across 3 groups of samples.)

Answer to RQ3

Box and whisker plots of samples for RQ3 (Left: Number of correct UCAs across 3 groups of samples. Right: Proportion of correct UCAs across 3 groups of samples.)

Discusstion

Image text Four-quadrant classification of risks with ways of mitigations.

Independent Review

The RQ2_RQ3_val_pdf folder contains .pdf type files that need to be independently reviewed. They contain all the results of RQ2 and RQ3. A black box in the file indicates findings which incorrect UCAs are identified in these interactions, and the 'correct' UCA is defined as an answer that is both correct and useful in this paper.

Note

  • Part of original ChatGPT response for each case (.mhtml) can be downloaded and opened locally using a web browser.
  • We utilised the plus versions of ChatGPT (GPT4) provided by the official source in this paper.