This repository contains the implementation and evaluation program for our ICSE'2024 paper "TRIAD: Automated Traceability Recovery based on Biterm-enhanced Deduction of Transitive Links among Artifacts".
Traceability allows stakeholders to extract and comprehend the trace links among software artifacts introduced across the software life cycle, to provide significant support for software engineering tasks. Despite its proven benefits, software traceability is challenging to recover and maintain manually. Hence, plenty of approaches for automated traceability have been proposed. Most rely on textual similarities among software artifacts, such as those based on Information Retrieval (IR). However, artifacts in different abstraction levels usually have different textual descriptions, which can greatly hinder the performance of IR-based approaches (e.g., a requirement in natural language may have a small textual similarity to a Java class). In this work, we leverage the consensual biterms and transitive relationships (i.e., inner- and outer-transitive links) based on intermediate artifacts to improve IR-based traceability recovery.
- We first extract and filter biterms from all source, intermediate, and target artifacts.
- We then use the consensual biterms from the intermediate artifacts to enrich the texts of both source and target artifacts,
- and finally deduce outer and inner-transitive links to adjust text similarities between source and target artifacts.
We conducted a comprehensive empirical evaluation based on five systems widely used in other literature to show that our approach can outperform four state-of-the-art approaches in AP over 15% and MAP over 10% on average, and how its performance is affected by different conditions of source, intermediate, and target artifacts.
- Java version 11
- dependencies management with Maven
- run
main()
insrc/main/java/RunWithBaseline.java
- set evaluated project by
projectEnum
parameter - set evaluated ir model by
irEnum
parameter - four baselines include IR-ONLY, TAROT, LIA, and COMET
Running for RQ2: What is the individual impact of biterms, outer- and inner-transitive on performance?
- run
main()
insrc/main/java/RunTRIAD.java
. - set evaluated project by
projectEnum
parameter - set evaluated ir model by
irEnum
parameter
├── RunWithBaselines.java <- Run result for RQ1.
│
├── RunTRIAD.java <- Run result for RQ2.
│
├── approach <- TRIAD and four approaches.
│ ├── TRAID.java <- Implemention of TRIAD.
│ ├── TRAID_NoBiterm.java <- Implemention of TRIAD without biterms.
│ ├── TAROT.java <- Implemention of TAROT.
│ ├── COMET.java <- Implemention of COMET.
│ └── LIA.java <- Implemention of LIA.
│
├── experiment <- Contain all information about the experiment.
│ ├── preprocess <- Preprocess datasets, including text preprocess and biterms extraction.
│ ├── project <- Information of evaluated projects.
│ ├── transitive <- Two types of transitive strategies.
│ │ ├── OuterTransitive.java <- Only consider outer-transitive links (e.g., S1→I1→T1).
│ │ └── OuterInnerTransitive.java <- Consider outer-inner combined transitive links (e.g., S1→S2→I1→T1 and S1→I1→I2→T1 ).
│ ├── enum <- Enum types used in this project.
│ └── Result.java <- Result of each approach.
│
├── model <- Three IR models (i.e., VSM, LSI, and JSD).
│ ├── VSM <- VSM model.
│ ├── LSI <- LSI model.
│ └── JSD <- JSD model.
│
├── document <- Model artifacts and links into entity classes.
│
└── util <- Utilities class used in the project.
Overview of the five evaluated systems:
Dataset | source | Intermediate | Target | S→I | I→T | I→T |
---|---|---|---|---|---|---|
Dronology | Requirement:58 | Design Definitions:144 | Source Code:184 | Req→DD:132 | DD→Src:563 | Req→Src:393 |
WARC | Non-Func. Reqs:21 | Specifications:89 | Func. Reqs:42 | NFR→SRS:58 | SRS→FRS:78 | NFR→FRS:45 |
EasyClinic | Use Case:30 | Interaction Descr.:20 | Code Descr.:47 | UC→ID:132 | ID→CD:563 | UC→CD:393 |
EBT | Requirement:44 | Test Case Descr.:25 | Source Code:50 | Req→TC:51 | TC→Src:93 | Req→Src:98 |
LibEST | Requirement:52 | Test Code:21 | Source Code:14 | Req→Test:352 | Test→Src:108 | Req→Src:204 |
Step1: add statement of the project in src/main/java/experiment/enums/ProjectEnum.java
Step2: new an entity class of the project in src/main/java/experiment/project
Step3: create file folders for the project in dataset
and copy artifacts
Step4: preprocess artifacts
Step5: extract biterms
- Extracting biterms from natural language written artifacts (refer WARC)
- Extracting biterms from programming language written artifacts (refer Dronology and LibEST)
- We only provide implementation of extract biterm from C (i.e., LibEST) and Java (i.e., Dronology) code. If you want to extract biterms from the other programming language. you can take following steps:
- parse code files with available parser tool to get identifier names (i.e., class name, method name, invoked method name, field name and its type, and parameter name and its type) and comments;
- extract candidate biterms from identifier names by combining any two splitted terms sequentially.
- We only provide implementation of extract biterm from C (i.e., LibEST) and Java (i.e., Dronology) code. If you want to extract biterms from the other programming language. you can take following steps:
Step6: run TRIAD