Skip to content

Latest commit

 

History

History
59 lines (39 loc) · 2.19 KB

README.md

File metadata and controls

59 lines (39 loc) · 2.19 KB

If you want to use this code, please cite our article describing this solution:

IEEE style

W. Brach, K. Košťál and M. Ries, "Can Large Language Model Detect Plagiarism in Source Code?," 2024 IEEE International Conference on Foundation and Large Language Models (FLLM2024), Dubai, United Arab Emirates, 2024, pp. 1-8.

LLM-plagiarism-check

We're trying to build a system for source code plagiarism detection using Large Language Models (LLMs) via the DSPy framework. The goal is to compare two input code files, determine if plagiarism has occurred, and provide an explanation for the result.

Installation

# Clone the repository
git clone https://github.com/fiit-ba/LLM-plagiarism-check.git
cd LLM-plagiarism-check

# Create a virtual environment
python3 -m venv llm-plagiarism-check

# Activate the virtual environment
source llm-plagiarism-check/bin/activate

# Install the required packages
pip install -r requirements.txt

Usage

Our project consists of several key components, each serving a specific purpose in our research workflow:

Jupyter Notebooks

  • check.ipynb: This is where we compile and train our DSPy programs.
  • eval.ipynb: Use this notebook to evaluate the performance of our DSPy programs.
  • jplag.ipynb: Run this to calculate the JPlag benchmark.
  • analysis.ipynb: This notebook contains all our plots and analysis of results.

Python Scripts

  • dataloader.py: Provides support for loading our research data.
  • models.py: Contains the model definitions for our DSPy programs.

Data Directories

  • data/IR-Plag-Dataset/: This directory contains our plagiarism dataset, sourced from this GitHub repository.
  • data/jplag/: Used for the JPlag benchmark calculations.
  • data/metadata/: Stores metadata for our DSPy programs.
  • data/results/: Where we save our research results.
  • data/train.tsv: Our training dataset for DSPy.
  • programs/ : Contains DSPy programs.

Citation

Contact

William Brach - @williambrach - william.brach@stuba.sk