Skip to content

MDEGroup/GPTSniffer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GPTSniffer

This repository contains the source code implementation of GPTSniffer and the datasets used to replicate the experimental results of our paper:

GPTSniffer: A CodeBERT-based classifier to detect source code written by ChatGPT

Phuong T. Nguyen, Juri Di Rocco, Claudio Di Sipio, Riccardo Rubei, Davide Di Ruscio,(1) Massimiliano Di Penta(2)

(1) Università degli Studi dell'Aquila, Italy

(2) Università degli Studi del Sannio, Italy

The paper is published by the Journal of Systems and Software (JSS), and it is available as open access in the following link.

Abstract

Since its launch in November 2022, ChatGPT has amassed attention from many users. In particular, programmers have started to leverage it to get help for several development problems. For example, ChatGPT can provide users with code snippets that fulfill a given development task. However, while offering a practical solution to programming problems, ChatGPT should be mainly used only as a supporting tool rather than as a replacement for the human being. This is particularly crucial in software education, where students are not expected to rely entirely on ChatGPT or similar systems to conduct their assign- ments. This triggers the need to detect source code written by ChatGPT automatically. While tools to identify content generated by AI exist, they may need to work properly on source code. In this paper, we conceptualize GPTSniffer as a novel approach to the detection of source code written by ChatGPT. GPTSniffer is built on top of CodeBERT, a model pre-trained with large corpora of source code. We collected a dataset consisting of code written by humans and then queried ChatGPT to get snippets generated by the platform. Finally, we present an empirical evaluation on the collected datasets, and compare GPTSniffer with two existing baselines. The experimental results show that GPTSniffer can give precise classification for several testing instances. More importantly, it outperforms both baselines with respect to the prediction accuracy.

Introduction

ChatGPT is a generative Artificial Intelligence (AI) tool, being able to produce convincingly human answers to queries from users. Since its public release on November 30, 2022, the system has become a phenomenon as it has garnered attention from both expert- and non expert users worldwide, reaching one million users only five days after the launching. ChatGPT rises to fame thanks to its ability to provide human- like answers, as well as to maintain a thread of conversation in a natural way.

One of the areas in which ChatGPT appears to be partic- ularly promising and fascinating is in its ability to support developers in a variety of tasks, that range from writing source code that fulfills a given (natural language) specification, creating a software architecture/design, generating tests, or fixing a bug.

Leveraging ChatGPT to get recommendations for source code solutions is becoming very popular among developers. This does not happen without risks, as it has been shown that ChatGPT could provide vulnerable code, and also there is a wide discussion about possible copyright and licensing infringements related to reusing its recommended code.

When ChatGPT or generic code recommenders are used by students during their learning processes, issues on risks and benefits arise, and this has triggered quite some discussion among educators. On the positive side, code snippets generated by ChatGPT provide students with a practical way to complete their assignment. Also, several educators believe that one of the skills of developers working with the state-of-the-practice technology is their ability to retrieve, review and integrate pieces of software. At the same time, one major risk is that students would not develop some essential skills that can be acquired only through self-learning, e.g., critical thinking and problem-solving. Moreover, handing in code written by ChatGPT as a whole, i.e., without any concrete self-work, can be considered as a form of fraud. Such kind of behaviors triggers concerns over ethics, as students have their work done without actually performing their own research.

Lately, as a precautionary measure, some universities in different countries even imposed a ban on ChatGPT, prohibiting their students from using the system to generate solutions for homework, or to compose essays. In the Software Engineering community, one the one hand, we advocate for the democratized use of AI tools to ease daily programming tasks, thus improving the working performance. On the other hand, we believe that it is necessary to recognize when a source code element has been written by the AI for various reasons, and, essentially (i) from the professional development side, dealing with security and legal problems; and (ii) from the educational side, coping with cheating and plagiarism.

Recently, GPTZero has been developed as one of the first systems to automatically help users recognize if a text is written by OpenAI technologies.Interestingly, by several attempts playing with the platform, we realized it is not good at distinguishing between source code written by humans and machines. We suppose that the underpinning engine has been trained on natural language text, rather than source code. Altogether, this necessitates proper tools to identify the real author of a code snippet. This paper proposes GPTSniffer–a machine learning solu- tion to determine whether a piece of source code has been generated by ChatGPT. The classification engine is based on CodeBERT, a pre-trained model built on top of a code search dataset, i.e., CodeSearchNet. While it has been widely used by Software Engineering research to provide code recommendation, CodeBERT has never been applied to the detection of source code.

Configuration pictures

We used 8 experimental configurations, which are indeed not exhaustive, as we cannot consider all possible combinations of artifacts. Thus, we pay attention only to those most representative and realistic. Due to space limits, the paper did not display a figure to illustrate the code examples for C7 C8. We report the missing consiguration in the follows:

  • Example C7 C7

  • Example C8

C8

How to cite

If you find our work useful for your research, please cite the papers using the following BibTex entries:

@article{NGUYEN2024112059,
title = {GPTSniffer: A CodeBERT-based classifier to detect source code written by ChatGPT},
journal = {Journal of Systems and Software},
pages = {112059},
year = {2024},
issn = {0164-1212},
doi = {https://doi.org/10.1016/j.jss.2024.112059},
url = {https://www.sciencedirect.com/science/article/pii/S0164121224001043},
author = {Phuong T. Nguyen and Juri {Di Rocco} and Claudio {Di Sipio} and Riccardo Rubei and Davide {Di Ruscio} and Massimiliano {Di Penta}},
keywords = {ChatGPT, Code classification, CodeBERT, Pre-trained Models},
abstract = {Since its launch in November 2022, ChatGPT has gained popularity among users, especially programmers who use it to solve development issues. However, while offering a practical solution to programming problems, ChatGPT should be used primarily as a supporting tool (e.g., in software education) rather than as a replacement for humans. Thus, detecting automatically generated source code by ChatGPT is necessary, and tools for identifying AI-generated content need to be adapted to work effectively with code. This paper presents GPTSniffer– a novel approach to the detection of source code written by AI–built on top of CodeBERT. We conducted an empirical study to investigate the feasibility of automated identification of AI-generated code, and the factors that influence this ability. The results show that GPTSniffer can accurately classify whether code is human-written or AI-generated, outperforming two baselines, GPTZero and OpenAI Text Classifier. Also, the study shows how similar training data or a classification context with paired snippets helps boost the prediction. We conclude that GPTSniffer can be leveraged in different contexts, e.g., in software engineering education, where teachers use the tool to detect cheating and plagiarism, or in development, where AI-generated code may require peculiar quality assurance activities.}
}

Troubleshooting

If you encounter any difficulties in working with the tool or the datasets, please do not hesitate to contact us at one of the following emails: phuong.nguyen@univaq.it, juri.dirocco@univaq.it. We will try our best to answer you as soon as possible.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages