Red Plag

Red Plag is a plagiarism checker tool that allows user to detect source code plagiarism and locate the instances of plagiarism within the code files pairwise. The frontend is implemented in Angular and backend with Django and Django REST framework.

It gives a visual measure of plagiarized content using scatter plots and also highlights the similar blocks between the files pairwise. It implements language-specific functionality to develop a more robust checker for Java.

Prerequisites

Before you begin, ensure you have installed the latest version of the following:

python
pygments, a syntax package in python
Java
@angular/cli
Django
Django REST

Installing Red Plag

To install Red Plag, follow these steps:

Clone this github repository on your local machine using:

$ git clone https://github.com/tantheta01/Plagiarism-Detection

Navigate to the PYTORCH directory, and setup the backend environment using:

$ cd Plagiarism-Detection/PYTORCH/
$ python3 manage.py runserver

Navigate to the FrontEnd directory, and setup the frontend environment using:

$ cd ../FrontEnd/sim-check/
# install dependencies
$ npm install
$ ng serve --open

The local host server will open after compilation.

Using Red Plag

To use Red Plag follow these steps:

Click on the Let’s get started! button on the introduction page.
You will be routed to the login page. If you already have an account, login with those credentials. Else, click on Don’t have an account? and signup.
Once logged in, the main page features options to change your password and upload the code files.
Accepted File Format for Upload:
- tar file format
- on extracting, tar file should have two directories, namely code_files and stub_files.
- code_files contains all the code files for pairwise plagiarism detection.
- stub_files contains the stub code file(s) that is invariably common in all the code files.
Upload the tar file in specified format.
You will be navigated to the results page. The result page features the following:
- Scatter plot to visualize high-dimensional signature vectors of the code files by performing dimensionality-reduction (PCA).
- Highlighted similar blocks between the files pairwise.
- A downloadable csv file pairwise_similarity.csv in the format:

File 1	File 2	Similarity

Backend Endpoints

UserLogin: An endpoint for logging in. Returns the username and the authentication token for the user.
UserCreate: An endpoint for signing in. Returns the username and the email of the user.
PassChangeView: An endpoint for changing password. Takes the old password and the new password of the user (authentication is done using token). If the old password matches the user's password, the password is updated.
FileUpload: To upload the tarball. Takes the tar file to be uploaded and the authentication token of the user. Every file is linked to a user object and to facilitate organization and showing previous results, the file is saved at /media/<username>/<filename>.

Contributors

Contact

If you want to contact me you can reach me at greettanay@gmail.com.

License

This project uses the following license: MIT.

Project Plan Ahead

We would majorily be working to implement the core logic in the next half of the evaluation and would be following these steps in a coherent manner.

We’ll start off with text preprocessing by removing the blank lines, variable tokenization.

Divide the document into n-grams and hash the n-grams.
Make windows of certain length over the hash values. In each window select the minimum hash value (There are several subtle variations. One has been stated here). If there is more than one hash with the minimum value, select the rightmost occurrence.
Now save all selected hashes as the signature vector for the document.
Normalize the signature vectors after padding and measure their cosine similarity pairwise.
We are studying about tSNE and PCA for visualization of results t-Distributed Stochastic Neighbor Embedding (t-SNE) is an unsupervised, non-linear technique primarily used for data exploration and visualizing high-dimensional data. In simpler terms, t-SNE gives you a feel or intuition of how the data is arranged in a high-dimensional space.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
FrontEnd/sim-check		FrontEnd/sim-check
PYTORCH		PYTORCH
LICENSE		LICENSE
README.md		README.md
image.png		image.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Red Plag

Prerequisites

Installing Red Plag

Using Red Plag

Backend Endpoints

Contributors

Contact

License

Project Plan Ahead

About

Releases

Packages

Contributors 4

Languages

License

tantheta01/Plagiarism-Detection

Folders and files

Latest commit

History

Repository files navigation

Red Plag

Prerequisites

Installing Red Plag

Using Red Plag

Backend Endpoints

Contributors

Contact

License

Project Plan Ahead

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages