- Datasets
- Machine Learning Models
- Flask Deployment with Caching
- Webpage Interface
- Usage
- Results
- Contributing
- License
The project utilizes two separate datasets, each tailored for training a specific machine learning model.
The repository includes two machine learning models:
This model is built using Python and popular libraries such as scikit-learn. It employs a supervised learning approach, where the model learns from labeled examples to make predictions based on the 29 URL features.
This model also uses Python and machine learning libraries to analyze the text content of URLs. It extracts relevant features from the text and trains a separate classifier to detect phishing websites.
The training process involves utilizing scikit-learn pipelines, which consist of custom transformers for preprocessing data before feeding it to the models. Grid search with cross-validation is used to tune hyperparameters and optimize model performance.
Each model is evaluated using metrics such as accuracy, precision, recall, and F1-score to assess its effectiveness in distinguishing between phishing and legitimate websites.
Both machine learning models are deployed using Flask, a lightweight web framework for Python. The Flask app exposes endpoints to make predictions using the trained models. Additionally, caching to disk is implemented to improve performance by storing results of previous predictions.
The web interface is built using HTML, CSS, and Bootstrap to provide a user-friendly experience. Users can input a URL and receive predictions on whether it is a phishing website or not.
To use the PhishShield, follow these steps:
-
Clone the repository:
git clone --depth=1 https://github.com/praneeth-katuri/PhishShield.git
-
Install the required dependencies:
Python Version:
3.12.3
pip install -r requirements.txt
-
Run the NLTK setup script:
python setup_nltk.py
-
Edit
.env
file and enter your reCAPTCHA Keys andFlask Secret Key
To generate
Flask Secret Key
run the below code in terminal and copy the Output key obtained in.env
filepython -c 'import secrets; print(secrets.token_hex(16))'
-
To start the Flask application, run the following command in your terminal:
python app.py
-
To access the webpage interface, open
http://127.0.0.1:5000
in your web browser.
The performance of the phishing detection models is evaluated using metrics such as accuracy, precision, recall, and F1-score. The results demonstrate the effectiveness of each model in distinguishing between phishing and legitimate websites.
Contributions to this project are welcome! If you have ideas for improvements or new features, feel free to open an issue or submit a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.