🐟 PhishNet

DISCLAIMER: The content provided by PhishNet is exclusively for educational and research purposes ONLY. The training data for our GPT-2 derived model has been carefully cleaned to remove any private or personally identifiable information (PII) to ensure ethical compliance and privacy. The views and opinions expressed are solely those of the authors and do not reflect any associated organizations. No warranty is provided regarding the accuracy or reliability of the information. Usage of PhishNet and its outputs is at your own risk, with no liability for any resultant damages. This project does not endorse illegal activities and should be used responsibly.

TL;DR

PhishNet is a research project utilizing Reinforced Self-Training (ReST) and fine-tuned GPT-2 to create a high-quality synthetic dataset of phishing emails. Trained on various valuable email datasets (see citations), this project aims to dive into the exploration of adversarial AI and expand our understanding of AI safety.

Citations

Radford, A., Wu, J., Child, R., et al. (2019). Language Models are Unsupervised Multitask Learners. Link

@article{radford2019language,
  title={Language Models are Unsupervised Multitask Learners},
  author={Radford, Alec and Wu, Jeff and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya},
  year={2019}
}

Gulcehre, C., Le Paine, T., Srinivasan, S., et al. (2023). Reinforced Self-Training (ReST) for Language Modeling. arXiv preprint arXiv:2308.08998. Link

@misc{gulcehre2023reinforced,
      title={Reinforced Self-Training (ReST) for Language Modeling}, 
      author={Caglar Gulcehre and Tom Le Paine and Srivatsan Srinivasan and Ksenia Konyushkova and Lotte Weerts and Abhishek Sharma and Aditya Siddhant and Alex Ahern and Miaosen Wang and Chenjie Gu and Wolfgang Macherey and Arnaud Doucet and Orhan Firat and Nando de Freitas},
      year={2023},
      eprint={2308.08998},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

The Enron Email Dataset. Carnegie Mellon University. Link.
The Enron Email Dataset. Kaggle. Link
Fraudulent Email Corpus. Kaggle. Link
Spam Mails Database. Kaggle. Link
Phishing Email Detection. Kaggle. Link
Customer Support Ticket Dataset. Kaggle Link
Spam or Not Spam Dataset. Kaggle Link

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github		.github
.vscode		.vscode
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
phishnet.py		phishnet.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐟 PhishNet

TL;DR

Citations

Table Of Content

About

Releases

Languages

License

0x5844/PhishNet

Folders and files

Latest commit

History

Repository files navigation

🐟 PhishNet

TL;DR

Citations

Table Of Content

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages