Skip to content

PhishNet is an experimental research project implementing Reinforced Self-Training (ReST) human-aligned with crafted instructions and fine-tuned models to craft a high-quality synthetic dataset of phishing emails.

License

Notifications You must be signed in to change notification settings

0x5844/PhishNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🐟 PhishNet

PhishNet Art

DISCLAIMER: The content provided by PhishNet is exclusively for educational and research purposes ONLY. The training data for our GPT-2 derived model has been carefully cleaned to remove any private or personally identifiable information (PII) to ensure ethical compliance and privacy. The views and opinions expressed are solely those of the authors and do not reflect any associated organizations. No warranty is provided regarding the accuracy or reliability of the information. Usage of PhishNet and its outputs is at your own risk, with no liability for any resultant damages. This project does not endorse illegal activities and should be used responsibly.

TL;DR

PhishNet is a research project utilizing Reinforced Self-Training (ReST) and fine-tuned GPT-2 to create a high-quality synthetic dataset of phishing emails. Trained on various valuable email datasets (see citations), this project aims to dive into the exploration of adversarial AI and expand our understanding of AI safety.

Citations

  • Radford, A., Wu, J., Child, R., et al. (2019). Language Models are Unsupervised Multitask Learners. Link
@article{radford2019language,
  title={Language Models are Unsupervised Multitask Learners},
  author={Radford, Alec and Wu, Jeff and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya},
  year={2019}
}
  • Gulcehre, C., Le Paine, T., Srinivasan, S., et al. (2023). Reinforced Self-Training (ReST) for Language Modeling. arXiv preprint arXiv:2308.08998. Link
@misc{gulcehre2023reinforced,
      title={Reinforced Self-Training (ReST) for Language Modeling}, 
      author={Caglar Gulcehre and Tom Le Paine and Srivatsan Srinivasan and Ksenia Konyushkova and Lotte Weerts and Abhishek Sharma and Aditya Siddhant and Alex Ahern and Miaosen Wang and Chenjie Gu and Wolfgang Macherey and Arnaud Doucet and Orhan Firat and Nando de Freitas},
      year={2023},
      eprint={2308.08998},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
  • The Enron Email Dataset. Carnegie Mellon University. Link.
  • The Enron Email Dataset. Kaggle. Link
  • Fraudulent Email Corpus. Kaggle. Link
  • Spam Mails Database. Kaggle. Link
  • Phishing Email Detection. Kaggle. Link
  • Customer Support Ticket Dataset. Kaggle Link
  • Spam or Not Spam Dataset. Kaggle Link

Table Of Content

About

PhishNet is an experimental research project implementing Reinforced Self-Training (ReST) human-aligned with crafted instructions and fine-tuned models to craft a high-quality synthetic dataset of phishing emails.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages