Anomaly Detection on Enron Dataset

In this project, we aim to build and train models based on machine learning algorithms commonly used for unsupervised anomaly detection; namely

one-class Support Vector Machine (SVM)
Isolation Forest
Local Outlier Factor (LOF)

The dataset used is a modified version of the Enron financial + email dataset that contains information about Enron Corporation, an energy, commodities, and services company that infamously went bankrupt in December 2001 as a result of fraudulent business practices. The dataset we have obtained is from the Udacity Data Analyst Nanodegree and their GitHub page. Inspiration for loading and preprocessing the dataset was taken from Will Koehrsen's Medium article.

The Enron dataset is widely used to try and develop models that can identify the persons of interests (POIs), i.e. individuals who were eventually tried for fraud or criminal activity in the Enron investigation, from the features within the data. The email + financial data contains the emails themselves, metadata about the emails such as number received by and sent from each individual, and financial information including salary and stock options.

NOTE:

All references are presented in the form of appropriate hyperlinks within the paragraphs both here and in the notebook rather than in a separate section.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
LICENSE		LICENSE
README.md		README.md
anomaly_detection.ipynb		anomaly_detection.ipynb
enron_data.pkl		enron_data.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Anomaly Detection on Enron Dataset

NOTE:

About

Releases

Packages

Languages

License

najeebuddinm98/enron_anomaly_detection

Folders and files

Latest commit

History

Repository files navigation

Anomaly Detection on Enron Dataset

NOTE:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages