Anomaly detection is the identification of items, (aggregated) events or observations which do not conform to expected patterns in a dataset. When applied to network data, anomaly detection allows operators to determine the normal network behavior and the identification of suspicious activities. Especially in very complex networks with a fast-changing number of devices, unsupervised machine learning approaches will be a future key step towards monitoring network traffic and optimizing network performance.
The project’s objective is to implement and evaluate unsupervised machine learning methods for intrusion detection. Therefore, the IDS2012 dataset provided by the Canadian Institute for Cybersecurity (http://www.unb.ca/cic/datasets/ids.html) should be analyzed. This realistic labeled dataset consists of network traces including full packet payloads in pcap format based on normal background traffic and attack scenarios (abnormal behavior). The provided data labels should not be used for training purposes but for measuring the performance of unsupervised machine learning methods.