Skip to content

Latest commit

 

History

History
59 lines (43 loc) · 3.39 KB

README.md

File metadata and controls

59 lines (43 loc) · 3.39 KB

Open-source datasets for anyone interested in working with network anomaly based machine learning, data science and research

Objective

Our immediate goal is to share real-world datasets and documentation that are instrumental to develop, test and compare anomaly detection algorithms based on machine learning (both supervised or unsupervised).

Our longer term goal is to systematically extend this collection with more complex datasets, event occurrences, which drives towards more real-life situations and helps the community move towards a greater capability for automation, remediation, and behavior pattern recognition.

Related repositories

The datasets released in this website are also instrumental to reproduce results that are published in [ACM SIGCOMM BigDama'18] and that are demonstrated at [IEEE INFOCOM'18] (see the Reference section below)

This repository only contains the dataset, whereas related repositories contain

Usage

Each datasets include the following:

  • .csv Dataset
  • Header Definition File: Provides a definition of each header
  • Case File: Information reflecting the events, time of the events, and device(s) where event triggers are initiated

Folders & Files

  • /topology_description_docs - Information regarding the topology, all connections, cdp neighbors, and device types

    • telemetry_topology_maps.pdf
      • Slide 1: Logical topology map with links colored based on the numbe of ECMP links and speed
      • Slide 2: Actual connected topology
      • Slide 3: Device types in position
    • CDP_ground_truth.pdf: Device connections for the network under test
  • # Traffic load No. Anomalies Duration Description
    0 0 0 1h Baseline (no amolies)
    1 500Gbps 0 1h Baseline (no anomalies)
    2 1Tbps 11 1h BGP Clear
    3 1Tbps 8 0.55h BGP Clear
    4 1Tbps 5 0.72h Port Flap
    5 1Tbps 12 2h BGP Clear
    6 0 12 2h BGP Clear
    7 0 130 72h (VIRL) BGP Clear
    8 0 238 262h (VIRL) BGP Clear
    9 2.9Tbps 5 .75h Port Admin Shut
    10 2Tbps 5 .55h Port Transceiver Pull and Reinsert

References

[ACM SIGCOMM BigDama'18] Putina, Andrian and Rossi, Dario and Bifet, Albert and Barth, Steven and Pletcher, Drew and Precup, Cristina and Nivaggioli, Patrice, Telemetry-based stream-learning of BGP anomalies ACM SIGCOMM Workshop on Big Data Analytics and Machine Learning for Data Communication Networks (Big-DAMA’18) aug. 2018

[IEEE INFOCOM'18] Putina, Andrian and Rossi, Dario and Bifet, Albert and Barth, Steven and Pletcher, Drew and Precup, Cristina and Nivaggioli, Patrice, Unsupervised real-time detection of BGP anomalies leveraging high-rate and fine-grained telemetry data IEEE INFOCOM, Demo Session apr. 2018,

License

Community Data License Agreement - Permissive 1.0 © Cisco Innovation Edge