These scripts form the Proof of Concept for providing a way to visualise network traffic as a proxy measure for eResearch collaboration between Research and Education institutions.
The D3.js visualisation code is heavily based on the example Chord Diagrams from D3.js by Mike Bostok.
As it's input it takes:
- Netflow records stored in nfcapd files
- Mapping for Autonomous System from Numbers to Names
As a process it:
- Filters by research looking flows
- Aggregates traffic by source and destination Autonomous Systems
- Formats the data in a format easily used by D3js
As an output it generates:
- Chord diagrams showing the traffic relationships between institutions
- Underlying data in JSON file.
-
The data for Autonomous System (AS) Numbers and names comes from two csv files:
- autnums.csv: which is included here derived from public BGP data
- private_asnums.csv: which is not included here and in which you can your private asn data
-
The system puts some limits around how many institutions are displayed on the chord diagram. Currently that is limited to ~50 with a everything else captured in a default "All Other" bucket. Some institutions are also force included (eg: known biggest universities in Australia's Group of 8). You can edit this in nfdump_aggregation_to_research_traffic.py
-
Set the source directories for your netflow record files in the environment variable $NETFLOW_BASE_DIR (or use the default value of "./netflow" if it's not specified)
-
Install dependent libraries and apps
- nfdump command line tools (available in package format for most linux distros)
- python libraries: pandas, pathlib
$ pip install pandas pathlib
-
Run the go_all script to process all files in a month
$ ./go_all.sh 201802 my-router-a my-router-b my-router-c
- OR run the go script to process a particular day
$ ./go.sh 20180205 my-router-a my-router-b
Generating list of research related ASNs ...
Generating summary of research like flows ...
Generating Chord Diagram data files of research institution traffic ...
Completed successfully.
- Load the file 'index.html' locally in your browser. This will also load two files created by the above process: 'institutions.csv' and 'matrix.json'
- Build and run the docker image of this in a standalone website
$ ./publish.sh
- Open the URL http://my-internal-server
If you don't see a diagram when you open index.html then there is a good chance that you've run the processing with some incorrect settings and then rerun the process with correct settings but there is some intermediate cached data left over from the first run. Try cleaning the cached files.
- CSV output from nfdump aggregration is stored in the data directory
rm data/*.csv
- Enriched ASN Mapping information is cache in the file asn_data csv.
rm asn_data.csv
- Cache of which days and router already have data loaded is stored in the DB
rm db/all_traffic.sqlite
As above it. It's probably something cached somewhere. Work backwards and check in order
- contents of matrix.json & institutions.csv which comes from
- contents of aggregrate tables in db/all_traffic.sqlite which come from
- contents data/*.csv files and asn_data.csv.
This is expected. It comes from the threshold filtering that only sets a link with a value if it's beyond a certain threshold. Currently that's 100Gbytes and can be configured in nfdump_aggregration_to_research_traffic.py
- Move to a shared SQL DB.
- Move index.html to react based app with API queries to backend that allow more filtering. Let a user choose date ranges or add/remove particular institutions.
- Setup filters per router to collect from only the specific interfaces.