Skip to content

Scripts used to reproduce our work on "Telemetry-based stream-learning of BGP anomalies"

Notifications You must be signed in to change notification settings

anrputina/OutlierDenStream-BigDama18

Repository files navigation

Scripts used to reproduce our work on "Telemetry-based stream-learning of BGP anomalies"

  • Configuration file
  • Reorganized dataset
  • Temporal vs Spatial Detection
  • Scripts to run DenStream

Configuration file

  • Nodes: Each node, for all the experiments, has its own dataset. You can choose a subset of nodes or all of them. Please note that the dataset of leaf4 is not present, as described in the documentation https://github.com/cisco-ie/telemetry. Default: all the nodes.
	"nodes": [
		"leaf1", "leaf2", "leaf3",
		"leaf5", "leaf6", "leaf7", "leaf8",
		"spine1", "spine2", "spine3", "spine4"
		]
  • featureModel: You can choose here the subset of features you want to use. The choices are "ControlPlane|CompleteFeatures|DataPlane". Default: CompleteFeatures
	"featureModel":"CompleteFeatures"
  • featureList: List of ControlPlane features. By default the dataset contains all the available features. If you have chosen "ControlPlane" in "featureModel" the script is going to use only the features in this field. On the other hand, if you have chosen "DataPlane" the script is going discard them and use all the remaining all. Finally, if you have chosen "CompleteFeatures" the script is going to use the dataset as it is, with all the features. IMPORTANT: please do not modify this field if you don't have complete mastery on how the features are organized in the datasets, otherwise you could use undesired features.

  • detectionCriterion: You can choose which detection criterion you want to use: "timeDetection" for temporal detection and "spatialDetection" for spatial detection.

	"detectionCriterion":"timeDetection"
  • dataset: Contains information on the dataset and the list of dataset you can use.
    • availableDataset: List of available dataset
    • list: List of datasets you want to use (if more than one). Use the same names you find in availableDataset
    • path: Location of the datasets
	"dataset":{
		"availableDataset": [
			"bgpclear_first" (dataset #2),
			"bgpclear_second" (dataset #3),
			"portflap_first" (dataset #4),
			"bgpclear_apptraffic_2hourRun" (dataset #5),
			"bgpclear_no_traffic_2hourRun" (dataset #6)
		],
		"list":[
			"bgpclear_apptraffic_2hourRun"
		],
			"path": "Data/DatasetByNodes/"
	}
  • multicoreAnalysis: Used to perform the grid optimization. If you switch ON the multicoreAnalysis, the script is going to perform the algorithm for all the parameters in range( \lambda and \beta ) and save the results in a json. important: please use only after having a look at main.py script and after understanding what it will produce. This code uses the multiprocessors, please select the number of parallel processes you want to run.
	"multicoreAnalysis":{
		"ON": "NO",
		"lambda": [0.01, 1.01, 0.01],
		"beta": [0.01, 1.01, 0.01]
	}
  • denstreamParameters: parameters to use in DenStream. The dict contains the values of lambda, epsilon, beta and mu as well as Kmax (used in detection of order K).
	"denstreamParameters": {
		"lambda": 0.15,
		"epsilon": "auto",
		"beta": 0.05,
		"mu": "auto",
		"Kmax": 5,
		"tp": 30
	}
 
  • sampleSkip: number of samples used as initial buffer on which epsilon is computed (the samples have, obviously, to be event-free and representative of the normal working condition of the system).
	"sampleSkip":39

Scripts

  • main.py: runs the DenStream algorithm (with the parameters in the configuration file), opens the groundtruth file and computes the performance indicators. The script uses all the fields in the configuration file (exit with error if missing fields) and saves the results in the folder called "Visualization". If you run the script in spyder, it produces automatically the plots. You can produce the plots running "finalPlot.py" in the folder named Visualization.
  • spatialPerformance.py: If you have chosen the "spatialDetection" in the "detectionCriterion", "main.py" is going to produce the results in a json file instead of producing the plot. After running the main in "spatialDetection" mode you have to run also the "spatialPerformance.py" in order to produce the results. The script saves the results in the folder called "Visualization". You can produce the plot running "finalPlot.py" in the folder named Visualization.
  • finalPlot.py: produces the plots with the performance indicators for increasing K. The script loads the results generated previously.
  • FeatureSelectionPlot.py: produces the performance indicators comparison of ControlPlane, CompleteFeatures and DataPlane. The script uses the results generated by the main for all the available parameters in "featureModel" and for both the detectionCriterion. You have thus to run the main 6 times (2 detectionCriterion * 3 featureModel).

About

Scripts used to reproduce our work on "Telemetry-based stream-learning of BGP anomalies"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages