dataset

The data directory contains all the samples and the label.csv provides the labels of these samples.

The dataset.zip file is 331MB, expands to 11.2GB, and has a MD5 of 46cc2fe85037006a26a730ff0f8f9d54.

The original data at the dataset was collected from two months, April 2017 and May 2017. We run these malware at Cuckoo server and then collected their runtime logs. Then we applied the proposed feature engineering method on these logs to get this published dataset.

The summary of the dataset as the following:

	Benign	Malicious	Total
April	10160	15609	25769
May	20552	11465	32017
Total	30712	27074	57786

Each sample is stored as numpy format, you can load it by numpy.load('./data/201704_0.npy'). The shape of each sample is (LENGTH, 102), and the LENGTH is at most 1000. 102 is the dimension of each API call, please refer to our paper for more details.

feature

This is the code of a feature engineering method.

Thare are two python scripts. The Features.py containes the core code of the feature engineering method. And the main.py implemented a multi-process function to call the Features.py.

Please refer to main function at each python script about how to run the code.

model

This is the deep learning model for our proposed approach, which is built within keras platform and the requriments.txt lists the necessary dependent packages.

If you want to run the model, please unzip the dataset/dataset.zip, then you can run the model.py by using python.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

dataset

feature

model

Files

README.md

Latest commit

History

README.md

File metadata and controls

dataset

feature

model