The dataset used for this project is Malming dataset.This is a preprocessed version of the dataset containing images of the malware files. Malming dataset contains images of the following classes
Some sample images in the dataset
The idea of the paper was to extract the images of binaries of malware files and then use the patterns in the files to classify the type of malware. Such an apprach is relatively immune to techniques such as code obfuscation as well This was the model architecture used for the task
I achieved a final accuracy of 0.9610992 after training for just 10 epochs
The confusion matrix of the final model
We can observe that although most of the Malwares were well classified, Autorun.K is always mistaken for Yuner.A. This is probably because we have very few samples of Autorun.K in our dataset and that both are part of a close Worm type.
Moreover, Swizzor.gen!E is often mistaken with Swizzor.gen!l, which can be explained by the fact that they come from really close kind of families and types and thus could have similarities in their code
Finally, I believe that we could greatly improve our model’s performances by creating a larger dataset
This project was built from upon greak work by others
[1] L. Nataraj, S Karthikeyan, Grégoire Jacob, S Manjunath Malware images: visualization and automatic classification Proceedings of the 8th International Symposium on Visualization for Cyber Security