Network-Based Malware Detection using Natural Language Processing

This project illustrates a method that utilizes the ordering of network flows to classify malicious behavior. The approach is lightweight and privacy preserving while also being resilient to encrypted packet payloads.

Getting Started

Prerequisites

The project is written in python3, ensure you have the latest version of python3 and pip3 installed. The project relies on tshark for pre-processing pcap files, and p7zip to extract zip files.

On Ubuntu, these can be installed using:

sudo apt-get install tshark p7zip

Besides these, other required packages can be installed using pip3.

pip3 install -r requirements.txt --user

Directory Structure

.
+-- ml
|   +-- model.py (file with ml functions)
+-- preprocess
|   +-- process.py
|   +-- process.sh (pcap pre-processing)
|   +-- pcap-to-ngrams.py (pcap conversion to ngrams)
|   +-- f2nlib.py
|   +-- p2flib.py
+-- scripts
|   +-- run.sh (script to run tests on ComputeCanada servers)
|   +-- run_all.sh (automate test running)
+-- requirements.txt
+-- README.md
+-- LICENSE.md

Running Tests

Grab the USTC-TFC2016 DeepTraffic dataset.
Generate a ngram file using process.sh.
Run model.py with the ngram file.
Automate tests using custom bash scripts, the ones included in the repository work on ComputeCanada servers.

./process.sh [path-to-dataset] [n]

This creates a file called [n]_test.csv in the dataset folder

python3 model.py [path-to-test-csv]

This should print the results on the screen.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

Mitacs Globalink, uOttawa and MHRD, India for funding the project.
Prof. David Knox at uOttawa for project supervision.
ComputeCanada for access to their servers to run tests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Network-Based Malware Detection using Natural Language Processing

Getting Started

Prerequisites

Directory Structure

Running Tests

License

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
ml		ml
preprocess		preprocess
scripts		scripts
.gitignore		.gitignore
LICENSE.md		LICENSE.md
Ngrams.zip		Ngrams.zip
README.md		README.md
requirements.txt		requirements.txt

License

archit-p/NLP-Malware

Folders and files

Latest commit

History

Repository files navigation

Network-Based Malware Detection using Natural Language Processing

Getting Started

Prerequisites

Directory Structure

Running Tests

License

Acknowledgments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages