Skip to content

Classes and utilities for storing and quickly retrieving hashes

License

Notifications You must be signed in to change notification settings

stumpinator/hashana

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hashana is a collection of functions and classes for storing and quickly retrieving hashes.

The original intent of this project was to convert the National Software Reference Library (NSRL) Reference Data Set (RDS) into a more compact and portable format. Tools used to accomplish this task were slightly generalized to be applicable in other project.

The tools provided can convert the entire RDS (100s of GBs) into approximately 13GB containing just unique hashes and their respective file sizes. This is customizable and could be made even smaller if less information is required. Adding the raw data to a sqlite database and indexing will add another 20-25GB, but the entire data set remains <= 40GB. The indexes are optional but make querying the entire 170M+ hashes very quick and responsive.

A front end of sorts is provided that allows easy querying via json. A zeromq front end for microservices and network applications is also available if you have the zmq package installed.

Example to convert the RDS data (may take 1 hour or more depending on hardware):

rds_list = [r"C:\NSRL\RDS_2023.12.1_modern_minimal.db", r"C:\NSRL\RDS_2023.12.1_legacy_minimal.db", \
  r"C:\NSRL\RDS_2023.12.1_android_minimal.db", r"C:\NSRL\RDS_2023.12.1_ios_minimal.db"]
HashanaRDSReader.make_hashana_db(rds_list, r"C:\NSRL\hashana_23.12.1.db")

An example zmq server, linux service, and Dockerfile are in extra/linux

Links:

NSRL RDS Query

About

Classes and utilities for storing and quickly retrieving hashes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages