Skip to content

Repository for the paper "1-to-1 or 1-to-n? Investigating the effect of function inlining on binary similarity analysis"

License

Notifications You must be signed in to change notification settings

island255/TOSEM2022

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dataset Construction and labeling

This is the repository illustrate how we construct the dataset and label the dataset.

Constuction

Folder "construction" shows some scripts to compile the binaries. "construction\Dockerfile_source2binary" is a Dockerfile for compiling coreutils v8.29 using clang-10 and O0-O3 options. Run "docker build -t image_owner/image_name -f Dockerfile_source2binary ." to build an image containing the source and binary of coreutils.

Dataset will be open-sourced after the acceptance of the paper.

Labeling

Folder "ground_truth_building" contains the code to automatically label the above dataset. In detail, the code structure is listed as follows:

dir file function
IDA_pro_scripts extract_binary_range.py scripts to extract binary function boundary for IDA 7.0 and lower
extract_binary_range_75.py scripts to extract binary function boundary for IDA 7.5
extract_debug_information extract_debug_dump.py extract the line mapping from .debug_line section in binary using readelf
extract_source_information use_understand_to_extract_entity.py use understand to extract the source line-to-function mapping.
mapping binary2source_mapping.py extend the line-mapping with binary address-to-function mapping and source line-to-function mapping to function level mapping.
- binary2source_mapping_using_understand.py main function to conduct labeling for all binaries and source projects.
summary_for_inline_staticstics.py summary the metrics for all binaries.

When using the above scripts for dataset labeling, some paths need to be set. ``binary2source_mapping_using_understand.py'' contains several paths including the path of ida, the path of understand python, the path of understand tool, the path of dataset, and paths of scripts. And the running of the scripts requires the install of IDA Pro, understand, readelf and python3. The current version is implemented in Linux, but using it in windows is also feasible.

About

Repository for the paper "1-to-1 or 1-to-n? Investigating the effect of function inlining on binary similarity analysis"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages