Skip to content

Removing Statically Linked Code

Petr Zemek edited this page Apr 17, 2019 · 3 revisions

Introduction

Input binaries often contain statically linked code that can be removed to get better and faster decompilation results. For this reason, the decompiler comes with YARA signature files describing statically linked functions of several compilers for various architectures. However, these are mainly for older compilers used for testing purposes. If you want to remove code from newer compilers or some custom libraries, you have to create your own signature files.

Using and Creating Custom Signature Files

There are several ways to create and use custom signature files.

1. Option --static-code-archive path/to/library.a of retdec-decompiler.sh

This option creates files with YARA signatures from the given library and uses them in the decompilation process. This option can be used multiple times for as many libraries as necessary. As signatures are recreated again during every decompilation with this option, it is best for one-shot cases or for creating signature files for later use. Created files have the .yara suffix and can be found in the output directory.

2. Option --static-code-sigfile path/to/signatures.yara of retdec-decompiler.sh

This option uses already created rules by a previous decompilation with the --static-code-archive path/to/library.a option or by using the retdec-signature-from-library-creator.sh script (see below). Input files can be either text .yara or compiled .yarac files.

3. Script retdec-signature-from-library-creator.sh

This script is responsible for creation of signatures during a decompilation with the --static-code-archive option. However, default settings do not always provide the best results. You may create better signatures with a direct use of this script before a decompilation. Most interesting options are:

  1. -m|--min-pure size

This option allows you to specify minimal size of functions that can be removed. Creating signatures from only bigger functions will lead to more accurate but much bigger results as smaller functions are not removed. Using smaller functions will lead to removal of a lot of statically linked code but results may be very inaccurate. Only bytes that are not affected by relocations are counted. The default value is 16B.

  1. -i|--ignore-nops opcode

This option allows you to specify which opcode is used for function alignment in binary files. Trailing bytes with the selected opcode (usually the NOP instruction) are excluded from the minimal-size computation to avoid creating signatures of very small functions with a lot of trailing NOP instructions. Some known values are 0 for the MIPS and PowerPC architectures and 144 for the x86 and x64 architectures (MinGW compilers). This option is not used with the default settings.

  1. -l|--logfile

This option turns on logging. Log files contain partially processed functions that could not be turned to complete signatures, including a reason why this could not be done. You may use this to change the already mentioned script settings or hand-pick and manually add some of the omitted functions.

Compilation of Text .yara Signatures

To achieve faster statically linked code detection, you can use the yarac tool to compile text .yara files. To do this, you need to use the exactly same version of yarac as is currently used in the decompiler. You can find a compatible yarac executable in the project build directory (currently, it is not installed to the installation directory), or you can use our YARA fork (be sure you are using the retdec branch).

Microsoft Visual C++

Our archive with signatures shipped during the installation already contains signatures for most common MSVC libraries with support up to version 15.0 (Visual Studio 2017). You need to create your own rules only for custom libraries or libraries that are not on the following list.

List of included MSVC libraries: libcmt.lib, libcpmt1.lib, libcpmt.lib, libvcruntime.lib, msvcmrt.lib, msvcprt.lib, msvcrt.lib, msvcurt.lib, vcruntime.lib, libucrt.lib, ucrt.lib (and their debug variants).

Format of .yara Rules

Every file starts with a private rule describing the architecture, endianness, and bit-width of rules that make up the rest of the file. Every rule contains a meta section with mandatory attributes name and size, and optional attributes refs and altNames. Attribute size provides info about the original size of the function in bytes (the pattern can be shorter). Attribute refs contains a list of referenced symbols, and attribute altNames contains alternative names for the function. The strings section contains exactly one pattern with size smaller or equal to the size stated in the meta section. For a detailed description of the YARA syntax, see the official documentation.

File Example

private rule architecture {
    meta:
        bits = 32
        endianness = "little"
        architecture = "x86"
    condition:
        true
}

rule file_0_0_0 {
    meta:
        name = "_exp"
        size = 60
        refs = "0002 ___use_sse2_mathfcns 0037 __exp_pentium4"
    strings:
        $1 = { 83 3D ?? ?? ?? ?? 00 74 6E 83 EC 08 0F AE 5C 24 04 8B 44 24 04 25 80 7F 00 00 3D 80 1F 00 00
75 0F D9 3C 24 66 8B 04 24 66 83 E0 7F 66 83 F8 7F 8D 64 24 08 75 41 E9 ?? ?? ?? ?? 90 }
    condition:
        $1
}