The goal of this document is to be a bag of important ideas, decisions, concepts and discoveries in the NameMapper
project. Currently these include:
- Why is NameMapper a separate command line tool?
- Why does NameMapper need to know about the symbol type to generate the obfuscated name proposal?
- Where are the characters to build the obfuscated name taken from?
- How can we ensure that the generated name will be unique and there will be no name collision?
- Why is there a random and deterministic name generation option available?
Among the tools that are part of the obfuscator the NameMapper
is the smallest and has the least responsibilities. It's a simple input-output command line tool that takes one json and returnes almost identical json, with just one field added to it's schema.
It's build as the separate command line tool, with all the infrastructure that comes with it, so that the logic determining what is the renaming mode and what symbols are being renamed and how is kept in one place. It makes it way easier to ensure that the particular symbol will not be renamed (it's enough to remove it from the the Renames.json
). It also makes the de-obfuscation a trivial task (it's enough to provide the Renames.json
with values from originalName
and obfuscatedName
reversed). The renaming modes can be defined and added to the command-line interface in a dedicated place, which avoids clutter in the command-line iterfaces of the other tools.
Different symbols have different sets of characters that are permitted to be used in them. So to generate the obfuscated name for operator one has to use characters that are not allowed in the function name. That's why NameMapper
requires separate symbol type.
The symbols used to generate obfuscated names are taken from the Swift language grammar reference. For example, the structure of the identifier explicitely lists what characters might be used. In the other place, the structure of the type shows that the type name is an identifier, and therefore all the characters used for identifier can be used for type name.
Similar analysis is used for all supported constructs, like function names or operators.
NameMapper
caches what names were already generated and used for renaming. It tries to generate new name in case of name collision. The process is repeated up to 100 times. If after 100 tries this there's still name collision, the NameMapper
logs error to the output and exits.
There's no name collision preventing at the time, however, that ensures that the generated name is different than some already existing original symbol name. The feature is in the future plans.
The random generation is the default one and it leads to the obfuscation in its main purpose: to render the code unreadable and more difficult to understand for the attacker that's looking for symbols in the binary file.
The deterministic name generation was created for the testing purpose as it enables to write the expected obfuscated code for the given original source file. It's done with three steps:
-
each symbol has a known prefix (dependant on its kind) added to the name instead of replacing its name with the random name (
NF
for functions,T
for types etc.), -
the strictly incremental index number is appended to the prefix, so that if there's more than one symbol of the same kind and the same name in the source code, the obfuscated name differs between them (for example,
NF1_foo
andNF2_foo
), -
the symbol extraction is done in the deterministic order from the top of the file to the bottom of the file, with files parsed in the alphabetical order (
a-zA-Z
).
These three rules are enough to know what obfuscated name would a particular symbol get. It's enough to know it's kind, name and the first occurence.