Localize all registers #652

PeterMatula · 2019-09-19T10:50:22Z

The current state:

Our binary to LLVM IR decoding represents registers as global variables.
Our low-level analyses make heavy use of Reaching Definition Analysis (RDA), which halts the [register] tracking at function starts - i.e. it is not inter-procedural, and therefore all analyses using it are not inter-procedural as well.
LLVM IR analyses are very strict - they do not make simplified assumptions, and if they are not able to prove optimization correct, they do not do it. Some of them are inter-procedural, and therefore very complex and expensive.
Some of our high-level analyses are inter-procedural (e.g. -global-to-local, -dead-global-assign), and therefore very complex and expensive.
Backend (llvmir2hll) is also strict and take inter-procedural relations into account.

All of this have the following consequences:

Many analyses are very complex, expensive, and not even necessary correct (it is very error-prone).
A lot of clutter in the resulting decompilation.

Proposal:

Transform all registers to local variables at some point (i.e. localize them).
- Do not translate (binary to LLVM IR) them like local variables, it would make translation less general.
- The cleanest solution would probably be to localize them right after the decoding, so that all analyses (ours and LLVM's) work on the same register representation. This would however require modifications to all of our analyses, so don't do it right away.
- Do the localization after our low-level passes, and before LLVM passes. LLVM does not care about the nature of our registers, and therefore no modifications are needed.
- Reduce the number of our high-level passes - some will become obsolete after localization, others can be moved.

Pros:

Cleaner and more compact decompiled code.
Less complex analyses.
Less expensive (i.e. faster) analyses.
This will uncover some other RetDec problems -> more issues.

Cons:

Loss of info needed for inter-procedural register analysis - probably not really needed - see Hex-Rays below.
This will uncover some other RetDec problems -> more issues.

Hex-Rays experiments:

Experiments with Hex-Rays decompiler showed that they probably do a version of this and don't care about possible loss of inter-procedural relations on registers.
Example:
- Original code:
```
int g1, g2;
void f1() {
   g1 = rand();
   g2 = rand(); // -> ecx = rand();
}
void f2() {
   printf("%d\n", g1);
   printf("%d\n", g1); // -> printf("%d\n", ecx);
}
int main() {
   f1();
   f2();
}
```
- On ASM level I changed instruction to write to ecx instead of g2 in f1(), and read from ecx instead of g2 in f2().
- Even though an inter-procedural (like RetDec is doing currently doing) analysis would find out that ecx = rand(); in f1() is used in a subsequently called function f2() and therefore should not be removed, Hex-Rays ignores this and throws the assignment away. It will use an uninitialized value representing ecx in f2().
- Decompilation of modified binary:
```
int g1;
void f1() {
   g1 = rand();
   // missing (optimized-out) ecx = rand();
}
void f2() {
   int v1; // ecx
   printf("%d\n", g1);
   printf("%d\n", v1);
}
int main() {
   f1();
   f2();
}
```
- This happens in for selective decompilation (functio-by-function) and full decompilation (Produce file -> Create C file...).

P.S.
Thanks to discrete LLVM passes system used in RetDec, the whole localization will be implemented as a single, independent, pass. By default, it will be enabled, but it will be no problem to disable it on demand if needed/wanted.

The text was updated successfully, but these errors were encountered:

The new results are not worse than before, there are good reasons these look like they do now.

PeterMatula added enhancement high-priority C-bin2llvmir labels Sep 19, 2019

PeterMatula added this to the RetDec v4 milestone Sep 19, 2019

PeterMatula self-assigned this Sep 19, 2019

PeterMatula added a commit to avast/retdec-regression-tests that referenced this issue Sep 19, 2019

bugs: cosmetic changes in dests due to avast/retdec/issues/652

4bd5195

PeterMatula added a commit to avast/retdec-regression-tests that referenced this issue Sep 20, 2019

tools/isaplugin: test modifications caused by avast/retdec#652

dbeec21

The new results are not worse than before, there are good reasons these look like they do now.

PeterMatula added a commit that referenced this issue Sep 24, 2019

CHANGELOG.md: fix #652, add entry for #652.

74e8094

PeterMatula mentioned this issue Sep 24, 2019

Reg localization #661

Merged

PeterMatula closed this as completed in 5659ad0 Sep 24, 2019

PeterMatula mentioned this issue Sep 24, 2019

Lot of noise in output #389

Open

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Localize all registers #652

Localize all registers #652

PeterMatula commented Sep 19, 2019

Localize all registers #652

Localize all registers #652

Comments

PeterMatula commented Sep 19, 2019