-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error handling and performance tweaks #1375
Merged
ivg
merged 2 commits into
BinaryAnalysisPlatform:master
from
ivg:error-handling-performance-tweaks
Nov 29, 2021
Merged
error handling and performance tweaks #1375
ivg
merged 2 commits into
BinaryAnalysisPlatform:master
from
ivg:error-handling-performance-tweaks
Nov 29, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sorry for a little bit dirty pull request featuring several independent changes, but it is really tedious to split it into several PRs. The most important and controversial change is that we now capture conflicts when disassemble. The conflict is reported, the instruction is marked as invalid and the whole chain instructions that led to it, is rejected. The previous behavior was terminating the program with an error message, which to my taste is more correct, but is unsatisfying from the user-perspective. We might return the old behavior after the release, in the development version of bap, as most of the conflicts are programmers' errors. The motivation for capturing conflicts was introducing pattern-based instruction encoding identification, which naturally can have false positives (which we occasionally see in some ARM interworked binaries). The other changes concern how our lifters report and handle errors. Since we can now have multiple lifters each supplying semantics for a subset of the instruction set, we no longer assume that an unhandled instruction is an error (it might be handled by some other lifter). We still report errors on instructions that a lifter claim to lift but failed to lift. And since the log file is now much cleaner, I was able to detect some missing cases, mostly for thumb2 (via arm) instructions, which are now handled. Finally, this PR removes the `is-valid` promise from thumb, which wasn't necessary but was introducing a little bit of slowdown.
ivg
added a commit
to ivg/bap
that referenced
this pull request
Dec 3, 2021
In BinaryAnalysisPlatform#1375 instead of terminating the program on a knowledge base conflict during instruction lifting, we decided to treat such instruction as invalid and retract it and the whole path that led to it from the set of valid instructions. It turned out that the retraction mechanism wasn't quite complete and there were certain cases when an invalid instruction was still reachable, which triggered conflicts downstream, e.g., during the CFG reconstruction. The problem mostly arises in the interworked code, where we have to guess whether an instruction is in A32 or T32 mode using heuristics such as byte patterns, which inevitably leads to conflicts. So the first place where we have to enfore agreement is in the encoding detection. Before this change, the information provided by the knowledge base had precedence over the natural rules of encodings, i.e., that the fall or regular jump can't change the encoding, unless it is the encoding changing jump. In addition, whenever we discover a fall or a jump to an already disassembled instruction we have to check if the encodings agree and discard it if they don't. Finally, there were some missing cases, when the invalid code wasn't retracted. First of all, it was possible when a jump destination was invalid but the jump remained in the code set. And the dual problem, when a basic block entry point was canceled not all incomming destinations were canceled - only the path through which the block was discovered. The last two issues were fixed and they affect even those targets that do not use interworking, e.g., x86. Which is good, as more code is discarded as invalid and gives us better CFG.
ivg
added a commit
to ivg/bap
that referenced
this pull request
Dec 6, 2021
In BinaryAnalysisPlatform#1375 instead of terminating the program on a knowledge base conflict during instruction lifting, we decided to treat such instruction as invalid and retract it and the whole path that led to it from the set of valid instructions. It turned out that the retraction mechanism wasn't quite complete and there were certain cases when an invalid instruction was still reachable, which triggered conflicts downstream, e.g., during the CFG reconstruction. The problem mostly arises in the interworked code, where we have to guess whether an instruction is in A32 or T32 mode using heuristics such as byte patterns, which inevitably leads to conflicts. So the first place where we have to enfore agreement is in the encoding detection. Before this change, the information provided by the knowledge base had precedence over the natural rules of encodings, i.e., that the fall or regular jump can't change the encoding, unless it is the encoding changing jump. In addition, whenever we discover a fall or a jump to an already disassembled instruction we have to check if the encodings agree and discard it if they don't. Finally, there were some missing cases, when the invalid code wasn't retracted. First of all, it was possible when a jump destination was invalid but the jump remained in the code set. And the dual problem, when a basic block entry point was canceled not all incomming destinations were canceled - only the path through which the block was discovered. The last two issues were fixed and they affect even those targets that do not use interworking, e.g., x86. Which is good, as more code is discarded as invalid and gives us better CFG.
ivg
added a commit
to ivg/bap
that referenced
this pull request
Dec 6, 2021
In BinaryAnalysisPlatform#1375 instead of terminating the program on a knowledge base conflict during instruction lifting, we decided to treat such instruction as invalid and retract it and the whole path that led to it from the set of valid instructions. It turned out that the retraction mechanism wasn't quite complete and there were certain cases when an invalid instruction was still reachable, which triggered conflicts downstream, e.g., during the CFG reconstruction. The problem mostly arises in the interworked code, where we have to guess whether an instruction is in A32 or T32 mode using heuristics such as byte patterns, which inevitably leads to conflicts. So the first place where we have to enfore agreement is in the encoding detection. Before this change, the information provided by the knowledge base had precedence over the natural rules of encodings, i.e., that the fall or regular jump can't change the encoding, unless it is the encoding changing jump. In addition, whenever we discover a fall or a jump to an already disassembled instruction we have to check if the encodings agree and discard it if they don't. Finally, there were some missing cases, when the invalid code wasn't retracted. First of all, it was possible when a jump destination was invalid but the jump remained in the code set. And the dual problem, when a basic block entry point was canceled not all incomming destinations were canceled - only the path through which the block was discovered. The last two issues were fixed and they affect even those targets that do not use interworking, e.g., x86. Which is good, as more code is discarded as invalid and gives us better CFG.
ivg
added a commit
that referenced
this pull request
Dec 6, 2021
In #1375 instead of terminating the program on a knowledge base conflict during instruction lifting, we decided to treat such instruction as invalid and retract it and the whole path that led to it from the set of valid instructions. It turned out that the retraction mechanism wasn't quite complete and there were certain cases when an invalid instruction was still reachable, which triggered conflicts downstream, e.g., during the CFG reconstruction. The problem mostly arises in the interworked code, where we have to guess whether an instruction is in A32 or T32 mode using heuristics such as byte patterns, which inevitably leads to conflicts. So the first place where we have to enfore agreement is in the encoding detection. Before this change, the information provided by the knowledge base had precedence over the natural rules of encodings, i.e., that the fall or regular jump can't change the encoding, unless it is the encoding changing jump. In addition, whenever we discover a fall or a jump to an already disassembled instruction we have to check if the encodings agree and discard it if they don't. Finally, there were some missing cases, when the invalid code wasn't retracted. First of all, it was possible when a jump destination was invalid but the jump remained in the code set. And the dual problem, when a basic block entry point was canceled not all incomming destinations were canceled - only the path through which the block was discovered. The last two issues were fixed and they affect even those targets that do not use interworking, e.g., x86. Which is good, as more code is discarded as invalid and gives us better CFG.
ivg
added a commit
to ivg/opam-repository
that referenced
this pull request
Dec 8, 2021
This release brings This release brings Ghidra as the new disassembler and lifting backend, significantly improves our Thumb lifter (especially with respect to interworking), adds forward-chainging rules and context variables to the knowledge base, support for LLVM 12, a pass that flattens IR, and a new framework for pattern matching on bytes that leverages the available patterns and actions from the Ghidra project. It also contains many bug fixes and improvements, most notable performance improvements that make bap from 30 to 50 per cent faster. See below for the full list of changes. Package-wise, we split bap into three parts: `bap-core`, `bap`, and `bap-extra`. The `bap-core` metapackage contains the minimal set of core packages that is necessary to disassemble the binary, the `bap` package extends this set with various analysis, finally, `bap-extra` includes rarely used or hard to install packages, such as the symbolic executor, which is very heavy on installation, and `bap-ghidra`, which is right now in a very experimental stage and is only installable on Ubuntu 18.04, since it requires the libghidra-dev package available from ppa, ``` sudo add-apt-repository ppa:ivg/ghidra -y sudo apt-get install libghidra-dev -y sudo apt-get install libghidra-data -y ``` Changelog ========= Features -------- - BinaryAnalysisPlatform/bap#1325 adds armeb abi - BinaryAnalysisPlatform/bap#1326 adds experimental Ghidra disassembler and lifting backend - BinaryAnalysisPlatform/bap#1332 adds the flatten pass - BinaryAnalysisPlatform/bap#1341 adds context variables to the knowledge base - BinaryAnalysisPlatform/bap#1343 adds register aliases to the Core Theory - BinaryAnalysisPlatform/bap#1358 adds LLVM 12 support - BinaryAnalysisPlatform/bap#1360 extends the knowledge monad interface - BinaryAnalysisPlatform/bap#1363 adds forward-chaining rules and Primus Lisp methods - BinaryAnalysisPlatform/bap#1364 adds a generic byte pattern matcher based on Ghidra - BinaryAnalysisPlatform/bap#1365 adds support for the Thumb IT blocks - BinaryAnalysisPlatform/bap#1369 adds some missing `t2LDR.-i12` instructions to the Thumb lifter Improvements ------------ - BinaryAnalysisPlatform/bap#1336 improves the `main` function discovery heuristics - BinaryAnalysisPlatform/bap#1337 adds more Primus Lisp stubs and fixes some existing - BinaryAnalysisPlatform/bap#1342 uses context variables to store the current theory - BinaryAnalysisPlatform/bap#1344 uses the context variables to store the Primus Lisp state - BinaryAnalysisPlatform/bap#1355 tweaks symbolization and function start identification facilities - BinaryAnalysisPlatform/bap#1353 improves arm-family support - BinaryAnalysisPlatform/bap#1356 stops proposing aliases as potential subroutine names - BinaryAnalysisPlatform/bap#1361 rewrites knowledge and primus monads - BinaryAnalysisPlatform/bap#1370 tweaks Primus Lisp' method resolution to keep super methods - BinaryAnalysisPlatform/bap#1375 error handling and performance tweaks - BinaryAnalysisPlatform/bap#1378 improves reification of calls in the IR theory (part I) - BinaryAnalysisPlatform/bap#1379 improves semantics of some ITT instructions - BinaryAnalysisPlatform/bap#1380 Fixes handling of fallthroughs in IR theory Bug Fixes --------- - BinaryAnalysisPlatform/bap#1328 fixes C.ABI.Args `popn` and `align_even` operators - BinaryAnalysisPlatform/bap#1329 fixes frame layout calculation in the Primus loader - BinaryAnalysisPlatform/bap#1330 fixes the address size computation in the llvm backend - BinaryAnalysisPlatform/bap#1333 fixes and improves label handling in the IR theor - BinaryAnalysisPlatform/bap#1338 fixes core:eff theory - BinaryAnalysisPlatform/bap#1340 fixes the Node.update for graphs with unlabeled nodes - BinaryAnalysisPlatform/bap#1347 fixes a knowledge base race condition in the run plugin - BinaryAnalysisPlatform/bap#1348 fixes endianness in the raw loader - BinaryAnalysisPlatform/bap#1349 short-circuits evaluation of terms in Bap_main.init - BinaryAnalysisPlatform/bap#1350 fixes variable rewriter and some Primus Lisp symbolic functions - BinaryAnalysisPlatform/bap#1351 fixes and improves aarch64 lifter - BinaryAnalysisPlatform/bap#1352 fixes several Primus Lisp stubs - BinaryAnalysisPlatform/bap#1357 fixes some T32 instructions that are accessing to PC - BinaryAnalysisPlatform/bap#1359 fixes handling of let-bound variables in flatten pass - BinaryAnalysisPlatform/bap#1366 fixes a bug in the `cmp` semantics - BinaryAnalysisPlatform/bap#1374 fixes handling modified immediate constants in ARM T32 encoding - BinaryAnalysisPlatform/bap#1376 fixes fresh variable generation - BinaryAnalysisPlatform/bap#1377 fixes the IR theory implementation Tooling ------- - BinaryAnalysisPlatform/bap#1319 fixes the shared folder in deb packages - BinaryAnalysisPlatform/bap#1320 removes sudo from postinst and postrm actions in the deb packages - BinaryAnalysisPlatform/bap#1321 enables push flag in the publish-docker-image action - BinaryAnalysisPlatform/bap#1323 fixes the ppx_bap version in the dev-repo opam file - BinaryAnalysisPlatform/bap#1331 fixes the docker publisher, also enables manual triggering - BinaryAnalysisPlatform/bap#1327 fixes a typo in the ubuntu dockerfiles - BinaryAnalysisPlatform/bap#1345 fixes bapdoc - BinaryAnalysisPlatform/bap#1346 nightly tests are failing due to a bug upstream
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Sorry for a little bit dirty pull request featuring several independent changes, but it is really tedious to split it into several PRs.
The most important and controversial change is that we now capture conflicts when disassemble. The conflict is reported, the instruction is marked as invalid and the whole chain instructions that led to it, is rejected. The previous behavior was terminating the program with an error message, which to my taste is more correct, but is unsatisfying from the user-perspective. We might return the old behavior after the release, in the development version of bap, as most of the conflicts are programmers' errors. The motivation for capturing conflicts was introducing pattern-based instruction encoding identification, which naturally can have false positives (which we occasionally see in some ARM interworked binaries).
The other changes concern how our lifters report and handle errors. Since we can now have multiple lifters each supplying semantics for a subset of the instruction set, we no longer assume that an unhandled instruction is an error (it might be handled by some other lifter). We still report errors on instructions that a lifter claim to lift but failed to lift.
And since the log file is now much cleaner, I was able to detect some missing cases, mostly for thumb2 (via arm) instructions, which are now handled.
Finally, this PR removes the
is-valid
promise from thumb, which wasn't necessary but was introducing a little bit of slowdown.