-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adds symbolizer based on radare2 #1112
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome!
The only thing to discuss is do we really need aaa
for symbols? I am not an expert in radare2, but my intuition tells that we can get symbols faster.
Alternatively, we can add command-line options, that will designate the aggressiveness of analysis.
plugins/radare2/radare2_main.ml
Outdated
let accept name addr = Hashtbl.set funcs addr name in | ||
let extract name json = Yojson.Basic.Util.member name json in | ||
try | ||
let symbol_list = Yojson.Basic.Util.to_list (R2.with_command_j "aaa;aflj" file) in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't be aaa
here too much?
@XVilka, what do you think, do we need triple a for symbols only?
Also, I think it would be a good idea to parameterize it by adding command-line options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends. If the elf file provides a symbolic information then its unnecessary. But if it doesn't, or its a raw binary - "aaa" is a way to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XVilka, but will aaa
in a raw binary give the names? Symbolizer is used only for resolving names, e.g., to have malloc
instead sub_xxxx
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XVilka, so the last questions, what would be the minimal (not in terms of the number of characters, but in terms of the time of analysis) command that will give us names from radare2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you say "names" are you referring to list of symbols? if so, the list can be found with the is
command and wouldn't need analysis.
You need to check that it will indeed give you whatever you need :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isj
then, or aa;aflj
if you want basic analysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isj
then, oraa;aflj
if you want basic analysis.
Maybe a command-line option can be introduced here to make both of them available to user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe this would be the best solution, lets use isj
as the default value and let the power-users to specify whatever command they would like.
Not sure, why Travis is not calling back, but he has failed with an expected error
|
Can't understand why travis CI keeps failing, it seems like only part of the runs has failed (1 out of 3). |
Probably just a glitch of Travis, restarted the build. We will move away from Travis in the future, as it is quite unstable. |
No, I was wrong on Travis, it is actually the new analysis that just hangs the tests. Either it takes to much time in radare2 or there is a bug in the plugin so it just stalls. You can run the tests on the local machine, just do |
I think it might be because Ubuntu radare2 is a fossil (1.5.2 in Trusty that is used in |
Yes, that seems to be the problem, I've run |
It looks like that either radare2 hangs or just takes a lot of time to perform the specified analyses (more than 10 minutes). Most likely the latter. That is why I want to minimize the number of analyses that we run in radare2 as this will be now the cost that we will pay on every run of bap. |
@ivg it's not a problem of this plugin or radare2 itself - it was fixed loooooooooong time ago. It's just Ubuntu is not very fond of using anything fresh. With |
Together with z3 and r2 we are again very close to the time limit on Travis :( I am restarting jobs until they fit into the limit, the margin is very tight, about 30 seconds. |
Ok, we're still hanging on the tests :( |
I opened an issue in Ubuntu to update radare2 package, let's see if they
will act on this https://bugs.launchpad.net/launchpad/+bug/1882889
…On Thu, Jun 11, 2020, 22:45 Ivan Gotovchits ***@***.***> wrote:
Ok, we're still hanging on the tests
<https://travis-ci.org/github/BinaryAnalysisPlatform/bap/jobs/696719507#L655>
:(
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1112 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABRT7IT4M3YABPQRHDENALRWDUZTANCNFSM4NUBY46Q>
.
|
Strangely, at least the radare2 and r2 part are working for me, I will later take a look into the plugin, maybe there is some issue there,
with
and,
|
Ok, so I can't reproduce the problem on my bionic box so probably it is something with Travis, so we are blocked by #1115 (@gitoleg can you take it from here and make it your main priority, I want it to be merged by the end of the week). The only concern that is left is these |
plugins/radare2/radare2_main.ml
Outdated
List.fold symbol_list ~init:() ~f:(fun () symbol -> | ||
accept (Yojson.Basic.Util.to_string (extract "name" symbol)) (Yojson.Basic.Util.to_int (extract "vaddr" symbol) |> Z.of_int) | ||
accept (Yojson.Basic.Util.to_string (extract "name" symbol) |> strip) (Yojson.Basic.Util.to_int (extract "vaddr" symbol) |> Z.of_int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're limiting our line length to 70 lines and using ocp-indent
to indent the code. Could you please cleanup the code?
plugins/radare2/radare2_main.ml
Outdated
List.fold symbol_list ~init:() ~f:(fun () symbol -> | ||
accept (Yojson.Basic.Util.to_string (extract "name" symbol)) (Yojson.Basic.Util.to_int (extract "vaddr" symbol) |> Z.of_int) | ||
accept (Yojson.Basic.Util.to_string (extract "name" symbol) |> strip) (Yojson.Basic.Util.to_int (extract "vaddr" symbol) |> Z.of_int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, missed on the first review, but Yojson.Basic.Util.to_int
will fail on addresses that do not fit into OCaml integers. We shall not assume that addresses will fit into the host OCaml's int
data type (which could be as small as 31 bit only), neither we shall assume that they will fit into int32 or even int64 (though the latter we do assume in the LLVM backend, but mostly because it is assumed by LLVM itself). I would suggest extracting the string and then use an appropriate Z.of_string*
function to parse it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly this would not be possible, because the vaddr
here is of variant `Int (i : int) (representing a Json value holding an ocaml int) which could only be extracted by Yojson.Basic.Util.to_int, this is rather a radare2-ocaml (or even for yojson) implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a critical flaw on the radare2-ocaml
side, which should be using Yojson.Safe
instead of basic. On our side, we have the following options,
-
submit a PR to radare2-ocaml and switch it to the Safe module. This will break the interface and we will have to bump their version. But without that this bindings are barely usable for the real-world binaries. So we anyway shall do this, irrelevant to this PR.
-
Ignore the json output and switch to the text parsing. The pros is that it will be faster, the cons is that I am not sure how stable is the output format of radare2.
@XVilka can you advise us on this, is option 2 viable at all? Or maybe there are other options, like other output formats that are more reliable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, we also have an option to fix radare2-ocaml without breaking the interfaces (or at least backport the fix to the 1.x versions), we can use [Safe.to_basic], which is safe and translates Intlit x
to String x
. The cost is performance (yet another format translation), but the benefit is the preserved interfaces. The downstream code should still be updated to expect String
s in places where Int
s do not fit the OCaml integers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I a maintainer of ocaml-radare2, I can release a new version tomorrow if you send the PR. :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets wait until it's merged in the main opam-repository: ocaml/opam-repository#16661
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still waiting for the opam upstream, seems like the merged version is still not upgradable now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will land in the repository tonight, I suppose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked and radare2.0.0.5
is in the opam. Please check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! just updated the plugin to keep up with new radare2-ocaml
@Phosphorus15, here is a git workflow tip for you. When you start any work do not do it in your master branch. Instead, create a topic branch in your fork and work in it. You can submit a draft PR for CI and immediate comments (and otherwise visibility). To summarize, here are the steps that you shall do for bap (and basically any other project)
|
MacOS failures are unrelated:
|
@ivg anything we should do? I think this PR is ready. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XVilka, yep it is on my work list, will work on it on my tomorrow morning.
plugins/radare2/radare2_main.ml
Outdated
| _ -> None) | ||
| _ -> None in | ||
try | ||
let symbol_list = let open Yojson in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style note. Please, refrain from using the open statement.
- Open statements make it hard to read the code as it is non-clear which binding came from which module.
- Open statements increase the cognition burden as the reader has to remember which modules were open and in which order.
- Open statements make code fragile to non-breaking changes, i.e., adding a new binding to an opened module may hide the existing bindings and, at best, lead to compiler errors (at worse to changed semantics).
Exceptions.
-
Always open modules that were designed to be opened, e.g., Bap.Std, Core_kernel, etc. Such modules define a namespace for a set of libraries and define very few, if any, toplevel values. Usually they define only modules on the toplevel and types.
-
Open modules that define syntax and eDSL, e.g.,
Primus.Syntax
,Option.Monad_infix
orBil
, but limit yourself to one syntax/DSL per unit of cognition, as it is hard to read the piece of code that is written in more than one syntax/language. You can delimit the units of cognition by encapsulating them in modules, sometimes functions.
plugins/radare2/radare2_main.ml
Outdated
Option.(symbol_list >>| List.fold ~init:() ~f:(fun () symbol -> | ||
both | ||
(extract "name" symbol >>= to_string >>| strip) | ||
(extract "vaddr" symbol >>= to_int) | ||
|> value_map ~default:() ~f:(fun (name, addr) -> accept name addr) | ||
) |> value ~default:()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This expression is overly complicated for the amount of work it is doing. A few suggestions on improving it.
List.fold ~init:() ~f:(fun () x -> f x)
is better expressed asList.iter ~f
Option.value (f x) ~default:()
is better expressed asOption.iter ~f
Option.value_map ~default:() ~f
is better expressed asOption.iter ~f
I also suggest that instead of creating to_string
and to_int
parsing functions, which are only used once and do not really provide much abstraction, create two more specialized extract_name
and extract_addr
helpers, so that at the end the code could be written as,
List.iter symbol_list ~f:(fun s -> match extract_name s, extract_addr s with
| Some name, Some addr -> accept name addr
| _ -> debug "skipping %a" Yojson.pp s)
plugins/radare2/radare2_main.ml
Outdated
|
||
|
||
|
||
let main = Stream.observe Project.Info.file @@ provide_radare2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main
here is not a function, but a value of type unit
, therefore you're not calling main on line 90 but is calling it right here. So add a ()
parameter to turn it into a function.
plugins/radare2/radare2_main.ml
Outdated
`S "SEE ALSO"; | ||
`P "$(b,bap-plugin-objdump)(1)" | ||
]; | ||
Config.when_ready (fun {get=_} -> main) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is equivalent to Config.when_ready (fun {get=_} -> ())
, which is probably not what you meant.
plugins/radare2/radare2_main.ml
Outdated
match R2.with_command_j "isj" file with | ||
| `List list -> Some list | ||
| _ -> None in | ||
let strip str = let open String in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your helper function such as strip
, to_string
, et alas, are not really using the enclosed scope (i.e., the do not depend on the file), so it is better to move them away and define them on the toplevel. This will make the provide_radare2
function easier to read and maintain, since now it will do only the work that depends on the file.
plugins/radare2/radare2_main.ml
Outdated
Bitvec.to_bigint (Word.to_bitvec addr) in | ||
Symbolizer.provide agent symbolizer; | ||
provide_roots funcs | ||
with _ -> warning "failed to use radare2";() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style note: limit the scope try/catch
statements to the scope of the expressions that you know can fail and never hush bugs and other programmer errors with try/catch
statements. Also, deal with errors as soon as possible, to not let them propagate to far away from the point of origin.
In BAP and Core we don't write functions that casually throw exceptions and rely on explicit algebraic data types to denote functions that are not partial. We keep exceptions for programmers errors and never hush them, so that if we introduce a bug it is immediately caught by our CI system with a readable backtrace.
In this case the only function that can fail with a runtime error is R2.with_command
so we need to handle it correctly, we can use the ability to match on exceptions of the match
construct, e.g.,
match R2.with_command_j "isj" file with
| `List list -> list
| s -> warning "unexpected output: %a" Yojson.pp s
| exception Invalid_argument msg -> warning "failed to get the symbols: %s" msg
Note, that we removed an unnecessary wrapping of the list into the option type, which will simplify the logic downstream, as well as we let all exceptions other than Invalid_argument
to propagate further so that we do not hash unexpected errors and accumulate problems. We're catching Invalid_argument
as, although it is undocumented, it is the exception that R2
is using to indicate runtime errors. A yet another argument for using algebraic data types instead of exceptions to report erroneous conditions.
Also, f x; ()
is the same as f x
.
Thanks for all the code-style and refactor advises @ivg ! I've done a refactor on all the related codes and used |
Nope, none is needed, we use the default config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, it now looks very close to ideal. Only two fixes are required
- Do not use local opens for accessing Json polymorphic constructors;
- Either do not use ppx_pipebang or add it to the build dependencies.
And then we are good to go!
plugins/radare2/radare2_main.ml
Outdated
|
||
let extract_name (json : Yojson.t) = | ||
match json with | ||
| Yojson.(`Assoc list) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here the local open superfluous as polymorphic variants (unlike the regular variants) all are accessible from the toplevel scope, i.e., they do not belong to any module.
plugins/radare2/radare2_main.ml
Outdated
| Yojson.(`Assoc list) -> | ||
(match List.find list ~f:(fun (key, _) -> String.equal key "vaddr") with | ||
| Some (_, v) -> (match v with | ||
| Yojson.(`Int i) -> Z.of_int i |> Some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x |> Some
is not a valid OCaml, unlike Haskell, constructors are not functions in OCaml and the only valid way to apply the constructor Some
to an argument is Some x
.
The reason why the code works for you is that you, accidentally, using the ppx_pipebang
syntax extension that on the source level substitutes x |> f
with f x
.
So, if you find x |> Some
appealing, you can just add ppx_pipebang
to the BuildDepends
field1. You can also use Some (Z.of_int i)
or Z.of_int |> Option.some
1) The OASIS OCamlbuild backend is adding the transitive closure of all dependencies to the built target, which also brings this ppx_pipebang dependency and God knows which other dependencies which might overwrite your code. Internally we do not use OASIS/OCamlbuild but OASIS/Omake which is more strict and includes only the specified dependencies and is 10 to 20 times faster (that's the main reason). So for me this code doesn't compile (well unless you add the ppx_pipebang extension). Basically, it means that we need to add yet another workflow that will check that the code is buildable with omake.
plugins/radare2/radare2_main.ml
Outdated
| Yojson.(`List list) -> Some list | ||
| s -> warning "unexpected radare2 output: %a" Yojson.pp s; None | ||
| exception _ -> warning "failed to get symbols - radare2 command failed"; None in | ||
Option.iter symbol_list ~f:(List.iter ~f:(fun s -> match extract_name s, extract_addr s with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line is too long, let's stick to 80 characters max, see also this comment for the detailed guideline.
plugins/radare2/radare2_main.ml
Outdated
|
||
let extract_addr (json : Yojson.t) = | ||
match json with | ||
| Yojson.(`Assoc list) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, there is no need to use the local open construct, Module.(expr)
, here even if the constructor would be a regular one, e.g., you can write Yojson.Assoc list
and it will work (if we will imagine that Yojson uses regular variants instead of polymorphic). But writing Yojson.(Assoc list)
is a little bit different, if Yojson
will define a function named list
this expression will be Yojson.Assoc Yojson.list
, which will be very different from what you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome! Thanks!
* adds radare2 symbolizer * Fixes errors within oasis file * fixes silly mistakes * use 'isj' to dump symbols * chop 'sym.imp' prefix * code style clean-up * refactor for new radare2-ocaml interface * lift symbol extracting to allow partial failure * clean-up refactor & indentation fix * polymorphic variant & pipe operator fix Co-authored-by: Ivan Gotovchits <ivg@ieee.org>
BAP contains a symbolizer based on
objdump
output, this PR introduces a symbol provider based on radare2 engine, using a safer ocaml-radare2 bindings (than theobjdump
one).