Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimizes reconstructor, symtab, and brancher performance #855

Merged
merged 3 commits into from
Aug 14, 2018

Conversation

gitoleg
Copy link
Contributor

@gitoleg gitoleg commented Aug 9, 2018

This PR rectifies reconstructor, symtab and brancher with a respect to performance, without addition of a new behavior or breaking of an existing one.

Changes:

reconstructor

Rewrote it and reduced the number of iterations over nodes/edges of cfg.
The main point is how we build the filtered cfg. Previously, we did like the following: take all nodes, add them ony-by-one to an empty cfg. So it was an iteration over all nodes of cfg. Then iterate over all edges, and add each of them if their's destination block is not root.
Also, we did one more iteration over all nodes in find_calls function.

So what I'm proposing here is just to iterate once over the whole cfg to find all roots. and then build the filtered cfg by iterating over the roots and removing their input edges.

symtab

There was a bit inefficient implementation of add_symbol function, so every addition of a symbol called filter on the whole table, although there were enough info to reduce such calls.
Concerning details. Every time we add symbol to the symtab we also remember it's name and address of an entry block. Also we filter memmap from previously added entries,
that have same name and entry block. But, as we remembered the address of the
entry block and the name of the symbol, we can check, if memmap needed to be
filtered without event touching it!

bracnher

Just a small fix that checks if an instruction has jumps at all before the subsequent call of fold_consts that could be heavy for some of the instructions

This PR rectifies reconstructor, symtab and brancher with a respect to
performance, without adding new behaviour or breaking of existed one.

Rewrote it and reduced a number of iterations over nodes/edges of cfg

There was a bit inefficient implementation of `add_symbol` function,
so every addition of a symbol led to a filter of the whole table,
although there are enough info to reduce such calls.

Just a small fix that check if an instruction has jumps at all before
subsequent call of `fold_consts` that could be heavy for some of
instructions
@gitoleg gitoleg requested a review from ivg August 9, 2018 19:01
else init

let find_calls cfg roots =
let roots = List.fold ~init:Addr.Set.empty ~f:Set.add roots in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is Addr.Set.of_list for this.

let addr = Block.addr blk in
let term = Block.terminator blk in
let init =
if Set.mem roots addr || Seq.is_empty (Cfg.Node.inputs blk cfg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a block address is already in the set of roots then there is no need to add to the set of roots.
also, to count the indegree of a block use the specialized function Cfg.Node.degree ~dir:In , thus this function
could be defined as simple as

let call_destinations cfg blk = 
   if not Insn.(is call) (Block.terminator blk)  then []
   else Seq.to_list_rev @@ Seq.filter_map (Cfg.Node.outputs blk cfg) ~f:(fun e -> 
           Option.some_if (Cfg.Edge.label e <> `Fall) (Cfg.Edge.dst e))

let roots_of_blk cfg blk = Addr.Set.of_list @@ List.concat [
   if Cfg.Node.degree ~dir:`In = 0 then Block.addr blk else [];
   call_destinations cfg blk
] 


let find_calls cfg roots =
let roots = List.fold ~init:Addr.Set.empty ~f:Set.add roots in
Graphlib.depth_first_search (module Cfg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need DFS for that as you do not need any specific ordering. Just iterate over all nodes. Also, you don't need roots, to check that root is root and add it to roots. This function is simply:

let find_calls cfg roots = Cfg.nodes cfg |> Seq.fold ~init:roots ~f:(fun roots n -> 
   Set.union roots (roots_of_node n))

~start_tree:(fun n t ->
if Block.equal n entry
then Cfg.Node.insert n t
else return t)) in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the main difference here is that instead of checking each destination for is it a call or not, we just iterate over all calls and ask for incoming edges of a call. That makes sense, and this is indeed faster, provided that majority of calls are reachable from the whole program CFG.

let add_symbol t (name,entry,cfg) : t =
let data = name,entry,cfg in
let t = remove t data in
let t = filter t data in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please justify why we can use filter instead of remove here? It's not quite obvious to me.

}

let filter t ((name,entry,_ ) as fn) =
Copy link
Member

@ivg ivg Aug 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so from the IRL discussion, the remove function is inefficient since it filters the memory mapping even if the symbol is not in the symtab, the fact that could be checked very fast with the Set membership operation.

However, this is the problem of the remove function, that should be fixed on its side. So, please remove the filter function and add the corresponding check to the remove function (in fact a single test for inclusion in addrs would be sufficient.

@ivg ivg changed the title rectifies reconstructor, symtab and brancher optimizes reconstructor, symtab, and brancher performace Aug 10, 2018
Copy link
Member

@ivg ivg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of bad naming choices, but otherwise good to go.
also, you might consider using a hash table instead of set for is_call if it gives a significant boost.

let w = Block.addr (Cfg.Edge.dst e) in
Hashtbl.set starts ~key:w ~data:(name w)));
starts
let callees_of_block cfg roots blk =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bad choice of name, since you're adding the block itself to the resulting set it is no longer a callees_of_block. Suggestion, use entries_of_block since you're effectively collecting the entry nodes.

Also, instead of doing lots of set unions you can pass the set along and use Set.add. It is more efficient.

let update_callees cfg roots callees blk =
Set.union callees (callees_of_block cfg roots blk)

let find_callees cfg roots =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to collect_entries

let callees = find_callees cfg roots in
let is_call e = Set.mem callees (Cfg.Edge.dst e) in
let rec traverse fng node =
let fng = Cfg.Node.insert node fng in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, don't use abbreviations that are not common, suggestions,

graph or g or prog for the whole program graph
cfg for the CFG.

Also, not edg, either edge or e.

Symtab.add_symbol syms (name,entry,cfg))
let callees = find_callees cfg roots in
let is_call e = Set.mem callees (Cfg.Edge.dst e) in
let rec traverse fng node =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not traverse as this function builds a new graph. The better name reachable or add_reachable or just add.

if is_call edg then fng
else
let dst = Cfg.Edge.dst edg in
let visited = Cfg.Node.mem dst fng in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the following is easier to read

if is_call edge then cfg
else 
  let cfg' = Cfg.Edge.insert edge cfg in
  if Cfg.Node.mem (Cfg.Edge.dst edge) cfg then cfg'
  else add cfg' node

@ivg ivg changed the title optimizes reconstructor, symtab, and brancher performace optimizes reconstructor, symtab, and brancher performance Aug 13, 2018
@ivg ivg assigned gitoleg and unassigned ivg Aug 13, 2018
@gitoleg
Copy link
Contributor Author

gitoleg commented Aug 13, 2018

so, what about time - the new reconstructor on my machine with our favorite dropbear works 25 seconds less than the old one.

@gitoleg gitoleg assigned ivg and unassigned gitoleg Aug 13, 2018
@ivg
Copy link
Member

ivg commented Aug 14, 2018

relative timing would be much more interesting. 25 seconds out of 3 hours is much different from 25 seconds out of 30 seconds.

@ivg ivg merged commit 06b516e into BinaryAnalysisPlatform:master Aug 14, 2018
@gitoleg gitoleg deleted the few-speedups branch October 10, 2018 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants