Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parse method header and sections #39

Open
malwarefrank opened this issue Jan 9, 2022 · 7 comments
Open

parse method header and sections #39

malwarefrank opened this issue Jan 9, 2022 · 7 comments
Milestone

Comments

@malwarefrank
Copy link
Owner

Parse the Method data (pointed to by RVA, see mdtable.MethodDefRow), as much as is needed to perform data-agnostic computation over the bytecode (cryptographic and fuzzy hashes, entropy, value distributions, etc).

See ECMA-335 6th Edition, Section II.25.4 Common Intermediate Language physical layout

@malwarefrank malwarefrank added this to the 1.0 milestone Jan 9, 2022
@mike-hunhoff
Copy link
Contributor

@malwarefrank I've been working on a Python library that parses method body sections and CIL instructions using RVAs recovered by dnfile. Is there any interest in adding this level of method body parsing directly to dnfile? You mention parsing the sections but not the CIL instructions.

@malwarefrank
Copy link
Owner Author

sorry for the delayed response. I would love to see what you are working on. I am uncertain whether bytecode disassembly should be separate from dnfile or a part of it.

I would not want to succumb to scope creep too much before milestone 1.0, but thinking through some of those details may help to inform the API before I lock out breaking changes

@mike-hunhoff
Copy link
Contributor

Apologies for the delayed response. I've released the work I've been doing on CIL disassembly here: https://github.com/mandiant/dncil. The library supports parsing method body headers, instructions, and exception handlers. There is an example of using dnfile and dncil together here: https://github.com/mandiant/dncil/blob/main/scripts/print_cil_from_dn_file.py.

One option to consider is dnfile including dncil as a dependency to keep core functionalities isolated and easier to maintain. dnfile could leverage dncil to parse method bodies and we could make disassembly a configurable option. I think dnfile including dncil as a dependency makes the most sense as both projects have progressed.

I'd be happy to contribute code to make dnfile and dncil work together but understand if this is outside the scope of your vision for dnfile.

@malwarefrank
Copy link
Owner Author

Thanks. I started looking at the dncli code and realized that there have been some helpful changes since dnfile v0.8.0, including a user_strings shortcut. I tagged master and pushed a new version to pypi.

I will look at the dncli code more and think through how to best integrate.

@malwarefrank
Copy link
Owner Author

I like that dncil focuses on parsing the method bodies. I still want to parse the MetadataTable MethodDef rows into a list of objects in dnfile and make accessible via a shortcut, maybe something like

import dnfile
pe = dnfile.dnPE("filename.exe")
if pe and pe.net:
    for method in pe.net.methods:
        # do something

I think I can do that without replicating any of the dncil method body parsing. I should be able to use or cherrypick from @williballenthin work on #37 for method signature and param signature parsing for some if not most of it. Then I can just import and call dncil for the method body parsing.

@mike-hunhoff
Copy link
Contributor

This sounds great. Please reach out if you have any questions or issues w/ dncil.

@malwarefrank
Copy link
Owner Author

Parsing methods is complicated! </rant>

The MethodDef row defines a parameter list, but then the associated method signature also defines parameters. I am working on this in the methods branch, but suffice to say it will be a while longer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants