Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit tests on byte extraction might have a wrong expected result #1336

Closed
xusheng6 opened this issue Mar 1, 2023 · 5 comments · Fixed by #1339
Closed

Unit tests on byte extraction might have a wrong expected result #1336

xusheng6 opened this issue Mar 1, 2023 · 5 comments · Fixed by #1339

Comments

@xusheng6
Copy link
Contributor

xusheng6 commented Mar 1, 2023

Reading the code at https://github.com/mandiant/capa/blob/master/tests/fixtures.py#L646-L649, I am wondering why the unit test would be expecting to extract those utf16-le strings as bytes sequence rather than strings? Because this comment says we should NOT extract obvious strings as bytes:

https://github.com/mandiant/capa/blob/master/capa/features/extractors/ida/insn.py#L175-L176

Yet, these are clearly "obvious" utf16-le strings:

Screenshot 2023-03-01 at 5 07 39 PM

I feel like:

  1. The unit test itself is problematic, and the expected test result should be False.
  2. We can add a unit test for string extraction, and then the expected result should be True.
  3. The current unit test passes for IDA extractor because there is a bug in the way extract_insn_bytes_features is written, see this issue.
  4. I am not sure why the viv extractor would behave in the same way, there might be a problem with isProbablyString, but I have not checked its doc or implementation.
@xusheng6
Copy link
Contributor Author

xusheng6 commented Mar 1, 2023

I am filing this because the binja extractor is failing on these tests, but I think its behavior is correct.

@mr-tz
Copy link
Collaborator

mr-tz commented Mar 1, 2023

Good catch. I did not update the tests as part of the change in #1298.

  1. Yes
  2. Yes
  3. Good find
  4. I'll take a look

If you want, would be great if you can submit the respective fixes (at least for 1-3).

@xusheng6
Copy link
Contributor Author

xusheng6 commented Mar 1, 2023

Sure, I will submit a PR soon.

However, it will be rejected by the CI until vivisect is fixed as well. I guess you can work on top of my upcoming PR.

@mr-tz
Copy link
Collaborator

mr-tz commented Mar 1, 2023

Thanks! I'm looking into it now.

@xusheng6
Copy link
Contributor Author

xusheng6 commented Mar 1, 2023 via email

mr-tz added a commit that referenced this issue Mar 2, 2023
* Fix wrong expected results on string and bytes tests. Fix #1336

* Fix IDA insn/byte extractor checks wrong address. Fix #1327

* fix vivisect string check and tests

---------

Co-authored-by: Xusheng <xusheng@vector35.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants