Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix byte/string extraction and unit tests #1339

Merged
merged 5 commits into from
Mar 2, 2023
Merged

Fix byte/string extraction and unit tests #1339

merged 5 commits into from
Mar 2, 2023

Conversation

mr-tz
Copy link
Collaborator

@mr-tz mr-tz commented Mar 1, 2023

This PR fixes some unit tests that wrongly treat strings features as bytes features. It also fixes a bug in the IDA extractor that causes the old unit test to pass.

closes #1336 #1327

ref: #1338

Checklist

  • No CHANGELOG update needed
  • No new tests needed
  • No documentation update needed

@mr-tz mr-tz mentioned this pull request Mar 1, 2023
@mr-tz mr-tz changed the title Fix byte unit test Fix byte/string extraction and unit tests Mar 1, 2023
@@ -271,7 +271,7 @@ def extract_insn_bytes_features(fh: FunctionHandle, bb, ih: InsnHandle) -> Itera
if capa.features.extractors.helpers.all_zeros(buf):
continue

if f.vw.isProbablyString(v):
if f.vw.isProbablyString(v) or f.vw.detectUnicode(v):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if f.vw.isProbablyString(v) or f.vw.detectUnicode(v):
if f.vw.isProbablyString(v) or f.vw.isProbablyUnicode(v):

https://github.com/vivisect/vivisect/blob/91e8419a861f49779f18316f155311967e696836/vivisect/__init__.py#L1113

Comment on lines 649 to 654
# don't extract byte features for obvious strings
("mimikatz", "function=0x40105D", capa.features.common.Bytes("SCardControl".encode("utf-16le")), False),
("mimikatz", "function=0x40105D", capa.features.common.String("SCardControl"), True),
("mimikatz", "function=0x40105D", capa.features.common.Bytes("SCardTransmit".encode("utf-16le")), False),
("mimikatz", "function=0x40105D", capa.features.common.Bytes("ACR > ".encode("utf-16le")), False),
("mimikatz", "function=0x40105D", capa.features.common.String("ACR > "), True),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

Comment on lines +179 to +181
if vw.isProbablyString(p) or vw.isProbablyUnicode(p):
# don't deref strings that coincidentally are pointers
return
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. ACS... which would be 0x430041 (valid pointer) as seen in mimikatz at 0x493E30

Comment on lines 654 to 655
("mimikatz", "function=0x401517", capa.features.common.Bytes(binascii.unhexlify("CA3B0E000000F8AF47")), True),
("mimikatz", "function=0x404414", capa.features.common.Bytes(binascii.unhexlify("0180000040EA4700")), True),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must test True features here!

@mr-tz
Copy link
Collaborator Author

mr-tz commented Mar 1, 2023

Tests in IDA succeed.

@mr-tz mr-tz merged commit 52de09a into master Mar 2, 2023
@mr-tz mr-tz deleted the fix_byte_unit_test branch March 2, 2023 09:33
@mr-tz
Copy link
Collaborator Author

mr-tz commented Mar 2, 2023

Thanks a lot @xusheng6!

@xusheng6
Copy link
Contributor

xusheng6 commented Mar 2, 2023

Thanks a lot @xusheng6!

Glad to help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unit tests on byte extraction might have a wrong expected result
3 participants