-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix string length >= 4 and remove bytes/string overlaps #1298
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -172,7 +172,9 @@ def extract_insn_bytes_features(fh: FunctionHandle, bbh: BBHandle, ih: InsnHandl | |
if ref != insn.ea: | ||
extracted_bytes = capa.features.extractors.ida.helpers.read_bytes_at(ref, MAX_BYTES_FEATURE_SIZE) | ||
if extracted_bytes and not capa.features.extractors.helpers.all_zeros(extracted_bytes): | ||
yield Bytes(extracted_bytes), ih.address | ||
if not capa.features.extractors.ida.helpers.find_string_at(insn.ea): | ||
# don't extract byte features for obvious strings | ||
yield Bytes(extracted_bytes), ih.address | ||
Comment on lines
-175
to
+177
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. im not necessarily convinced by this change in logic. for example, consider a file format (like capa) that has a magic header/signature of ASCII text, like is this really a problem? maybe not, but i do want to raise the issue. perhaps we do merge this change, but be prepared to revert if we come up with a real exampe. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That would be an example to leave as is. My argument here is that we extract many superfluous features here for all references strings. If we find an example/use case we can also fine tune the extraction logic. E.g. is it a reference to a null-terminated string that's followed by another string in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i dont think its worthwhile to make any further changes here (or revert) unless we can come up with some specific examples. just explaining a possibility. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. gotcha, happy to revisit this if the need arises |
||
|
||
|
||
def extract_insn_string_features( | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make this configurable and consistent across backends?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal of this PR was to make this consistent for viv, IDA, and dotnet. Did I miss anything?
Making it configurable, would be a neat enhancement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i meant that we might have a single place where "4" is defined and then passed into the string extractors (consistently). i dont think its important to hit in this PR, but something nice to do sometime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, sounds good!