-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Merge function type and sort enumeration #177
Add Merge function type and sort enumeration #177
Conversation
@@ -938,6 +939,16 @@ | |||
<xsd:group ref="oval-def:ComponentGroup"/> | |||
</xsd:sequence> | |||
</xsd:complexType> | |||
<xsd:complexType name="MergeFunctionType"> | |||
<xsd:annotation> | |||
<xsd:documentation>The merge function takes one or more components and merges them together into a single string, with an optional delimiter. For example if data from one registry multi-size value contains values of "abc" and "def". The merge function will resolve to a local_variable with the value of 'abcdef'. If an optional delimiter of ',' was used, the merge function would resolve to a local_variable with the value of 'abc,'def'</xsd:documentation> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edited slightly:
The merge function takes one or more components and merges them together into a single string value, optionally including a delimiter string. For example, if data from one registry multi-size value contains values of "abc" and "def", the merge function operated on those values would resolve to a string with the value of 'abcdef'. If an optional delimiter of ',' was used, the merge function would resolve to a value of 'abc,'def'.
<xsd:documentation>The SortEnumeration simple type defines basic sorting operations. Currently 'none', 'alphabetical' and 'alphanumerical' are defined..</xsd:documentation> | ||
</xsd:annotation> | ||
<xsd:restriction base="xsd:string"> | ||
<xsd:enumeration value="none"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't call this "none", I would call this "document_order". After all, two interpreters using the same system-characteristics file as input should both generate identical results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't entirely sure what your "document_order" meant, as long as we explain it as the order of the items/values in OVAL system characteristics, I would agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document order means... this is an XML document. The order is the order in which the associated elements appear in the document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my brain, this all happens in memory structures, long before we end up writing it to XML, so that was my disconnect.
<xsd:documentation>No sorting is performed, the values will remain in the order they came in as.</xsd:documentation> | ||
</xsd:annotation> | ||
</xsd:enumeration> | ||
<xsd:enumeration value="alphabetical"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd call this "lexical", which I believe is more standard terminology, and typically stands in contrast with "natural" ordering.
For example, given the strings {"a1, "a2", "a12", "1", "2", "12"}, in ascending order*, lexical sorting yields:
"1", "12", "2", "a1", "a2", "a12"
And natural sorting yields:
"1", "2", "12", "a1", "a12", "a2"
- this reminds me, there is a difference between the sort type and the order (ascending vs. descending)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with that, will update
<xsd:documentation>Sort alphabetical, useful for pure string lists</xsd:documentation> | ||
</xsd:annotation> | ||
</xsd:enumeration> | ||
<xsd:enumeration value="alphanumerical"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different from "natural"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe you need a SortEnumeration (lexical, natural, numeric) and an OrderEnumeration (ascending/descending)
Where numeric would generate an error if it encounters a string value that cannot be converted to a base-10 number (float).
I wasn't sure about a numeric sort, as accidental string data could cause all kinds of errors, but an alphanumeric (or natural), would handle them both. I'm not 100% certain on the need for alphanumeric vs natural, likely just natural. I can easily add, and content authors beware. Agree on OrderEnumeration, will add. |
As for numerical vs. natural, it's just a thought. You could effectively achieve the same thing using a variable datatype, like you did. 😀 |
So for the OrderEnumeration, I'm pondering the interaction between that and "document" sort? I guess we can just document that the OrderEnumeration has no influence on the 'document' sort, as 'document' sort, isn't sorting at all? Default OrderEnumeration should be set to "ascending". |
Here's some updated sample results, I've added a recursive Registry test, a numeric sort test using variables, and I threw in a WMI57 test to show it works with records. |
OrderEnumeration values would be "ascending" and "descending". I guess in the case of a "document_order" sort value, this would be ignored. ETA: which I guess is what you're proposing, so we agree! Also yes to the default. |
Naturally, a little digression here... I noticed this item in your results:
I see two empty values, which in XML parlance means strings of zero length. I assume this happened because a REG_SZ is terminated by a pair of NULL (0x00) bytes. IIRC, Joval would interpret this as a non-valued key, and represent that by creating a single item value like this:
This is important, because empty strings are still string values that exist. |
Very astute observation in the results, and given the lack of multi-sz content in the wild we may have overlooked this. Below is an export from my registry, so just to confirm you would say that this should be a DNE on the value for ExcludeFromKnownDlls? In looking back in our code revision history, a former developer decided at least 12 years ago that we should intentionally mark this scenario as exists. Not saying it's 'right', but it was intentionally done, if misguided. Shows how frequently multi-sz keys are used in content more than anything I'd guess. Windows Registry Editor Version 5.00; [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager] |
Yes, I would. I believe we even had a discussion about this on the old OVAL mailing list. |
@solind any other thoughts on the PR? I feel like (after a lot of improvements based on your feedback, which I greatly appreciate) that we are pretty much good to go on this? |
@@ -938,6 +939,17 @@ | |||
<xsd:group ref="oval-def:ComponentGroup"/> | |||
</xsd:sequence> | |||
</xsd:complexType> | |||
<xsd:complexType name="MergeFunctionType"> | |||
<xsd:annotation> | |||
<xsd:documentation>The merge function takes one or more components and merges them together into a single string value, optionally including a delimiter string. For example, if data from one registry multi-size value contains values of "abc" and "def", the merge function operated on those values would resolve to a string with the value of 'abcdef'. If an optional delimiter of ',' was used, the merge function would resolve to a value of 'abc,'def'.</xsd:documentation> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
REG_MULTI_SZ doesn't actually mean "multi-size", it means multiple null-terminated strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I swear I know what I'm doing from time to time... probably would have been faster to have you write this spec from the start.
</xsd:enumeration> | ||
<xsd:enumeration value="natural"> | ||
<xsd:annotation> | ||
<xsd:documentation>Sort like PHP's natsort https://www.php.net/manual/en/function.natsort.php</xsd:documentation> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Natural sort order is a pretty established computing concept, with implementations in most programming languages and on most operating systems. So, I think this might be a better external URL to reference: https://en.wikipedia.org/wiki/Natural_sort_order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only comments are documentation-related. LMK if you agree with them.
Fixed documentation per your suggestions. Hoping this is now done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
Adding a merge function which joins together multiple elements into a single variable, with an optional delimiter, and optional sorting. Had to add a new sort enumeration which may need to be discussed.