-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security Solution] Adds diff algorithm and unit tests for multi-line string fields #188022
Conversation
Pinging @elastic/security-detection-rule-management (Team:Detection Rule Management) |
Pinging @elastic/security-detections-response (Team:Detections and Resp) |
Pinging @elastic/security-solution (Team: SecuritySolution) |
merged_version: mockVersions.target_version, | ||
diff_outcome: ThreeWayDiffOutcome.StockValueCanUpdate, | ||
merge_outcome: ThreeWayMergeOutcome.Target, | ||
has_conflict: false, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was our ultimate decision on such cases? Are we going to mark these potential ABC situations as SOLVABLE_CONFLICT
later to let users explicitly accept our merge proposal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is what I'm currently working on: #187770
} | ||
: { | ||
mergeOutcome: ThreeWayMergeOutcome.Merged, | ||
mergedVersion: mergedVersion.result.join(' '), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's a legitimate concern, but it looks like we are substituting all whitespace chars (except newlines) with spaces when we do .join(' ')
. I wonder if it can mess up formatting if someone had tab chars in property values.
I think you can't insert a tab or another weird whitespace char when editing via our web UI, but maybe you can if you edit it an external editor.
I played with it a little and tried to come up with a regexp that won't consume whitespace chars so that we won't have to "restore" them later. This is what I came up with:
stringSeparator: /(?<=\r?\n)|[^\S\r\n]+/g
It does split the string correctly. For example this string
First line.\nSecond line.
gets split as
['First', 'line.\n', 'Second', 'line.']
BUT for some reason merge
still returns a conflict when I try to run it with
baseVersion: "First line.\nSecond line."
currentVersion: "First row.\nSecond line."
targetVersion: "First line.\nThird line."
I expected it merge without conflict and return
"First row.\nThird line."
@dplumlee Any idea why might that be? Am I misunderstanding how it works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fair question, I was thinking about it when initially toying with the regex. I think the only true way around it would be to make one that gets split with all the whitespace still in tact (e.g. something like ['First ', 'line.\n', 'Second<space character>', 'line.'])
so we could .join
with no argument and just merge the strings back together with their appending whitespace attached at the hip. None of these fields will be impactful to rule runs themselves so I'm also unsure of it's a big concern but I suppose it has the potential to mess with markdown stuff made via third party (although by default markdown only uses spaces so far as I can tell).
For that particular example, I've been trying to get it to work for a bit - might be the case that the package is just not sophisticated enough to handle it. Or we need better regex as described above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only true way around it would be to make one that gets split with all the whitespace still in tact (e.g. something like ['First ', 'line.\n', 'Second', 'line.']) so we could .join with no argument and just merge the strings back together with their appending whitespace attached at the hip.
Played with it a bit more and think I came up with a cleaner regexp that does exactly this: /(?<=\s)(?=\S)/
Still the returns a computated merged version without a conflict if 3 way merge is possible
test doesn't pass. Because merge
returns a conflict when I expect it to merge cleanly. Not sure why.
{
conflict: true,
result: [
'First ',
'<<<<<<<',
'row.\n', 'Second ',
'=======',
'line.\n', 'Third ',
'>>>>>>>',
'line.'
]
}
Do you have an idea why might that not work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please disregard most of my above comment. The /(?<=\s)(?=\S)/
regexp actually does work and tests do pass with it! I modified the test implementation, that's why I saw the error 🤦♂️. If you use /(?<=\s)(?=\S)/
as stringSeparator
and then do .join('')
it should work fine. And this way it won't cut away non-space chars.
Also I think I found the limitation of the diffing lib. It can't merge without a conflict if changed sections are adjacent.
For example, this merges cleanly:
{
base_version: `Line\nAND\nSquare`,
current_version: `Triangle\nAND\nSquare`,
target_version: `Line\nAND\nCircle`,
}
// result: `Triangle\nAND\nCircle`
While this results in a conflict:
{
base_version: `Line\nSquare`,
current_version: `Triangle\nSquare`,
target_version: `Line\nCircle`,
};
// result: conflict
But there's nothing we can do about it. Please try the new regexp approach and tell me what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @dplumlee and @nikitaindik
I've been doing some additional testing and came up with: stringSeparator: /(\S+|\s+)/g,
plus using .join('')
.
This allowed the algorithm to both:
- keep all whitespace characters
- join adjacent sections
Both of the next two examples pass:
1.
const mockVersions: ThreeVersionsOf<string> = {
base_version: `My description.\f\nThis is a second\u2001 line.\f\nThis is a third line.`,
current_version: `My GREAT description.\f\nThis is a second\u2001 line.\f\nThis is a third line.`,
target_version: `My description.\f\nThis is a second\u2001 line.\f\nThis is a GREAT line.`,
};
const expectedMergedVersion = `My GREAT description.\f\nThis is a second\u2001 line.\f\nThis is a GREAT line.`;
2.
const mockVersions: ThreeVersionsOf<string> = {
base_version: `Line\nSquare`,
current_version: `Triangle\nSquare`,
target_version: `Line\nCircle`,
};
const expectedMergedVersion = `Triangle\nCircle`;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dplumlee! I have reviewed. Left a comment about the regexp. Please take a look when you can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for implementing this Davis.
Left a proposal on how to solve the issue with ABC scenarios merges, check it out.
Also, let's align on the work #187770. This multi-line string fields algo is the first one that should have as a possible responses both SOLVABLE
and NON_SOLVABLE
conflicts, so I'd like to introduce that in my PR as well. So we can merge this one and introduce those changes on top of this.
…#188323) ## Summary Related ticket: #180159 Adds test plan for diff algorithm for arrays of scalar values implemented in #188022 ### For maintainers - [ ] This was checked for breaking API changes and was [labeled appropriately](https://www.elastic.co/guide/en/kibana/master/contributing.html#kibana-release-notes-process) --------- Co-authored-by: Georgii Gorbachev <georgii.gorbachev@elastic.co> Co-authored-by: Juan Pablo Djeredjian <jpdjeredjian@gmail.com>
Summary
Related ticket: #180159
Adds diff algorithm for multi-line string fields and unit tests to cover all use cases. Also adds the
node-diff3
package to utilize in the diffing logic to both determine if conflicts exist and merge the 3 rule versions together to create the returned merged version.Checklist
Delete any items that are not applicable to this PR.
For maintainers