-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes for new 7.10 rsa2elk datasets #21240
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some parsers from netwitness wrongly use ’ XML entity as a quote character. This entity translates to UNICODE codepoint U+0092 (PRIVATE USE 2), which is not printable and can cause problems. My understanding is that this is the result of either: - Device logs are encoded in the windows-1252 codepage, or - Log parsers originally written in windows-1252 codepage. In this codepage, \x92 represents a quotation mark similar to the ASCII \x27 single quotation mark ('). I believe someone misunderstood XML's &#xNNN entity as escaping a byte value, instead of a UNICODE codepoint. As it is unclear if the original logs contain this special quote, or it's the result of writting the parsers in a Windows editor, it's better to replace it's usage with empty captures that skip over this quote.
The original pipelines had been generated with some debugging comments in them, which made them much larger than necessary.
adriansr
added
bug
review
needs_backport
PR is waiting to be backported to other branches.
labels
Sep 23, 2020
botelastic
bot
added
the
needs_team
Indicates that the issue/PR needs a Team:* label
label
Sep 23, 2020
Pinging @elastic/siem (Team:SIEM) |
botelastic
bot
removed
the
needs_team
Indicates that the issue/PR needs a Team:* label
label
Sep 23, 2020
marc-gr
approved these changes
Sep 28, 2020
v1v
added a commit
to v1v/beats
that referenced
this pull request
Sep 29, 2020
* upstream/master: feat: prepare release pipelines (elastic#21238) Add IP validation to Security module (elastic#21325) Fixes for new 7.10 rsa2elk datasets (elastic#21240) o365input: Restart after fatal error (elastic#21258) Fix panic in cgroups monitoring (elastic#21355) Handle multiple upstreams in ingress-controller (elastic#21215) [CI] Fix runbld when workspace does not exist (elastic#21350) [Filebeat] Fix checkpoint (elastic#21344) [CI] Archive build reasons (elastic#21347) Add dashboard for pubsub metricset in googlecloud module (elastic#21326) [Elastic Agent] Allow embedding of certificate (elastic#21179) Adds a default for failure_cache.min_ttl (elastic#21085) [libbeat] Disk queue implementation (elastic#21176)
6 tasks
adriansr
added
v7.10.0
and removed
needs_backport
PR is waiting to be backported to other branches.
labels
Sep 29, 2020
adriansr
added a commit
to adriansr/beats
that referenced
this pull request
Sep 29, 2020
* Fix bad unicode character used in juniper/netscreen Some parsers from netwitness wrongly use ’ XML entity as a quote character. This entity translates to UNICODE codepoint U+0092 (PRIVATE USE 2), which is not printable and can cause problems. My understanding is that this is the result of either: - Device logs are encoded in the windows-1252 codepage, or - Log parsers originally written in windows-1252 codepage. In this codepage, \x92 represents a quotation mark similar to the ASCII \x27 single quotation mark ('). I believe someone misunderstood XML's &#xNNN entity as escaping a byte value, instead of a UNICODE codepoint. As it is unclear if the original logs contain this special quote, or it's the result of writting the parsers in a Windows editor, it's better to replace it's usage with empty captures that skip over this quote. * Update pipelines for new 7.10 rsa2elk datasets The original pipelines had been generated with some debugging comments in them, which made them much larger than necessary. (cherry picked from commit 24e972f)
adriansr
added a commit
that referenced
this pull request
Sep 29, 2020
* Fix bad unicode character used in juniper/netscreen Some parsers from netwitness wrongly use ’ XML entity as a quote character. This entity translates to UNICODE codepoint U+0092 (PRIVATE USE 2), which is not printable and can cause problems. My understanding is that this is the result of either: - Device logs are encoded in the windows-1252 codepage, or - Log parsers originally written in windows-1252 codepage. In this codepage, \x92 represents a quotation mark similar to the ASCII \x27 single quotation mark ('). I believe someone misunderstood XML's &#xNNN entity as escaping a byte value, instead of a UNICODE codepoint. As it is unclear if the original logs contain this special quote, or it's the result of writting the parsers in a Windows editor, it's better to replace it's usage with empty captures that skip over this quote. * Update pipelines for new 7.10 rsa2elk datasets The original pipelines had been generated with some debugging comments in them, which made them much larger than necessary. (cherry picked from commit 24e972f)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This updates the Javascript pipelines in the new rsa2elk datasets for 7.10.
Why is it important?
There were two problems with the original pipelines:
juniper/netscreen:
This pipeline used ’ XML entity as a quote character. This entity translates to UNICODE codepoint U+0092 (PRIVATE USE 2) (�), which is not printable and can cause problems.
My understanding is that this is the result of either:
and
In this codepage, \x92 represents a quotation mark similar to the ASCII \x27 single quotation mark ('). The correct codepoint to use for this character would have been U+2019 (’, RIGHT SINGLE QUOTATION MARK, ’).
As it is unclear if the original logs contain this special quote, or it's the result of writing the parsers in a Windows editor, it's better to replace it's usage with empty captures that skip over the quote.
The original pipelines had been generated with some debugging comments that made them much larger than necessary.
Checklist
My code follows the style guidelines of this projectI have commented my code, particularly in hard-to-understand areasI have made corresponding changes to the documentationI have made corresponding change to the default configuration filesI have added tests that prove my fix is effective or that my feature worksI have added an entry inCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Author's Checklist
It's OK as long as it passes the tests.