Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Fix character set finder bug with unencodable charsets #33234

Merged
merged 1 commit into from
Aug 29, 2018

Conversation

droberts195
Copy link
Contributor

Some character sets cannot be encoded and this was tripping
up the binary data check in the ML log structure character
set finder.

The fix is to assume that if ICU4J identifies that some bytes
correspond to a character set that cannot be encoded and those
bytes contain zeroes then the data is binary rather than text.

Fixes #33227

Some character sets cannot be encoded and this was tripping
up the binary data check in the ML log structure character
set finder.

The fix is to assume that if ICU4J identifies that some bytes
correspond to a character set that cannot be encoded and those
bytes contain zeroes then the data is binary rather than text.

Fixes elastic#33227
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core

@droberts195
Copy link
Contributor Author

Marked as >non-issue as the affected code has never been released so this fix doesn't need to be release noted.

@droberts195 droberts195 merged commit 22415fa into elastic:master Aug 29, 2018
@droberts195 droberts195 deleted the fix_cn_encoding_bug branch August 29, 2018 13:56
droberts195 added a commit that referenced this pull request Aug 29, 2018
Some character sets cannot be encoded and this was tripping
up the binary data check in the ML log structure character
set finder.

The fix is to assume that if ICU4J identifies that some bytes
correspond to a character set that cannot be encoded and those
bytes contain zeroes then the data is binary rather than text.

Fixes #33227
dnhatn added a commit that referenced this pull request Aug 29, 2018
* master:
  Painless: Add Bindings (#33042)
  Update version after client credentials backport
  Fix forbidden apis on FIPS (#33202)
  Remote 6.x transport BWC Layer for `_shrink` (#33236)
  Test fix - Graph HLRC tests needed another field adding to randomisation exception list
  HLRC: Add ML Get Records API (#33085)
  [ML] Fix character set finder bug with unencodable charsets (#33234)
  TESTS: Fix overly long lines (#33240)
  Test fix - Graph HLRC test was missing field name to be excluded from randomisation logic
  Remove unsupported group_shard_failures parameter (#33208)
  Update BucketUtils#suggestShardSideQueueSize signature (#33210)
  Parse PEM Key files leniantly (#33173)
  INGEST: Add Pipeline Processor (#32473)
  Core: Add java time xcontent serializers (#33120)
  Consider multi release jars when running third party audit (#33206)
  Update MSI documentation (#31950)
  HLRC: create base timed request class (#33216)
  [DOCS] Fixes command page titles
  HLRC: Move ML protocol classes into client ml package (#33203)
  Scroll queries asking for rescore are considered invalid (#32918)
  Painless: Fix Semicolon Regression (#33212)
  ingest: minor - update test to include dissect (#33211)
  Switch remaining LLREST usage to new style Requests (#33171)
  HLREST: add reindex API (#32679)
dnhatn added a commit that referenced this pull request Aug 29, 2018
* 6.x:
  Fix forbidden apis on FIPS (#33202)
  HLRC: Add ML Get Records API (#33085)
  [ML] Fix character set finder bug with unencodable charsets (#33234)
  Tests fix - Graph HLRC client overly long line and syncing core’s copy of GraphExploreResponseTests taken from protocol. Related to #33231
  Test fix - Graph HLRC test was missing field name to be excluded from randomisation logic
  Parse PEM Key files leniantly (#33173)
  Core: Add java time xcontent serializers (#33120)
  Consider multi release jars when running third party audit (#33206)
  Update MSI documentation (#31950)
  [DOCS] Fixes command page titles
  HLRC: Move ML protocol classes into client ml package (#33203)
  Painless: Fix Semicolon Regression (#33212)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] LogStructureFinderManagerTests#testFindCharsetGivenBinary fails reproducibly
4 participants