Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate doc ID in build index job for idempotency #1803

Merged
merged 5 commits into from
Jul 11, 2023

Conversation

dai-chen
Copy link
Collaborator

@dai-chen dai-chen commented Jun 30, 2023

Description

  1. Generate a new ID column __id__ (avoid conflict with user data) as spark.datasource.flint.write.id_name. The underlying Flint data source writer uses this to deduplicate by OpenSearch create API. This happens when Spark streaming job retry failed batch after restart.
  2. Exclude old version log4j which causes UT failure in IDE.

TODO: figure out how to add IT for fault tolerance.

Issues Resolved

opensearch-project/opensearch-spark#2

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
Signed-off-by: Chen Dai <daichen@amazon.com>
@dai-chen dai-chen added enhancement New feature or request Flint labels Jun 30, 2023
@dai-chen dai-chen self-assigned this Jun 30, 2023
Signed-off-by: Chen Dai <daichen@amazon.com>
@codecov
Copy link

codecov bot commented Jun 30, 2023

Codecov Report

Merging #1803 (d7bff3e) into feature/flint (91b2a06) will not change coverage.
The diff coverage is n/a.

❗ Current head d7bff3e differs from pull request most recent head adeed60. Consider uploading reports for the commit adeed60 to get more accurate results

@@               Coverage Diff                @@
##             feature/flint    #1803   +/-   ##
================================================
  Coverage            97.19%   97.19%           
  Complexity            4107     4107           
================================================
  Files                  371      371           
  Lines                10464    10464           
  Branches               706      706           
================================================
  Hits                 10170    10170           
  Misses                 287      287           
  Partials                 7        7           
Flag Coverage Δ
sql-engine 97.19% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

@dai-chen dai-chen requested a review from penghuo July 5, 2023 16:26
@dai-chen dai-chen merged commit b9eb0ea into opensearch-project:feature/flint Jul 11, 2023
11 of 12 checks passed
@dai-chen dai-chen deleted the add-doc-id branch July 11, 2023 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Flint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants