Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run the RFS Container's DocumentMigration application repeatedly as long as it's successful #1047

Merged

Conversation

gregschohn
Copy link
Collaborator

@gregschohn gregschohn commented Oct 2, 2024

Unsuccessful means anything but a 0 exit code. The application has been cleaned up to return 3 for no work available. It used to return whatever a top-level exception would create. That part wasn't necessary, but it seemed like a good idea to test for that to make sure that after running repeatedly, processes would eventually do THAT instead of returning 0, causing infinite loops in the containers.

Description

This eliminates container startup costs while still cleaning out the java process so that we don't have to worry about runaway processes.

  • Category Enhancement
  • Why these changes are required? For clusters with very small indices, migrations could take a very long time. since every tiny shard would need to create a new container. That should be drastically reduced now (~100x ?).

Issues Resolved

https://opensearch.atlassian.net/browse/MIGRATIONS-2042

Is this a backport? If so, please add backport PR # and/or commits #

Testing

Manual testing of the entrypoint.sh script

docker % export RFS_COMMAND="exit 1" 
docker % ./entrypoint.sh            
RFS_COMMAND: exit 1
RFS_TARGET_USER: 
RFS_TARGET_PASSWORD: <redacted>
RFS_TARGET_PASSWORD_ARN: 
Running command exit 1
export RFS_COMMAND="sleep 1"
docker % ./entrypoint.sh
RFS_COMMAND: sleep 1
RFS_TARGET_USER: 
RFS_TARGET_PASSWORD: <redacted>
RFS_TARGET_PASSWORD_ARN: 
Running command...
Running command...
^C

Check List

  • New functionality includes testing
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…ong it's successful.

Unsuccessful means anything but a 0 exit code.  The application has been cleaned up to return 3 for no work available.  It used to return whatever a top-level exception would create.  That part wasn't necessary, but it seemed like a good idea to test for that to make sure that after running repeatedly, processes would eventually do THAT instead of returning 0, causing infinite loops in the containers.

Signed-off-by: Greg Schohn <greg.schohn@gmail.com>
@gregschohn gregschohn changed the title Run the RFS Container's DocumentMigration application repeatedly as l… Run the RFS Container's DocumentMigration application repeatedly as long as it's successful Oct 2, 2024
Copy link

codecov bot commented Oct 2, 2024

Codecov Report

Attention: Patch coverage is 0% with 8 lines in your changes missing coverage. Please review.

Project coverage is 80.55%. Comparing base (647f601) to head (863f034).
Report is 11 commits behind head on main.

Files with missing lines Patch % Lines
...org/opensearch/migrations/RfsMigrateDocuments.java 0.00% 8 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1047      +/-   ##
============================================
- Coverage     80.57%   80.55%   -0.02%     
  Complexity     2736     2736              
============================================
  Files           365      365              
  Lines         13611    13614       +3     
  Branches        941      941              
============================================
  Hits          10967    10967              
- Misses         2067     2070       +3     
  Partials        577      577              
Flag Coverage Δ
gradle-test 78.59% <0.00%> (-0.03%) ⬇️
python-test 90.11% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…ess specification so that all 3rd party imports are in the same alphabetical block.

Signed-off-by: Greg Schohn <greg.schohn@gmail.com>
@AndreKurait AndreKurait marked this pull request as ready for review October 3, 2024 22:18
gregschohn and others added 3 commits October 4, 2024 17:42
…isk space.

Signed-off-by: Greg Schohn <greg.schohn@gmail.com>
…ories

Signed-off-by: Andre Kurait <akurait@amazon.com>
…tLoopOptimization

Use sed for command line parsing in rfs entrypoint for cleanup directories
@AndreKurait AndreKurait requested a review from chelma October 7, 2024 20:38
Copy link
Member

@chelma chelma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One question but I don't think it will substantively impact the application either way.

fi

if [[ -n "$LUCENE_DIR" ]]; then
echo "Will delete lucene local directory between runs: $LUCENE_DIR"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker, but - if we pre-delete the S3 directory (lines 52-23), why not pre-delete this one as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed lines 52-53. We'll run document migration once before deleting anything

AndreKurait and others added 2 commits October 7, 2024 16:20
Signed-off-by: Andre Kurait <akurait@amazon.com>
@AndreKurait AndreKurait merged commit 5709add into opensearch-project:main Oct 7, 2024
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants