Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental Codec Support #13992

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sarthakaggarwal97
Copy link
Contributor

@sarthakaggarwal97 sarthakaggarwal97 commented Jun 5, 2024

Description

We currently do not have a way to mark and validate the codecs as experimental in OpenSearch. With this, we would be able to safely introduce experimental codecs for consumption.

Related Issues

Resolves #13723

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • API changes companion pull request created.
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing labels Jun 5, 2024
Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
@sarthakaggarwal97 sarthakaggarwal97 added the backport 2.x Backport to 2.x branch label Jun 5, 2024
Copy link
Contributor

github-actions bot commented Jun 5, 2024

❌ Gradle check result for 14ee1c0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Jun 5, 2024

✅ Gradle check result for 03f1246: SUCCESS

Copy link

codecov bot commented Jun 5, 2024

Codecov Report

Attention: Patch coverage is 72.72727% with 3 lines in your changes missing coverage. Please review.

Project coverage is 71.64%. Comparing base (b15cb0c) to head (03f1246).
Report is 854 commits behind head on main.

Files with missing lines Patch % Lines
...java/org/opensearch/index/engine/EngineConfig.java 72.72% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #13992      +/-   ##
============================================
+ Coverage     71.42%   71.64%   +0.22%     
- Complexity    59978    61336    +1358     
============================================
  Files          4985     5065      +80     
  Lines        282275   288206    +5931     
  Branches      40946    41743     +797     
============================================
+ Hits         201603   206476    +4873     
- Misses        63999    64677     +678     
- Partials      16673    17053     +380     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@reta
Copy link
Collaborator

reta commented Jun 5, 2024

@sarthakaggarwal97 I am not onboard with this change, in my opinion:

  • we do need generic mechanism to enable / disable codecs that does not rely on OpenSearch custom interfaces (mentioned in [Feature Request] Support for Experimental Codecs #13723)
  • we should not take all or nothing approach to cut off experimental codecs, the users should be given a choice to have fine-grained control over which ones they want or don't want

On a general note, I think we should not be tight to the notion of "experimental codec in code" but provide the mechanism to configure inclusions / exclusions. Surely this is just my opinion, @dblock @andrross @msfroh any thoughts from you guys? Thanks!

@andrross
Copy link
Member

andrross commented Jun 6, 2024

Agree with @reta's comment above. I still don't understand what is preventing us from implementing this capability within custom-codecs behind a setting defined by that plugin?

@sarthakaggarwal97
Copy link
Contributor Author

sarthakaggarwal97 commented Jun 10, 2024

@andrross the issue is that the codecs are validated and made available over at two places: EngineConfig and at CustomCodecService.

The validation that happens in the EngineConfig, the codecs are available via NamedSPI which in turn gets the codecs / services from resources file. I don't think there is a way to load granular resources/services/codecs based on the cluster settings.

We can definitely control whether to make the codecs available or not in CustomCodecService but I dont think there is way to make codecs not available in EngineConfig once loaded via resources.

If we don't make a validation change (as in the PR), we would always allow the experimental codec (as the codec was loaded through the resources), as it will be present in the NamedSPI. If the same experimental codec is not available in CodecService, the shards will fail.

If there is another way out by limiting the changes in custom-codecs plugin, we should go for it.

Tagging @reta @backslasht @mgodwan @shwetathareja @dblock @msfroh @ankitkala to add more thoughts.

@reta
Copy link
Collaborator

reta commented Jun 10, 2024

The validation that happens in the EngineConfig, the codecs are available via NamedSPI which in turn gets the codecs / services from resources file. I don't think there is a way to load granular resources/services/codecs based on the cluster settings.

So we have 2 issues here:

  1. Codecs SPI (that come from Apache Lucene)
  2. CodecService that manages codecs on OS side

As you rightfully mentioned, Apache Lucene has no direct support of codecs filtering (the only way we could distinguish stable and non-stable codecs is by convention, Apache Lucene uses sandbox there). Let us enumerate the options here that target any codec that user may use (not relying on any additional OpenSearch specific interfaces):

  • add new node level setting to explicitly enable / disable codecs, that would work with EngineConfig (using dependent setting) and it would also cover any codec, either supplied by Apache Lucene or custom-codecs plugin
  • add new node level setting to explicitly enable / disable sandbox codecs (we could get that from the package), it will work with Apache Lucene codecs but not custom-codecs plugin since we didn't follow the convention, this is also not convenient since it is "all-or-nothing" approach, I think we should be able to give more control to the user which codecs to enable / disable

We could narrow the scope and try to contain the change with custom-codecs plugin only but the split between "codecs coming from anywhere" and "settings are validated in core" makes it difficult to implement, especially with the need to allow individual codecs to be enabled / disabled.

@andrross
Copy link
Member

@reta What is the behavior of the system if we just implement runtime checks based on the value of some "experimental" configuration setting that dynamically chooses the codecs to add in the CodecService here? https://github.com/opensearch-project/custom-codecs/blob/main/src/main/java/org/opensearch/index/codec/customcodecs/CustomCodecService.java#L47

@reta
Copy link
Collaborator

reta commented Jun 10, 2024

@reta What is the behavior of the system if we just implement runtime checks based on the value of some "experimental" configuration setting that dynamically chooses the codecs to add in the CodecService here?

@andrross that would not "hide" them from Apache Lucene: https://github.com/opensearch-project/custom-codecs/blob/main/src/main/resources/META-INF/services/org.apache.lucene.codecs.Codec

@andrross
Copy link
Member

@reta What is the behavior of the system if we just implement runtime checks based on the value of some "experimental" configuration setting that dynamically chooses the codecs to add in the CodecService here?

@andrross that would not "hide" them from Apache Lucene: https://github.com/opensearch-project/custom-codecs/blob/main/src/main/resources/META-INF/services/org.apache.lucene.codecs.Codec

Right, but would they be usable to an OpenSearch user if CodecService prevent access?

@reta
Copy link
Collaborator

reta commented Jun 10, 2024

Right, but would they be usable to an OpenSearch user if CodecService prevent access?

Yes and no: the validation check will pass successfully for all codec related settings (those use Codec SPI), but it will fail later on if the codec is filtered out from CodecService when engine is going to be created. So in some places the codecs will be usable, in others - not.

@sarthakaggarwal97
Copy link
Contributor Author

sarthakaggarwal97 commented Jun 11, 2024

Yes and no: the validation check will pass successfully for all codec related settings (those use Codec SPI), but it will fail later on if the codec is filtered out from CodecService when engine is going to be created. So in some places the codecs will be usable, in others - not.

yeah, exactly. The shard will fail in this scenario.

@reta @andrross what should be the next course of action on this?

@reta
Copy link
Collaborator

reta commented Jun 11, 2024

@reta @andrross what should be the next course of action on this?

@sarthakaggarwal97 I think we have to agree on 3 things in principle:

  • do we want to have enable / disable controls on individual codecs level or not? (I think we should)
  • do we want to have enable / disable controls on class of codecs (sandbox / experimental / ...)? (I think we should not)
  • do we want to limit enable / disable controls to the custom-codecs plugin only? (I think we should)

The decisions here would shape the solution.

@sarthakaggarwal97
Copy link
Contributor Author

@reta I'm aligned with your suggestion. If we are able to restrict the experimental codec changes to custom-codecs, then we should go for it. Granular control over experimental codec would be ideal as well.

@reta
Copy link
Collaborator

reta commented Jun 12, 2024

@reta I'm aligned with your suggestion. If we are able to restrict the experimental codec changes to custom-codecs, then we should go for it. Granular control over experimental codec would be ideal as well.

Thanks @sarthakaggarwal97 , let us close ti and move the discussion to opensearch-project/custom-codecs#148

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@@ -140,15 +140,19 @@ public Supplier<RetentionLeases> retentionLeasesSupplier() {
return s;
default:
if (Codec.availableCodecs().contains(s)) {
return s;
if (!isExperimentalCodec(Codec.forName(s))) {
return s;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we just rename s to something better?

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

@opensearch-trigger-bot opensearch-trigger-bot bot added the stalled Issues that have stalled label Sep 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch enhancement Enhancement or improvement to existing feature or request Indexing Indexing, Bulk Indexing and anything related to indexing stalled Issues that have stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Support for Experimental Codecs
4 participants