Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add remove_by_pattern ingest processor #11920

Merged

Conversation

gaobinlong
Copy link
Collaborator

@gaobinlong gaobinlong commented Jan 18, 2024

Description

Inspired by #10967, this PR adds a new ingest processor called remove_by_pattern processor which supports removing fields by field name patterns like a*, *b, there are two main parameters in this processor, and they are mutually exclusive:

  • field_pattern: optional, single value or array, support wildcard pattern like a*, *b, or a*b, fields match the pattern will be removed
  • exclude_field_pattern: optional, single value or array, fields do not match the pattern will be removed

This processor doesn't touch any metadata fields in the document like _index, _id, if users want to remove some metadata fields, they can use the remove ingest processor, and the validation for field name patterns is same to the index patterns when creating index template, starting with _ is not allowed.

In addition, this processor only targets for root level fields in the document, if users want to remove some nested fields, they can use remove processor to remove the fields by specifying the field names completely.

Here is an example:

Keep the fields which start with a* or b*, and remove others:

POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "remove_by_pattern": {
          "exclude_field_pattern": [
            "a*",
            "b*"
          ]
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "a": 1,
        "b": 2,
        "c": 3
      }
    }
  ]
}

In addition, why we don't support filed name patterns in the existing remove ingest processor? That's because field names can be a* or b*, so we need to add extra two parameters like field_pattern and exclude_filed_pattern to the remove ingest processor, which makes the usage complicated and confuses users, a single processor is better than that solution.

Related Issues

#1578

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Copy link
Contributor

github-actions bot commented Jan 18, 2024

Compatibility status:

Checks if related components are compatible with change cf3722d

Incompatible components

Incompatible components: [https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/performance-analyzer-rca.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/geospatial.git]

Copy link
Contributor

❕ Gradle check result for 00054dd: UNSTABLE

  • TEST FAILURES:
      2 org.opensearch.common.util.concurrent.QueueResizableOpenSearchThreadPoolExecutorTests.classMethod
      1 org.opensearch.remotestore.RemoteIndexPrimaryRelocationIT.testPrimaryRelocationWhileIndexing
      1 org.opensearch.common.util.concurrent.QueueResizableOpenSearchThreadPoolExecutorTests.testResizeQueueDown

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link
Contributor

❕ Gradle check result for 4ea2323: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testShardRoutingWithNetworkDisruption_FailOpenEnabled
      1 org.opensearch.repositories.azure.AzureBlobStoreRepositoryTests.testRequestStats
      1 org.opensearch.remotestore.RemoteIndexPrimaryRelocationIT.testPrimaryRelocationWhileIndexing

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Copy link
Contributor

❕ Gradle check result for 96298c1: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.index.IndexServiceTests.testAsyncTranslogTrimTaskOnClosedIndex

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@gaobinlong
Copy link
Collaborator Author

Hi @deshsidd , I've changed some code according to your comments, please help to take a second look, thanks!

@gaobinlong
Copy link
Collaborator Author

@msfroh @reta , could you help to take a second look at this PR? Thank you!

Copy link
Contributor

✅ Gradle check result for 130d6dd: SUCCESS

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Copy link
Contributor

❕ Gradle check result for cf3722d: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testIndexCreateBlockIsRemovedWhenAnyNodesNotExceedHighWatermarkWithAutoReleaseEnabled

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@gaobinlong
Copy link
Collaborator Author

@msfroh could this PR be approved now? This PR targets for 2.12.0, so we want to get it merged before the code freeze date. The document PR is ready.

@msfroh msfroh merged commit 8d54278 into opensearch-project:main Feb 6, 2024
33 of 40 checks passed
@msfroh msfroh added the backport 2.x Backport to 2.x branch label Feb 6, 2024
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-11920-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 8d54278cc5e2b9b73c0825cc430747f03ed96349
# Push it to GitHub
git push --set-upstream origin backport/backport-11920-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-11920-to-2.x.

gaobinlong added a commit to gaobinlong/OpenSearch that referenced this pull request Feb 7, 2024
* Add remove_by_pattern ingest processor
* Modify change log
* Remove some duplicated checks
* Add more yml test case
* Fix typo

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
(cherry picked from commit 8d54278)
gaobinlong added a commit to gaobinlong/OpenSearch that referenced this pull request Feb 7, 2024
* Add remove_by_pattern ingest processor
* Modify change log
* Remove some duplicated checks
* Add more yml test case
* Fix typo

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
(cherry picked from commit 8d54278)
msfroh pushed a commit that referenced this pull request Feb 7, 2024
* Add remove_by_pattern ingest processor
* Modify change log
* Remove some duplicated checks
* Add more yml test case
* Fix typo

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
(cherry picked from commit 8d54278)
msfroh pushed a commit that referenced this pull request Feb 7, 2024
* Add remove_by_pattern ingest processor
* Modify change log
* Remove some duplicated checks
* Add more yml test case
* Fix typo

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
(cherry picked from commit 8d54278)
peteralfonsi pushed a commit to peteralfonsi/OpenSearch that referenced this pull request Mar 1, 2024
* Add remove_by_pattern ingest processor
* Modify change log
* Remove some duplicated checks
* Add more yml test case
* Fix typo

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
rayshrey pushed a commit to rayshrey/OpenSearch that referenced this pull request Mar 18, 2024
* Add remove_by_pattern ingest processor
* Modify change log
* Remove some duplicated checks
* Add more yml test case
* Fix typo

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
* Add remove_by_pattern ingest processor
* Modify change log
* Remove some duplicated checks
* Add more yml test case
* Fix typo

---------

Signed-off-by: Gao Binlong <gbinlong@amazon.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
@reta reta mentioned this pull request Jul 17, 2024
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants