Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INGEST: Enable default pipelines #32286

Merged

Conversation

original-brownbear
Copy link
Member

* Add `default_pipeline` index setting
* Empty string pipeline argument is interpreted as no pipeline
* closes elastic#21101
@original-brownbear original-brownbear added >enhancement :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v7.0.0 v6.4.0 labels Jul 23, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@@ -78,7 +78,7 @@ protected void doRun() throws Exception {
innerExecute(indexRequest, getPipeline(indexRequest.getPipeline()));
//this shouldn't be needed here but we do it for consistency with index api
// which requires it to prevent double execution
indexRequest.setPipeline(null);
indexRequest.setPipeline("");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to change this since null now triggers an empty loop because it causes the default pipeline to be set over and over.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on what triggers this? I'm unsure about the original comment, but it seems like that should be fixed first, so that we can have null as meaning "no pipeline" in the index request. Empty string should be an error. We need some type of literal string (which we disallow from pipeline names) to mean "no pipeline". In the original default pipelines issue, I suggested _none or something like that. Using empty string is too easy to happen with a malformed request (like accidentally not serializing http params correctly into the request).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjernst the problem is that I need to somehow differentiate between "no pipeline so set default" (when the request first hits org.elasticsearch.action.bulk.TransportBulkAction#doExecute) and "no pipeline, period" (when it hits org.elasticsearch.action.bulk.TransportBulkAction#doExecute and I want to prevent the default to be applied again). Previously double execution here could be prevented by setting null, but now I used null to trigger setting the default and "" to mean that no default should be set (since that is also what I'd see if someone used a url ending in ?pipeline=).

So better move to _none for when we want to prevent setting a pipeline and also use the default on "" (i.e. URLs ending in ?pipeline=)? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So better move to _none for when we want to prevent setting a pipeline and also use the default on "" (i.e. URLs ending in ?pipeline=)? :)

Sorry I don't understand what you are suggesting here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjernst

simply:

?pipeline= means no pipeline is set i.e is equivalent to not ?pipeline parameter in the uri => apply default pipeline
?pipeline=_none => don't run any pipeline

sorry, seems it's too late for my writing skills :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel strongly ?pipeline= should be an error. It is too easy too have someone mean to override a pipeline, but then get the default (which is the opposite of the problem when using ?pipeline= to mean don't run any pipeline).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rjernst I like it :) Error for ?pipeline= it is :) => on it

hasIndexRequestsWithPipelines = true;
}
}
} else if (!pipeline.isEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prefer == false instead of ! as the latter is susceptible to being missed when reading through code.

ignore: 404

---
"Test index with default pipeline":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While having rest tests is fine to ensure the param is passed through the rest layer correctly, we should first have unit tests.

@original-brownbear
Copy link
Member Author

@rjernst added unit tests and moved to _none for the loop setting and returning a 400 on an empty setting as discussed. I think this is good for another review whenever you have a sec :)

@original-brownbear
Copy link
Member Author

@rjernst ping :)

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple comments.

IndexMetaData indexMetaData = indicesMetaData.get(indexRequest.index());
if (indexMetaData != null) {
String defaultPipeline = IndexSettings.DEFAULT_PIPELINE.get(indexMetaData.getSettings());
if (defaultPipeline.isEmpty() == false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be possible if we are validating a non-empty string?

@@ -254,6 +254,9 @@
public static final Setting<Integer> MAX_REGEX_LENGTH_SETTING = Setting.intSetting("index.max_regex_length",
1000, 1, Property.Dynamic, Property.IndexScope);

public static final Setting<String> DEFAULT_PIPELINE =
Setting.simpleString("index.default_pipeline", "", Property.Dynamic, Property.IndexScope);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add validation to this setting so it is not empty. We can make the default the _none constant.

continue;
}
String pipeline = indexRequest.getPipeline();
if (Strings.hasText(pipeline) && IngestService.NOOP_PIPELINE_NAME.equals(pipeline) == false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is hasText necessary if we have validated this is a non empty string?

@original-brownbear
Copy link
Member Author

@rjernst thanks for spotting that, fixed all the points you mentioned by moving to the _none setting for the default and validating for a non-empty string for the setting as well.
This worked fine before because it validated for non empty string on the request layer and then used the empty string as the internal default (which is really meh once you spell it out like that :)) => fixed I think.

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @original-brownbear. LGTM.

@original-brownbear
Copy link
Member Author

@rjernst sorry my bad here, had to push one last fix (maybe take one last quick look? :)) in 6a9ebae (we have to correctly set _none now when the index doesn't exist and also when there was a null pipeline on the request to avoid the null check downstream).

Should be green now :)

Copy link
Member

@rjernst rjernst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looks fine.

@original-brownbear
Copy link
Member Author

Jenkins test this

@original-brownbear
Copy link
Member Author

@rjernst thanks! => merging :)

@original-brownbear original-brownbear merged commit be31cc6 into elastic:master Aug 2, 2018
@original-brownbear original-brownbear deleted the default-pipelines branch August 2, 2018 15:11
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Aug 2, 2018
* INGEST: Enable default pipelines

* Add `default_pipeline` index setting
* `_none` is interpreted as no pipeline
* closes elastic#21101
original-brownbear added a commit that referenced this pull request Aug 2, 2018
* INGEST: Enable default pipelines

* Add `default_pipeline` index setting
* `_none` is interpreted as no pipeline
* closes #21101
dnhatn added a commit that referenced this pull request Aug 3, 2018
* master:
  HLRC: Move commercial clients from XPackClient (#32596)
  Add cluster UUID to Cluster Stats API response (#32206)
  Security: move User to protocol project (#32367)
  [TEST] Test for shard failures, add debug to testProfileMatchesRegular
  Minor fix for javadoc (applicable for java 11). (#32573)
  Painless: Move Some Lookup Logic to PainlessLookup (#32565)
  TEST: Avoid merges in testSeqNoAndCheckpoints
  [Rollup] Remove builders from HistoGroupConfig (#32533)
  Mutes failing SQL string function tests due to #32589
  fixed elements in array of produced terms (#32519)
  INGEST: Enable default pipelines (#32286)
  Remove cluster state initial customs (#32501)
  Mutes LicensingDocumentationIT due to #32580
  [ML] Remove multiple_bucket_spans (#32496)
  [ML] Rename JobProvider to JobResultsProvider (#32551)
  Correct minor typo in explain.asciidoc for HLRC
  Build: Add elastic maven to repos used by BuildPlugin (#32549)
  Clarify the error message when a pipeline agg is used in the 'order' parameter. (#32522)
  Revert "[test] turn on host io cache for opensuse (#32053)"
  Enable packaging tests on suse boxes
  [ML] Improve error when no available field exists for rule scope (#32550)
  [ML] Improve error for functions with limited rule condition support (#32548)
  Painless: Clean Up PainlessField (#32525)
  Add @AwaitsFix for #32554
  Remove broken @link in Javadoc
  Scripting: Conditionally use java time api in scripting (#31441)
  [ML] Fix thread leak when waiting for job flush (#32196) (#32541)
  Add AwaitsFix to failing test - see #32546
  Core: Minor size reduction for AbstractComponent (#32509)
  SQL: Added support for string manipulating functions with more than one parameter (#32356)
  [DOCS] Reloadable Secure Settings (#31713)
  Watcher: Reenable HttpSecretsIntegrationTests#testWebhookAction test (#32456)
  [Rollup] Remove builders from TermsGroupConfig (#32507)
  Use hostname instead of IP with SPNEGO test (#32514)
  Switch x-pack rolling restart to new style Requests (#32339)
  NETWORKING: Fix Netty Leaks by upgrading to 4.1.28 (#32511)
  [DOCS] Small fixes in rule configuration page (#32516)
  Painless: Clean up PainlessMethod (#32476)
  Build: Remove shadowing from benchmarks (#32475)
  Docs: Add all JDKs to CONTRIBUTING.md
  Add licensing enforcement for FIPS mode (#32437)
  SQL: Add test for handling of partial results (#32474)
  Mute testFilterCacheStats
  [ML][DOCS] Fix typo applied_to => applies_to
  Scripting: Fix painless compiler loader to know about context classes (#32385)
dnhatn added a commit that referenced this pull request Aug 6, 2018
* 6.x:
  [Kerberos] Use canonical host name (#32588)
  Cross-cluster search: preserve cluster alias in shard failures (#32608)
  [TEST] Allow to run in FIPS JVM (#32607)
  Handle AlreadyClosedException when bumping primary term
  [Test] Add ckb to the list of unsupported languages (#32611)
  SCRIPTING: Move Aggregation Scripts to their own context (#32068) (#32629)
  [TEST] Enhance failure message when bulk updates have failures
  [ML] Add ML result classes to protocol library (#32587)
  Suppress LicensingDocumentationIT.testPutLicense in release builds (#32613)
  [Rollup] Improve ID scheme for rollup documents (#32558)
  Mutes failing SQL string function tests due to #32589
  Suppress Wildfly test in FIPS JVMs (#32543)
  Add cluster UUID to Cluster Stats API response (#32206)
  [ML] Add some ML config classes to protocol library (#32502)
  [TEST]Split transport verification mode none tests (#32488)
  [Rollup] Remove builders from DateHistogramGroupConfig (#32555)
  [ML] Add Detector config classes to protocol library (#32495)
  [Rollup] Remove builders from MetricConfig (#32536)
  Fix race between replica reset and primary promotion (#32442)
  HLRC: Move commercial clients from XPackClient (#32596)
  Security: move User to protocol project (#32367)
  Minor fix for javadoc (applicable for java 11). (#32573)
  Painless: Move Some Lookup Logic to PainlessLookup (#32565)
  Core: Minor size reduction for AbstractComponent (#32509)
  INGEST: Enable default pipelines (#32286) (#32591)
  TEST: Avoid merges in testSeqNoAndCheckpoints
  [Rollup] Remove builders from HistoGroupConfig (#32533)
  fixed elements in array of produced terms (#32519)
  Mutes ReindexFailureTests.searchFailure dues to #28053
  Mutes LicensingDocumentationIT due to #32580
  Remove the SATA controller from OpenSUSE box
  [ML] Rename JobProvider to JobResultsProvider (#32551)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement v6.5.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5.0 Default pipelines
4 participants