Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address mapping and compute engine runtime field issues #117792

Merged
merged 30 commits into from
Dec 5, 2024

Conversation

martijnvg
Copy link
Member

@martijnvg martijnvg commented Nov 30, 2024

This PR addresses the following issues:

  • Fields mapped as runtime fields not getting stored if source mode is synthetic.
  • Address java.io.EOFException when an es|ql query uses multiple runtime fields that fallback to source when source mode is synthetic. (1)
  • Address concurrency issue when runtime fields get pushed down to Lucene. (2)

1: ValueSourceOperator can read values in row striding or columnar fashion. When values are read in columnar fashion and multiple runtime fields synthetize source then this can cause the same SourceProvider evaluation the same range of docs ids multiple times. This can then result in unexpected io errors at the codec level. This is because the same doc value instances are used by SourceProvider. Re-evaluating the same docids is in violation of the contract of the DocIdSetIterator#advance(...) / DocIdSetIterator#advanceExact(...) methods, which documents that unexpected behaviour can occur if target docid is lower than current docid position.

Note that this is only an issue for synthetic source loader and not for stored source loader. And not when executing in row stride fashion which sometimes happen in compute engine and always happen in _search api.

2: The concurrency issue that arrises with source provider if source operator executes in parallel with data portioning set to DOC. The same SourceProvider instance then gets access by multiple threads concurrently. SourceProviders implementations are not designed to handle concurrent access.

Closes #117644

This change attempts to solve the following issue:

ValueSourceOperator can read values in row striding or columnar fashion. When values are read in columnar fashion and multiple runtime fields synthetize source then this can cause the same `SourceProvider` evaluation the same range of docs ids multiple times. This can then result in unexpected io errors at the codec level. This is because the same doc value instances are used by `SourceProvider`.

Re-evaluating the same docids is in violation of the contract of the DocIdSetIterator#advance(...) / DocIdSetIterator#advanceExact(...) methods, which documents that unexpected behaviour can occur if target docid is lower than current docid position.

Note that this is only an issue for synthetic source loader and not for stored source loader. And not when executing in row stride fashion which sometimes happen in compute engine and always happen in _search api.

Closes elastic#117644
@@ -108,7 +108,12 @@ public DocValueFormat docValueFormat(String format, ZoneId timeZone) {

@Override
public BlockLoader blockLoader(BlockLoaderContext blContext) {
return new LongScriptBlockDocValuesReader.LongScriptBlockLoader(leafFactory(blContext.lookup()));
boolean isSyntheticSource = SourceFieldMapper.isSynthetic(blContext.indexSettings());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have two different implementations based on source mode?


@Override
public final StoredFieldsSpec rowStrideStoredFieldSpec() {
return StoredFieldsSpec.NEEDS_SOURCE;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed because otherwise the follow error occurs:

[2024-12-03T04:52:11,002][WARN ][o.e.x.e.a.EsqlResponseListener] [test-cluster-0] Request failed with status [INTERNAL_SERVER_ERROR]: java.lang.IllegalStateException: found row stride readers [[RowStrideReaderWork[reader=ScriptLongs, builder=org.elasticsearch.compute.data.LongBlockBuilder@19f6840e, loader=org.elasticsearch.index.mapper.LongScriptBlockDocValuesReader$RowStrideLongScriptBlockLoader@6f332c9, offset=1], RowStrideReaderWork[reader=ScriptLongs, builder=org.elasticsearch.compute.data.LongBlockBuilder@12a1d61f, loader=org.elasticsearch.index.mapper.LongScriptBlockDocValuesReader$RowStrideLongScriptBlockLoader@7b7a5f6c, offset=2]]] without stored fields [StoredFieldsSpec[requiresSource=false, requiresMetadata=false, requiredStoredFields=[]]]
        at org.elasticsearch.compute.lucene.ValuesSourceReaderOperator.loadFromSingleLeaf(ValuesSourceReaderOperator.java:254)
        at org.elasticsearch.compute.lucene.ValuesSourceReaderOperator.process(ValuesSourceReaderOperator.java:143)
        at org.elasticsearch.compute.operator.AbstractPageMappingOperator.getOutput(AbstractPageMappingOperator.java:76)
        at org.elasticsearch.compute.operator.Driver.runSingleLoopIteration(Driver.java:258)
        at org.elasticsearch.compute.operator.Driver.run(Driver.java:189)
        at org.elasticsearch.compute.operator.Driver$1.doRun(Driver.java:378)
        at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
        at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:34)
        at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023)
        at org.elasticsearch.server@9.0.0-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at java.base/java.lang.Thread.run(Thread.java:1575)

However it isn't really required. Since the SearchLook / SourceLoader will load stored fields and not compute engine.

…te engine and scripting infrastructure to load _source
@martijnvg martijnvg requested a review from dnhatn December 2, 2024 16:34
@dnhatn
Copy link
Member

dnhatn commented Dec 4, 2024

Another issue is the thread switching, as the SyntheticSourceProvider is designed to work with only one thread. I think we need to address this before merging the change. It appears we are loading the source once for each script field, but I think it's okay to address the performance issue later.

сне 04, 2024 1:48:31 AM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
WARNING: Uncaught exception in thread: Thread[#143,elasticsearch[node_s1][esql_worker][T#2],5,TGRP-SyntheticSourceIT]
java.lang.AssertionError: StoredFieldsReader are only supposed to be consumed in the thread in which they have been acquired. But was acquired in Thread[#145,elasticsearch[node_s1][esql_worker][T#4],5,TGRP-SyntheticSourceIT] and consumed in Thread[#143,elasticsearch[node_s1][esql_worker][T#2],5,TGRP-SyntheticSourceIT].
	at __randomizedtesting.SeedInfo.seed([273E3739EEB5DBA8]:0)
	at org.apache.lucene.tests.codecs.asserting.AssertingCodec.assertThread(AssertingCodec.java:44)
	at org.apache.lucene.tests.codecs.asserting.AssertingStoredFieldsFormat$AssertingStoredFieldsReader.document(AssertingStoredFieldsFormat.java:75)
	at org.apache.lucene.index.CodecReader$1.document(CodecReader.java:101)
	at org.elasticsearch.search.internal.FieldUsageTrackingDirectoryReader$FieldUsageTrackingLeafReader$2.document(FieldUsageTrackingDirectoryReader.java:137)
	at org.elasticsearch.index.fieldvisitor.StoredFieldLoader$ReaderStoredFieldLoader.advanceTo(StoredFieldLoader.java:205)
	at org.elasticsearch.search.lookup.SyntheticSourceProvider$SyntheticSourceLeafLoader.getSource(SyntheticSourceProvider.java:57)
	at org.elasticsearch.search.lookup.SyntheticSourceProvider.getSource(SyntheticSourceProvider.java:42)
	at org.elasticsearch.search.lookup.LeafSearchLookup.lambda$new$0(LeafSearchLookup.java:40)
	at org.elasticsearch.script.AbstractFieldScript.extractFromSource(AbstractFieldScript.java:107)
	at org.elasticsearch.script.AbstractFieldScript.emitFromSource(AbstractFieldScript.java:127)
	at org.elasticsearch.script.LongFieldScript$1$1.execute(LongFieldScript.java:29)
	at org.elasticsearch.script.AbstractFieldScript.runForDoc(AbstractFieldScript.java:159)
	at org.elasticsearch.index.mapper.LongScriptBlockDocValuesReader.read(LongScriptBlockDocValuesReader.java:69)
	at org.elasticsearch.index.mapper.LongScriptBlockDocValuesReader.read(LongScriptBlockDocValuesReader.java:65)
	at org.elasticsearch.compute.lucene.ValuesSourceReaderOperator$LoadFromMany.read(ValuesSourceReaderOperator.java:420)
	at org.elasticsearch.compute.lucene.ValuesSourceReaderOperator$LoadFromMany.run(ValuesSourceReaderOperator.java:359)
	at org.elasticsearch.compute.lucene.ValuesSourceReaderOperator.process(ValuesSourceReaderOperator.java:159)
	at org.elasticsearch.compute.operator.AbstractPageMappingOperator.getOutput(AbstractPageMappingOperator.java:76)
	at org.elasticsearch.compute.operator.Driver.runSingleLoopIteration(Driver.java:258)
	at org.elasticsearch.compute.operator.Driver.run(Driver.java:189)
	at org.elasticsearch.compute.operator.Driver$1.doRun(Driver.java:378)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
	at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:34)
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:1023)
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:27)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1575)

@martijnvg martijnvg removed the needs:triage Requires assignment of a team area label label Dec 5, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @martijnvg, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Member

@dnhatn dnhatn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two workarounds are good! Feel free to merge the PR. Thanks, Martijn!

@martijnvg martijnvg changed the title Avoid reusing source providers for script based block loaders. Address SourceProvider issues in compute engine. Dec 5, 2024
@martijnvg martijnvg changed the title Address SourceProvider issues in compute engine. Address runtime field issues Dec 5, 2024
@martijnvg martijnvg changed the title Address runtime field issues Address mapping and compute engine runtime field issues Dec 5, 2024
@martijnvg martijnvg added the auto-backport Automatically create backport pull requests when merged label Dec 5, 2024
@martijnvg martijnvg merged commit 2051401 into elastic:main Dec 5, 2024
16 checks passed
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Dec 5, 2024
This change addresses the following issues:

Fields mapped as runtime fields not getting stored if source mode is synthetic.
Address java.io.EOFException when an es|ql query uses multiple runtime fields that fallback to source when source mode is synthetic. (1)
Address concurrency issue when runtime fields get pushed down to Lucene. (2)
1: ValueSourceOperator can read values in row striding or columnar fashion. When values are read in columnar fashion and multiple runtime fields synthetize source then this can cause the same SourceProvider evaluation the same range of docs ids multiple times. This can then result in unexpected io errors at the codec level. This is because the same doc value instances are used by SourceProvider. Re-evaluating the same docids is in violation of the contract of the DocIdSetIterator#advance(...) / DocIdSetIterator#advanceExact(...) methods, which documents that unexpected behaviour can occur if target docid is lower than current docid position.

Note that this is only an issue for synthetic source loader and not for stored source loader. And not when executing in row stride fashion which sometimes happen in compute engine and always happen in _search api.

2: The concurrency issue that arrises with source provider if source operator executes in parallel with data portioning set to DOC. The same SourceProvider instance then gets access by multiple threads concurrently. SourceProviders implementations are not designed to handle concurrent access.

Closes elastic#117644
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.17
8.x

martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Dec 5, 2024
This change addresses the following issues:

Fields mapped as runtime fields not getting stored if source mode is synthetic.
Address java.io.EOFException when an es|ql query uses multiple runtime fields that fallback to source when source mode is synthetic. (1)
Address concurrency issue when runtime fields get pushed down to Lucene. (2)
1: ValueSourceOperator can read values in row striding or columnar fashion. When values are read in columnar fashion and multiple runtime fields synthetize source then this can cause the same SourceProvider evaluation the same range of docs ids multiple times. This can then result in unexpected io errors at the codec level. This is because the same doc value instances are used by SourceProvider. Re-evaluating the same docids is in violation of the contract of the DocIdSetIterator#advance(...) / DocIdSetIterator#advanceExact(...) methods, which documents that unexpected behaviour can occur if target docid is lower than current docid position.

Note that this is only an issue for synthetic source loader and not for stored source loader. And not when executing in row stride fashion which sometimes happen in compute engine and always happen in _search api.

2: The concurrency issue that arrises with source provider if source operator executes in parallel with data portioning set to DOC. The same SourceProvider instance then gets access by multiple threads concurrently. SourceProviders implementations are not designed to handle concurrent access.

Closes elastic#117644
elasticsearchmachine pushed a commit that referenced this pull request Dec 5, 2024
…) (#118048)

* Address mapping and compute engine runtime field issues (#117792)

This change addresses the following issues:

Fields mapped as runtime fields not getting stored if source mode is synthetic.
Address java.io.EOFException when an es|ql query uses multiple runtime fields that fallback to source when source mode is synthetic. (1)
Address concurrency issue when runtime fields get pushed down to Lucene. (2)
1: ValueSourceOperator can read values in row striding or columnar fashion. When values are read in columnar fashion and multiple runtime fields synthetize source then this can cause the same SourceProvider evaluation the same range of docs ids multiple times. This can then result in unexpected io errors at the codec level. This is because the same doc value instances are used by SourceProvider. Re-evaluating the same docids is in violation of the contract of the DocIdSetIterator#advance(...) / DocIdSetIterator#advanceExact(...) methods, which documents that unexpected behaviour can occur if target docid is lower than current docid position.

Note that this is only an issue for synthetic source loader and not for stored source loader. And not when executing in row stride fashion which sometimes happen in compute engine and always happen in _search api.

2: The concurrency issue that arrises with source provider if source operator executes in parallel with data portioning set to DOC. The same SourceProvider instance then gets access by multiple threads concurrently. SourceProviders implementations are not designed to handle concurrent access.

Closes #117644

* fixed compile error after backporting
martijnvg added a commit that referenced this pull request Dec 5, 2024
… (#118049)

* Address mapping and compute engine runtime field issues (#117792)

This change addresses the following issues:

Fields mapped as runtime fields not getting stored if source mode is synthetic.
Address java.io.EOFException when an es|ql query uses multiple runtime fields that fallback to source when source mode is synthetic. (1)
Address concurrency issue when runtime fields get pushed down to Lucene. (2)
1: ValueSourceOperator can read values in row striding or columnar fashion. When values are read in columnar fashion and multiple runtime fields synthetize source then this can cause the same SourceProvider evaluation the same range of docs ids multiple times. This can then result in unexpected io errors at the codec level. This is because the same doc value instances are used by SourceProvider. Re-evaluating the same docids is in violation of the contract of the DocIdSetIterator#advance(...) / DocIdSetIterator#advanceExact(...) methods, which documents that unexpected behaviour can occur if target docid is lower than current docid position.

Note that this is only an issue for synthetic source loader and not for stored source loader. And not when executing in row stride fashion which sometimes happen in compute engine and always happen in _search api.

2: The concurrency issue that arrises with source provider if source operator executes in parallel with data portioning set to DOC. The same SourceProvider instance then gets access by multiple threads concurrently. SourceProviders implementations are not designed to handle concurrent access.

Closes #117644
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Dec 10, 2024
The previous fix to ensure that each thread uses its own SearchProvider wasn't good enough.
When multiple threads access `ReinitializingSourceProvider` the simple thread accounting could still result in returned `SourceProvider` being used by multiple threads concurrently.

The ReinitializingSourceProvider was introduced via elastic#117792

Closes elastic#118238
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Dec 11, 2024
This reverts the workaround that was introduced in elastic#117792 to avoid EOF error when an es|ql query uses multiple runtime fields that fallback to source when source mode is synthetic.
This is now covered by the `ReinitializingSourceProvider` workaround that covers that and the concurrency problem.
With this change, the main code for the required workarounds are now in isolated in ReinitializingSourceProvider.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Automatically create backport pull requests when merged >bug :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine v8.17.1 v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate synthetic source doc values error
3 participants