Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

elasticsearch hung up / closing connection during collapse search query with size parameter #104647

Closed
sh3bang opened this issue Jan 23, 2024 · 13 comments · Fixed by #104666
Closed
Assignees
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@sh3bang
Copy link

sh3bang commented Jan 23, 2024

Elasticsearch Version

8.12.0

Installed Plugins

No response

Java Version

bundled

OS Version

Linux behemoth 5.15.0-89-generic #99-Ubuntu SMP Mon Oct 30 20:42:41 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

Problem Description

Hello,

if i do a collapse query with inner_hits size limit elastic suddenly close the connection without answer!

Steps to Reproduce

POST http://elastic:9200/products/_search

{
    "query": {
        "match_all": {}
    },
    "collapse": {
        "field": "brand.name.keyword",
        "inner_hits": {
            "size": 50
        }
    },
    "fields": [
        "brand.name.keyword"
    ],
    "_source": false
}

Logs (if relevant)

{"@timestamp":"2024-01-23T10:00:46.202Z", "log.level":"ERROR", "message":"failure encoding chunk", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[460b78b2420d][transport_worker][T#6]","log.logger":"org.elasticsearch.rest.ChunkedRestResponseBody","elasticsearch.cluster.uuid":"X2gdOkX2Rca247aicNr33g","elasticsearch.node.id":"zuPqb5OFSPK4C6U9e8QqTQ","elasticsearch.node.name":"460b78b2420d","elasticsearch.cluster.name":"docker-cluster","error.type":"com.fasterxml.jackson.core.JsonGenerationException","error.message":"Can not start an object, expecting field name (context: Object)","error.stack_trace":"com.fasterxml.jackson.core.JsonGenerationException: Can not start an object, expecting field name (context: Object)\n\tat com.fasterxml.jackson.core@2.15.0/com.fasterxml.jackson.core.JsonGenerator._reportError(JsonGenerator.java:2849)\n\tat com.fasterxml.jackson.core@2.15.0/com.fasterxml.jackson.core.json.JsonGeneratorImpl._reportCantWriteValueExpectName(JsonGeneratorImpl.java:262)\n\tat com.fasterxml.jackson.core@2.15.0/com.fasterxml.jackson.core.json.UTF8JsonGenerator._verifyValueWrite(UTF8JsonGenerator.java:1179)\n\tat com.fasterxml.jackson.core@2.15.0/com.fasterxml.jackson.core.json.UTF8JsonGenerator.writeStartObject(UTF8JsonGenerator.java:375)\n\tat org.elasticsearch.xcontent.impl@8.12.0/org.elasticsearch.xcontent.provider.json.JsonXContentGenerator.writeStartObject(JsonXContentGenerator.java:148)\n\tat org.elasticsearch.xcontent@8.12.0/org.elasticsearch.xcontent.XContentBuilder.startObject(XContentBuilder.java:329)\n\tat org.elasticsearch.server@8.12.0/org.elasticsearch.search.SearchHit.toXContent(SearchHit.java:621)\n\tat org.elasticsearch.server@8.12.0/org.elasticsearch.rest.ChunkedRestResponseBody$1.encodeChunk(ChunkedRestResponseBody.java:119)\n\tat org.elasticsearch.server@8.12.0/org.elasticsearch.rest.RestController$EncodedLengthTrackingChunkedRestResponseBody.encodeChunk(RestController.java:839)\n\tat org.elasticsearch.transport.netty4@8.12.0/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.writeChunk(Netty4HttpPipeliningHandler.java:314)\n\tat org.elasticsearch.transport.netty4@8.12.0/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.doFlush(Netty4HttpPipeliningHandler.java:296)\n\tat org.elasticsearch.transport.netty4@8.12.0/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.flush(Netty4HttpPipeliningHandler.java:260)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:923)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:941)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1247)\n\tat io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)\n\tat io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)\n\tat io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)\n\tat io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)\n\tat io.netty.common@4.1.94.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat java.base/java.lang.Thread.run(Thread.java:1583)\n"}
{"@timestamp":"2024-01-23T10:00:46.205Z", "log.level": "WARN", "message":"caught exception while handling client http traffic, closing connection Netty4HttpChannel{localAddress=/172.21.0.3:9200, remoteAddress=/172.21.0.1:36538}", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[460b78b2420d][transport_worker][T#6]","log.logger":"org.elasticsearch.http.AbstractHttpServerTransport","elasticsearch.cluster.uuid":"X2gdOkX2Rca247aicNr33g","elasticsearch.node.id":"zuPqb5OFSPK4C6U9e8QqTQ","elasticsearch.node.name":"460b78b2420d","elasticsearch.cluster.name":"docker-cluster","error.type":"com.fasterxml.jackson.core.JsonGenerationException","error.message":"Can not start an object, expecting field name (context: Object)","error.stack_trace":"com.fasterxml.jackson.core.JsonGenerationException: Can not start an object, expecting field name (context: Object)\n\tat com.fasterxml.jackson.core@2.15.0/com.fasterxml.jackson.core.JsonGenerator._reportError(JsonGenerator.java:2849)\n\tat com.fasterxml.jackson.core@2.15.0/com.fasterxml.jackson.core.json.JsonGeneratorImpl._reportCantWriteValueExpectName(JsonGeneratorImpl.java:262)\n\tat com.fasterxml.jackson.core@2.15.0/com.fasterxml.jackson.core.json.UTF8JsonGenerator._verifyValueWrite(UTF8JsonGenerator.java:1179)\n\tat com.fasterxml.jackson.core@2.15.0/com.fasterxml.jackson.core.json.UTF8JsonGenerator.writeStartObject(UTF8JsonGenerator.java:375)\n\tat org.elasticsearch.xcontent.impl@8.12.0/org.elasticsearch.xcontent.provider.json.JsonXContentGenerator.writeStartObject(JsonXContentGenerator.java:148)\n\tat org.elasticsearch.xcontent@8.12.0/org.elasticsearch.xcontent.XContentBuilder.startObject(XContentBuilder.java:329)\n\tat org.elasticsearch.server@8.12.0/org.elasticsearch.search.SearchHit.toXContent(SearchHit.java:621)\n\tat org.elasticsearch.server@8.12.0/org.elasticsearch.rest.ChunkedRestResponseBody$1.encodeChunk(ChunkedRestResponseBody.java:119)\n\tat org.elasticsearch.server@8.12.0/org.elasticsearch.rest.RestController$EncodedLengthTrackingChunkedRestResponseBody.encodeChunk(RestController.java:839)\n\tat org.elasticsearch.transport.netty4@8.12.0/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.writeChunk(Netty4HttpPipeliningHandler.java:314)\n\tat org.elasticsearch.transport.netty4@8.12.0/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.doFlush(Netty4HttpPipeliningHandler.java:296)\n\tat org.elasticsearch.transport.netty4@8.12.0/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.flush(Netty4HttpPipeliningHandler.java:260)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:923)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:941)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1247)\n\tat io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)\n\tat io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)\n\tat io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)\n\tat io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)\n\tat io.netty.common@4.1.94.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat java.base/java.lang.Thread.run(Thread.java:1583)\n"}
{"@timestamp":"2024-01-23T10:00:46.206Z", "log.level":"ERROR", "message":"unexpected error while releasing pipelined http responses", "ecs.version": "1.2.0","service.name":"ES_ECS","event.dataset":"elasticsearch.server","process.thread.name":"elasticsearch[460b78b2420d][transport_worker][T#6]","log.logger":"org.elasticsearch.http.netty4.Netty4HttpServerTransport","elasticsearch.cluster.uuid":"X2gdOkX2Rca247aicNr33g","elasticsearch.node.id":"zuPqb5OFSPK4C6U9e8QqTQ","elasticsearch.node.name":"460b78b2420d","elasticsearch.cluster.name":"docker-cluster","error.type":"java.lang.IllegalStateException","error.message":"complete already: DefaultChannelPromise@7c6a8580(failure: java.nio.channels.ClosedChannelException)","error.stack_trace":"java.lang.IllegalStateException: complete already: DefaultChannelPromise@7c6a8580(failure: java.nio.channels.ClosedChannelException)\n\tat io.netty.common@4.1.94.Final/io.netty.util.concurrent.DefaultPromise.setFailure(DefaultPromise.java:113)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.DefaultChannelPromise.setFailure(DefaultChannelPromise.java:89)\n\tat org.elasticsearch.transport.netty4@8.12.0/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.safeFailPromise(Netty4HttpPipeliningHandler.java:353)\n\tat org.elasticsearch.transport.netty4@8.12.0/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.close(Netty4HttpPipeliningHandler.java:337)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeClose(AbstractChannelHandlerContext.java:751)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:727)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:560)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.DefaultChannelPipeline.close(DefaultChannelPipeline.java:957)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannel.close(AbstractChannel.java:244)\n\tat org.elasticsearch.transport.netty4@8.12.0/org.elasticsearch.http.netty4.Netty4HttpChannel.close(Netty4HttpChannel.java:67)\n\tat org.elasticsearch.base@8.12.0/org.elasticsearch.core.IOUtils.close(IOUtils.java:71)\n\tat org.elasticsearch.base@8.12.0/org.elasticsearch.core.IOUtils.close(IOUtils.java:119)\n\tat org.elasticsearch.server@8.12.0/org.elasticsearch.common.network.CloseableChannel.closeChannels(CloseableChannel.java:78)\n\tat org.elasticsearch.server@8.12.0/org.elasticsearch.common.network.CloseableChannel.closeChannel(CloseableChannel.java:67)\n\tat org.elasticsearch.server@8.12.0/org.elasticsearch.common.network.CloseableChannel.closeChannel(CloseableChannel.java:57)\n\tat org.elasticsearch.server@8.12.0/org.elasticsearch.http.AbstractHttpServerTransport.onException(AbstractHttpServerTransport.java:383)\n\tat org.elasticsearch.transport.netty4@8.12.0/org.elasticsearch.http.netty4.Netty4HttpServerTransport.onException(Netty4HttpServerTransport.java:326)\n\tat org.elasticsearch.transport.netty4@8.12.0/org.elasticsearch.http.netty4.Netty4HttpPipeliningHandler.exceptionCaught(Netty4HttpPipeliningHandler.java:385)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:928)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:941)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1247)\n\tat io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)\n\tat io.netty.common@4.1.94.Final/io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)\n\tat io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)\n\tat io.netty.transport@4.1.94.Final/io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:569)\n\tat io.netty.common@4.1.94.Final/io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)\n\tat io.netty.common@4.1.94.Final/io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)\n\tat java.base/java.lang.Thread.run(Thread.java:1583)\nCaused by: java.nio.channels.ClosedChannelException\n\t... 26 more\n"}
@sh3bang sh3bang added >bug needs:triage Requires assignment of a team area label labels Jan 23, 2024
@andreidan andreidan added :Search/Search Search-related issues that do not fall into other categories and removed needs:triage Requires assignment of a team area label labels Jan 23, 2024
@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Jan 23, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@DaveCTurner
Copy link
Contributor

Reproduces for me:

PUT /testindex
{
  "mappings": {
    "properties": {
      "field": {
        "type": "keyword"
      }
    }
  },
  "settings": {
    "number_of_replicas": 0
  }
}

# 200 OK
# {
#   "acknowledged": true,
#   "index": "testindex",
#   "shards_acknowledged": true
# }

POST /testindex/_doc?refresh
{
  "field": "value"
}

# 201 Created
# {
#   "_id": "H1gKNo0BZmTqZV2nDUoP",
#   "_index": "testindex",
#   "_primary_term": 1,
#   "_seq_no": 0,
#   "_shards": {
#     "failed": 0,
#     "successful": 1,
#     "total": 1
#   },
#   "_version": 1,
#   "forced_refresh": true,
#   "result": "created"
# }

POST /testindex/_search
{
  "collapse": {
    "field": "field",
    "inner_hits": {
      "size": 50
    }
  },
  "query": {
    "match_all": {}
  }
}

# No response

@benwtrent benwtrent self-assigned this Jan 23, 2024
@benwtrent
Copy link
Member

I have a yaml test that replicates what @DaveCTurner does for his replication. I have verified it fails all the way back in 8.9.

In 8.8, I get a different failure other than the chunked xcontent failure.

I am guessing we are running into the same bad xcontent objects but chunked serialization is just failing at a different part.


1>             "type" : "illegal_argument_exception",
  1>             "reason" : "Field name cannot be null",
  1>             "stack_trace" : "org.elasticsearch.ElasticsearchException$1: Field name cannot be null
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.ElasticsearchException.guessRootCauses(ElasticsearchException.java:669)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.ElasticsearchException.generateFailureXContent(ElasticsearchException.java:597)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.rest.RestResponse.build(RestResponse.java:176)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.rest.RestResponse.<init>(RestResponse.java:124)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.rest.RestResponse.<init>(RestResponse.java:103)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.rest.action.RestActionListener.onFailure(RestActionListener.java:55)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.rest.action.RestActionListener.onResponse(RestActionListener.java:40)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.rest.action.RestCancellableNodeClient$1.onResponse(RestCancellableNodeClient.java:87)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.rest.action.RestCancellableNodeClient$1.onResponse(RestCancellableNodeClient.java:81)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.client.internal.node.NodeClient$SafelyWrappedActionListener.onResponse(NodeClient.java:160)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.tasks.TaskManager$1.onResponse(TaskManager.java:205)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.tasks.TaskManager$1.onResponse(TaskManager.java:199)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.ActionListenerImplementations$RunAfterActionListener.onResponse(ActionListenerImplementations.java:172)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.ActionListener$5.onResponse(ActionListener.java:333)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.AbstractSearchAsyncAction.sendSearchResponse(AbstractSearchAsyncAction.java:723)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.FetchLookupFieldsPhase.run(FetchLookupFieldsPhase.java:75)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:470)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:464)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.ExpandSearchPhase.onPhaseDone(ExpandSearchPhase.java:151)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.ExpandSearchPhase.lambda$run$0(ExpandSearchPhase.java:102)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.ActionListener$2.onResponse(ActionListener.java:158)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:43)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.SearchTransportService$ConnectionCountingHandler.handleResponse(SearchTransportService.java:611)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.transport.TransportService$UnregisterChildTransportResponseHandler.handleResponse(TransportService.java:1679)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1395)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1494)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1465)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:42)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.support.ChannelActionListener.lambda$onResponse$0(ChannelActionListener.java:31)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.ActionListener.run(ActionListener.java:357)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:31)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:19)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.TransportMultiSearchAction$1.finish(TransportMultiSearchAction.java:180)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.TransportMultiSearchAction$1.handleResponse(TransportMultiSearchAction.java:166)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.TransportMultiSearchAction$1.onResponse(TransportMultiSearchAction.java:154)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.TransportMultiSearchAction$1.onResponse(TransportMultiSearchAction.java:151)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.client.internal.node.NodeClient$SafelyWrappedActionListener.onResponse(NodeClient.java:160)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.tasks.TaskManager$1.onResponse(TaskManager.java:205)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.tasks.TaskManager$1.onResponse(TaskManager.java:199)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.ActionListenerImplementations$RunAfterActionListener.onResponse(ActionListenerImplementations.java:172)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.ActionListener$5.onResponse(ActionListener.java:333)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.AbstractSearchAsyncAction.sendSearchResponse(AbstractSearchAsyncAction.java:723)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.FetchLookupFieldsPhase.run(FetchLookupFieldsPhase.java:75)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:470)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:464)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.ExpandSearchPhase.onPhaseDone(ExpandSearchPhase.java:151)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.ExpandSearchPhase.run(ExpandSearchPhase.java:105)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.AbstractSearchAsyncAction.executePhase(AbstractSearchAsyncAction.java:470)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.AbstractSearchAsyncAction.executeNextPhase(AbstractSearchAsyncAction.java:464)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.FetchSearchPhase.moveToNextPhase(FetchSearchPhase.java:271)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.FetchSearchPhase.lambda$innerRun$2(FetchSearchPhase.java:108)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.FetchSearchPhase.innerRun(FetchSearchPhase.java:117)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.FetchSearchPhase$1.doRun(FetchSearchPhase.java:90)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.common.util.concur  1> rent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
  1>    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
  1>    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
  1>    at java.base/java.lang.Thread.run(Thread.java:1623)
  1> Caused by: java.lang.IllegalArgumentException: Field name cannot be null
  1>    at org.elasticsearch.xcontent@8.8.3-SNAPSHOT/org.elasticsearch.xcontent.XContentBuilder.ensureNotNull(XContentBuilder.java:1268)
  1>    at org.elasticsearch.xcontent@8.8.3-SNAPSHOT/org.elasticsearch.xcontent.XContentBuilder.ensureNameNotNull(XContentBuilder.java:1263)
  1>    at org.elasticsearch.xcontent@8.8.3-SNAPSHOT/org.elasticsearch.xcontent.XContentBuilder.field(XContentBuilder.java:363)
  1>    at org.elasticsearch.xcontent@8.8.3-SNAPSHOT/org.elasticsearch.xcontent.XContentBuilder.startObject(XContentBuilder.java:334)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.search.SearchHit.toInnerXContent(SearchHit.java:784)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.search.SearchHit.toXContent(SearchHit.java:670)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.search.SearchHits.toXContent(SearchHits.java:193)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.SearchResponseSections.toXContent(SearchResponseSections.java:102)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.SearchResponse.innerToXContent(SearchResponse.java:299)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.action.search.SearchResponse.toXContent(SearchResponse.java:269)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.rest.action.RestStatusToXContentListener.buildResponse(RestStatusToXContentListener.java:45)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.rest.action.RestStatusToXContentListener.buildResponse(RestStatusToXContentListener.java:21)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.rest.action.RestBuilderListener.buildResponse(RestBuilderListener.java:27)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.rest.action.RestResponseListener.processResponse(RestResponseListener.java:26)
  1>    at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.rest.action.RestActionListener.onResponse(RestActionListener.java:38)
  1>    ... 53 more
  1>    Suppressed: java.lang.IllegalStateException: Failed to close the XContentBuilder
  1>            at org.elasticsearch.xcontent@8.8.3-SNAPSHOT/org.elasticsearch.xcontent.XContentBuilder.close(XContentBuilder.java:1254)
  1>            at org.elasticsearch.server@8.8.3-SNAPSHOT/org.elasticsearch.rest.action.RestBuilderListener.buildResponse(RestBuilderListener.java:26)
  1>            ... 55 more
  1>    Caused by: java.io.IOException: Unclosed object or array found
  1>            at org.elasticsearch.xcontent.impl@8.8.3-SNAPSHOT/org.elasticsearch.xcontent.provider.json.JsonXContentGenerator.close(JsonXContentGenerator.java:560)
  1>            at org.elasticsearch.xcontent@8.8.3-SNAPSHOT/org.elasticsearch.xcontent.XContentBuilder.close(XContentBuilder.java:1252)
  1>            ... 56 more

@benwtrent
Copy link
Member

OK, I have verified this bug seems to be because innerHitsBuilder allows not supplying a name when building the collapsable inner hits. However, we access it assuming it ISN'T null. I am not sure if we should allow no-name to be provided and default to the field (like we do with nested inner hits), or require name to be provided.

@sh3bang does your request fail when you provide a name?

{
    "query": {
        "match_all": {}
    },
    "collapse": {
        "field": "brand.name.keyword",
        "inner_hits": {
            "size": 50,
            "name": "brand_name"
        }
    },
    "fields": [
        "brand.name.keyword"
    ],
    "_source": false
}

@benwtrent
Copy link
Member

@jimczi @javanna ^ what do y'all think? Should we default to the field name when name==null for the inner hits builders? Or add validation to CollapseBuilder that requires name to be set for the inner hits builders?

This bug seems like its been around for a LONG time, but now its worse due to new ways of serializing xcontent.

@DaveCTurner
Copy link
Contributor

This bug seems like its been around for a LONG time

That's a relief, I was worried we'd messed something up recently.

but now its worse due to new ways of serializing xcontent.

I'm not sure it's meaningfully worse with the chunked encoding tbh, you never got a useful response to this request.

@sh3bang
Copy link
Author

sh3bang commented Jan 23, 2024

@sh3bang does your request fail when you provide a name?

@benwtrent nope, it doesn't fail! Nice workaround - thanks. ;)

@benwtrent
Copy link
Member

I'm not sure it's meaningfully worse with the chunked encoding tbh, you never got a useful response to this request.

@DaveCTurner you are correct, it isn't worse in any meaningful way other than the error_trace log doesn't indicate why the xcontent serialization failed :(. I had to go back before chunking to see what part of the xcontent serialization broke.

Nice workaround - thanks. ;)

🎉 🎉 🎉
@sh3bang I am glad! Sorry you ran into this weird one. We will either validate eagerly so that folks know what to do (you are required to put a name) or we will allow empty names and default to the field value.

There is some funkiness around duplicate name values (e.g. providing the same name for two inner_hits or both defaulting to field). Right now we allow duplicate name values 🤦 but these could overwrite each other, meaning the inner_hit builder provided later in the list would overwrite the earlier one in the result set if they have the same name.

@benwtrent
Copy link
Member

@jimczi @javanna the fact that we allow duplicate name values in collapse tells me we SHOULDN'T default to the field name like we do with nested. We would be encouraging extra weird and unexpected behavior (e.g. user defines two inner hit objects without names and we only return the results of one with no warning, really weird).

I vote we require name in collapse.inner_hits. All our docs use name and if its missing, we fail weirdly.

@jimczi
Copy link
Contributor

jimczi commented Jan 23, 2024

Should we default to the field name when name==null for the inner hits builders?

+1, that would be inlined with the behaviour of the nested query

There is some funkiness around duplicate name values (e.g. providing the same name for two inner_hits or both defaulting to field). Right now we allow duplicate name values 🤦 but these could overwrite each other, meaning the inner_hit builder provided later in the list would overwrite the earlier one in the result set if they have the same name.

That shouldn't be the case. See #37645 where the intent is to throw when there's a name clash. Did you find an example where the logic is flawned?

@benwtrent
Copy link
Member

That shouldn't be the case. See #37645 where the intent is to throw when there's a name clash. Did you find an example where the logic is flawned?

See:

for (InnerHitBuilder innerHitBuilder : innerHitBuilders) {

We call

hit.getInnerHits().put(innerHitBuilder.getName(), innerHits);

With no checks. That PR only messes with nested things, nothing about ExpandedSearchPhase

@benwtrent
Copy link
Member

I think either:

  • We default name to field and then disallow duplicate names
  • We require name

Those were the two options back when this was fixed for nested. name has been de facto required in collapse for quite some time. But, disallowing duplicates does change the behavior now.

@jimczi
Copy link
Contributor

jimczi commented Jan 23, 2024

ok bummer, although I think the duplicate name is another issue. Requiring name or default name to collapse field doesn't prevent the possible duplication.
Let's first ensure that it is always set to avoid the bug.
I am fine with both approach that you proposed with a slight preference on setting the name to the collapse field by default.

benwtrent added a commit that referenced this issue Jan 29, 2024
`name` is de facto required for `collapse.inner_hits`. It always has been, but we have never validated up front. Instead we accidentally try to serialize `null`, which leads to exciting and confusing errors.

closes: #104647
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants