Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Document level bulk request error messages are overridden by bulk level error message when max limit is reached #3507

Open
graytaylor0 opened this issue Oct 16, 2023 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@graytaylor0
Copy link
Member

graytaylor0 commented Oct 16, 2023

Describe the bug
When a bulk request fails to write to OpenSearch, failures will be handled after the max_retries has been exhausted. However, when logging the failure or sending the failure to the DLQ, the bulk level message of Number of retries reached the limit of max retries (configured value %d), instead of using the document's bulkResponse with the error code and the exception. This makes it so document level failure root cause is hidden due to the code here (

).

Expected behavior
Given how clustered the code here is, I think it is simplest for us to add both the bulk-level error message (if it exists) as well as the document-level failure at all times to the failure message that is logged or sent to the dlq.

Additional context
Add any other context about the problem here.

@graytaylor0
Copy link
Member Author

Related to #3504

@dlvenable dlvenable added this to the v2.6 milestone Oct 18, 2023
@dlvenable dlvenable modified the milestones: v2.6, v2.6.1 Nov 14, 2023
@dlvenable
Copy link
Member

Here is an example I got:

{
  "dlqObjects": [
    {
      "pluginId": "opensearch",
      "pluginName": "opensearch",
      "pipelineName": "test-pipeline",
      "failedData": {
        "index": "dlq-failures-aoss",
        "indexId": "a001",
        "status": 0,
        "message": "Number of retries reached the limit of max retries (configured value 8)",
        "document": {
          "id": "a001",
          "name": "Test001",
          "number": 145,
          "action": "index"
        }
      },
      "timestamp": "2023-11-14T17:18:10.326Z"
    },
    {
      "pluginId": "opensearch",
      "pluginName": "opensearch",
      "pipelineName": "test-pipeline",
      "failedData": {
        "index": "dlq-failures-aoss",
        "indexId": "a002",
        "status": 0,
        "message": "Number of retries reached the limit of max retries (configured value 8)",
        "document": {
          "id": "a002",
          "name": "Test002",
          "number": 200,
          "action": "index"
        }
      },
      "timestamp": "2023-11-14T17:18:10.328Z"
    },
    {
      "pluginId": "opensearch",
      "pluginName": "opensearch",
      "pipelineName": "test-pipeline",
      "failedData": {
        "index": "dlq-failures-aoss",
        "indexId": "a003",
        "status": 0,
        "message": "Number of retries reached the limit of max retries (configured value 8)",
        "document": {
          "id": "a003",
          "name": "Test003",
          "number": 200,
          "action": "index"
        }
      },
      "timestamp": "2023-11-14T17:18:10.329Z"
    },
    {
      "pluginId": "opensearch",
      "pluginName": "opensearch",
      "pipelineName": "test-pipeline",
      "failedData": {
        "index": "dlq-failures-aoss",
        "indexId": "a004",
        "status": 0,
        "message": "Number of retries reached the limit of max retries (configured value 8)",
        "document": {
          "id": "a004",
          "name": "Test004",
          "number": 400,
          "action": "index"
        }
      },
      "timestamp": "2023-11-14T17:18:10.329Z"
    },
    {
      "pluginId": "opensearch",
      "pluginName": "opensearch",
      "pipelineName": "test-pipeline",
      "failedData": {
        "index": "dlq-failures-aoss",
        "indexId": "a005",
        "status": 0,
        "message": "Number of retries reached the limit of max retries (configured value 8)",
        "document": {
          "id": "a005",
          "name": "Test005",
          "number": 500,
          "action": "index"
        }
      },
      "timestamp": "2023-11-14T17:18:10.329Z"
    }
  ]
}

@dlvenable
Copy link
Member

I also saw this while working on #3644.

@dlvenable dlvenable modified the milestones: v2.6.1, v2.6.2 Dec 7, 2023
@dlvenable dlvenable modified the milestones: v2.6.2, v2.8 Jan 17, 2024
@dlvenable
Copy link
Member

We should probably keep Number of retries reached the limit of max retries (configured value %d) as the prefix to these messages.

@KarstenSchnitter
Copy link
Collaborator

Is it possible to add data from the source or the event itself?

We have a use-case, where data comes in from different applications and might fail due to field type collisions. In that case, it would be helpful to identify the origin of the events. For OTel events, this can be done by the resource attributes, for JSON messages by particular fields of the message. Since DataPrepper parsed the message, it might have access to that kind of data to add to the DLQ message.

@dlvenable dlvenable removed this from the v2.8 milestone May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

No branches or pull requests

4 participants