Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer kept growing after sink error #19379

Open
andresperezl opened this issue Dec 13, 2023 · 1 comment
Open

Buffer kept growing after sink error #19379

andresperezl opened this issue Dec 13, 2023 · 1 comment
Labels
type: bug A code related bug.

Comments

@andresperezl
Copy link

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

Running vector in kuberntes as an aggreagator with a Loki sink, in its logs I saw the following error:

ERROR sink{component_kind="sink" component_id=loki component_type=loki}: vector_buffers::variants::disk_v2::writer: Last written record was unable to be deserialized. Corruption likely. reason="invalid data: check failed for struct member payload: pointer out of bounds: base 0x7f35f0870f78 offset 308637025 not in range 0x7f35e7ed0000..0x7f35ee200000"

After the error happened, the buffer wouldn't go down and just kept on growing:

image

As you can see from the image, this is a statefulset with several replicas, and the other replicas were fine, just this one was buffering a lot.

After manually restarting that pod, vector was able to solve the issue and drop the bad batch and empty its disk buffer:

ERROR sink{component_kind="sink" component_id=loki component_type=loki}: vector_buffers::internal_events: Error encountered during buffer read. error=The reader detected that a data file contains a partially-written record. error_code="partial_write" error_type="reader_failed" stage="processing" internal_log_rate_limit=true
2023-12-13T16:09:47.113249Z ERROR sink{component_kind="sink" component_id=loki component_type=loki}:sink{buffer_type="disk"}: vector_buffers::internal_events: Events dropped. count=200 intentional=false reason=corrupted_events stage=0

From the documentation here, If I understood correctly, vector should have exited by itself which would have triggered a restart of the pod automatically, but that didn't seem to happen.

I do not know exactly how to reproduce this issue, as I have been running several loadtests, and this just happen in one of them out of nowhere

Configuration

sinks:
      loki:
        buffer:
          max_size: 10737418240
          type: disk
          when_full: drop_newest
        encoding:
          codec: json
        endpoint: http://loki-gateway:80
        inputs:
        - transform_logs
        out_of_order_action: accept
        request:
          retry_max_duration_secs: 60
        type: loki

Version

0.34.0

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

@andresperezl andresperezl added the type: bug A code related bug. label Dec 13, 2023
@xiaoxiongxyy
Copy link

Is there a solution to this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

2 participants