Buffer kept growing after sink error #19379

andresperezl · 2023-12-13T16:22:43Z

A note for the community

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

Running vector in kuberntes as an aggreagator with a Loki sink, in its logs I saw the following error:

ERROR sink{component_kind="sink" component_id=loki component_type=loki}: vector_buffers::variants::disk_v2::writer: Last written record was unable to be deserialized. Corruption likely. reason="invalid data: check failed for struct member payload: pointer out of bounds: base 0x7f35f0870f78 offset 308637025 not in range 0x7f35e7ed0000..0x7f35ee200000"

After the error happened, the buffer wouldn't go down and just kept on growing:

As you can see from the image, this is a statefulset with several replicas, and the other replicas were fine, just this one was buffering a lot.

After manually restarting that pod, vector was able to solve the issue and drop the bad batch and empty its disk buffer:

ERROR sink{component_kind="sink" component_id=loki component_type=loki}: vector_buffers::internal_events: Error encountered during buffer read. error=The reader detected that a data file contains a partially-written record. error_code="partial_write" error_type="reader_failed" stage="processing" internal_log_rate_limit=true
2023-12-13T16:09:47.113249Z ERROR sink{component_kind="sink" component_id=loki component_type=loki}:sink{buffer_type="disk"}: vector_buffers::internal_events: Events dropped. count=200 intentional=false reason=corrupted_events stage=0

From the documentation here, If I understood correctly, vector should have exited by itself which would have triggered a restart of the pod automatically, but that didn't seem to happen.

I do not know exactly how to reproduce this issue, as I have been running several loadtests, and this just happen in one of them out of nowhere

Configuration

sinks:
      loki:
        buffer:
          max_size: 10737418240
          type: disk
          when_full: drop_newest
        encoding:
          codec: json
        endpoint: http://loki-gateway:80
        inputs:
        - transform_logs
        out_of_order_action: accept
        request:
          retry_max_duration_secs: 60
        type: loki

Version

0.34.0

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

The text was updated successfully, but these errors were encountered:

xiaoxiongxyy · 2023-12-27T06:25:38Z

Is there a solution to this problem?

andresperezl added the type: bug A code related bug. label Dec 13, 2023

hillmandj mentioned this issue Feb 12, 2024

Unknown buffer disk limitation #19759

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Buffer kept growing after sink error #19379

Buffer kept growing after sink error #19379

andresperezl commented Dec 13, 2023

xiaoxiongxyy commented Dec 27, 2023

Buffer kept growing after sink error #19379

Buffer kept growing after sink error #19379

Comments

andresperezl commented Dec 13, 2023

A note for the community

Problem

Configuration

Version

Debug Output

Example Data

Additional Context

References

xiaoxiongxyy commented Dec 27, 2023