You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment
Problem
Running vector in kuberntes as an aggreagator with a Loki sink, in its logs I saw the following error:
ERROR sink{component_kind="sink" component_id=loki component_type=loki}: vector_buffers::variants::disk_v2::writer: Last written record was unable to be deserialized. Corruption likely. reason="invalid data: check failed for struct member payload: pointer out of bounds: base 0x7f35f0870f78 offset 308637025 not in range 0x7f35e7ed0000..0x7f35ee200000"
After the error happened, the buffer wouldn't go down and just kept on growing:
As you can see from the image, this is a statefulset with several replicas, and the other replicas were fine, just this one was buffering a lot.
After manually restarting that pod, vector was able to solve the issue and drop the bad batch and empty its disk buffer:
ERROR sink{component_kind="sink" component_id=loki component_type=loki}: vector_buffers::internal_events: Error encountered during buffer read. error=The reader detected that a data file contains a partially-written record. error_code="partial_write" error_type="reader_failed" stage="processing" internal_log_rate_limit=true
2023-12-13T16:09:47.113249Z ERROR sink{component_kind="sink" component_id=loki component_type=loki}:sink{buffer_type="disk"}: vector_buffers::internal_events: Events dropped. count=200 intentional=false reason=corrupted_events stage=0
From the documentation here, If I understood correctly, vector should have exited by itself which would have triggered a restart of the pod automatically, but that didn't seem to happen.
I do not know exactly how to reproduce this issue, as I have been running several loadtests, and this just happen in one of them out of nowhere
A note for the community
Problem
Running vector in kuberntes as an aggreagator with a Loki sink, in its logs I saw the following error:
After the error happened, the buffer wouldn't go down and just kept on growing:
As you can see from the image, this is a statefulset with several replicas, and the other replicas were fine, just this one was buffering a lot.
After manually restarting that pod, vector was able to solve the issue and drop the bad batch and empty its disk buffer:
From the documentation here, If I understood correctly, vector should have exited by itself which would have triggered a restart of the pod automatically, but that didn't seem to happen.
I do not know exactly how to reproduce this issue, as I have been running several loadtests, and this just happen in one of them out of nowhere
Configuration
Version
0.34.0
Debug Output
No response
Example Data
No response
Additional Context
No response
References
No response
The text was updated successfully, but these errors were encountered: