Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative Buffer Usage being reported in Datadog and Prometheus and S3 sink hangs forever #17666

Open
smitthakkar96 opened this issue Jun 12, 2023 · 2 comments
Labels
domain: buffers Anything related to Vector's memory/disk buffers type: bug A code related bug.

Comments

@smitthakkar96
Copy link

smitthakkar96 commented Jun 12, 2023

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

We had an incident today similar to this. Our S3 sink buffer was reporting negative values (see the screenshots).

Screenshot 2023-06-12 at 11 42 57

Screenshot 2023-06-12 at 13 55 01

I also noticed that outgoing events dropped to 0 during this time, and our source time lag increased.

Screenshot 2023-06-12 at 14 00 32

Screenshot 2023-06-12 at 13 59 27

We have seen this in past when a sink with a blocking buffer gets full. Even if the client stops/slows down sending events due to backpressure, the buffer usage stays the same, causing it to hang forever.

Configuration

  aws_s3_archive.toml: |-
    type = "aws_s3"
    inputs = [ "remap_archive_logs_for_datadog" ]
    key_prefix = "dt=%Y%m%d/hour=%H/"
    compression = "gzip"
    content_encoding = "none"
    bucket = "vector-logs-archive-prod-asia"
    content_type = "application/x-gzip"
    filename_extension = "json.gz"
    filename_time_format = "archive_%H%M%S.%3f0.e"

    framing.method = "newline_delimited"
    encoding.codec = "json"

    batch.max_bytes = 256000000

    buffer.type = "disk"
    buffer.max_size = 8000000000

    buffer.when_full = "block"
  datadog_logs.toml: |-
    type = "datadog_logs"
    inputs = [ "internal_logs", "drop_nginx_ingress_logs" ]
    default_api_key = "${DATADOG_API_KEY}"
    site = "datadoghq.eu"


    buffer.type = "disk"
    buffer.max_size = 36000000000

    buffer.when_full = "block"
  internal_metrics_exporter.toml: |-
    type = "prometheus_exporter"
    inputs = [ "remap_enrich_internal_metrics_with_static_tags" ]
    distributions_as_summaries = true
    address = "0.0.0.0:9598"
  nginx_ingress_logs_s3_archive.toml: |-
    # S3 Sink to archive nginx ingress logs

    type = "aws_s3"
    inputs = [ "filter_nginx_logs_for_archival" ]
    key_prefix = "dt=%Y%m%d/hour=%H/"
    compression = "gzip"
    content_encoding = "none"
    bucket = "nginx-ingress-logs-archive-prod-asia"
    content_type = "application/x-gzip"
    filename_extension = "json.gz"
    filename_time_format = "archive_%H%M%S.%3f0.e"

    framing.method = "newline_delimited"
    encoding.codec = "json"

    batch.max_bytes = 256000000

    buffer.type = "disk"
    buffer.max_size = 4000000000

    buffer.when_full = "drop_newest"

Version

vector 0.28.0

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

@smitthakkar96 smitthakkar96 added the type: bug A code related bug. label Jun 12, 2023
@smitthakkar96 smitthakkar96 changed the title Negative Buffer Usage being reported in Datadog and Prometheus Negative Buffer Usage being reported in Datadog and Prometheus and S3 sink hangs forever Jun 12, 2023
@smitthakkar96
Copy link
Author

@jszwedko is it related to #15683 by any chance?

@tobz tobz added the domain: buffers Anything related to Vector's memory/disk buffers label Oct 16, 2023
@smitthakkar96
Copy link
Author

Screenshot 2024-01-10 at 10 48 16

A similar issue popped up in v0.33.0. The buffer events metric dropped to a negative number and at the same time, we got an alert about datadog agent receiving small amount of errors when communicating with Vector. I didn't see anything weird in the logs around that time, and restarting Vector solved the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: buffers Anything related to Vector's memory/disk buffers type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

2 participants