Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to set wal_truncate_frequency (greater than scrape_frequency) #390

Closed
oneacl opened this issue Feb 6, 2021 · 4 comments · Fixed by #403
Closed

unable to set wal_truncate_frequency (greater than scrape_frequency) #390

oneacl opened this issue Feb 6, 2021 · 4 comments · Fixed by #403
Labels
bug Something isn't working frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed.

Comments

@oneacl
Copy link

oneacl commented Feb 6, 2021

Hi,

I'm using: https://github.com/grafana/agent/releases/latest/download/agent-linux-amd64.zip

agent config:

integrations:
  node_exporter:
    scrape_interval: 10m
    enabled: true
    textfile_directory: /textfile
  prometheus_remote_write:
    - basic_auth:
        password: (redacted)
        username: (redacted)
      url: https://prometheus-us-central1.grafana.net/api/prom/push
prometheus:
  configs:
  - wal_truncate_frequency: 10m
    name: integrations
    remote_write:
    - basic_auth:
        password: (redacted)
        username: (redacted)
      url: https://prometheus-us-central1.grafana.net/api/prom/push
  wal_directory: /tmp/grafana-agent-wal
server:
  http_listen_port: 12345

results in:

level=error ts=2021-02-06T11:36:10.082340821Z caller=manager.go:209 msg="failed to apply integration. integration will not run. THIS IS A BUG!" err="failed to apply grouped configs for config integration/node_exporter: failed to validate instance 5412fdfaa4: scrape interval greater than wal_truncate_frequency for scrape config with job name \"integrations/node_exporter\"" integration=node_exporter

Setting scrape_interval to 1m works.
Setting wal_truncate_frequency: 0s returns the expected error (must be greater than zero)

I'm not sure how I can check the wal_truncate_frequency after the agent has started (to confirm it's indeed 10m although based on the error message it doesn't appear to?).
Is there any way I can set the scrape_interval greater than 1m?

Thank you.

@rfratto rfratto added the bug Something isn't working label Feb 6, 2021
@rfratto
Copy link
Member

rfratto commented Feb 6, 2021

Hey, sorry you're running into this. Like the log message says, this is a bug (and an oversight on my part). Integrations you configure are given their own unique Prometheus config and currently don't expose changing the default wal_truncate_frequency (1m). The integrations Prometheus config you manually provided wouldn't be used by the integrations system, which is why you're still seeing the error. We'll get this fixed and into the next release. Thanks for reporting!

@rfratto rfratto added the size/s label Feb 6, 2021
@oneacl
Copy link
Author

oneacl commented Feb 7, 2021

Hi Robert,
No worries at all, thanks for the quick reply - I wasn't sure if I was doing something wrong as I'm not very familiar with grafana agent.

I'm not sure if this is related or not (so I didn't raise a new issue) but this is something I noticed.
I have this working config (snip):

prometheus:
  configs:
  - name: integrations
    scrape_configs:
      - job_name: job1
        scrape_interval: 1m
        static_configs:
        - targets:
          - 192.0.2.1:9100
          - 192.0.2.2:9100
      - job_name: job2
        scrape_interval: 1m
        static_configs:
        - targets:
          - 192.0.2.1:9101
          - 192.0.2.2:9101
      - job_name: job3
        scrape_interval: 1m
        static_configs:
        - targets:
          - 192.0.2.3:9100
    remote_write:
    - basic_auth:
[..]

If I increase the scrape_interval of job3 to anything more than 1m (eg: 2m) - then everything stops working (tcpdump shows no traffic).
The logs don't show anything unusual and they suggest everything is OK.

Thanks.

@rfratto
Copy link
Member

rfratto commented Feb 8, 2021

Hmm, that's odd. Few tips for debugging that:

  1. Query the Agent's Targets API make sure your targets show up and that they've been scraped recently
  2. Enable debug logging, which is generally where Prometheus will output most of its information
  3. Optionally you can use the agentctl tools (wal-stats, sample-stats, etc) to ensure that your WAL has samples for jobs you've defined.

@oneacl
Copy link
Author

oneacl commented Feb 9, 2021

Heh, it was related after all:

2021-02-09 06:15:26.652080 I | error in config file: error validating instance integrations: scrape interval greater than wal_truncate_frequency for scrape config with job name "foo"

I'll patiently wait until the next release - thanks a lot for your fantastic work on this!

@github-actions github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label Feb 24, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants