-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Harvest service doesn't work as expected #122
Comments
i have a theory on what's going, but first a few questions:
One issue you hit is running Later when you try to start|stop|restart using To fix this (as root):
As root, you could also su to |
Hi faguayot, Thanks for the details and screenshots. The fact that you have a harvest binary in When using Interesting that in your last example none of the pollers started. We're going in the wrong direction :) Let's try starting just one poller in the foreground. Maybe there are errors that are being missed. Make a copy of your harvest.yml like so: Edit Login or su as the
Hopefully you'll get some logging to the terminal that helps us figure out what's wrong. |
Hi Chris, Sorry but I think I have confused you because I forgot to show you the /bin/harvest is only a symbolic link which I created for recognize the command I remove the file /etc/harvest/harvest.yml and then I've tried to follow the steps for trying to start one poller in the foreground but it seems like my harvest doesn't recognize the flag But I've deleted the foreground flag too and the result is the same without the error about the foreground. Thanks. |
Yes, I left out the |
That's a good start! If you do that and then run the following, do you see the poller running? What does the current log file for this poller show (see |
More progress - options at this point: With option B, make sure all pollers are stopped first, just so we're at a known state, then do the systemctl dance: |
Excellent! Looks like everything is working now. |
@faguayot Pls close the issue if resolved. |
Thanks Chris. |
deb/rpm harvest.example changes Handle special characters in passwords This change only addresses passwords in Pollers and Defaults. The bigger refactor is to use HarvestConfig through out the codebase, but that was too big a change at the moment. That change touches a lot more code. When that change is made, the code in conf.LoadConfig can be removed. fix remaining merge Enable GitHub code scanning Remove extra fmt workflow action Remove redundant Slack section and polish Add Dev team to clabot Add license check and GitHub action add zerolog pretty print for console InsecureSkipVerify with basicauth Correct httpd logging pattern Replace snake case with camel Fix mistyped package Shelf purges instances too soon Fixes #75 update clabot allow user-defined URL for the influxDB server update conf tests, move allow_addrs_regex: not influxdb parameter auth test cases Change triage label Replace CCLA.pdf with online link to CCLA Remove CONTRIBUTING_CCLA.pdf uniform structure of collector doc, add explanation about metric collection/calculation add known issue on WSL update toc add rename example, remove tabs disliked by markdown removed allow_addrs_regex, not a parameter tab to space tab to space remove redundant TOC; spelling typos in docs support/hacks for workload objects templates for 4 workload objects re-add earlier removed disk counters chrishenzie has signed the CCLA Make vendored copy of dependencies handle panic in collector Allow insecure Grafana TLS connections `harvest/grafana` should not rewrite https connections into http Fixes #111 enable caller for zerolog Remove buildmode=plugin Add support for cluster simulator WIP Implement Caddy style plugins for collectors Fix go vet warnings in node.go enable stacktrace during errors InfluxDB exporter should pass url unchanged Thanks to @steverweber for the suggestion Fixes #63 Add unique prom ports and export type checks to doctor Prometheus dashboards don't load when exemplar = true Fixes #96 Don't run harvest as root on RHEL/Deb See also #122 Improve harvest start behavior Two cases are improved here: 1) Harvest detects when there is a stale pidfile and correctly restarts the poller process. A stale pidfile is when the pidfile exists in `/var/run/harvest` but there is no running process associated with that pid. 1) Harvest no longer suggests killing an already running poller when you try to start it. This is a a no-op. Fixes #123 stop renamed pollers resolved comments for stop pollers in case of rename Addressed review comments Fixes #20 Restore Zapiperf support workload changes add missing tag for labels pseudometric cache ZAPI counters to distinct from own metircs Update needs triage label rpb deb bugs Fixes #50 Fixes #129 Auth_style should not be redacted Run workflows on release branch Remove unused graphite_leaves PrometheusPort should be int Trim absolute file system paths Add -trimpath to go build so errors and stacktraces print with module path@version instead of this {"level":"info","Poller":"infinity","collector":"ZapiPerf:WAFLAggr","caller":"/var/jenkins_home/workspace/BuildHarvestArtifacts/harvest/cmd/poller/collector/collector.go:318","time":"2021-06-11T13:40:03-04:00","message":"recovered from standby mode, back to normal schedule"} correct ghost poll kill Sridevi has signed CCLA Update README.md Added Upgrade steps to README file Removed specific links in the Installation steps Overall updated format Polish README.md Reduce redundant information Make tar gz example copy pasteable Fix panic in unix.go When a poller in harvest.yml is changed while a unix collector is running it panics Fixes #160 Remove pidfiles - Improve poller detection by injecting IS_HARVEST into exec-ed process's environment. - Simplify management code and improve accuracy - Remove /var/run logic from RPM and Deb script to validate metrics at runtime typo update changelog update support md update readme run ghost kill poller during harvest start Store reason as a label for disk.yaml so that disk status is correctly reported Fixes #182 check trailing newline needs to be done before splitlines make sure stream trails with newline label value can be empty fix mistake in label regex include empty keys, to make sure label set is consistent fix export options, to avoid duplicate labels properly parse boolean parameters avoid metric name conflict fix return value when nothing is scraped drop using lib alias typo in plugin params Correcting Grafana Cluster Dashboard Typo plus other same typos port range changes resolved merge commits port range review comments Encapsulate port mapping port range changes Reduce the amount of time and attempts spinning for status checks Makes a big difference on Mac when process is not found Goes from 19.5 seconds to (not) start 27 pollers to 1.9 seconds Add README on how to setup per poller systemd services. Add generate systemd subcommand check for duplicate metatags, since telegraf complains about this as well ugly temporary solution against duplicate metatags temporary fix to duplicate node labels, until fixed in Aggregator plugin resolve conflicting names with system_node.yaml, to prevent label inconsistency shelf dashboard: adding ovverride option for shelf field Node Dashboard Bugs
* script to validate metrics at runtime * typo * check trailing newline needs to be done before splitlines * make sure stream trails with newline * label value can be empty * fix mistake in label regex * include empty keys, to make sure label set is consistent * fix export options, to avoid duplicate labels * properly parse boolean parameters * avoid metric name conflict * fix return value when nothing is scraped * drop using lib alias * typo in plugin params * check for duplicate metatags, since telegraf complains about this as well * ugly temporary solution against duplicate metatags * temporary fix to duplicate node labels, until fixed in Aggregator plugin * resolve conflicting names with system_node.yaml, to prevent label inconsistency * harvest yml changes deb/rpm harvest.example changes Handle special characters in passwords This change only addresses passwords in Pollers and Defaults. The bigger refactor is to use HarvestConfig through out the codebase, but that was too big a change at the moment. That change touches a lot more code. When that change is made, the code in conf.LoadConfig can be removed. fix remaining merge Enable GitHub code scanning Remove extra fmt workflow action Remove redundant Slack section and polish Add Dev team to clabot Add license check and GitHub action add zerolog pretty print for console InsecureSkipVerify with basicauth Correct httpd logging pattern Replace snake case with camel Fix mistyped package Shelf purges instances too soon Fixes #75 update clabot allow user-defined URL for the influxDB server update conf tests, move allow_addrs_regex: not influxdb parameter auth test cases Change triage label Replace CCLA.pdf with online link to CCLA Remove CONTRIBUTING_CCLA.pdf uniform structure of collector doc, add explanation about metric collection/calculation add known issue on WSL update toc add rename example, remove tabs disliked by markdown removed allow_addrs_regex, not a parameter tab to space tab to space remove redundant TOC; spelling typos in docs support/hacks for workload objects templates for 4 workload objects re-add earlier removed disk counters chrishenzie has signed the CCLA Make vendored copy of dependencies handle panic in collector Allow insecure Grafana TLS connections `harvest/grafana` should not rewrite https connections into http Fixes #111 enable caller for zerolog Remove buildmode=plugin Add support for cluster simulator WIP Implement Caddy style plugins for collectors Fix go vet warnings in node.go enable stacktrace during errors InfluxDB exporter should pass url unchanged Thanks to @steverweber for the suggestion Fixes #63 Add unique prom ports and export type checks to doctor Prometheus dashboards don't load when exemplar = true Fixes #96 Don't run harvest as root on RHEL/Deb See also #122 Improve harvest start behavior Two cases are improved here: 1) Harvest detects when there is a stale pidfile and correctly restarts the poller process. A stale pidfile is when the pidfile exists in `/var/run/harvest` but there is no running process associated with that pid. 1) Harvest no longer suggests killing an already running poller when you try to start it. This is a a no-op. Fixes #123 stop renamed pollers resolved comments for stop pollers in case of rename Addressed review comments Fixes #20 Restore Zapiperf support workload changes add missing tag for labels pseudometric cache ZAPI counters to distinct from own metircs Update needs triage label rpb deb bugs Fixes #50 Fixes #129 Auth_style should not be redacted Run workflows on release branch Remove unused graphite_leaves PrometheusPort should be int Trim absolute file system paths Add -trimpath to go build so errors and stacktraces print with module path@version instead of this {"level":"info","Poller":"infinity","collector":"ZapiPerf:WAFLAggr","caller":"/var/jenkins_home/workspace/BuildHarvestArtifacts/harvest/cmd/poller/collector/collector.go:318","time":"2021-06-11T13:40:03-04:00","message":"recovered from standby mode, back to normal schedule"} correct ghost poll kill Sridevi has signed CCLA Update README.md Added Upgrade steps to README file Removed specific links in the Installation steps Overall updated format Polish README.md Reduce redundant information Make tar gz example copy pasteable Fix panic in unix.go When a poller in harvest.yml is changed while a unix collector is running it panics Fixes #160 Remove pidfiles - Improve poller detection by injecting IS_HARVEST into exec-ed process's environment. - Simplify management code and improve accuracy - Remove /var/run logic from RPM and Deb script to validate metrics at runtime typo update changelog update support md update readme run ghost kill poller during harvest start Store reason as a label for disk.yaml so that disk status is correctly reported Fixes #182 check trailing newline needs to be done before splitlines make sure stream trails with newline label value can be empty fix mistake in label regex include empty keys, to make sure label set is consistent fix export options, to avoid duplicate labels properly parse boolean parameters avoid metric name conflict fix return value when nothing is scraped drop using lib alias typo in plugin params Correcting Grafana Cluster Dashboard Typo plus other same typos port range changes resolved merge commits port range review comments Encapsulate port mapping port range changes Reduce the amount of time and attempts spinning for status checks Makes a big difference on Mac when process is not found Goes from 19.5 seconds to (not) start 27 pollers to 1.9 seconds Add README on how to setup per poller systemd services. Add generate systemd subcommand check for duplicate metatags, since telegraf complains about this as well ugly temporary solution against duplicate metatags temporary fix to duplicate node labels, until fixed in Aggregator plugin resolve conflicting names with system_node.yaml, to prevent label inconsistency shelf dashboard: adding ovverride option for shelf field Node Dashboard Bugs Co-authored-by: rahulg2 <rahul.gupta@netapp.com>
Describe the bug
A clear and concise description of what the bug is.
When I started the service of harvest, it only runs two poller from 11 that we have defined in the harvest.yml. If I run the harvest process with the next command:
**/opt/harvest/bin/harvest start**
or**/opt/harvest/bin/harvest restart all the pollers**
run correctly. It happens the same with the different context defining for the service that is to say: start, restart, status and stop.I attached an image of what we see in the beginning of the service
Environment
Provide accurate information about the environment to help us reproduce the issue.
bin/harvest start --config=foo.yml --collectors Zapi
]To Reproduce
Running the service systemctl start harvest.service
Expected behavior
It should run a process for every poller in my harvest.yml.
Actual behavior
It only runs two pollers, sometimes none of them.
Possible solution, workaround, fix
Starting the gathering using the executable: "/opt/harvest/bin/harvest" instead of the service
The text was updated successfully, but these errors were encountered: