-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[inputs.systemd_units] Error in plugin: listing unit files failed: context deadline exceeded - after updating to 1.30.0 (git: HEAD@3c03ddcf) #14980
Comments
@electrofloat can you please check the binary in #14987 available once CI finished the tests!? Let me know if this fixes the issue! |
Yes. I've upgraded to this version (Telegraf 1.31.0-30d5d365 (git: pull/14987@30d5d365)) and it does not report the errors now. Tested it on both machines. So it seems to be fixed. |
@electrofloat thanks for the quick testing! |
binary from #14987 also fixes unit name is missing errors (tested on RHEL8/9, AlmaLinux8/9) for me. |
@DStrand1 You closed this as completed, but do we know when this will be released? |
Hi, It will be part of v1.30.1 on or around April 1: |
April 1? But this is a regression in 1.30. How is this not fixed and released immediately? |
@electrofloat you can use a nightly build starting from tomorrow.
This was fixed within two days! What do you expect? You do have three possibilities, use 1.29.5 until release, use the binary in the PR or use a nightly build starting from tomorrow. We do not have the resources to bake a release for every single commit to master! |
I expect a new release after a regression fix! What you guys need to understand is that on a debian based system like ubuntu, you install software by using an apt line in a sources.list file (exactly how it is described in your docs to do on ubuntu) and then using apt. Now.. to upgrade packages on debian based systems, you type in a command like This is a second time in a short window where a new release just breaks a previously working functionality. So the problem is not with the slow patch, the problem is that we have to wait 3 weeks for release to be able to upgrade our package. |
We are well aware of how package managers work. As you also mention they do provide mechanisms for you to avoid package versions with issues. As Sven already said, we do not release a new version for every single fix, security issue, or regression. Telegraf has for its history used time-based releases with great success. When issues do arise, there are mechanisms available to users to use a nightly, a custom build, or revert to a previous version whether they are using our own provided package repo, downloading tarballs, or using the official docker images.
Yes, it is and I can tell you we hate when this happens, and it literally keeps us up at night after a release. It is why we jump on these types of issues and ensure that we make every attempt to resolve them ASAP. Additionally, when we are landing PRs, there is a consideration around the potential for regression. For a tool with literally millions of deployments across a wide range of architectures and operating systems, each that can have huge numbers of varying environments and configurations, we cannot replicate every deployment or scenario. We released the Docker image of 1.30 today, which means a lot more users may run into this or other issues. I would personally feel better about waiting till early next week to see if anything else has come up before we jump on another release. |
One more thing I wanted to mention: do you have the ability to run the nightly build as a test? it would help both us and you so incredibly much if you could or had the ability. That way you could catch issues before we did a release and could relay issues that you might have. I take it you have a large deployment so catching issues earlier would help both of us. |
@powersj Unfortunately no. We have strict rules on what software we can install on prod machines, which only includes stable releases. I also forgot to mention, but probably you know this too already, debian/ubuntu has this so called feature "phased-updates". Which also supported by apt now since 2.1.16 (Fri, 08 Jan 2021 22:01:50 +0100). That means a new package update does not get to all the repo users at once, but in phases. And in the event of a regression they can immediately set the phasing back to 0%, which causes it to not to install the update. So maybe in the future you could utilize this feature too. (as far as I remember, on the server side this only needs a new |
It absolutely would!
Any issues that you come across should be filed as issues in this repo. |
@powersj @srebhan First, thanks for your work on this bugfix. 🙏 I've evaluated https://repos.influxdata.com/rhel/7/x86_64/stable/telegraf-1.30.1-1.x86_64.rpm (appeared in repo yesterday) and still see this error. Are we too early? I notice that https://github.com/influxdata/telegraf still shows "Latest" as v1.30.0. Example error:
|
@JamieSimon2 which systemd version is installed? Is the dbus interface running? |
@srebhan thanks for your quick response! (This is Centos7 😢 )
|
This is the issue, a 9 year old systemd... ;-) @JamieSimon2 could you please open a new issue with the information above? We will discuss internally how we handle the situation... |
Relevant telegraf.conf
Logs from Telegraf
on a different machine:
System info
Ubuntu 22.04
Docker
No response
Steps to reproduce
I've just updated to: Telegraf 1.30.0 (git: HEAD@3c03ddcf)
and after restarting telegraf, I'm getting the above error in the logs.
Expected behavior
No error.
Actual behavior
Error logs
Additional info
I had to revert back to: Telegraf 1.29.5 (git: HEAD@138d0d54)
The text was updated successfully, but these errors were encountered: