-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix timer GC delays in the Linux filesystem collector #2169
Conversation
Signed-off-by: bawenmao <bawenmao@sogou-inc.com>
5a6a7cb
to
a289a6a
Compare
Hi, not sure I understand the implications of this change. If you have 500k mountpoints, having 500k timer objects that disappear again after the timeout is expected. And from what I can tell, with your change this would still be the case, right? |
89f3c96
to
8a87b71
Compare
cdcb117
to
ce7799d
Compare
@SuperQ hi, why |
ce7799d
to
a8ee98a
Compare
I think the issue with CircleCI is because you've selected to "Follow" the node_exporter fork in your CircleCI account. This triggers CircleCI to think that it should build under your account, rather than the Prometheus account. If you click the "Unfollow Project" for node_exporter in your CircleCI account, we can try to re-trigger builds. |
Okay, maybe this is my problem; |
You might need to make a commit to trigger the build. Maybe squash up the commit history with |
a8ee98a
to
95c1241
Compare
time.NewTimer
instead of using time.After
that maybe cause memory leak.
Signed-off-by: bawenmao <jordy1024@163.com> Use Timer instead of using time.After that can avoid memory leak. Signed-off-by: bawenmao <jordy1024@163.com> Avoid time.after memory leak when calling time.After Signed-off-by: bawenmao <jordy1024@163.com> Use Timer instead of using time.After that maybe cause memory leak. Signed-off-by: bawenmao <bawenmao@sogou-inc.com>
95c1241
to
d028a85
Compare
Okay,Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, for the record if you read the Go docs for time.After()
this is explicitly mentioned.
After waits for the duration to elapse and then sends the current time on the returned channel. It is equivalent to NewTimer(d).C. The underlying Timer is not recovered by the garbage collector until the timer fires. If efficiency is a concern, use NewTimer instead and call Timer.Stop if the timer is no longer needed.
So, yes, it's much better to explicitly call Stop()
when there's nothing to wait for anymore.
time.NewTimer
instead of using time.After
that maybe cause memory leak.
Awesome findings! |
NOTE: In order to support globs in the textfile collector path, filenames exposed by `node_textfile_mtime_seconds` now contain the full path name. * [CHANGE] Add path label to rapl collector #2146 * [CHANGE] Exclude filesystems under /run/credentials #2157 * [FEATURE] Add support for monitoring GPUs on Linux #1998 * [FEATURE] Add Darwin thermal collector #2032 * [FEATURE] Add os release collector #2094 * [FEATURE] Add netdev.address-info collector #2105 * [ENHANCEMENT] Support glob textfile collector directories #1985 * [ENHANCEMENT] ethtool: Expose node_ethtool_info metric #2080 * [ENHANCEMENT] Use include/exclude flags for ethtool filtering #2165 * [ENHANCEMENT] Add flag to disable guest CPU metrics #2123 * [ENHANCEMENT] Add DMI collector #2131 * [ENHANCEMENT] Add threads metrics to processes collector #2164 * [ENHANCMMENT] Reduce timer GC delays in the Linux filesystem collector #2169 * [BUGFIX] ethtool: Sanitize metric names #2093 * [BUGFIX] Fix ethtool collector for multiple interfaces #2126 * [BUGFIX] Fix possible panic on macOS #2133 * [BUGFIX] Collect flag_info and bug_info only for one core #2156 Signed-off-by: Ben Kochie <superq@gmail.com>
NOTE: In order to support globs in the textfile collector path, filenames exposed by `node_textfile_mtime_seconds` now contain the full path name. * [CHANGE] Add path label to rapl collector #2146 * [CHANGE] Exclude filesystems under /run/credentials #2157 * [FEATURE] Add darwin powersupply collector #1777 * [FEATURE] Add support for monitoring GPUs on Linux #1998 * [FEATURE] Add Darwin thermal collector #2032 * [FEATURE] Add os release collector #2094 * [FEATURE] Add netdev.address-info collector #2105 * [ENHANCEMENT] Support glob textfile collector directories #1985 * [ENHANCEMENT] ethtool: Expose node_ethtool_info metric #2080 * [ENHANCEMENT] Use include/exclude flags for ethtool filtering #2165 * [ENHANCEMENT] Add flag to disable guest CPU metrics #2123 * [ENHANCEMENT] Add DMI collector #2131 * [ENHANCEMENT] Add threads metrics to processes collector #2164 * [ENHANCMMENT] Reduce timer GC delays in the Linux filesystem collector #2169 * [BUGFIX] ethtool: Sanitize metric names #2093 * [BUGFIX] Fix ethtool collector for multiple interfaces #2126 * [BUGFIX] Fix possible panic on macOS #2133 * [BUGFIX] Collect flag_info and bug_info only for one core #2156 Signed-off-by: Ben Kochie <superq@gmail.com>
NOTE: In order to support globs in the textfile collector path, filenames exposed by `node_textfile_mtime_seconds` now contain the full path name. * [CHANGE] Add path label to rapl collector #2146 * [CHANGE] Exclude filesystems under /run/credentials #2157 * [FEATURE] Add lnstat collector for metrics from /proc/net/stat/ #1771 * [FEATURE] Add darwin powersupply collector #1777 * [FEATURE] Add support for monitoring GPUs on Linux #1998 * [FEATURE] Add Darwin thermal collector #2032 * [FEATURE] Add os release collector #2094 * [FEATURE] Add netdev.address-info collector #2105 * [ENHANCEMENT] Support glob textfile collector directories #1985 * [ENHANCEMENT] ethtool: Expose node_ethtool_info metric #2080 * [ENHANCEMENT] Use include/exclude flags for ethtool filtering #2165 * [ENHANCEMENT] Add flag to disable guest CPU metrics #2123 * [ENHANCEMENT] Add DMI collector #2131 * [ENHANCEMENT] Add threads metrics to processes collector #2164 * [ENHANCMMENT] Reduce timer GC delays in the Linux filesystem collector #2169 * [BUGFIX] ethtool: Sanitize metric names #2093 * [BUGFIX] Fix ethtool collector for multiple interfaces #2126 * [BUGFIX] Fix possible panic on macOS #2133 * [BUGFIX] Collect flag_info and bug_info only for one core #2156 Signed-off-by: Ben Kochie <superq@gmail.com>
NOTE: In order to support globs in the textfile collector path, filenames exposed by `node_textfile_mtime_seconds` now contain the full path name. * [CHANGE] Add path label to rapl collector #2146 * [CHANGE] Exclude filesystems under /run/credentials #2157 * [FEATURE] Add lnstat collector for metrics from /proc/net/stat/ #1771 * [FEATURE] Add darwin powersupply collector #1777 * [FEATURE] Add support for monitoring GPUs on Linux #1998 * [FEATURE] Add Darwin thermal collector #2032 * [FEATURE] Add os release collector #2094 * [FEATURE] Add netdev.address-info collector #2105 * [ENHANCEMENT] Support glob textfile collector directories #1985 * [ENHANCEMENT] ethtool: Expose node_ethtool_info metric #2080 * [ENHANCEMENT] Use include/exclude flags for ethtool filtering #2165 * [ENHANCEMENT] Add flag to disable guest CPU metrics #2123 * [ENHANCEMENT] Add DMI collector #2131 * [ENHANCEMENT] Add threads metrics to processes collector #2164 * [ENHANCMMENT] Reduce timer GC delays in the Linux filesystem collector #2169 * [BUGFIX] ethtool: Sanitize metric names #2093 * [BUGFIX] Fix ethtool collector for multiple interfaces #2126 * [BUGFIX] Fix possible panic on macOS #2133 * [BUGFIX] Collect flag_info and bug_info only for one core #2156 Signed-off-by: Ben Kochie <superq@gmail.com>
NOTE: In order to support globs in the textfile collector path, filenames exposed by `node_textfile_mtime_seconds` now contain the full path name. * [CHANGE] Add path label to rapl collector #2146 * [CHANGE] Exclude filesystems under /run/credentials #2157 * [FEATURE] Add lnstat collector for metrics from /proc/net/stat/ #1771 * [FEATURE] Add darwin powersupply collector #1777 * [FEATURE] Add support for monitoring GPUs on Linux #1998 * [FEATURE] Add Darwin thermal collector #2032 * [FEATURE] Add os release collector #2094 * [FEATURE] Add netdev.address-info collector #2105 * [ENHANCEMENT] Support glob textfile collector directories #1985 * [ENHANCEMENT] ethtool: Expose node_ethtool_info metric #2080 * [ENHANCEMENT] Use include/exclude flags for ethtool filtering #2165 * [ENHANCEMENT] Add flag to disable guest CPU metrics #2123 * [ENHANCEMENT] Add DMI collector #2131 * [ENHANCEMENT] Add threads metrics to processes collector #2164 * [ENHANCMMENT] Reduce timer GC delays in the Linux filesystem collector #2169 * [BUGFIX] ethtool: Sanitize metric names #2093 * [BUGFIX] Fix ethtool collector for multiple interfaces #2126 * [BUGFIX] Fix possible panic on macOS #2133 * [BUGFIX] Collect flag_info and bug_info only for one core #2156 Signed-off-by: Ben Kochie <superq@gmail.com>
Use `time.NewTimer()` and explicit `Stop()` to avoid memory bloat / GC problems with `time.After()` in the Linux filesystem collector timeout handling. Signed-off-by: bawenmao <bawenmao@sogou-inc.com>
NOTE: In order to support globs in the textfile collector path, filenames exposed by `node_textfile_mtime_seconds` now contain the full path name. * [CHANGE] Add path label to rapl collector prometheus#2146 * [CHANGE] Exclude filesystems under /run/credentials prometheus#2157 * [FEATURE] Add lnstat collector for metrics from /proc/net/stat/ prometheus#1771 * [FEATURE] Add darwin powersupply collector prometheus#1777 * [FEATURE] Add support for monitoring GPUs on Linux prometheus#1998 * [FEATURE] Add Darwin thermal collector prometheus#2032 * [FEATURE] Add os release collector prometheus#2094 * [FEATURE] Add netdev.address-info collector prometheus#2105 * [ENHANCEMENT] Support glob textfile collector directories prometheus#1985 * [ENHANCEMENT] ethtool: Expose node_ethtool_info metric prometheus#2080 * [ENHANCEMENT] Use include/exclude flags for ethtool filtering prometheus#2165 * [ENHANCEMENT] Add flag to disable guest CPU metrics prometheus#2123 * [ENHANCEMENT] Add DMI collector prometheus#2131 * [ENHANCEMENT] Add threads metrics to processes collector prometheus#2164 * [ENHANCMMENT] Reduce timer GC delays in the Linux filesystem collector prometheus#2169 * [BUGFIX] ethtool: Sanitize metric names prometheus#2093 * [BUGFIX] Fix ethtool collector for multiple interfaces prometheus#2126 * [BUGFIX] Fix possible panic on macOS prometheus#2133 * [BUGFIX] Collect flag_info and bug_info only for one core prometheus#2156 Signed-off-by: Ben Kochie <superq@gmail.com>
Use `time.NewTimer()` and explicit `Stop()` to avoid memory bloat / GC problems with `time.After()` in the Linux filesystem collector timeout handling. Signed-off-by: bawenmao <bawenmao@sogou-inc.com>
NOTE: In order to support globs in the textfile collector path, filenames exposed by `node_textfile_mtime_seconds` now contain the full path name. * [CHANGE] Add path label to rapl collector prometheus#2146 * [CHANGE] Exclude filesystems under /run/credentials prometheus#2157 * [FEATURE] Add lnstat collector for metrics from /proc/net/stat/ prometheus#1771 * [FEATURE] Add darwin powersupply collector prometheus#1777 * [FEATURE] Add support for monitoring GPUs on Linux prometheus#1998 * [FEATURE] Add Darwin thermal collector prometheus#2032 * [FEATURE] Add os release collector prometheus#2094 * [FEATURE] Add netdev.address-info collector prometheus#2105 * [ENHANCEMENT] Support glob textfile collector directories prometheus#1985 * [ENHANCEMENT] ethtool: Expose node_ethtool_info metric prometheus#2080 * [ENHANCEMENT] Use include/exclude flags for ethtool filtering prometheus#2165 * [ENHANCEMENT] Add flag to disable guest CPU metrics prometheus#2123 * [ENHANCEMENT] Add DMI collector prometheus#2131 * [ENHANCEMENT] Add threads metrics to processes collector prometheus#2164 * [ENHANCMMENT] Reduce timer GC delays in the Linux filesystem collector prometheus#2169 * [BUGFIX] ethtool: Sanitize metric names prometheus#2093 * [BUGFIX] Fix ethtool collector for multiple interfaces prometheus#2126 * [BUGFIX] Fix possible panic on macOS prometheus#2133 * [BUGFIX] Collect flag_info and bug_info only for one core prometheus#2156 Signed-off-by: Ben Kochie <superq@gmail.com>
Hello,I found a potential hazard in
filesystem_linux.go
that may cause memory leaks,the details are as follows:function
GetStats
will loop through mount points, according to the number of all mount points.Recently I encountered a problem: the number of mount points of a certain system in the production environment is abnormal, as many as more than 500,000 rows; mainly because the for loop is blocked, and then the
Timer Heap objects
in the statementfor {select case->time.After
cause accumulation, and eventually a memory leak occurs.It may be a better way to control the number of mount points to be checked, but this may require more changes; therefore, I only made a slight optimization for this problem: change
time.After
totime.NewTimer
.Expect to be accepted or answered.