Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Rate limit reading expensive files from /proc #814

Closed
wants to merge 7 commits into from

Conversation

tomwilkie
Copy link
Contributor

Fixes #812

@tomwilkie tomwilkie self-assigned this Jan 14, 2016
@tomwilkie tomwilkie changed the title Rate limit reading expensive files from /proc [WIP] Rate limit reading expensive files from /proc Jan 14, 2016
@tomwilkie
Copy link
Contributor Author

This seems to have some unintended side effects on the edges that get detected. Needs more work.

Example without this change:

screen shot 2016-01-15 at 10 46 25

With this change:

screen shot 2016-01-15 at 10 43 56

@tomwilkie tomwilkie force-pushed the 812-rate-limit-proc branch 2 times, most recently from e835d21 to a0806f9 Compare January 15, 2016 18:32
@2opremio 2opremio assigned 2opremio and unassigned tomwilkie Jan 21, 2016
@tomwilkie tomwilkie force-pushed the 812-rate-limit-proc branch from a0806f9 to 5c29c6f Compare January 26, 2016 06:19
@tomwilkie
Copy link
Contributor Author

Have rebased, increased limits and made it so results are deleted once used (a bit racey right now, just an experiment). Results are improved but he connections are still flakey and sometime outright odd.

Its worth testing without conntrack (as that has known bugs) so you know any mis-connections are due to this change:

./scope launch --probe.conntrack=false

@2opremio
Copy link
Contributor

I've experimented with grouping the processes by namespace to ensure that all the reads of /proc/PID/fd/* are done immediately after reading /proc/PID/net/tcp{,6}.

However, I still miss a lot of connections.

The cause most likely lays in the inconsistencies resulting from caching /proc/PID/net/tcp{,6} and not /proc/PID/fds. I will start by caching both to see if it solves the problem.

@2opremio 2opremio force-pushed the 812-rate-limit-proc branch 4 times, most recently from 4ea7668 to 2c32110 Compare January 28, 2016 20:09
@2opremio
Copy link
Contributor

2opremio commented Feb 1, 2016

This is still hacky/work in process, but works well. Now there's a background gorountine walking /proc/PID/fd/* and /proc/PID/net/tcp{,6} in a loop with dynamically adjusted ratelimiting.

On the way I found out that we were not reading the network namespace inodes properly (the inode of the symlink is not the inode of the namespace), resulting in reading /proc/PID/net/tcp{,6} for all processes (see f63b637).

In the average case that should provide over one order of magnitude speedup to the PID walker (e.g. in my machine I have 162 processes and only 7 net namespaces).

So, for now, maybe it's enough to simply cherrypick f63b637 and probably 2c32110 .

@2opremio 2opremio force-pushed the 812-rate-limit-proc branch 2 times, most recently from 004ab28 to 31a78ec Compare February 2, 2016 12:17
@2opremio 2opremio force-pushed the 812-rate-limit-proc branch from 31a78ec to 629d41e Compare February 2, 2016 12:25
@2opremio
Copy link
Contributor

2opremio commented Feb 2, 2016

I have made some measurements to see if it's still worth having a background reader after fixing the namespace reading but it doesn't seem to be the case.

With 60 network namespaces, the probe is at ~15% CPU in both cases. BTW, master is over 30%, which gives us a 2x gain :) .

So, I am closing this in favor of #898 and I will cherry-pick f799dc8 into a separate PR.

@2opremio 2opremio closed this Feb 2, 2016
@2opremio 2opremio deleted the 812-rate-limit-proc branch February 2, 2016 12:51
@tomwilkie
Copy link
Contributor Author

+1

On Tuesday, 2 February 2016, Alfonso Acosta notifications@github.com
wrote:

Closed #814 #814.


Reply to this email directly or view it on GitHub
#814 (comment).

@2opremio 2opremio mentioned this pull request Feb 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants