-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix root span sampling #263
Fix root span sampling #263
Conversation
I'm not sure if
%% hack to set the created span as local not sampled
?set_current_span(SpanCtx1#span_ctx{is_remote=false,
is_recording=false,
trace_flags=0}), |
Codecov Report
@@ Coverage Diff @@
## main #263 +/- ##
==========================================
- Coverage 36.47% 36.39% -0.09%
==========================================
Files 41 41
Lines 3169 3165 -4
==========================================
- Hits 1156 1152 -4
Misses 2013 2013
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
ab06852
to
28cfcc5
Compare
I think I've fixed it properly now. |
Yup, I think this is right. Thanks! I've got to look a little closer but will merge soon I think. |
I converted this PR back to a draft. You were right to have a little closer look ;) because there still a problem: I noticed that sometime I would have missing root spans, it seems like this PR broke the other use case: when you do have parent spans, then they are not exported it seems? |
Never mind, I was testing with also #261 and that seems to be actually the problem, if I use both PRs but don't use any samplers then there are no issues. |
Please merge #264 first, it fixes the failing test in this PR (the underlying issue is not introduced in this PR, hence the separate PR). |
@tsloughter could it be that there is a bug (introduced by this PR/related) with the sweeper? We're running a long process where we use |
|
@dvic are you sure the sweeper is running? Not recorded is a tricky situation. Originally a span that wasn't sampled wouldn't be inserted into the ets table, but the addition of "non-recording" complicated that. I think this was done to be safe but it may be the case that we don't need it. I'll see if I can find anything in the spec that relates to a non-recording span being used that would mean we need to keep it in the table. But either way they should be swept away, so that is a separate problem I think. |
This is by default right? (I guess with recent Elixir versions, where, how I understand it, deps are automatically started?) Anyways, it seems to be running:
Clear, I'll have a look to see if I can replicate with a test. |
I can't replicate with a test (tried really quickly) but non-recording spans really seem to stay put in the ETS table. How is the sweeper supposed to work? Does it clean every 10 minutes the items that are 30 minutes and older? Is this the default setting? |
Yea, couldn't remember if it defaults to sweeping anything. But yea, looks like it has defaults of dropping any spans over 30 minutes old. |
Alright, well if there is no good reason for having the non-recorded ones in ETS I suggest we skip inserting them, because for this particular type of operation (a cron job "sync" operation that involves many database operations) we now have 10x increase in memory usage :) |
I've just tested a patch that does not insert non-recording spans anymore and indeed it solves our memory problems (1100+ MB vs ~200 MB usage). Let me know if you want me to open a PR for this. By the way: all tests seem to pass, so if there is some behavior associated with these non-recording spans, it's not tested yet. |
Thanks, yes, I think a separate PR would be great. I'm guessing it should be merged but am still not positive, will try to figure out today. Would be great to also figure out why the sweeper isn't working in this case. I can try to look into it today as well but if you have the opportunity to run a regular erlang trace on the sweeper to see what it is doing in your case where the table is growing that'd be a huge help. Hit me up on slack if you are able to do that and want some help. |
Sure, I'll submit one then.
The problem is not that the sweeper is not working but that the default config results in a high memory usage. We run a process that executes many queries and each query generates a span. This results in so many spans in ETS and given the fact that they are only cleaned every 10 minutes (older than 30 minutes) it takes 30 minutes or so to clear all non-recording spans. Here is an example log statement that was outputted:
|
@dvic aaah, ok. Yea, those defaults were picked to be very conservative. |
This fixes #262