-
Notifications
You must be signed in to change notification settings - Fork 342
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flamegraph does not match table data and callgraph #212
Comments
I need help understanding this. |
I think there is a bug in the flamegraph. Notice how on the main Bug ticket opened here #207 though I have not had a chance to fix it. |
sidenote: check your server charset settings, looks like you are possibly serving up ISO-8859-1 encoding instead of UTF-8. |
I'm confused, if the flamegraph is incorrect, ¿how is the total wall time calculated? |
The inclusive wall time of main function is the sum of all its secondary functions, ¿right? So, ¿why is the sum of the self wall times of their children not equivalent to the inclusive wall time of the main function? |
It should be close, but it won't be exact as xhprof is not able to capture 100% of every operation, there is always a bit of slippage/loss. |
¿Can you see that for me the most logical graph is flamegraph? The sum seems correct, but in the callgraph I do not understand why the time is different:
More information: Please, help me to understand it. |
First you need to find out what is really the true correct inclusive wall time values. Your data (the index table, which should be the correct thing) is reporting that db_connect took 32ms inclusive wall time... that means it took 32ms if you did something like this: $t = microtime(true);
whatever->db_connect()
error_log(microtime(true) - $t); Most surely that is correct, but please confirm by actually using that snippet above and writing the value with $t = microtime(true);
MyApplication::run(); // whatever the main entry point to your app that does everything.
error_log(microtime(true) - $t); |
You could also look at the data in mongodb. If xhprof has recorded the data 'wrong' we can never make the page look right. |
Inclusive Wall Time will always be greater than Self Wall Time. |
Oh, right, I saw wrong..., I thought time was 468,000 but is just 468; my error. |
¿Is the wall time sum in the flamegraph the same of callgraph? |
It should be. The callgraph and flamegraph use the same data under the hood. |
@markstory Yes, the display issues that are made apparent by #216, however, the issue is pre-existent. In my opinion, the current flame graph implementation is almost useless, and highly deceptive. The only reason it doesn't look broken is due to various bugs cancelling each other out. But a close look at any part of the stack, exposes its information to be non-sensical. I looked at this earlier when we talked at #216, but after another very confusing debug round today, I realised the problem is much bigger than I thought. Below is an example: Actual call tree(Unbeknownst to xhprof)
stacksFrom xhprof (unsampled)
flamegraph-dataXHGui currently converts the above as follows: 'data' => array(
'name' => 'main()',
'value' => 1000,
'children' => array(
array(
'name' => 'smallStuff()',
'value' => 100,
'children' => array(
array(
'name' => 'process()',
'value' => 1000, # bigger than parent
'children' => array(
array(
'name' => 'processOne()',
'value' => 1000,
),
),
),
),
),
array(
'name' => 'bigStuff()',
'value' => 900,
'children' => array(
array(
'name' => 'process()',
'value' => 1000 # bigger than parent
),
),
),
),
),
'sort' => array(
'main()' => 0,
'smallStuff()' => 1,
'bigStuff()' => 4,
'process()' => 2,
'processOne()' => 3,
) flamegraph-visual
I originally thought that the only problem was that D3-flamegraph sees each node as self-time (instead of inclusive time, per #207). But that is not exactly the case. If D3-flamegraph was seeing each node reporting self-time then smallStuff/process would be computed as 2000ms (1000 self, 1000 of its children), and smallStuff as 3000ms, etc.. In actuality, D3-flamegraph is throwing away 90% of all timing data, in favour of only summing the leaf nodes. This means any time spent in any method anywhere in the system, unless it is a leaf node, it is not represented in the graph. Of course, the problem is not that D3-flamegraph is using a wrong formula for aggregating self-time. The problem is that it tries to aggregate at all, because the XHProf data is not self-time, it is already aggregated. Problems with old D3-flamegraph:
Problems with flamegraph-data export:
|
I'm ok with removing the flamegraph output entirely. If its not helpful and can't be made helpful there is no point in keeping it. |
@markstory Yeah, I'm sad to see it go, but in the short-medium term we should probably look into the Callgraph visualisation instead. Callgraph is a bit harder to navigate, but is true to the data and super accurate. In it also useful in ways that Flamegraphs can't be, such as visually emphasising methods that are called a lot and take much time, but are called from different sub-trees. The Callgraph shows that time combined really well. Would you accept a PR that removes the component? |
Yes 👍 |
This unfortunately never worked correctly due to a fundamental limitation with the XHProf data format, which is that it only records metadata per parent-child method combination, it cannot be used to build a complete call tree. The feature was added in pull perftools#177. More information about this limitation is described in detail at perftools#219, perftools#216 and perftools#212. A brief summary: * The visualisation showed callstacks that did not actually exist, and was missing callstacks that did exist. (Due to assuming that every combination of a parent-child pair is valid, and due to it randomly assinging repeated calls to the first encountered parent.) * The visualisation discarded all timing values from XHProf, except for the timing of leaf nodes (methods without children), which were then added up recursively. The end result was a visually well-balanced tree, but with timing values that were not related to the actual performance (upto 100x inflated), and the proportions were incorrect as well, making some code look fast instead of slow, and vice versa. These are inherent problems that cannot be solved because the information logically required to make a flamegraph (call stacks) is not collected by XHProf. This closes perftools#216, perftools#212, perftools#211, perftools#207. This fixes perftools#212.
This unfortunately never worked correctly due to a fundamental limitation with the XHProf data format, which is that it only records metadata per parent-child method combination, it cannot be used to build a complete call tree. The feature was added in pull perftools#177. More information about this limitation is described in detail at perftools#219, perftools#216 and perftools#212. A brief summary: * The visualisation showed callstacks that did not actually exist, and was missing callstacks that did exist. (Due to assuming that every combination of a parent-child pair is valid, and due to it randomly assinging repeated calls to the first encountered parent.) * The visualisation discarded all timing values from XHProf, except for the timing of leaf nodes (methods without children), which were then added up recursively. The end result was a visually well-balanced tree, but with timing values that were not related to the actual performance (upto 100x inflated), and the proportions were incorrect as well, making some code look fast instead of slow, and vice versa. These are inherent problems that cannot be solved because the information logically required to make a flamegraph (call stacks) is not collected by XHProf. This closes perftools#216, perftools#212, perftools#211, perftools#207. This fixes perftools#212.
This unfortunately never worked correctly due to a fundamental limitation with the XHProf data format, which is that it only records metadata per parent-child method combination, it cannot be used to build a complete call tree. The feature was added in pull perftools#177. More information about this limitation is described in detail at perftools#219, perftools#216 and perftools#212. A brief summary: * The visualisation showed callstacks that did not actually exist, and was missing callstacks that did exist. (Due to assuming that every combination of a parent-child pair is valid, and due to it randomly assinging repeated calls to the first encountered parent.) * The visualisation discarded all timing values from XHProf, except for the timing of leaf nodes (methods without children), which were then added up recursively. The end result was a visually well-balanced tree, but with timing values that were not related to the actual performance (upto 100x inflated), and the proportions were incorrect as well, making some code look fast instead of slow, and vice versa. These are inherent problems that cannot be solved because the information logically required to make a flamegraph (call stacks) is not collected by XHProf. This closes perftools#212, perftools#211, perftools#207. This fixes perftools#212.
The function main inclusive wall time value is the sum of the time invested in itself in addition to its children, this value is correct in the flamegraph, but in the callgraph and table data the values aren't the same.
The text was updated successfully, but these errors were encountered: