Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flamegraph does not match table data and callgraph #212

Closed
josdagaro opened this issue Oct 19, 2017 · 18 comments · Fixed by #246
Closed

Flamegraph does not match table data and callgraph #212

josdagaro opened this issue Oct 19, 2017 · 18 comments · Fixed by #246

Comments

@josdagaro
Copy link

josdagaro commented Oct 19, 2017

The function main inclusive wall time value is the sum of the time invested in itself in addition to its children, this value is correct in the flamegraph, but in the callgraph and table data the values aren't the same.

3


1


2

@josdagaro
Copy link
Author

I need help understanding this.

@michaelbutler
Copy link

I think there is a bug in the flamegraph. Notice how on the main Recent runs page it says 150ms wall time, that's the overall time it took for the whole request. You can confirm using chrome network inspector. The callgraph also shows main() taking 150ms wall time, also correct. The flamegraph says 454ms, that is not correct. The reason for this is that it adds the total time (150ms) plus the total time of all children, greatly inflating the value.

Bug ticket opened here #207 though I have not had a chance to fix it.

@michaelbutler
Copy link

sidenote: check your server charset settings, looks like you are possibly serving up ISO-8859-1 encoding instead of UTF-8.

@josdagaro
Copy link
Author

I'm confused, if the flamegraph is incorrect, ¿how is the total wall time calculated?

@josdagaro
Copy link
Author

The inclusive wall time of main function is the sum of all its secondary functions, ¿right? So, ¿why is the sum of the self wall times of their children not equivalent to the inclusive wall time of the main function?

@markstory
Copy link
Member

So, ¿why is the sum of the self wall times of their children not equivalent to the inclusive wall time of the main function?

It should be close, but it won't be exact as xhprof is not able to capture 100% of every operation, there is always a bit of slippage/loss.

@josdagaro
Copy link
Author

josdagaro commented Oct 20, 2017

Check this:
1


2


3


4


5


6


¿Can you see that for me the most logical graph is flamegraph? The sum seems correct, but in the callgraph I do not understand why the time is different:

Flamegraph: CI_DB_driver::simple_query (44,176 ms) + pg_connect (20,067 ms) = CI_DB_postgre_driver::db_connect (62,0243)

Callgraph: CI_DB_driver::simple_query (44,176 ms) + pg_connect (20,067 ms) = CI_DB_postgre_driver::db_connect (32,229)

More information:

7

Please, help me to understand it.

@michaelbutler
Copy link

First you need to find out what is really the true correct inclusive wall time values.

Your data (the index table, which should be the correct thing) is reporting that db_connect took 32ms inclusive wall time... that means it took 32ms if you did something like this:

$t = microtime(true);
whatever->db_connect()
error_log(microtime(true) - $t);

Most surely that is correct, but please confirm by actually using that snippet above and writing the value with error_log. Next, use the same snippet to get the overall time of your request:

$t = microtime(true);
MyApplication::run(); // whatever the main entry point to your app that does everything.
error_log(microtime(true) - $t);

@markstory
Copy link
Member

You could also look at the data in mongodb. If xhprof has recorded the data 'wrong' we can never make the page look right.

@josdagaro
Copy link
Author

josdagaro commented Oct 23, 2017

I'm very confused, ¿is it possible that "self wall time" is greater than "inclusive wall time"?
1

@michaelbutler
Copy link

Inclusive Wall Time will always be greater than Self Wall Time.

@josdagaro
Copy link
Author

Oh, right, I saw wrong..., I thought time was 468,000 but is just 468; my error.
Ok, I saw this page from a third party: xhprof and the results are the same.

@josdagaro
Copy link
Author

josdagaro commented Oct 23, 2017

¿Is the wall time sum in the flamegraph the same of callgraph?
That is, ¿does the callgraph wall time sum like the flamegraph?

@markstory
Copy link
Member

Is the wall time sum in the flamegraph the same of callgraph?

It should be. The callgraph and flamegraph use the same data under the hood.

@Krinkle
Copy link
Contributor

Krinkle commented Sep 16, 2018

@markstory wrote at #216 (comment):

Last I remember there were different display issues with these changes. I haven't had much time to dedicate to this project as of late though.

@markstory Yes, the display issues that are made apparent by #216, however, the issue is pre-existent.

In my opinion, the current flame graph implementation is almost useless, and highly deceptive. The only reason it doesn't look broken is due to various bugs cancelling each other out. But a close look at any part of the stack, exposes its information to be non-sensical.

I looked at this earlier when we talked at #216, but after another very confusing debug round today, I realised the problem is much bigger than I thought. Below is an example:

Actual call tree

(Unbeknownst to xhprof)

  • main() - [wt: 1000ms]
    • 1x smallStuff() [wt: 100ms]
      • 1x process('small') [wt: 100ms]
        • 10x processOne() [wt: 100ms, 10ms each]
    • 1x bigStuff() [wt: 900ms]
      • 1x process('big') [wt: 900ms]
        • 90x processOne() [wt: 900ms, 10ms each]

stacks

From xhprof (unsampled)

main() => smallStuff()    [ c: 1, wt: 100ms ]
main() => bigStuff()      [ c: 1, wt: 900ms ]
smallStuff() => process() [ c: 1, wt: 100ms ]
bigStuff() => process()   [ c: 1, wt: 900ms ]
process() => processOne() [ c: 100, wt: 1000ms ]

flamegraph-data

XHGui currently converts the above as follows:

            'data' => array(
                'name' => 'main()',
                'value' => 1000,
                'children' => array(
                    array(
                        'name' => 'smallStuff()',
                        'value' => 100,
                        'children' => array(
                            array(
                                'name' => 'process()',
                                'value' => 1000, # bigger than parent
                                'children' => array(
                                    array(
                                        'name' => 'processOne()',
                                        'value' => 1000,
                                    ),
                                ),
                            ),
                        ),
                    ),
                    array(
                        'name' => 'bigStuff()',
                        'value' => 900,
                        'children' => array(
                            array(
                                'name' => 'process()',
                                'value' => 1000 # bigger than parent
                            ),
                        ),
                    ),
                ),
            ),
            'sort' => array(
                'main()' => 0,
                'smallStuff()' => 1,
                'bigStuff()' => 4,
                'process()' => 2,
                'processOne()' => 3,
            )

flamegraph-visual

  • main() – 2000ms
    • smallStuff() – 1000ms
      • process() – 1000ms
        • processOne() – 1000ms
    • bigStuff() – 100ms
    • process() – 1000ms

screen shot 2018-09-16 at 18 25 38

I originally thought that the only problem was that D3-flamegraph sees each node as self-time (instead of inclusive time, per #207). But that is not exactly the case.

If D3-flamegraph was seeing each node reporting self-time then smallStuff/process would be computed as 2000ms (1000 self, 1000 of its children), and smallStuff as 3000ms, etc..

In actuality, D3-flamegraph is throwing away 90% of all timing data, in favour of only summing the leaf nodes. This means any time spent in any method anywhere in the system, unless it is a leaf node, it is not represented in the graph.

Of course, the problem is not that D3-flamegraph is using a wrong formula for aggregating self-time. The problem is that it tries to aggregate at all, because the XHProf data is not self-time, it is already aggregated.

Problems with old D3-flamegraph:

  • It discards all time spent in any method, except for leaf nodes.
  • It replaces the time for each method with the sum of the descendant leaf nodes.
  • The time spent in main() is often 10x or 1000x larger than it should be.
  • The overall proportions are incorrect. Slow code can appear fast, fast code can appear slow. If the problem was that it was doing self-time aggregation only (and doing it correctly), it would at least have correct overall proportions (although it would be exaggerated in that case).

Problems with flamegraph-data export:

  • Whenever a duplicate pair of parent>child is seen, only the first encounter (seemingly random) gets a child stack. The rest will have no children.

@markstory
Copy link
Member

markstory commented Sep 17, 2018

I'm ok with removing the flamegraph output entirely. If its not helpful and can't be made helpful there is no point in keeping it.

@Krinkle
Copy link
Contributor

Krinkle commented Sep 17, 2018

@markstory Yeah, I'm sad to see it go, but in the short-medium term we should probably look into the Callgraph visualisation instead.

Callgraph is a bit harder to navigate, but is true to the data and super accurate. In it also useful in ways that Flamegraphs can't be, such as visually emphasising methods that are called a lot and take much time, but are called from different sub-trees. The Callgraph shows that time combined really well.

Would you accept a PR that removes the component?

@markstory
Copy link
Member

Would you accept a PR that removes the component?

Yes 👍

Krinkle added a commit to Krinkle/xhgui that referenced this issue Sep 23, 2018
This unfortunately never worked correctly due to a fundamental
limitation with the XHProf data format, which is that it only
records metadata per parent-child method combination, it cannot
be used to build a complete call tree.

The feature was added in pull perftools#177. More information about this
limitation is described in detail at perftools#219, perftools#216 and perftools#212.

A brief summary:

* The visualisation showed callstacks that did not actually exist,
  and was missing callstacks that did exist. (Due to assuming
  that every combination of a parent-child pair is valid, and
  due to it randomly assinging repeated calls to the first
  encountered parent.)

* The visualisation discarded all timing values from XHProf,
  except for the timing of leaf nodes (methods without children),
  which were then added up recursively. The end result was a
  visually well-balanced tree, but with timing values that were
  not related to the actual performance (upto 100x inflated), and
  the proportions were incorrect as well, making some code look
  fast instead of slow, and vice versa.

These are inherent problems that cannot be solved because the
information logically required to make a flamegraph (call stacks)
is not collected by XHProf.

This closes perftools#216, perftools#212, perftools#211, perftools#207.
This fixes perftools#212.
Krinkle added a commit to Krinkle/xhgui that referenced this issue Sep 23, 2018
This unfortunately never worked correctly due to a fundamental
limitation with the XHProf data format, which is that it only
records metadata per parent-child method combination, it cannot
be used to build a complete call tree.

The feature was added in pull perftools#177. More information about this
limitation is described in detail at perftools#219, perftools#216 and perftools#212.

A brief summary:

* The visualisation showed callstacks that did not actually exist,
  and was missing callstacks that did exist. (Due to assuming
  that every combination of a parent-child pair is valid, and
  due to it randomly assinging repeated calls to the first
  encountered parent.)

* The visualisation discarded all timing values from XHProf,
  except for the timing of leaf nodes (methods without children),
  which were then added up recursively. The end result was a
  visually well-balanced tree, but with timing values that were
  not related to the actual performance (upto 100x inflated), and
  the proportions were incorrect as well, making some code look
  fast instead of slow, and vice versa.

These are inherent problems that cannot be solved because the
information logically required to make a flamegraph (call stacks)
is not collected by XHProf.

This closes perftools#216, perftools#212, perftools#211, perftools#207.
This fixes perftools#212.
Krinkle added a commit to Krinkle/xhgui that referenced this issue Sep 23, 2018
This unfortunately never worked correctly due to a fundamental
limitation with the XHProf data format, which is that it only
records metadata per parent-child method combination, it cannot
be used to build a complete call tree.

The feature was added in pull perftools#177. More information about this
limitation is described in detail at perftools#219, perftools#216 and perftools#212.

A brief summary:

* The visualisation showed callstacks that did not actually exist,
  and was missing callstacks that did exist. (Due to assuming
  that every combination of a parent-child pair is valid, and
  due to it randomly assinging repeated calls to the first
  encountered parent.)

* The visualisation discarded all timing values from XHProf,
  except for the timing of leaf nodes (methods without children),
  which were then added up recursively. The end result was a
  visually well-balanced tree, but with timing values that were
  not related to the actual performance (upto 100x inflated), and
  the proportions were incorrect as well, making some code look
  fast instead of slow, and vice versa.

These are inherent problems that cannot be solved because the
information logically required to make a flamegraph (call stacks)
is not collected by XHProf.

This closes perftools#212, perftools#211, perftools#207.
This fixes perftools#212.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants