-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overhaul display of a Trial #217
Conversation
This looks neat and seems easy to read. I like the colour showing mean/median, and I like that the units are clear at a glance. Going horizontally gets enough bins without making you scroll too much. I have suggestions. Perhaps printing "Histogram: frequency by time" every time is a bit much. Perhaps printing the min/max twice is also unnecessary. It would be nicer to have mean & median on an equal footing, instead of median being tacked on at the right somewhere. So I'd propose that it look like this:
Perhaps min/max need not be coloured, if their position makes it obvious (or perhaps white not grey?) Perhaps grayscale plus two colours is enough. I'd also say that the standard deviation should not share the mean's colour -- it's data about the grey bars, not the green one. Edit -- now I see there are two maxima, one is a cutoff to make the plot show the interesting part. Then I'm less sure but change my suggestion to something like this (with the [137 ns] in light grey?) with some marker for the broken axis?
In the screenshot above, I think |
Thanks for the feedback @mcabbott !
The reason why I have this is because it's not always "frequency by time". From the suite of expressions I benchmarked I found cases in a frequency vs. time plot distribution was not very helpful. So, conditional on a little heuristic sometimes "log(frequency) by time" is used. If you have a suggestion that is more succinct but still captures this that would be great.
The max isn't printed twice. Because long tails seem common, but are not indicative of the performance characteristics of the expression I chop off the top 1% of times when creating the histogram. You can see this in the screenshot where the max time is ~710ns but the max on the graph is 140ns. Not doing this causes what I often consider to be the region of interest to become undesirably squished.
I think the colouring gives it the same prominence, but this seems reasonable. I hope this makes the motivation behind some of my choices a bit more clear :) Edit: Regarding your new suggestion, I must admit that this: Some sort of a broken axis could be a decent idea though, but then I suddenly need to cram a lot more information onto one line:
and I suspect that might just be too much. |
Ah, if it's sometimes log, then perhaps that needs to be noted. Can you add an example above maybe? I guess my complaint there is about saying "time" twice, and labelling a histogram "histogram" every time. But here's a possible idea; it doesn't seem too crammed to put min & max on the same line although I agree that 6 things would probably be too much. (I picture
My main complaint about the existing printing is that I never remember what order min/max/median/mean get printed; I think |
I've just realised that your approach would be misleading, as min/max GC don't necessarily correspond to min/max time 🙁. |
Are any of the JuliaCI people planning on taking a look at this? Perhaps @Roger-luo, @vtjnash, or @vchuravy ? |
Ah oops, I assumed that Roger-luo and vtjnash were part of JuliaCI based on recent commits, but I see that that isn't the case. Sorry for the bogus ping. |
This is a question I meant to raise too. What exactly is that GC percentage printed next to I would argue that the current printing certainly suggests that the GC percentages for min/max/median belong to the same runs as the times shown:
If this isn't now the case, perhaps it should be... or perhaps the printing should separate these if in fact it's the median GC time, as a percentage (or median GC percentage?) not the GC percentage of the median-time run. I don't see anything about this point in the documentation. |
That would usually be true for all the options I mentioned, right? |
What do you mean? |
Pushed the change in the screenshot above. |
Codecov Report
@@ Coverage Diff @@
## master #217 +/- ##
==========================================
- Coverage 85.63% 75.77% -9.86%
==========================================
Files 6 6
Lines 738 834 +96
==========================================
Hits 632 632
- Misses 106 202 +96
Continue to review full report at Codecov.
|
Before this PR is merged, can I ask you please to take a pause and think it through again? In my opinion, such visual change is a big thing and shouldn't be taken lightly, even despite the best intentions, for accessibility reasons. First of all, you are using lots of colors. While colors can be very expressive, at the same time they can be very distracting. The good thing about the current design that it is color neutral, which means that while it doesn't give you the best possible visual it is not offending anyone. Secondly, there were issues when various colorschemes were broken when extra colors were introduced in logs in Julia 1.6 for example (JuliaLang/julia#38730). One can argue that it is a problem of the colorscheme itself, but there were also issues when people had problems since they had problems with their eyes or monitors. I think, there can be multiple ways to solve this problem. These solutions can be for example:
For color schemes or Just to clarify, it is nice to have good looking output and it should be kept as a feature, but it should be opt-in. Colors shouldn't be a distraction or an obstacle to fulfilling the main purpose of |
@Arkoniak I appreciate your concern, but the only colours used here are 8-color terminal code. These are set by the terminal/user and should be readable at all times. My take is that this is the responsibility of the terminal/user. Should I be using 256-color codes, I would be inclined to agree that the use of colour would warrant more thought. This is why after trying a shaded gradient with the histogram I reverted to solid 'white'. The lone exception to this here is How "distracting" you want terminal colours to be is indeed a personal preference, and it can be addressed by users changing the base terminal colours to match their preference --- as they already do. |
I like colors if they are (1) customizable and (2) highlighting the most important things. In that regard I would ask the question: do we really need to highlight mean median min and max, i.e. all of them? Afaik, min and median are the most important ones, so why not just highlight those with colors? The full information is still there. One might argue that If you highlight / color everything you don't highlight anything anymore. |
The colours here aren't currently customisable, but if there's much demand I can make them so.
I think there's an important point to be made here. If you emphasise everything you emphasise nothing, but colour can also be used for categorisation, as I do here. Which number corresponds to I think it's also worth bearing in mind that once/if this PR is accepted, it's not as if this feature is locked in stone. Tweaks (like colour customisation) can always be added later. |
The colors are great. But light_black should be reserved for text that carries no important information, or a lot of people wont be able to see it. |
Hmmm, maybe I should just make the GC text white then |
And probably the timings underneath each end of the histogram... I just did the colorsheme in DimensionalData.jl and light black does look good, but I only used it for additional comments, brackets etc that wont be missed if you cant see them. Here the divider, brackets, colons, probably "Histogram: frequency by time" is also non-essential, people will figure that out. |
That seems like a pretty reasonable approach, I'll give that a shot 👍 |
Looks great! But there's still that one dark I'm using this already it's a vast improvement for benchmarking. Thanks. |
I make the |
IDK, but when I read it thats the value I read, not the one in the middle |
FWIW: This is how things look for me (I was wondering because @tecosaur's (I'd think that my terminal color scheme is pretty "standard", i.e. no drastic color changes.) |
Thanks @crstnbr. It's nice to see how things look with other people. The purple |
Perhaps it would be worth keeping the old format around for the 2-arg signature ( |
What do you think about keeping the text ordering roughly the same, so that |
@vchuravy I just implemented C without red, so unless there's anything else you'd like to see, I'd say this is ready to merge 🙂. |
Would be good to get a new release. |
The new output format is so nice! Bravo 👏🏻 |
Glad you like it 😁 I love how Julia seems much more enthusiastic about making a rich terminal experience than most other languages I've used ❤️. |
Uh? JuliaLang/julia#40703 was fixed before you posted the message here. |
I saw, the fact of the matter is that I wasn't able to build it though, and I have all of the dependencies. |
So now we have had this for a while and I am curious what people think. Personally, I think the new output is really cool but the level of noise to signal is in most cases very high. This makes me doubt that having this as a default display is that good. Something terser that would fit on one line would be great, and then there could be an option to expand this to have the richer printing. |
Isn't that basically
|
Yes, something like that. There could fit be a +-σ there as well. |
I'm quite happy to say that the new histogram benchmark, while displaying more information, is actually more compact than the previous display to the point that it's actually less lines. Regarding,
It sounds like this is more an argument for expanding |
I like the status quo. The histogram makes it crystal-clear what's happening, and gets people thinking more deeply about performance measurement. I'd support an |
In the vast majority of cases, it shows literally noise (your computer doing various background tasks which is a bad thing to show because you are likely to confuse the runtime of your code based on external "pollution".) I have yet seen a single comment about the histogram result, in fact, I have not seen a single comment about almost any of all the new numbers. Just as a reference, for people who are interested, https://youtu.be/vrfYLlR8X8k?t=915 (and about 7 minutes forward) is a bit of an interesting discussion of timing code from the guy who used to do a lot of performance work at facebook. |
So then why not just use My question is often "is the difference in performance between two implementations meaningful, or noise?" That's the whole reason I just submitted #255, with a real-world demo in JuliaLang/julia#42227 (comment). You could (correctly) make the argument that if I'm going to the effort to supply an For reference where seeing the raw data can be helpful: https://stackoverflow.com/questions/69164027/unreliable-results-from-timeit-cache-issue which came up in working on https://github.com/JuliaImages/image_benchmarks. And the video is useful, thanks. I think it actually provides fodder for the view that we should show the histogram: he points out that it is a bit complicated, that there are circumstances like networking where the minimum is not the only measure you're interested in. I should make it clear that I can live with any outcome; I'm mostly glad this is available, and if we switch to a less verbose printing I will turn it on manually. |
I feel the same way as @timholy here. If brevity is so crucial, why not use |
if Basically, I think the idea is sometimes people want to communicate, via non-screenshot format, potentially in a limited width setting like slack thread, benchmark with real-life workload (=> having allocation, GC time, etc.) *without wrapping 30 lines with black-and-white tofus. I'm all for keeping everything in |
Not to beat a dead horse, but yesterday I got this histogram:
It instantly told me that I had some kind of "jackpot event" (of low quantal count and thus high variability) dominating the runtime, which in this case proved to be thread scheduling. Took about half a second to diagnose thanks to the histogram. |
I think histograms are worth looking at whenever using julia> @benchmark fetch(Threads.@spawn 1+1)
BenchmarkTools.Trial: 7468 samples with 10 evaluations.
Range (min … max): 10.931 μs … 360.687 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 66.520 μs ┊ GC (median): 0.00%
Time (mean ± σ): 66.620 μs ± 5.501 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▅█▆▁
▂▁▂▂▁▁▂▁▁▁▁▁▂▂▁▁▁▁▁▁▁▁▂▁▁▁▁▁▁▂▂▁▂▁▂▁▁▂▂▁▂▂▁▁▂▁▁▁▁▂▂▂▄▇████▆▃ ▃
10.9 μs Histogram: frequency by time 70.8 μs <
Memory estimate: 480 bytes, allocs estimate: 5. The minimum time isn't representative at all. If all we ever look at is a single number summary, I think the median is better than the minimum. |
There is space on the line to make Edit: this is now #258. The hope would be that this could be used more places which people now use |
I agree. While with a single number summary the mean can be biased by a single slow run, given two numbers I think it's a better addition to the minimum than the median. |
I think many folks used to report the So I feel like it's the fact that users switched which one they use to report results (probably because of the fancier printing) rather than what the default of |
Agreed, that's really awful. But I agree with @ericphanson that the reason more people are using |
julia> std(t) | ||
BenchmarkTools.TrialEstimate: | ||
time: 25.161 μs | ||
gctime: 23.999 μs (95.38%) | ||
memory: 16.36 KiB | ||
allocs: 19 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cross-link to #146 for reexporting std
too.
Use layout inspired by hyperfine, with an added histogram. Closes #215.
Sample:
Initial version