-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[perf] remove extra combine logic #1515
[perf] remove extra combine logic #1515
Conversation
ff7fd6d
to
6f2d044
Compare
I don't think there is a particularly compelling reason to revisit this code. The original intention was probably to avoid having to allocate an array to pass to |
@chancancode could we have only one change from this PR? |
No that's what I was saying in the second half of the response, I specifically don't think we need to/should be doing that. Unless we have evidence that this ends up being a hot spot in macro app benchmarks, I don't think that kind of changes are a good idea to make based on just those microbenchmarks you did. Even if you set up the benchmark perfectly it is still too small to be meaningful, and even if that happens to be slightly faster in the version of the browser you tested, that can easily change in a different browser, or in a different version. The current code is making a single call to So we just do the straight forward thing everywhere unless there is specific evidence that one spot needs to be optimized in a particular way, and even then you should heavily document why that deviates from the normal pattern in code comments, etc. |
I'm re-opening because there is no evidence one way or another, and the opening of this PR is to communicate that we need to run benchmarks on this, before and after. |
Some extra notes here, I found HASADDADD_TO_ORDERED_HASH_TABLEAnd it seems |
A few things:
@lifeart your analysis of the v8 codebase is impressive and a really good start for the work that would go into a possible optimization. Let's circle back around once the errors feature has landed and figure out what it would take to verify that the performance benefits are significant enough to meet the "changes to low-level features" bar. In the meantime, you can feel free to apply the change to a local version of Ember and see if it impacts any of the ecosystem's macro-benchmarks (such as the https://github.com/krausest/js-framework-benchmark). Any information along those lines would be a good head start. |
@chancancode @wycats @NullVoxPopuli Left to right: Here is results: Memory allocation: It seems results a bit better ( |
that's amazing!!!! nice work!! |
Sorry for being a bit terse and not fully unpacking the reasoning the first time, if I seemed a bit impatient it is because this is not the first time we have to restate these things. There is a particular piece of institutional knowledge/learned experience from the project that I am deliberately trying to pass on here. Usually, it would be faster and sufficient to say "we don't prefer to accept these kind of changes, maybe you can find something else to work on?", especially when there are plenty of other low hanging fruits here. But, since you seem to be particularly motivated in working on these things you deserve a longer explanation and hopefully you can internalize these principles, find places in the codebase to invest your effort in a way that would be productive for everyone, and maybe pass it on in the future. So let's unpack. Let's take this PR as an example, but the principles apply more generally. There are two distinct changes that you are trying to make here:
Let's look at these separately. The first one, at its core, is the claim that: // Version A: Existing
set.add(potentiallyExistingItem); ...is slower than... // Version B: Proposed
if (set.has(potentiallyExistingItem)) {
set.add(potentiallyExistingItem)
} This seems counter-intuitive on the surface, as the new code appears to be doing more work, potentially duplicating work that is already happening inside Now, let's say it actually is the case that the new code is universally faster across various engines and use cases, then I think the appropriate thing to do with that discovery is to upstream that to V8/etc. If it is always faster for us to do this in JavaScript/userland, surely they could do this even faster in native code. If this is indeed universally faster, even if we don't propose the change, someone else will eventually notice it and optimize this in the engines. That's great, everyone/everywhere will benefit from the change! But, in that world, if we had accepted this change now, then we would now be stuck with the slower code, because we would really be duplicating work that is already happening in the engine anyway, and the engine can't necessarily/easily optimize away the extra call we are making into the native code, and probably no one would be around to notice and remove the workaround. That would be a net negative in the long term. There are plenty of these workarounds in Ember and in here, accumulated over the years, and they are very difficult to maintain and even remove! (We will discuss that one later.) Now, the above was with the assumption that "if this is universally faster". If you actually tried to upstream that change, I suspect this is the pushback you may get:
What does that mean? Well, take a step back and think about why version B may be faster in the case you measured:
For version B to be faster, the following has to hold:
Which is a fancy way to say, assuming the performance profiles of these methods are fixed (which is, not even that great of an assumption – as that can easily vary between desktop/mobile/browsers/CPU architecture etc), whether version A and version B is faster entirely depends on how often do you call this method with duplicated items! You can easily intuit this also, because, clearly, if you never call this method with duplicate items, then version B has to be slower, so it follows that, there is a crossover point, when you call the method enough times with duplicate items, then the extra code you added will start to pay off, by eliminating the slower path in In order for this to be a win in the real world, we essentially need to:
(1) is frankly quite annoying to do and difficult to measure correctly, but at least it's tractable. But what is But you know what is not representative for these kinds of scenarios? Microbenchmarks and artificial benchmarks. You can probably intuit why – here we are talking about the percentage calls to the auto-tracker where we attempt to add an already-existing tag to an auto-tracking frame that already contains the same tag. Think about the diversity of auto-tracked getters/helpers/etc in your app, the vast amount of different things they do, and then go look at the code for the JS framework benchmark – how many tracked getters does it have? Does it remotely resemble the diversity of code in your production app? Does it give you confidence that optimizing for that benchmark for this case would definitely yield good results for your actual app? It doesn't for me. Can you see at least a possibility that this may actually make things slower in your actual app when you account for the real world usage patterns? I can definitely see that Do you see the problem here? But there is more! Even if you did all the work to measure things right now, things can change over time. The browser can optimize their implementations, swinging Speaking of which, here is something better to investigate: evidently, your measurements are showing you that the That seems quite surprising to me – how often do you reference the same piece of tracked state over and over again, within, say, the same getter? I would expect it to happen occasionally, but not that often (other than Anyway. I don't think these hurdles are necessary impossible to overcome, and I am not saying that there are zero cases where it is worth it. But I hope you can appreciate why it is so important to measure these things correctly, and why doing that properly is very difficult and a lot of work. These are not easy PRs to review either, despite the seemingly tiny code change – you are basically reviewing for the methodology and data, as well as thinking about any unintended consequences elsewhere. Balancing all of these, hopefully you can see why chasing these tiny gains are, in the grand scheme of things not very appealing and generally not worth everyone's time, and in the long term, can easily flip around and become a liability. Now let's talk about the second part of your change to remove an existing micro-optimization that someone else already added. Ironically, you are now witnessing the consequence of chasing these gains from the opposite side!! Presumably, once upon a time, someone added that optimization with good intentions, perhaps even did some of those measurements we discussed and found it to be well worth it. But it wasn't written down in the code, the benchmarks weren't maintained, so, who knows. Even if it were benchmarked at the time, it was probably in the era well before Apple Silicon. Are those results still valid? Have anyone retested them lately? Do you see why these are a maintenance hazard in the long run? Still, we can't be reckless about it. IMO the default/conservative affinity for existing code is to assume that it is there for a good reason unless we can deduce prove otherwise with reasonable confidence. I think it should be a lower bar to clear for removing existing optimizations compared to adding new ones, but still, the benchmarks you did aren't remotely enough to be conclusive. Even the JS framework benchmarks one isn't that great – here it essentially boils down to the distribution for the number of items typically found in the set. The artificial benchmark has two components, and I would not consider that distribution to be representative of your real apps, hope you can see why. Or, you may be making an even stronger claim – which is that the existing optimization never pays off regardless of the number of items. That is perhaps possible to prove in a more targeted microbenchmark, but even then, the possibility of much more aggressive inlining (and the much more pronounced effect of depots) tends to make even these targeted microbenchmarks not very representative. Still, I think this would be a promising candidate to look into, especially from the memory angle of eliminating that extra field (though it is a little suspect, as tracking frames are short lived transient object, they have a tiny lifetime in the grand scheme of things, so it feels suspicious that they have a strong impact of memory consumption – high watermark I can believe, but straight up memory consumption would imply they are being held longer than expected, which could indicate other issues). Anyway, if you understand what I am saying and you are still down to look into doing more comprehensive testing (at least enough variety of browsers in real app workloads) to make sure we aren't doing anything bad, I am pretty open to removing that if we can't find a compelling reason to keep it around. Hopefully I have said enough this time to prove that I am not just turning you away for no reason or throwing up arbitrary roadblocks? I am not saying stop working on performance improvements, but I am saying that you may want to pick your battles wisely (also considering review bandwidth, among other things), and I think that would help you pick better areas to focus on that justifies the overall cost. If the starting point of this was "this is my work app, it is slow under these situations, we profiled it and landed on X being the culprit", I think that would be a very different conversation. |
@chancancode thank you for detailed answer! I'm not pushing this change, and really appreciate your time and patience while reviewing all this stuff and providing answers. I think it's a great round-up in general and we could use it as reference for further related PR's. I also agree that we have to highlight According to already existing microoptimisations: Contributor notes: I think microopimizations should be explicitly marked by comments otherwise we should count this code as a place for improvements. I do believe we need to try to cover cases where we could improve 2-3% because after 10-20-30 iterations we could get up to 7-10% performance improvements and it's just a small steps to notable difference. You mentioned low handing perf fruits in codebase - it will be great to have issue-like think with a references to this places to everyone who would like to improve codebase will be able to look into and make meaningful changes blessed by core-team. |
6f2d044
to
46c32e6
Compare
010d935
to
9c480ca
Compare
I meant to come back to this and write a longer response, but I never found the time to do it, so here is a short one: I didn't low hanging fruits in performance work – I think part of working on performance is to embrace that interactions at scale are messy and often unpredictable, and accept that most of the time you don't really know for sure until you are able to measure that at the appropriate scale for the change, and a lot of times things don't really end up panning out. We should also get on the same page about what we meant by "macro" benchmarks, while certainly better than just a micro isolated loop, the js framework benchmark is very focused on a particular (and IMO not representative) and skew things pretty massively in a certain direction, and in general we really need to be testing in real apps, and it is what I said I would prefer as a starting point – start with a real app, find out what's slow in real life and then hone in from that direction. It is much easier to validate the motivation/assumptions in a perf PR that way. I think making it easier to link and test things in a real app would go a long way, #1523 is a step in the right direction, but there are much more to do. I don't know if you consider that "low hanging" but those are the things I meant. Other examples are implementing the I offered this to @NullVoxPopuli before (they ended up getting busy with other things, which is fine), and I'd extend it to anyone else, get the tests (e.g. for |
Closing as not significant |
combine
function already handle 0, 1 cases:I think it's safe to remove pre-checks here
artifact-1.pdf
Set check ideas (not actual anymore)
In addition, we check if item already in set, it seems 20% faster (https://jsben.ch/ti5Ts) than enforce set to override.