-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate Tokens and Identifiers from other Nodes #33431
Conversation
Use this together with `node --expose-gc` to get more stable results
With some further optimization to their properties, this shrinks the objects by quite a bit.
@typescript-bot perf test this |
Heya @jack-williams, I've started to run the perf test suite on this PR at 391a3c8. You can monitor the build here. It should now contribute to this PR's status checks. Update: The results are in! |
So I know essentially zero about JS runtimes, but is one consideration here that certain runtimes only support inline-caches for one shape, while v8 supports up to four? I'm not sure that a possible regression introduced by this change would be observed testing with node. I've been quite keen to learn more about the details around the perf design of the compiler, so any insights here would be a welcomed learning experience (for me). |
I’m no expert either. Just tried to apply some things I learned from talk recordings, and digging through the heap profiler. But mostly all engines agree that its best to initialize all properties at least in the same order to avoid shape changes. Which TS is notoriously bad at, since it just throws on additional properties onto I also do see a ton of stuff when running with Also interesting to actually run the compiler with All of these things are really badly documented though and its pretty much guesswork to figure out what all that means :-( And regarding your note that this would only be observed when testing with node: I would argue running |
@jack-williams Here they are:Comparison Report - master..33431
System
Hosts
Scenarios
|
wow, those results are really encouraging, 6-8% wins on memory usage. total time seems to be ±1%, so not really conclusive. |
What is the call to gc for? Will that mislead the results when comparing against master? |
That trigger a manual gc when run with |
I'd consider opening an issue with the questions and discussion points you list. It would probably get more visibility by those on the team, and also for other users of the tracker. |
Does anyone know what the "Lines" diagnostic is for? Is it the number of lines emitted? My source files are about 30k lines total. But when I build, "Lines" diagnostic says 5,266,173 and takes 300s (5 minutes) How many LOC do your source files have? How simple or complex are the types in your "super huge project"? For reference, when building my composite project, my total stats are,
I'm very interested in knowing if your changes will improve the memory usage without negatively affecting check times for me. The total stats are huge because it's building 42 sub projects. Those stats are about a few weeks old. I am at 52 sub projects now =/ The perf tests don't show much of a change in check times; +3.64% at worst. If we pack this, I can try running it on my project and report back with the results. |
This might be a bit off-topic for this PR, but maybe something is going wrong with your dependencies. Take a look at Oh, and is your tsc srsly using 32G of ram? How is that even possible? |
I'm not using any fancy tooling. Just regular tsc with a composite project set up. So, unless tsc itself is bugged, I should be good in that area. I'll try out --listFiles when I get home. It's using 32G total for all 42 sub projects. Not in one compilation =P it's basically creating 42 ts programs, I think. All 42 sub projects use the same @jack-williams can we have this packed? =x |
@RyanCavanaugh @rbuckton I had done initial change for LS only as part of #9529 and as part of decreasing memory usage for Salsa.. We had then decided to only do this for LS at that time, don't remember the reason for that though. |
Well this has conflicts now, should be trivial to fix. I would suggest to just land a very minimal version first, which only copy-pastes the I would be willing to further work on this to add shortcuts, remove unused props, maybe better organize the different types, etc. if someone wants to mentor me. However, I’m a bit… well… disappointed that this has been sitting idle for almost 3 months now, so I don’t really want to waste my time if this isn’t going anywere… 😒 @sheetalkamat @rbuckton @RyanCavanaugh any guidance? |
@Swatinem I apologize for not looking into this sooner. The memory wins are appreciated, and the few cases of perf losses are either within the margin-of-error or targeting NodeJS 8.x which is now outside of Node's LTS maintenance schedule. If you are willing to resolve the conflicts, I can take a thorough review in the next few days. |
Awesome! I’m gone now for an extended weekend, will rebase monday evening. :-) |
The change in utilities.ts is a fairly trivial conflict to resolve, and the change in tsc.ts can just be dropped per my comment above. I can easily resolve the conflicts myself. |
go ahead ;-) |
Oh, while you are at it… There is also an allocator for |
I'll look into that one in a separate PR, we also have one for |
Hmm. The GitHub resolve conflicts web UI seems to have messed with the linebreaks in utilities.ts and tsc.ts. I'll have to fix that manually. |
@Swatinem: Thanks! I've merged these changes. |
I have been profiling tsc quite extensively recently, with the following super huge project:
Analyzing a heap snapshot, I noticed that a large portion of the memory usage is due to
Node
, which is 160 bytes. (~370M worth)Looking at the code, I noticed that TS already has the code to split
Token
andIdentifier
fromNode
, which I did.Token
is now 88 bytes, andIdentifier
is at 104 bytes.In total, this gives before stats of:
and after stats of:
Which in total is a ~6% reduction in Memory used, without any significant change in runtime, by such a simple change.
Thanks to using
node --expose-gc
, those numbers are really stable!But I am not done yet. Digging deeper into the code, and instrumenting things a bit, I noticed that literals (strings, numbers, etc) are also of type
Token
, but they participate in type-checking, and get some additional properties attached, which might spill out of the allocated storage, which might slow down accesses.I suspect splitting this up further into syntax only Tokens, and
Literal
types would even lead to improved performance.So I would really appreciate some help further building on this Idea:
Literal
nodes?--expose-gc
to get more stable numbers? How about using it inside of TS own perf infrastructure?