NPM3 is coming #233

kant2002 · 2015-06-26T03:14:49Z

With upcoming release of NPM3 and it's flat dependency tree we have to find another way to fix performance problems with large applications. Nested depth limit will no longer working.

Does you have any plans and ideas how this could be improved further?

Potentially affected areas:

IntelliSense
- General analysis logic - this needs the most investigation
- nested modules limit
npm integration in solution explorer
We should verify this doesn't break require intellisense by displaying too many packages to choose from.

Potential actions to take:

We'll probably end up using npm ls --json to get the tree. (This command was not available when NTVS was started)

The text was updated successfully, but these errors were encountered:

mousetraps · 2015-06-26T03:30:15Z

Yeah, we've been discussing this lately - npm v3 is one big core issue to handle - more so than the that one feature in analysis.

So the first step is to investigate places were we rely on the nested dependency structure, so that we have a thorough understand all of the affected areas. And then we can discuss the appropriate fix.

The potentially affected feature areas include:

IntelliSense
- General analysis logic - this needs the most investigation
- nested modules limit
npm integration in solution explorer

And we should verify everything else.

mousetraps · 2015-06-26T03:36:24Z

And we'll probably end up using npm ls --json to get the tree (I don't believe this command was available when we started on NTVS)

kant2002 · 2015-06-26T04:16:25Z

One thing which I think of when try to improve overall speed of analysis on modules with deep structure is that possible we could have global cache of all analysed modules. That way analysis could be done just on user code, since other modules would be pre-cached, We will still have large initial startup time, but data for Intellisense would be already available on second run and for other projects hopefully for free.

I will take a look at other bugs which I could fix, to familiarize myself with code base, then could come back to discussion in details. The more I work with Node.JS the more I need VS. Thanks for the product 👍

mousetraps · 2015-06-26T14:02:53Z

Adding things as they come to me. We should verify this doesn't break require intellisense by displaying too many packages to choose from.

mousetraps · 2015-07-06T08:15:07Z

Okay, I'm starting on this one now. Some working notes so far:

got the npm node working-ish (need to fix a few quirks, but otherwise all is well and good.)
discluding all files in the .staging directory that npm v3 creates as it runs
working on fixing the nested modules limit. We can't quite combine it with the npm node logic because the npm nodes will update slower than files are getting included in the project. So for this I plan to do something similar to npm ls --depth=<nestedModulesLimit> | findstr <dirname>. The cool thing is that npm v3 creates a staging directory when it is working, so npm ls will always reflect the current state.
potentially need to work on perf - executing npm and parsing large json files isn't super efficient.
getting lots of exceptions in analysis, not sure what the cause is, but could be to do with a sad .ntvs_analysis file after changing npm versions.
no reason we can't maintain back compat with v2 afaik
we should probably get this change into the next release because it's going to be disruptive enough that I wouldn't feel comfortable including it without bake-time, but not so disruptive that it won't fit into the next release. Even though npm v3 won't be widely used, v2 will follow the same codepath, so it should get plenty of gametime.

mousetraps · 2015-07-06T08:34:31Z

The exceptions could also be due to how we reference dependencies during analysis that is somehow incompatible with v3.

Require IntelliSense is also broken (too many entries in the list)

mousetraps · 2015-07-14T03:46:43Z

More working thoughts now that I've played around with this some more and also chatted with Dino... much of this is a gross simplification of the actual work required, but gets the idea across...

Our static analysis and IntelliSense will be affected in five ways:

1. it will be more efficient

Let's start with the good 😃. Rather than analyzing the same package multiple times as we currently do, things will be de-duplicated by default and only added to the ModuleTree once.

2. we are showing to many entries in the require list

This first one is a matter of using the package.json file rather than the folder structure under node_modules to understand what we should include in the require list. This will, however, mean that we won't get require completions for items that are top level, but not added to package.json. So we may want to consider having different codepaths for the interactive window (where you may be working with packages that are not add a package to package.json) and the editor. Either way, it's not a huge deal.

3. npm creates a .staging directory during install

We need to ignore this directory during analysis. The upside is that once we do that, we will get fewer race conditions and such because we won't be dealing with half-downloaded packages.

4. breaks our optimizations around node_modules depth

See the below bullet for a deeper dive - there may be a way we can re-use the code path described below to help solve this problem, so we shouldn't approach this issue until we fix the below issue.

5. we will not analyze enough stuff

This is the tougher issue, so I'll elaborate much more on it. The analysis engine analyzes all of the files in a module, and that module's children. Then, it creates links between the module and its dependencies. So if one of them changes, then we'll know to update the linked analysis unit. The problem with npm v3 is that... let's say module-a depends on module-b. In npm v2, the directory structure would be:

module-a
    node_modules
        module-b

module-a includes an index.js, that includes something like:

module-a

var b = require('module-b');
var myStringArg = 'my argument';
var a = b.foo(myStringArg);

Now, we would try to resolve module-b, and we would find it just fine because it's been added to module-a's ModuleTree because we were using a directory based approach where we looked at the child nodes.

However, with npm v3, module-b would not be found because the node_modules directory structure looks like this:

module-a
module-b

Now one question might be: can't we just let everything see everything?

Unfortunately no, and I'll communicate that using an example. If everything could see everything, then we would run into performance issues in the event that we module-c is depending on module-b as well. So the dependency tree would look like:

module-a
    module-b
module-c
    module-b

And let's say that module-c is using module-b differently than module-a is using module-b.

module-c

var b = require('module-b');
var myIntArg= 1;
var a = b.foo(myIntArg);

The difference between module-a and module-c in this example is that module-a is using the foo function, and passing in a string argument, whereas module-c is using the foo function and passing in an integer argument.

If you'll recall from some of our demos... the analysis engine analyzes the usages of a function to understand what types flow in and out. If you are using a function with a string argument, then the engine will pick that up as it tries to understand the rest of your code. If you are using a function with an integer argument, then the engine will pick that up. If you are using the same function with both string and integer arguments, the analysis will pick up that an argument is either a string or an integer, and will analyze both codepaths (and eventually we give up trying to keep track if you're using it in too many ways).

So back to our example, if we say that everything can see everything, then there will be three issues:

we're going to give up trying to keep track of modules too soon
significantly more noise
As a result of _#_2, the scope of the usages will be not be limited, which means that whereas before, a change in module-b's usages would only be reflected in module-b's scope... now that change would be reflected in module-a's scope. Which means that we are trying to analyze both the string and the int cases even though we should only need to analyze the string argument case. These unnecessary updates will likely lead to performance issues.

Great, now we understand the problem... but what about the solution? What options do we have?

It's clear we need to somehow understand the dependency tree. And there are two ways we can do that.

Initially, we considered using npm ls --json to understand the tree. This makes sense for the solution explorer integration, which already takes a dependency on npm and where race conditions are not an issue. However, it takes a while (several seconds), spikes the cpu, and is going to cause all sorts of performance issues and race conditions if we attempt to use it during analysis which is far more complicated, and triggered far more frequently.
The second approach is simply to rely on the filesystem, but just in a different way than we're currently doing. Rather than understanding that children are all of the names of the directories under the node_modules folder, we should base it on the dependencies as described in package.json. So the logic will roughly be (again it's a simplification)...
- read package.json
- grab dependency names, and attempt to resolve first in the node_modules folder below as we currently do (this would be the case in both npm v2, and npm v3 if there are conflicts)
- then, search up the tree until we can properly resolve the dependency.

The second approach clearly makes the most sense, so (barring any unforeseen circumstances) that's what we're going to do. 😃

+@billti since you were asking about this the other day

kant2002 · 2015-07-14T04:43:38Z

Excellent write-up. Thanks @mousetraps and @billti

About 4

Could you consider another option for Intellisense and other areas.
Right now all analysis is one big .ntvs_analysis.dat. From my observations during debugging of large modules we have very inefficient analyzer which do all of work again and again. When I introduce node_modules limit, it was solely as a workaround to hide these inefficiencies.

When I work on modules limit I was thought about following idea. What if we will have analysis cach for the individual modules. That way

We could reuse analyzes done on the different projects.
And on the same project, but for sub-dependencies located in the different tree paths.
Speculation, since not familiar with NPM3. If .staging replace node_modules when we have to reanalyze whole tree again. would be Expensive. Would be interested to know how actually install works.

About 5.

Could we have analysis engine after analysis return not actual types from dependencies, but rather some proxy types.
For example

module-a
    module-b

Content of module.b

module.exports = function () {
return "string";
}

Content of module.a

var f = require('module-b');
var x = f();

So when analyzing variable X it will have not type String, but rather - Type which returned from invocation of module F.
Actual type resolution would be done in the runtime and we could replace analysis tree module by module in case of install.

I understand that this is just sounds easy, but would be not very easy. On the other side our analysis modules would be self-contained.

Short summary

All of these thoughts come from the performance optimization perspective, but I see NPM3 is huge opportunity to also improve performance of analysis by the way.

mousetraps · 2015-07-14T05:32:44Z

Regarding re-using dependencies: we're going to get a lot of that for free with npm v3 because dependencies within a project will be de-duplicated by default, so we should re-evaluate once we understand the perf gain we get from that (and whether it's sufficient). If node were more similar to python, in that it had a virtual environment that you could re-use between projects, than it would make sense to have a global cache. But we also need to consider versioning between projects. And also how do we manage the cache. What if part of the cache gets corrupted? How do we fix issues there without re-analyzing every project?

Re: npm v3 staging, we need to look into this more. From what I can tell, it's not replacing the entire tree (just providing an area to start downloading the new stuff, so we just need to ignore it during analysis), but we may run into issues during the "rebuild" command, but even then, I think it only will affect dependencies with conflicting versions.

re: the hybrid static/dynamic analysis. It's an interesting idea, but unfortunately would require much broader sweeping changes than we can possibly consider in this stage in the product (and a much deeper understanding of this portion of the codebase than I currently have to offer at the moment 😉)

Re: other changes - we should definitely discuss and consider ways to improve the analysis engine, but there's a sense of urgency with some of these changes as well. To give you an idea of timeline... it's always subject to change, but we're aiming to RTM 1.1 in about 1-1.5 months, and we're actually looking into getting these npm v3 changes into the 1.1 release, so we really can't afford destabilize things that much this late in the 1.1 cycle.

Another idea for improving perf, and this was triggered by your idea for having a componentized analysis cache and the hybrid approach you mentioned... is using .ts.d files to get us partway there so we have a quick and effectively cached analysis of the most popular packages, so we would only need to analyze the remaining packages.

kant2002 · 2015-07-14T06:13:36Z

Using .d.ts files but it will require us to create parser for that, unless you could summon some code from current TypeScript support for the VS. I will think about something like that in the free time.

Thanks for clarification about release schedule, by no means would not willing to interfere. Over that time will learn codebase more, so I would be prepared if this still be needed.

One more question. Seems that more or less main issues is discovered and this issue could be split on the individual topics, to have about more explicit problems during implementation. For example Intellisense as you pointed is easy, so just by implementing and testing we could uncover another set of subtle issues which is unknown from the current point. I could try help with that.

For sub-issue 5 I also in favor of rely on filesystem (approach 2) and walk backward until we found dependency. Maybe after that everything will just works, and we have NPM3 support almost for free without large changes.

Having even partial support for NPM3 is huge win for early adopters like me. Right now I can't try since I will loose benefits which NTVS gives me.

- support solution explorer tree view - analysis depth optimizations - add .staging to ignored directories - ensure global scope visibility is not limited during analysis microsoft#308 VS2015 RTM exceptions on project load - ensure key is not null before adding package

#233 npm v3 support, #308 VS2015 RTM exceptions on project load

mousetraps · 2015-08-29T04:24:15Z

The bulk of this work has been completed now (see #233), and will be available in our next release. Closing this, and filing new issues for the remaining work:

mousetraps added the feature label Jun 26, 2015

mousetraps added this to the July milestone Jun 28, 2015

mousetraps self-assigned this Jul 2, 2015

mousetraps added the in-progress label Jul 6, 2015

mousetraps added the next-release label Jul 6, 2015

mousetraps mentioned this issue Jul 14, 2015

Node/npm MAX_PATH issues #69

Closed

mousetraps added npm-integration area/code-intelligence and removed next-release labels Jul 23, 2015

mousetraps modified the milestones: July, August Aug 4, 2015

mousetraps mentioned this issue Aug 26, 2015

#233 npm v3 support, #308 VS2015 RTM exceptions on project load #386

Merged

billti added a commit that referenced this issue Aug 27, 2015

Merge pull request #386 from mousetraps/npm3

05b51f5

#233 npm v3 support, #308 VS2015 RTM exceptions on project load

mousetraps closed this as completed Aug 29, 2015

mousetraps mentioned this issue Oct 6, 2015

.ntvs_analysis.dat size and mem use #88

Closed

mousetraps mentioned this issue Nov 20, 2015

Regarding _requiredBy #603

Open

mjbvz added intellisense and removed intellisense code-intelligence labels Aug 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NPM3 is coming #233

NPM3 is coming #233

kant2002 commented Jun 26, 2015

mousetraps commented Jun 26, 2015

mousetraps commented Jun 26, 2015

kant2002 commented Jun 26, 2015

mousetraps commented Jun 26, 2015

mousetraps commented Jul 6, 2015

mousetraps commented Jul 6, 2015

mousetraps commented Jul 14, 2015

kant2002 commented Jul 14, 2015

mousetraps commented Jul 14, 2015

kant2002 commented Jul 14, 2015

mousetraps commented Aug 29, 2015

NPM3 is coming #233

NPM3 is coming #233

Comments

kant2002 commented Jun 26, 2015

mousetraps commented Jun 26, 2015

mousetraps commented Jun 26, 2015

kant2002 commented Jun 26, 2015

mousetraps commented Jun 26, 2015

mousetraps commented Jul 6, 2015

mousetraps commented Jul 6, 2015

mousetraps commented Jul 14, 2015

1. it will be more efficient

2. we are showing to many entries in the require list

3. npm creates a .staging directory during install

4. breaks our optimizations around node_modules depth

5. we will not analyze enough stuff

Great, now we understand the problem... but what about the solution? What options do we have?

kant2002 commented Jul 14, 2015

About 4

About 5.

Short summary

mousetraps commented Jul 14, 2015

kant2002 commented Jul 14, 2015

mousetraps commented Aug 29, 2015