Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPM3 is coming #233

Closed
kant2002 opened this issue Jun 26, 2015 · 11 comments
Closed

NPM3 is coming #233

kant2002 opened this issue Jun 26, 2015 · 11 comments

Comments

@kant2002
Copy link
Contributor

With upcoming release of NPM3 and it's flat dependency tree we have to find another way to fix performance problems with large applications. Nested depth limit will no longer working.

Does you have any plans and ideas how this could be improved further?

Potentially affected areas:

  • IntelliSense
    • General analysis logic - this needs the most investigation
    • nested modules limit
  • npm integration in solution explorer
  • We should verify this doesn't break require intellisense by displaying too many packages to choose from.

Potential actions to take:

  • We'll probably end up using npm ls --json to get the tree. (This command was not available when NTVS was started)
@mousetraps
Copy link
Contributor

Yeah, we've been discussing this lately - npm v3 is one big core issue to handle - more so than the that one feature in analysis.

So the first step is to investigate places were we rely on the nested dependency structure, so that we have a thorough understand all of the affected areas. And then we can discuss the appropriate fix.

The potentially affected feature areas include:

  • IntelliSense
    • General analysis logic - this needs the most investigation
    • nested modules limit
  • npm integration in solution explorer

And we should verify everything else.

@mousetraps
Copy link
Contributor

And we'll probably end up using npm ls --json to get the tree (I don't believe this command was available when we started on NTVS)

@kant2002
Copy link
Contributor Author

One thing which I think of when try to improve overall speed of analysis on modules with deep structure is that possible we could have global cache of all analysed modules. That way analysis could be done just on user code, since other modules would be pre-cached, We will still have large initial startup time, but data for Intellisense would be already available on second run and for other projects hopefully for free.

I will take a look at other bugs which I could fix, to familiarize myself with code base, then could come back to discussion in details. The more I work with Node.JS the more I need VS. Thanks for the product 👍

@mousetraps
Copy link
Contributor

Adding things as they come to me. We should verify this doesn't break require intellisense by displaying too many packages to choose from.

@mousetraps mousetraps added this to the July milestone Jun 28, 2015
@mousetraps mousetraps self-assigned this Jul 2, 2015
@mousetraps
Copy link
Contributor

Okay, I'm starting on this one now. Some working notes so far:

  • got the npm node working-ish (need to fix a few quirks, but otherwise all is well and good.)
  • discluding all files in the .staging directory that npm v3 creates as it runs
  • working on fixing the nested modules limit. We can't quite combine it with the npm node logic because the npm nodes will update slower than files are getting included in the project. So for this I plan to do something similar to npm ls --depth=<nestedModulesLimit> | findstr <dirname>. The cool thing is that npm v3 creates a staging directory when it is working, so npm ls will always reflect the current state.
  • potentially need to work on perf - executing npm and parsing large json files isn't super efficient.
  • getting lots of exceptions in analysis, not sure what the cause is, but could be to do with a sad .ntvs_analysis file after changing npm versions.
  • no reason we can't maintain back compat with v2 afaik
  • we should probably get this change into the next release because it's going to be disruptive enough that I wouldn't feel comfortable including it without bake-time, but not so disruptive that it won't fit into the next release. Even though npm v3 won't be widely used, v2 will follow the same codepath, so it should get plenty of gametime.

@mousetraps
Copy link
Contributor

The exceptions could also be due to how we reference dependencies during analysis that is somehow incompatible with v3.

Require IntelliSense is also broken (too many entries in the list)

@mousetraps
Copy link
Contributor

More working thoughts now that I've played around with this some more and also chatted with Dino... much of this is a gross simplification of the actual work required, but gets the idea across...

Our static analysis and IntelliSense will be affected in five ways:

1. it will be more efficient

Let's start with the good 😃. Rather than analyzing the same package multiple times as we currently do, things will be de-duplicated by default and only added to the ModuleTree once.

2. we are showing to many entries in the require list

This first one is a matter of using the package.json file rather than the folder structure under node_modules to understand what we should include in the require list. This will, however, mean that we won't get require completions for items that are top level, but not added to package.json. So we may want to consider having different codepaths for the interactive window (where you may be working with packages that are not add a package to package.json) and the editor. Either way, it's not a huge deal.

3. npm creates a .staging directory during install

We need to ignore this directory during analysis. The upside is that once we do that, we will get fewer race conditions and such because we won't be dealing with half-downloaded packages.

4. breaks our optimizations around node_modules depth

See the below bullet for a deeper dive - there may be a way we can re-use the code path described below to help solve this problem, so we shouldn't approach this issue until we fix the below issue.

5. we will not analyze enough stuff

This is the tougher issue, so I'll elaborate much more on it. The analysis engine analyzes all of the files in a module, and that module's children. Then, it creates links between the module and its dependencies. So if one of them changes, then we'll know to update the linked analysis unit. The problem with npm v3 is that... let's say module-a depends on module-b. In npm v2, the directory structure would be:

module-a
    node_modules
        module-b

module-a includes an index.js, that includes something like:

module-a

var b = require('module-b');
var myStringArg = 'my argument';
var a = b.foo(myStringArg);

Now, we would try to resolve module-b, and we would find it just fine because it's been added to module-a's ModuleTree because we were using a directory based approach where we looked at the child nodes.

However, with npm v3, module-b would not be found because the node_modules directory structure looks like this:

module-a
module-b

Now one question might be: can't we just let everything see everything?

Unfortunately no, and I'll communicate that using an example. If everything could see everything, then we would run into performance issues in the event that we module-c is depending on module-b as well. So the dependency tree would look like:

module-a
    module-b
module-c
    module-b

And let's say that module-c is using module-b differently than module-a is using module-b.

module-c

var b = require('module-b');
var myIntArg= 1;
var a = b.foo(myIntArg);

The difference between module-a and module-c in this example is that module-a is using the foo function, and passing in a string argument, whereas module-c is using the foo function and passing in an integer argument.

If you'll recall from some of our demos... the analysis engine analyzes the usages of a function to understand what types flow in and out. If you are using a function with a string argument, then the engine will pick that up as it tries to understand the rest of your code. If you are using a function with an integer argument, then the engine will pick that up. If you are using the same function with both string and integer arguments, the analysis will pick up that an argument is either a string or an integer, and will analyze both codepaths (and eventually we give up trying to keep track if you're using it in too many ways).

So back to our example, if we say that everything can see everything, then there will be three issues:

  1. we're going to give up trying to keep track of modules too soon
  2. significantly more noise
  3. As a result of _#_2, the scope of the usages will be not be limited, which means that whereas before, a change in module-b's usages would only be reflected in module-b's scope... now that change would be reflected in module-a's scope. Which means that we are trying to analyze both the string and the int cases even though we should only need to analyze the string argument case. These unnecessary updates will likely lead to performance issues.

Great, now we understand the problem... but what about the solution? What options do we have?

It's clear we need to somehow understand the dependency tree. And there are two ways we can do that.

  1. Initially, we considered using npm ls --json to understand the tree. This makes sense for the solution explorer integration, which already takes a dependency on npm and where race conditions are not an issue. However, it takes a while (several seconds), spikes the cpu, and is going to cause all sorts of performance issues and race conditions if we attempt to use it during analysis which is far more complicated, and triggered far more frequently.
  2. The second approach is simply to rely on the filesystem, but just in a different way than we're currently doing. Rather than understanding that children are all of the names of the directories under the node_modules folder, we should base it on the dependencies as described in package.json. So the logic will roughly be (again it's a simplification)...
    • read package.json
    • grab dependency names, and attempt to resolve first in the node_modules folder below as we currently do (this would be the case in both npm v2, and npm v3 if there are conflicts)
    • then, search up the tree until we can properly resolve the dependency.

The second approach clearly makes the most sense, so (barring any unforeseen circumstances) that's what we're going to do. 😃

+@billti since you were asking about this the other day

@kant2002
Copy link
Contributor Author

Excellent write-up. Thanks @mousetraps and @billti

About 4

Could you consider another option for Intellisense and other areas.
Right now all analysis is one big .ntvs_analysis.dat. From my observations during debugging of large modules we have very inefficient analyzer which do all of work again and again. When I introduce node_modules limit, it was solely as a workaround to hide these inefficiencies.

When I work on modules limit I was thought about following idea. What if we will have analysis cach for the individual modules. That way

  1. We could reuse analyzes done on the different projects.
  2. And on the same project, but for sub-dependencies located in the different tree paths.
  3. Speculation, since not familiar with NPM3. If .staging replace node_modules when we have to reanalyze whole tree again. would be Expensive. Would be interested to know how actually install works.

About 5.

Could we have analysis engine after analysis return not actual types from dependencies, but rather some proxy types.
For example

module-a
    module-b

Content of module.b

module.exports = function () {
return "string";
}

Content of module.a

var f = require('module-b');
var x = f();

So when analyzing variable X it will have not type String, but rather - Type which returned from invocation of module F.
Actual type resolution would be done in the runtime and we could replace analysis tree module by module in case of install.

I understand that this is just sounds easy, but would be not very easy. On the other side our analysis modules would be self-contained.

Short summary

All of these thoughts come from the performance optimization perspective, but I see NPM3 is huge opportunity to also improve performance of analysis by the way.

@mousetraps
Copy link
Contributor

Regarding re-using dependencies: we're going to get a lot of that for free with npm v3 because dependencies within a project will be de-duplicated by default, so we should re-evaluate once we understand the perf gain we get from that (and whether it's sufficient). If node were more similar to python, in that it had a virtual environment that you could re-use between projects, than it would make sense to have a global cache. But we also need to consider versioning between projects. And also how do we manage the cache. What if part of the cache gets corrupted? How do we fix issues there without re-analyzing every project?

Re: npm v3 staging, we need to look into this more. From what I can tell, it's not replacing the entire tree (just providing an area to start downloading the new stuff, so we just need to ignore it during analysis), but we may run into issues during the "rebuild" command, but even then, I think it only will affect dependencies with conflicting versions.

re: the hybrid static/dynamic analysis. It's an interesting idea, but unfortunately would require much broader sweeping changes than we can possibly consider in this stage in the product (and a much deeper understanding of this portion of the codebase than I currently have to offer at the moment 😉)

Re: other changes - we should definitely discuss and consider ways to improve the analysis engine, but there's a sense of urgency with some of these changes as well. To give you an idea of timeline... it's always subject to change, but we're aiming to RTM 1.1 in about 1-1.5 months, and we're actually looking into getting these npm v3 changes into the 1.1 release, so we really can't afford destabilize things that much this late in the 1.1 cycle.

Another idea for improving perf, and this was triggered by your idea for having a componentized analysis cache and the hybrid approach you mentioned... is using .ts.d files to get us partway there so we have a quick and effectively cached analysis of the most popular packages, so we would only need to analyze the remaining packages.

@kant2002
Copy link
Contributor Author

Using .d.ts files but it will require us to create parser for that, unless you could summon some code from current TypeScript support for the VS. I will think about something like that in the free time.

Thanks for clarification about release schedule, by no means would not willing to interfere. Over that time will learn codebase more, so I would be prepared if this still be needed.

One more question. Seems that more or less main issues is discovered and this issue could be split on the individual topics, to have about more explicit problems during implementation. For example Intellisense as you pointed is easy, so just by implementing and testing we could uncover another set of subtle issues which is unknown from the current point. I could try help with that.

For sub-issue 5 I also in favor of rely on filesystem (approach 2) and walk backward until we found dependency. Maybe after that everything will just works, and we have NPM3 support almost for free without large changes.

Having even partial support for NPM3 is huge win for early adopters like me. Right now I can't try since I will loose benefits which NTVS gives me.

@mousetraps mousetraps modified the milestones: July, August Aug 4, 2015
mousetraps added a commit to mousetraps/nodejstools that referenced this issue Aug 26, 2015
- support solution explorer tree view
- analysis depth optimizations
- add .staging to ignored directories
- ensure global scope visibility is not limited during analysis

microsoft#308 VS2015 RTM exceptions on project load
- ensure key is not null before adding package
billti added a commit that referenced this issue Aug 27, 2015
#233 npm v3 support, #308 VS2015 RTM exceptions on project load
@mousetraps
Copy link
Contributor

The bulk of this work has been completed now (see #233), and will be available in our next release. Closing this, and filing new issues for the remaining work:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants