Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Node.js vendor namespace #43413

Closed
jasnell opened this issue Jun 13, 2022 · 16 comments
Closed

Proposal: Node.js vendor namespace #43413

jasnell opened this issue Jun 13, 2022 · 16 comments
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. feature request Issues that request new features to be added to Node.js. module Issues and PRs related to the module subsystem. util Issues and PRs related to the built-in util module.

Comments

@jasnell
Copy link
Member

jasnell commented Jun 13, 2022

One of the discussions at the Node.js Collaborator Summit in Austin last week was about the strategy of vendoring in Node.js ecosystem modules to distribute with the node binary. The proposal discussed was to introduce a new vendor: or vendored: specifier prefix to access such modules included in the distribution. This issue is for discussing that.

This issue was prompted by discussions on the @nodejs/tooling team.

For the sake of this introduction, we'll assume the namespace is vendor:, but that name still needs to be decided on.

The Proposal ...

The vendor: namespace provides a clear namespace for exposing vendored-in dependencies distributed with the Node.js binary. For instance, assuming some arbitrary npm module foo that Node.js decided, for whatever reason, to bundle and distribute with the binary, the module specifier vendor:foo can be used to load that module from within the node binary as opposed to searching for it in the node_modules path.

For a more practical example, Node.js currently bundles the undici HTTP client to provide the implementation of the fetch() API but does not expose the full undici module API. If Node.js did decide to expose the full undici module underlying the fetch implementation, it could decide to do so, using the vendor:undici specifier.

Modules exported within the vendor: namespace are not Node.js built-in modules. They are simply ecosystem modules that are distributed with the node binary. These modules are:

  • Developed independently of Node.js core.
  • May or may not be developed by the Node.js project under the OpenJS Foundation
  • May or may not have been developed to the same standards/guidelines as the Node.js project (for instance, they might not make use of the Node.js internal errors or primordials mechanisms).
  • May or may not be within the nodejs GitHub Organization

Bugs in the vendored module are not considered to be bugs in Node.js (this is similar to how we treat bugs in npm, for instance, which is also distributed with the Node.js binary).

Why would we do this?

Some modules are used by nearly every one, provide core functionality needed by Node.js itself, and are so important to the ecosystem that bundling them just makes sense. Many of these are already bundled with the node.js distribution via dependencies to the npm client. Whether or not there is sufficient justification to include these in the distribution itself is part of this conversation. Specifically, what is our decision matrix for accepting these?

Small Core!!?

Adding a vendor module does not increase the LTS supported core API surface area of Node.js, and most of the modules that will be considered for bundling are already distributed with Node.js as dependencies of the npm client.

What about versions?

Yes, vendored modules might end up out of date with what is published on the npmjs registry. Node.js will make a best effort to keep the dependencies up to date but it is expected to be common that a vendored version will be older than the version available on npm. If an application wants to make sure that it is using the latest version and not the one bundled in Node.js, then it can continue to list that module as a dependency in its package.json and use the non-prefixed specifier when importing/requiring the module.

What about LTS?

Node.js will support vendored modules only in the sense that Node.js will ensure that non-breaking or security updates to vendored modules will be handled as part of LTS. For instance, if there is a security release for a version of a vendored module, then a new Node.js release with the updated version will be provided. Semver-major and some semver-minor releases to those vendored modules, however, will not be updated in LTS versions, following the spirit of our existing LTS contract. The Node.js release team will have final discretion on details relating to the handling of updates to vendored modules in LTS releases.

What can go into vendor: ?

Adding a new vendor: module is a big decision that can have significant ecosystem impact, it also can have the effect of greatly increasing the size of the node binary distribution. Great care and consideration must be taken in the decision to add a module. A defined set of criteria must be established. These can include the number of dependencies in the ecosystem, considerations about the quality of the code, desire of the maintainers of the code to have the module bundled, generalized utility of the module to the largest number of use cases, and utility of the module to Node.js' own use (e.g. we bundle undici because we directly depend on undici for fetch(), exposing it in vendor: would be optional but also fairly straightforward without introducing a significant additional burden).

Strong consideration should be given to limiting vendored modules to only those that have been contributed to either the Node.js project (the way, for instance, undici has been) or are covered under the OpenJS Foundation open governance model. Especially important is that vendored modules are not maintained by a single source or a single decision maker. In fact, as part of the discussion as to whether a module should be added to vendor:, it is reasonable to request that the module is contributed to Node.js or to the OpenJS Foundation as a first step, and that the project have at least 3+ core contributors with commit/write/publish access..

Why vendor:foo and not @node/foo or vendor/foo or node:foo?

There is already a vendor module on npm. Using a specifier such as vendor/foo would directly conflict with that existing module. Using any other /-prefix could potentially suffer from the same constraint. It's doable, yes, but using the :-prefix avoids this challenge.

Using node:foo would run the risk of a vendored module being confused with an official Node.js built-in module.

Using something like @node could be problematic if the module that is being vendored already has an @-prefix. For example, if we have a module like @abc/foo, then @node/@abc/foo is rather clumsy and awkward. vendor:@abc/foo is clearer and less awkward (at least to me)

Vulnerabilities

The risk of shipping additional vulnerabilities in Node.js exists. A few mitigation must be put in place to avoid such risk:

  • Processes to update those dependencies in LTS releases if/when a vulnerabilities in them is disclosed
  • An API to query which version of said module is being shipped within Node.js (e.g. process.versions.vendor or equivalent)
  • Before vendoring, the frequency of vulnerabilities in a module should be considered and evaluated. For example, something that consistently ships vulnerability releases might not be a good fit for vendoring due to the increased workload on releasers and increased churn for end-users.

/cc @bcoe @darcyclarke @nodejs/tooling

@jasnell jasnell added the feature request Issues that request new features to be added to Node.js. label Jun 13, 2022
@mhdawson
Copy link
Member

mhdawson commented Jun 14, 2022

Overall I think this is a good discussion to have so thanks for capturing the discussion at the collaborator summit and submitting a comprehensive issue for it.

One initial comment:

Adding a vendor module does not increase the LTS supported core API surface area of Node.js,

I'm not sure this is a valid assumption. If we add something to Node.js and ship it I think the expectation will be that we will meet the same expectations in terms of LTS guarantees (ie no breaking changes as you say above). It think we can say that fixes for non-security related issues will wait until they make it into the vendored dependency and we have a good time to update. On the other hand I think we'll still have to repect SemVer in a given LTS release and will have to do security releases if vulnerabilities are reported against the vendored components. Using the npm example we often get questions about vulnerabilities reported against npm and need to follow up and respond to questions about impact and when a fix will be available.

@jasnell
Copy link
Member Author

jasnell commented Jun 14, 2022

I think I just worded that piece incorrectly. Essentially, we treat changes in these pieces just as we would any dependency -- that is, the changes are made upstream in each respective project and not within the nodejs/node. Then those changes flow in with version updates. In other words, core contributors are not responsible for supporting/triaging those APIs, although they may be the ones that are updating the dependency in our tree.

@mhdawson
Copy link
Member

@jasnell thanks, for the clarification, your description in #43413 (comment) sounds better.

@BethGriggs
Copy link
Member

For instance, if there is a security release for a version of a vendored module, then a new Node.js release with the updated version will be provided. Semver-major and some semver-minor releases to those vendored modules, however, will not be updated in LTS versions, following the spirit of our existing LTS contract.

One challenge with that is that many modules do not maintain multiple release lines and will only provide security patches for the latest major version. This likely wouldn't be a problem for the cases where the security patches are simple enough to cherry-pick and adjust. And also a number of the vendored module candidates may be very unlikely to ever cut new major versions. But, I do think it's a consideration we should bear in mind.

Perhaps @nodejs/npm could speak to any of the challenges in that regard (from maintaining both npm@8/npm@6 in node).

@jasnell jasnell added the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label Jun 27, 2022
@GeoffreyBooth GeoffreyBooth added util Issues and PRs related to the built-in util module. module Issues and PRs related to the module subsystem. esm Issues and PRs related to the ECMAScript Modules implementation. labels Jul 10, 2022
@GeoffreyBooth
Copy link
Member

One thing to consider is breaking changes in new versions of vendored dependencies. To take one example, Undici is currently at version 5.6.1. Imagine that we include this in Node 20.0.0. Then Undici releases 6.0.0; do we include this breaking-change version of Undici in Node 20.1.0? I assume Node 20.x is stuck with Undici 5.x—or maybe even 5.6.x?

Or we could bundle several versions of Undici and provide access to all of them, like vendor:undici@5, vendor:undici@6 etc. But then our binary gets bloated, and it becomes difficult to remove old versions because users’ code might explicitly import an @5 version that maybe we can technically drop in Node 21.0.0 but might not want to because it’s such a bad breaking change.

So assuming we can only include one major version of a vendored dependency with each Node major version, this implies that we should be very judicious in which packages we choose to vendor within core: we probably only want to include very stable ones like semver or chalk that rarely change. And we should write down a policy along the lines of “vendored dependencies’ minor versions will bump along with Node’s minor versions, and major versions with major versions” or whatever we decide upon.

@bnb
Copy link
Contributor

bnb commented Jul 11, 2022

@GeoffreyBooth I'm less concerned about this issue, though I agree a specific process should be documented, since it's relatively solved by falling back to the current status quo - add the package to your package.json and pin it if you need to.

One factor to consider here, I suppose, is that this will potentially increase friction for those who are upgrading from version to version, increasing potential drift across versions.

Another factor that I've not seen considered here: what's the mechanism to patch a vendored module? I've seen many organizations float internal patches of third-party modules either for feature reasons (adding/removing features) or for security reasons (fix a bug until a release is shipped). Heck, patch-package has 1m weekly downloads, which is high for that kind of module. It would likely be smart to provide such an option if we do go down this path. Such an interface might also be a good mechanism for Node.js itself to still maintain control and be able to ship fixes for vendored modules faster.

@GeoffreyBooth
Copy link
Member

what's the mechanism to patch a vendored module?

Along those lines, how will we make ESM vendor packages available to CommonJS?

@Snapstromegon
Copy link

I have some strong opinions regarding this and I believe that this is not a good idea. In the following I'll try to present my reasons to opposing this proposal as a daily user of nodejs from everything in small tools, large web apps, from constrained devices to large servers and from my personal machine to larger FaaS deployments.

Necessaty

I do not think that nodejs needs such a system, simply because it's already well established to install additional dependencies with other ways. Also while it's nice to have everything you need bundled in, I often resort to at least some smaller packages in every project which would probably not be bundled in. So it's very likely that the majority of users will likely need to install additional packages anyways.

Binary Size

Especially when working on constrained devices (not only CPU/RAM constrained) or when downloading the runtime very often, the size of the node installation does matter. This might not / won't be a problem at the start, what about 10 years in the future. When will we stop including a package or how do we limit the number of included packages. In my projects bundle size of the runtime is one of the factors to use nodejs with JS over e.g. python.

Versioning

I also share @GeoffreyBooth's concerns regarding versioning. At the moment with the example undici, the exposed surface is part of nodejs's API and versioned with it. If undici itself is exposed, versioning will be significantly harder and shipping multiple versions of a single package again would add to bloating the installation.

Promoting suboptimal packages

Raising a package to become a vendored package will likely aid the distribution of said package while hindering competitive packages. While this can also be seen as an upside, I think we have good examples in the ecosystem where we see good competition improving the situation for everyone. I personally think e. g. fastify would not have grown that big if express were included as a vendored package.

Release Frequency

While we all try our best to keep security risks away from our projects, they do occur and when they come up and get fixed, we need to act fast and distrubute the fixes as fast as possible. This would also mean that security fixes in vendored packages will likely result in a nodejs release. Again, the more packages are included in the installation, the more often we'll likely see releases because of this. Even more important for admins caring about release notes, this is likely to create more work, since you need to get, read, understand and filter every release note for such a release to decide wether or not you use the affected packages and wether or not to warrant potential issues of an update.

Removing and Abandoning Packages

People and even groups of people can leave projects unannounced and in fast frequency for many reasons. Even if there is a plan to remove packages in future versions of nodejs, we need a way to keep all vendored packages maintained for at least the lifetime of the last LTS it was included in. This might mean worst case that e. g. the OpenJS Foundation or nodejs will have to maintain packages for about 3 years (initial relase - EOL) before it can finally be abandoned.

Deciding on what to include

This is a big topic with probably way too many opinions floating around ranging from include everything to take only what's already distributed.

most of the modules that will be considered for bundling are already distributed with Node.js as dependencies of the npm client

This quote is both good and concerning IMO. On the one hand it would fight against bloat if only packages that are used anyways get included, but on the otherhand it leaves the potential of this proposal nearly untouched.
As I see it (and correct me if I'm wrong), the base idea of this is to avoid everyone having to install these modules, but especially when they are indirect dependencies of a project, I think the impact will be minimal and if they are direct ones, I think the group of users who don't need/want the control over the version via an explicit installation and need the packages included in the vendored group (if it's kept minimal) is fairly small.

So overall I understand the proposal and what the potential of it is, but I personally think that the risks and downsides outweigh the benefits. Because of that I personally oppose this proposal.

@jasnell
Copy link
Member Author

jasnell commented Jul 11, 2022

@GeoffreyBooth:

Along those lines, how will we make ESM vendor packages available to CommonJS?

Keep in mind that vendor packages will be known to node.js at compile time and can be included in any startup snapshot mechanisms. We would have the opportunity to optimize the loading of such modules such that they can be require()'d synchronously or imported statically or dynamically. It certainly would not be trivial, but absolutely possible.

@GeoffreyBooth:

...this implies that we should be very judicious in which packages we choose to vendor within core

I don't think there is anyone that would argue with this at all. In fact, we should keep the bar raised as high as we possibly can here. This proposal isn't about opening a flood gate to let a bunch of modules in but to provide a reasonable approach for the very limited handful of modules we may want to consider for this.

@mcollina
Copy link
Member

We talked briefly about this in today's TSC meeting. The conversation is happening here, @jasnell let us know if there is a specific question for the TSC. Removing from the agenda for now.

@mcollina mcollina removed the tsc-agenda Issues and PRs to discuss during the meetings of the TSC. label Jul 13, 2022
@GeoffreyBooth
Copy link
Member

I’m starting to reconsider whether the benefits outweigh the costs. The issues with versioning and support imply that we can’t really have our cake and eat it too; if we vendor a library, we become responsible for it in many respects. And I agree that the “picking winners” issue is real, and could really anger package authors.

Even Chalk, to take an isolated case; is it really so essential that core include this functionality? Maybe if we could implement it in a few dozen lines of code, but as we’ve learned in #43382 , getting colorized terminal text right is insanely complicated. Perhaps this is best left to the ecosystem, where competing packages can take varying approaches and differing opinions on terminal standards to support.

Or put another way, if something is important enough to move into core, like fetch, then maybe it’s important enough to just move into core, and get full support as a part of Node itself? What’s a case besides Chalk that we might consider wanting to vendor?

@wraithgar
Copy link

What’s a case besides Chalk that we might consider wanting to vendor?

semver was one that came up in originating discussions for this issue.

@GeoffreyBooth
Copy link
Member

semver was one that came up in originating discussions for this issue.

I remember that coming up as a candidate to move into the Node organization, like github.com/nodejs/semver, because the semver maintainers wanted more support; and that would be fine. I’m not sure how much sense it makes to distribute with Node, both because Node core itself doesn’t use it and also because I assume there are relatively few users who depend on it directly? I know its download counts are huge but I assume that’s mostly because some very popular tools depend on semver, like npm itself.

@Snapstromegon
Copy link

@GeoffreyBooth I totally agree with that assessment. While there are projects I maintain, which use semver as a direct dependency, those are far from usual and most of the time I have it as an indirect dependency.

Semver would on the other hand be one of those cases where you could say that there is a "correct" package/implementation to use (like one would strain away from using a custom version of openssl), which would make it okay to include in core/vendored packages.

@mhdawson
Copy link
Member

As mentioned by @GeoffreyBooth

if we vendor a library, we become responsible for it in many respects.

That along with versioning as he also brought up are significant issues we need to discuss/figure out in advance.

@jasnell
Copy link
Member Author

jasnell commented Jul 14, 2022

Given the responses here and the complexity of the issues involved, I'm going to close this issue and table this proposal for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
esm Issues and PRs related to the ECMAScript Modules implementation. feature request Issues that request new features to be added to Node.js. module Issues and PRs related to the module subsystem. util Issues and PRs related to the built-in util module.
Projects
None yet
Development

No branches or pull requests

8 participants