-
Notifications
You must be signed in to change notification settings - Fork 30.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental: Symlinks Just Work #9719
Conversation
@VanCoding @isaacs You might be interested in this |
Major unless proven otherwise, sorry.
That is to say, at least, do not expect when/if it will land but we appreciate the effort and care about it nonetheless. |
@Fishrock123 Researching this will require participation from the ecosystem. Expecting them to git clone a fork/branch from an unknown org, and then build a version of Given how simple (and easily reviewable) the changes are, and that by simply including the PR the default behavior of Both of your concerns I have spoken to and addressed in the description, and leave me with the general impression that you are simply ignoring the details. How unfortunate for those of us who use |
This is mistaken, the tests unfortunately do not cover all of what is possible via the APIs. Sorry if I came off wrong, I wanted to let you know the likely response upfront. I don't have time to read it right now and am mostly out of the conversation other than that landing of this needs extra review. The module system is "locked" with reason, although it is possible for changes to happen. |
... pardon? Any other collaborators here have about as much say as I do.
I know little about how the module system works. I, myself specifically, have other more pressing things to do at this time. Maybe others will chime in (American thanksgiving is this week so people may be busy). Long explanations attempting to... I don't even know what about my position will not help anyone else read this thread. You best course of action, was, is, and usually will be to wait a bit on large issues. |
As I said I'm new to this so my sincere apologies; I was only wanting to better understand you and your statements and this process. Github indicates that you are a I understand it will take time for this to be reviewed and so forth, but again it came across in a way that implied it might never come to that, and given what I thought was your position I reasoned others might give weight to your words and conclude this PR would never come to be. My sincerest apologies as I detect I've irritated you. |
@Trott That was incredibly kind of you. Thank you. |
@Trott I'm presuming you're running the tests with the switch off, to verify default behavior has not been altered? I hope that will help get the PR merged someday. What would also be interesting is to run with the switch on. Is it relatively straight forward to run the tests, but ensuring every spawned process have an environment variable set? |
@Trott Thanks again for running those tests on I'm looking at the early results and the tests seem to be failing for reasons such as node's version being 8.0.0-pre, and not necessarily from a failure in the Modules subsystem, so I'm wondering if there's a baseline of some kind to distinguish between failures in the 8.0.0-pre master and those as a result of the PR? |
Yes.
I don't use that CI job too much. Let me @-mention someone who uses it a lot and see what they have to say: @thealphanerd |
Think I've scared everyone away, probably before they even got here. Anyways, I've been developing this PR on Windows. In getting citgm running on a linux vm, I ran with the PR switch on, and hit the first issue. This is actually an issue in In my testing, This worked fine on Windows, and I could also still run globally installed commands. However, on linux when trying to run a global command, like npm, it failed with the PR active. This is because on linux, npm creates an executable filetype-symlink directly to the You have follow the symlink for the So, probably how |
@phestermcs I don't want to discourage you on this. You've done a chunk of work and you clearly feel strongly about it and have thought about it a lot. But there are big obstacles for this. It doesn't mean it's never going to happen. But it does mean it might never happen and, if it does happen, you will need to be patient and persistent. The big obstacles are:
Therefore:
If you want to see this (or something similar to it) land, here's how I suggest you proceed:
Don't be surprised if you work hard to advocate for this and it doesn't happen anyway, because you are definitely facing uphill on this. Again, that's not to say it absolutely won't happen. But it is to say that the burden of convincing a lot of people that this is worth doing is on you (and anyone else who wishes to advocate for it). I'm personally unconvinced (at least as of this moment) that this is something we want to do, but I wouldn't say that I'm impossible to convince. (I'll also admit that I haven't yet read yarnpkg/yarn#1761 and probably other related things that may or may not affect my opinion.) |
FWIW I think a good tl;dr would be good to have at the top of the OP. That is a lot of text to take in ... |
An EP may be most appropriate here, I agree. It is great to have working code to go along with it though. |
Not commenting the nature of this PR, this change doesn't look semver-major to me. All code paths that are added are wrapped in If the new flag isn't switched on (it's not by default), the behaviour is (or should be after minor tweaks to this PR) completely unaffected. Note: I mean the actual observed behaviour. Timings, optimizations, etc. could be (or not be) significantly affected by this PR. Another question is what happens when the flag is switched on and how far does this break user expectations, and how far would enabling this flag break existing ecosystem packages — that I can't estimate atm. |
Thanks@Trott thank you so kindly for replying so thoughtfully and providing actionable feedback; your time in doing so is greatly appreciated. I agree and understand with everything you say, and I'm fully prepared to push the rock uphill.. I was just getting the impression over the last month or so, that no one would be at the top of the hill to actually take an honest look at the rock as something that annihilates a real problem. Regarding risking the eventual need to support an undocumented feature, totally get it. It's just one way to get bits into peoples hands in order to collect empirical real-world evidence; maybe, after I've convinced you of the value and merits of the thing, you can help in the other ways. TL;DRMy sincere apologies to @mscdex, @Fishrock123, and any others trying to even contemplate reading the PR comment because of its length; I've spent more time trying to succinctly define the essence of the thing than actually fixing the thing, because this issue in particular has a number of unique challenges:
I've found it extremely difficult to engage with anyone at a deep technical level about any of this, and practically all input as been anecdotal, like 'symlinks have broken vast amounts of the ecosystem', with adjacent yet unrelated issues thrown in, while actual technical descriptions of actual problems have been few and far between. Frustrating to say the least. The only thing I ask, is to please grill the ever living s__t out of me about all of this; give me an honest shot to convince you! And when I have branches of node and yarn showing it all working, get my fork/branches, do the build, it see it all working for yourself! But just give me an honest shot, and I won't let you down, even if that means the whole idea needs to be s__tcanned; despite how it may seem, I am very reasonable person. I love Just imagine for one secondThis PR will, once and for all, give us symlinks that just work, will be backwards compatible, and will give package managers the option to store all module@versions once on a machine, and symlink to from anywhere on the machine. Memory bloat and addon crashing be damned. I would even take the bet, that within 2 years, the I'm currently running I'm going to test the hell out of this for sure, but the initial results look quite promising. Just give me a fair and reasoned shot, please. |
@ChALkeR Thanks kindly for actually taking a look at the code. I've been doing dev on Windows. I just started last night testing on 'nix, and it did uncover an issue. I've sense coded for and I'm currently testing (with great success btw), but unfortunately it's just a smidge more intrusive, and will be more controversial.. such is life |
...and not to ruffle any feathers... well, not too much anyways... there's a reason |
Can someone tell me exactly what would be the best format of |
@Trott Maybe you can answer two simple questions:
My opinion is either don't support them at all, or fully support them. Right now I would say
Symlinks are a very valuable way to optimize use of the file system for certain classes of problems, and if ever there was a real world example of the kinds of problems they best solve, it's If your answer to both those questions is yes (and why wouldn't it be), then I've already convinced you this PR is of substantial value; I just now have to show you this solution works beautifully. (I'm not even going to pretend you answered no ;) ) |
@phestermcs Why did you close this PR? I finally took the time to read your proposal, and I think it's pretty smart! The reason for this is, that your proposed "Adjacent Node Modules" would not work without changing how symlinks curently work. But changing symlinks to behave like you proposed, does not require to also implement adjacent modules. This made it a bit harder for me to understand it. But now that I understand it, I think it's awesome! |
@VanCoding Thanks kindly for taking the time :) I created the PR primarily to begin a discussion. I'm still learning how best to do these types of things in uhm... maybe TL;DR??You are correct, fixing --preserve-symlinks should have come first, and in fact I'm in the process of rejiggering some test branches, but I kinda back-ended into it having first started with adj-nm. There's also an additional fix in --preserve-symlinks to only follow main.js if it's a file-symlink, and not when it's just a file potentially through a directory symlink (this was breaking tooling). Fwiw, adj-nm stills 'works' without using symlinks, but it enables using symlinks to a machine store of modules, so they 'kinda' go hand-in-hand. What I'll have later today in my fork:
I don't really intend any of these to ever be directly pulled by I'm going to 'publish' all this as an issue titled 'fixing --preserve-symlinks', rather than a PR, with things explained in simple bullets so it will be 10x's shorter than the description in this PR. It will include links to my fork/branches so others can clone and test, and links to gists showing all results from Stay tuned. |
That's a good plan!
|
We're thinking alike :). I've looked quite a bit at npm, yarn, ied, and pnpm. I will be creating issues on all their repo's to pull them into the loop, hopefully get their support. I love node, a lot. It lets me run super fast, except with a thorn in the ball of my foot. Hopefully someday we can run without the thorn (and have installs take like 1 second) |
Are you able to, or would be willing to build my fork/branch and see how it works for you? |
Please offer both myself and symlinks a second chance to make a first impression with this new issue |
Checklist
make -j8 test
(UNIX), orvcbuild test nosign
(Windows) passesThe same tests pass/fail with and without the change
At the end of the comment
This is an intentionally undocumented, experimental opt-in switch
Affected core subsystem(s)
Modules
Symlinks that Just Work
The request to merge this PR also comes with a request that for the time being, the opt-in command line switch that activates the change in behavior be left undocumented and considered experimental.
To fully realize what the switch enables will require changes to package managers, and will then need to be used and vetted by some representative portion of the ecosystem to uncover and address any potential issues, all of which will take some time.
This is ultimately to determine if the solution should be announced and broadly publicized because it works great, or removed from
node
because it fundamentally does not solve the problem in a way that can be reliably used on a daily basis, or it in any way breaks the current ecosystem.What I've Experienced
I'm a professional software engineer of 35+ years who in some way uses
node
on a daily basis, and has been doing so for about 2 years.I have 150+ projects on just one of my development machines, spanning front-end, back-end, and library type projects, from developing my own modules, modules as part of teams, and open source modules I've git cloned and have contributed to or used for learning.
I enjoy developing with
node
, except whenever I need to install or update or change module dependencies for a project, as the time it takes to always copy modules, or sometimes delete thenode_modules
directory and reinstall everything, is quite frustrating. It feels like torture by a thousand paper cuts.I know that in many cases I've already used many of the module@versions being copied during an install, and they're already on my machine. I know symlinking exists, and from past experiences outside
node
know that if it could be used correctly, installs andnode_modules
directory deletion would go from taking a couple minutes to a couple of seconds. It's like pouring rubbing alcohol all over the paper cuts.I can also see by looking at the average size of the
node_modules
directory in my projects, and the size ofnpm
's module cache, even factoring in a 90% compression ratio on the tarballs, that all those copies are taking several orders more space than would be needed ifnode
andnpm
could exploit symlinking to its fullest; I could get many Gigabytes of storage back.I understand several attempts have been made to use symlinks, but they have often just not worked in ways that can be generally relied on, and have caused show-stopping issues with memory consumption, module dependency-version resolution, addon loading, filesystem cycles, and tooling failure. Because of this, practically speaking, symlinking just shouldn't be used during development, and the general perception is that it will never quite work.
What I'd Like
I'd like
node
andnpm
(or any package manager) to exploit symlinks to optimize in all possible ways how modules are stored on my machine. In other words, for a given module@version there should be only one physical copy on my machine, and it should be symlinked to from anywhere else it may happen to be used on my machine.Other than telling
node
andnpm
its ok to maximally exploit symlinking, I'd like to never have to do something different in how I'm using development tooling, or how I author a module and its module dependencies, just because modules may have been symlinked vs having been copied. In other words, module behavior and tooling behavior should always work the same, like they have no idea nor care if the module's directory was a symlink or an actual copy.I'd like all the current show-stopping issues with using symlinks to just disappear.
I'd like it to be easy to point a symlink away from the central copy to somewhere else in my development root folder, to support those situations where I'm concurrently developing dependent modules at the same time.
I'd require a means to alter a particular symlink within a given project, by replacing it with a copy, so in those rare cases I want to temporarily muck with dependencies source in order to understand some behavior or find a bug, I can do so without affecting the centrally stored copy.
I'd require this all to be done without breaking anything in any way with how
node
andnpm
work today.I'd require that if there were a solution that I began to use, and then some issue arose, I could always very quickly and easily just go back to not using symlinks for the particular project, simply by deleting
node_modules
and tellingnpm
to copy-install everything.While it would be nice if this would work in other contexts, such as on production servers, I only require that it work this way during development on a development machine.
How it can be done
This PR enables
node
, when explicitly opted-into, and working in conjunction with sufficiently enhanced package managers conforming to a normalized use of symlinks, to leverage symlinks in a way that Just Works. Additionally, when deploying modules, package managers can easily ring every last drop of value out of symlinks to fully optimize the use and consumption of a given machine's filesystem, regardless of how many places a given module may be redundantly used across the entire machine.It enables this while also addressing all know issues within
node
when using symlinks to module directories, and would not require developers alter the way in which they currently specify, author, and consume modules, or expect modules to behave. In other words, the only things that would need to be aware of symlinks would benode
and package managers (and bundlers like webpack and their brethren); developers and tooling would not.Because the behavior is opt-in, merging the PR will not change
node
's default behavior. When opted-into, except for one edge case, the behavior should be fully backwards compatible with respect tonode
's behavior during link-time dependency-version resolution.The behavior-effecting changes in this PR (ie those absent
housekeeping
, like reading the command line switch) are about 10 lines within the locked Modules subsystem.I fully appreciate how preserving
node
'snode_modules
based elegant mechanic is fundamental to the entire ecosystem, even when not directly being used bynode
, but by bundlers such as webpack. It is currently the de-facto mechanism for dealing with link-time dependency-version resolution for the javascript ecosystem, and ES6 Modules wont change that as they don't address the problem of versioning.I believe this PR not only preserves that, but enhances it just as elegantly while allowing us to get symlinks that just work.
Adjacent Node Modules
In order to precisely control what dependencies get resolved for a given module, they must be stored in a subdirectory,
node_modules
, of the dependent module's directory.node
's resolution logic does allow for shared modules to be stored in a common ancestornode_modules
directory, but this is merely an optimization. If a shared module has more than one of its versions used anywhere in a given dependency tree, the package manager can only bubble one of the versions to some common ancestor, with the remaining most likely needing to be copied into each dependent module'snode_modules
subdirectory.The
node_modules
subdirectory constraint, while extremely elegant in fundamentally dealing with link-time dependency-version resolution in those cases where several versions of a common module are used in a single tree, is the core reason modules cannot be centrally stored on a machine and then symlinked to from anywhere else used on the machine. It is also the core reason filesystem cycles can appear when using symlinks. (Note: Theied
package manager is a little more clever in how it ensures modules resolve their dependencies, but it is also prevented from using centrally stored copies for the same fundamental reason)Understanding why it's the cause of cycles should be straight forward, but some don't at first understand why it prevents centrally storing. It basically comes down to the fact that module dependencies specified in a package.json usually come with a version specifier that is a range of valid versions, and based on simply when a module is installed and what version of it's dependencies happen to be most recently released, the actual dependency versions used can change from install to install; i.e. from project to project. This is one reason
npm
implements a "shrinkwrap" option, and whyyarn
has a "lock file". If a module was centrally stored with it's dependents underneath it in itsnode_modules
subdirectory, not matter if they be copies or symlinks, it would not be possible to honor the shrinkwrap or lock files between projects, or for the same project between developer machines.The solution is to offer an additional, equivalently scoping directory as the
node_modules
directory, that is located adjacent to the dependent module's directory rather than underneath it. This solves both the cycle problem when using symlinks, rendering them physically impossible, and the problems with symlinking to a central copy, all in one fell swoop.With this approach, the central copy of a module would never have a
node_modules
subdirectory. Instead, an adjacent equivalently scoping directory would be an actual directory within a given project, itself containing symlinks to the particular modules dependencies@version, thereby keeping them specific to their inclusion in that specific project, which then enables shrinkwrap and lock files to work as expected, and directory cycles to never occur.Implementation
Implementing Adjacent Node Modules is incredibly simple to do and understand. When
node
processes arequire()
call whose request path is neither relative nor absolute (i.e. doesn't start with '..', '.', or '/'), the first thing it does is create an ordered list of directory paths to search, starting from the directory location of the.js
that made therequire()
call, ensuring or suffixing'/node_modes'
as necessary. A list of directories could look something like this:Making an ordered list of equivalently scoping adjacent node_modules directories, is as simple as changing
'/node_modules'
to'.node_modules'
(the/
becomes a.
), so the list would now look like this:node
would still mechanically implement link-time dependency-version resolution, practically speaking, in exactly the same way it currently does. But of course if this PR only did that, activating the new behavior would obviously break running node against trees deployed using'/node_modules'
. The solution for that is also very simple; we interleave both paths in the search, giving priority to'/node_modules'
, and so now the search list looks something like this for deployments using'/node_modules'
based tree structure:and then like this for new deployments using the
'.node_modules'
based tree structure:The actual implementation merely creates an additional search path by suffixing
'.node_modules'
, right afternode
decides it needs to make a search path suffixing'/node_modules'
, where the same parent folder is used in each case; super simple. This means with the new behavior active, its not only backwards compatible, but both structures are interoperable, giving precedence to'/node_modules'
.I earlier alluded to one edge case that could be a problem, and that would be if someone named their module something like
'myModule.node_modules'
; i.e.'.node_modules'
was actually part of the modules full name. While not impossible, I would opinion a highly improbable thing to occur. Package managers could easily check for such a thing when installing, and prevent with a warning, and the developer could then decide to simply not use the new behavior.Simultaneously Preserving and Following Symlinks
By default,
node
will always 'follow' symlinks, meaning it will get the 'real' path of a symlink, and use that to identify the module. In other words the__dirname
of a module is always the 'real' path of where the module physically exists. The 'real' path is also the path used insidenode
to cache/map a path to a module instance. This ensuresnode
never creates multiple instances of the same logical module, which prevents memory from being consumed unnecessarily, and preventsnode
from possibly crashing when trying to load the sameaddon
twice. However, its at the expense that other kinds of resolutions relative to the module's__dirname
now occur outside the symlink, which is often not what is wanted and prevents many uses of symlinks.When
node
is told to--preserve-symlinks
, it uses the symlink path for both the__dirname
and its internal cache/mapping, and while it does let symlinks work a bit more like one would expect (kinda, as the directory of the entry.js passed on the command line is not preserved), the above problems can then occur regarding memory consumption and crashing from multiply loading the same addon.The fix for this is also incredibly simple, but first lets understand how a module's path is, and is not, significant.
When following symlinks, if there are multiple symlinks to the same module, no matter what, the module will always have the same
__dirname
; it's 'real' path. In fact, from execution to execution, one could change the physical location, i.e. the 'real' path, update all the symlinks, and the program would still operate exactly like it did on the previous executions, even though at no time where any of the symlink paths used as the__dirname
.This makes it quite obvious the actual
__dirname
of a module isn't important to the modules behavior, but only to resolutions that are relative to it's__dirname
. And this is what lets us fundamentally simultaneously preserve and follow symlinks. This is accomplished by no longer coupling the path we use to cache/map a module or addon instance, to the path we use to initialize its__dirname
.Implementation
With the new behavior active, the path we use to cache/map will always be the 'real' path of a module, but the path we use as the modules
'__dirname'
will always be the first symlink path that was used to initially load the module (or the real path if it wasn't a symlink to begin with). This means that the next symlink to that module that getsrequired()
, will still get the first instance that was created, but that instance would still have its__dirname
set to the symlink that it was first loadeded through. But going back to our previous thought experiment regarding changing the real path from run to run, it won't matter. However, because the__dirname
is in fact still a symlink, things will work exactly as one would expect.Also, unlike the
--preserve-symlinks
switch, this behavior is applied to the directory of theentry.js
file passed tonode
on its command line, so that tooling and scripts that launchnode
withentry.js
files somewhere down in thenode_modules
tree also still behave as expected when they're coming from a symlinked directory.This does place and additional responsibility on package managers to ensure that in each case the symlinked modules will resolve their dependencies exactly the same, but that's an easy thing, and not part of
node
's behavior.All Together
By implementing Adjacent Node Modules and Simultaneous Preservation and Resolution of symlinks,
node
can very easily and efficiently use symlinks all the time in all cases, without tooling and developers having to do anything different than they otherwise would.However, there are going to be some issues with package managers and tooling. For example, package managers run preinstall, install, and postinstall lifecycle scripts. How and when those run would need to change. Also, some scripts simply launch
node
to run some module that was actually installed somewhere in the tree, and package managers will need to know to set the environment variable that activates the new behavior before spawning a process to run thenode
script.There are also others things that might be issues, but I'm already working on a first POC fork/branch of yarn, and this is also why, for now, it would be nice to merge this PR, but keep it as a hidden and undocumented switch.
Lastly
I'm looking foward to answering any questions in more detail, and responding to any concerns. However, if you decide to comment or question, please, please, please don't say something like 'This will break things with ES6 Modules', or 'This will still likely break vast amounts of the ecosystem'. I can't respond to those types of comments, they're not at all helpful in bringing about technical awareness, and I will simply ignore them. Please reply with actionable, technically descriptive responses.
Test Results