Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Work around a weird bug in Yarn v1.x #101

Merged
merged 1 commit into from
Oct 24, 2023
Merged

CI: Work around a weird bug in Yarn v1.x #101

merged 1 commit into from
Oct 24, 2023

Conversation

DeeDeeG
Copy link
Member

@DeeDeeG DeeDeeG commented Oct 23, 2023

Install a global copy of node-gyp for Yarn to use. Works around a really weird bug.

Details of the bug (and explanation of the workaround) below:


Yarn install in Yarn v1.x has a really obscure bug that only happens for certain repos that "require node-gyp", and even then, only if the globally installed copy of npm is v9.7.2 or newer (or if there is no global copy of npm installed, which is quite rare.)

In these unusual (and complex to describe) circumstances, this repo's postinstall script will not run, due to the bug in Yarn v1.x.

Unfortunately for us, the versions of npm Yarn v1.x is incompatible with in this manner, npm v9.7.2 or newer, are bundled with NodeJS v18.18.0 and newer, as well as NodeJS v20.4.0 and newer. So, this is the new normal.

Working around this is easy IRL. Yarn globally installs a copy of node-gyp it can use the first time, so this is a one-time issue. The only problem is it not handing off to its own package lifecycle script handler afterwrd, thus skipping the postinstall.

So, to work around this problem IRL, simply run 'yarn install' a second time, or run 'yarn postinstall' once, in the ppm repo.

ppm repo still needs a workaround (like this commit) to keep testing in CI with newer versions of Node. Either we install older npm in CI (not a sustainable fix, since this will become incompatible with newer versions of Node eventually), or give Yarn a copy of node-gyp it can use, (anything to ensure the postinstall script actually runs), which is what this commit does.

(We could just manually run 'yarn postinstall' after the install, but this commit's approach is tidier, faster on machines where the workaround isn't needed, and clearer about what is going on.)

(We may also want to consider switching to the newer versions of Yarn, which are actively maintained. Yarn v1.x doesn't accept bug fixes anymore, only "maybe" critical security fixes. Something like this could happen again, and we'd be on our own, again.)

Install a global copy of node-gyp for Yarn to use.
Works around a really weird bug.

Details of the bug (and explanation of the workaround) below:

---

Yarn install in Yarn v1.x has a really obscure bug that only happens
for certain repos that "require node-gyp", and even then, only if the
globally installed copy of npm is v9.7.2 or newer (or if there is no
global copy of npm installed, which is quite rare.)

In these unusual (and complex to describe) circumstances, this repo's
postinstall script will not run, due to the bug in Yarn v1.x.

Unfortunately for us, the versions of npm Yarn v1.x is incompatible
with in this manner, npm v9.7.2 or newer, are bundled with NodeJS
v18.18.0 and newer, as well as NodeJS v20.4.0 and newer.
So, this is the new normal.

Working around this is easy IRL. Yarn globally installs a copy of
node-gyp it can use the first time, so this is a one-time issue.
The only problem is it not handing off to its own package lifecycle
script handler afterwrd, thus skipping the postinstall.

So, to work around this problem IRL, simply run 'yarn install'
a second time, or run 'yarn postinstall' once, in the ppm repo.

ppm repo still needs a workaround (like this commit) to keep testing
in CI with newer versions of Node. Either we install older npm in CI
(not a sustainable fix, since this will become incompatible with newer
versions of Node eventually), or give Yarn a copy of node-gyp it *can*
use, (anything to ensure the postinstall script actually runs),
which is what this commit does.

(We could just manually run 'yarn postinstall' after the install,
but this commit's approach is tidier, faster on machines where the
workaround isn't needed, and clearer about what is going on.)

(We may also want to consider switching to the newer versions of Yarn,
which are actively maintained. Yarn v1.x doesn't accept bug fixes
anymore, only "maybe" critical security fixes. Something like this
could happen again, and we'd be on our own, again.)
@DeeDeeG
Copy link
Member Author

DeeDeeG commented Oct 23, 2023

Repeating myself a fair bit, but doing this a bit more informally to get my thoughts across (feel free to skip or skim, I bolded key parts and context for the repo, and so on):

This fixes the weird CI failures in the most recent versions of NodeJS 18 for this repo. And yeah, this turned about to be a really weird one, even after I mostly understand what's happening here.

Looks like Yarn v1.x does some funky stuff[1],[2] that's meant to find node-gyp either on the PATH, or as bundled with the copy of npm bundled with NodeJS. (Unlike npm, Yarn does not bundle its own copy of node-gyp out of the box).

If Yarn v1.x can't find node-gyp in any of those places, and it believes it needs a copy, it will add a copy of node-gyp in its own special bins dir.

That's all well and good, and it works in and of itself, but what's not good is that this happens during Yarn's package lifecycle script handling, like just before the postinstall script would normally run, and it doesn't properly hand back off to the lifecycle script handling code. It just effectively exits that code early right after downloading its own copy of node-gyp to use.

So... this PR simply does the "grab a copy of node-gyp for Yarn to use" thing early, before the bug in Yarn's package lifecycle script code can be hit. The postinstall script runs as normal with this workaround. CI should finally be green again for us.

This only started happening as of npm 9.7.2,[3] which is included in NodeJS 18.18.0+,[4] and NodeJS 20.4.0+.[5] That's because the subdir of npm (bin/node-gyp-bin), which is where Yarn v1.x checks for npm's copy of node-gyp, was deleted in npm v9.7.2.

We'll probably want to look at migrating to newer Yarn at this point, IMO, since Yarn v1.x isn't accepting bug fixes anymore, only maybe (?) critical security fixes. Hopefully the transition path is decently smooth now? But I don't want to debug obscure things like this again in the future, knowing there's no real fix coming, only strange workarounds. (And boy was this a weird one to get to the bottom of. Took a while, too.)

By the way, this would have no merge conflicts with #95, so there should be no worries about merging this before that one if need be to get back to green/passing CI for this repo.

@DeeDeeG
Copy link
Member Author

DeeDeeG commented Oct 23, 2023

This is definitely one of those "really small changes I went overboard about writing the PR for and explaining it to death" things.

I do like to get my thoughts down while they're fresh once I've tracked down a weird and obscure problem, or done my research on something I usually don't have to think hard about.

But this is another real simple PR that I hope will be easy to review if looking at the diff alone.

Sorry about the walls of text. I hope they are informative to anyone who reads them, and inoffensive to anyone who skims or skips them! If anything, I get the satisfaction of being thorough.

Copy link
Member

@confused-Techie confused-Techie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a sufficiently bizarre and obscure bug that I'm inclined to fully trust your research on finding the best solution, especially because as we can see we finally have green CI again.

Seriously amazing work on this one. I know I've always said you're the best person to go to for the bugs that don't make any sense, but I feel like you've thoroughly outdone yourself here. Fantastic write up on the issue, and I'd be seriously curious on how this discovery came about.

But like you mentioned there are no big conflicts here, so I'm more than happy to merge now, and even ask #95 to update from master with these changes

@DeeDeeG
Copy link
Member Author

DeeDeeG commented Oct 23, 2023

I'd be seriously curious on how this discovery came about.

I have a log of stuff I tried: https://github.com/DeeDeeG/ppm/commits/oldddd-node-18-npm-fix

  • Well, today I was checking how many NodeJS versions happened between our last green CI and the next red one. It was between Node 18.17.1 (here) and 18.18.0 (here).
  • I'm super distrustful of CI and GitHub's first-party Actions at this point, so I tried bumping actions/setup-node to the latest version, but no dice.
  • Began to suspect one of my key fact-finding from before (my own internal monologue practically echoed in my head like "~It doesn't even happen locally! Must be a CI thing!~". Hmmmmm. Maybeeeee not?
    • Looked into updating actions/checkout, nothing there [in the actions/checkout changelog] sounded relevant
    • Checked the rest of [ppm's] CI [workflow definition yaml], it's super bare-bones, not very heavily "CI-only"'d up.
      • Starting to get suspicious it's not a CI problem
    • Suddenly had a strong vibe of like "What if isn't not a CI thing?" -- the amount of effort I'd put into other stuff made this suddenly a much more attractive thing to re-visit, I just had a sense this is where the fruitful path had to be now, I'd basically looked into what there was to look into in CI-land, and this problem was way too reproducible to not try it IRL again some more outside of CI
  • Investigated how Yarn might think it had node-gyp already, since this was the odd thing out, [in CI I could see] it was reporting it "required node-gyp and was gonna go fetch it" every time it failed, and none of the times when it didn't fail [in CI]
  • Somehow landed on testing yarn global command (maybe I did yarn --help to see available commands as a starting place? [yarn global probably stood out at this point, since I'm already looking for global node-gyp installs that shouldn't be there, per the weird "fetching node-gyp" message from Yarn in CI. In hindsight, the weird message in CI mentions yarn global add node-gyp outright, but not in a blindingly obvious way necessarily, but it's not a long message so it stands out. I forget what led me to run yarn global list exactly, though.])
    • yarn global list showed I did have node-gyp installed globally!
    • I had always assumed Yarn re-used the global install dir that npm installs global packges to. Nope. It has its own.
    • So, when I said "I don't even have node-gyp installed locally!" -- this was true of npm's global installed packages, but not true of yarn's global installed packages
    • WTF this might be a real not-CI issue, OMGWTFBBQ Yarn why are you like this, or is Node the prime suspect, or is it npm IDK
  • Went back and tried to reproduce with different Node versions, since that's what changed in CI.
    • Node 18.18.0 is affected, Node 18.17.1 isn't. (As long as I keep deleting Yarn's copy of node-gyp it keeps global installing. Just like CI.) Okay, definitely not a CI issue.
      • Deleted Yarn's global copy of node-gyp.
      • Deleted node_modules and tried yarn install again locally. It's not a CI issue. Reproduced it in-person on my own machine. Jackpot.
      • By the way, the text output where Yarn says it's fetching node-gyp is blank/effectively just newlines and "..."s interleaved with the "building fresh dependencies" Yarn output on my screen, so I never would have known I had this copy of node-gyp installed before. I never asked for it, never saw it being installed, but in CI you can see Yarn does in fact fetch node-gyp. But yeah, this never printed right on my machine, no wonder I didn't know about it.
      • Gotta read the Changelogs of Node and NPM now I guess? Since it's based on the Node or npm version, maybe?
      • Yarn [1.x] has had no releases in like, years it feels like? So it's not a Yarn update. Nope.
    • Gotta research the dang heck outta this now, though, what the heck is going on? ^ See "gotta read changelogs"
    • Searched the very specific string about "this package requires node-gyp" in Yarn's code-base, found this translation string: https://github.com/yarnpkg/yarn/blob/158d96dce95313d9a00218302631cd263877d164/src/reporters/lang/en.js#L229
      • Searched the corresponding translation key, found this: https://github.com/yarnpkg/yarn/blob/158d96dce95313d9a00218302631cd263877d164/src/util/execute-lifecycle-script.js#L317
        • This is where our weird message is being emanated from in Yarn's code
          • Puzzled over this for way too long.
            • Thought about if I could fix it. Yarn is heavily transpiled before being shipped, so the code does really weird stuff. Pretty much none of this stuff ends up being promises by the end. I tried a fix, it did nothing.
            • Checked Yarn's CONTRIBUTING.md to see if there are hints how to build it, it basically got an update real recently to just say it's the end of the line for Yarn v1.x.
            • Noticed the most-recent commits are updating README.md and the PR template to say "don't submit bug fixes, we'll only maybe do critical security fixes", so I gave up trying to fix it pretty quickly, admittedly
    • Tried different npm versions on Node 18, between the version in the last green CI run (npm 9.6.7) and the next red CI run after that (npm 9.8.1).
  • Wrapped up my research making sure I understood everything I could, made sure there's no appropriate place to report this, but no, it's a Yarn 1.x issue, and Yarn team don't want 1.x bugfix PRs or really to have non-critical non-security issues reported, they want folks to migrate to newer Yarn. Understandable

I never write blog posts, so this is about as close as I'm gonna get lol. Since you asked.

It was a journey, I might as well memorialize it here before I (as quick as I can) forget all this crazy stuff, lol. I mean, only half joking, this is pretty useless knowledge other than to know "it needs a workaround" and "here's the workaround". Heh.

And that's about as much as I should write about this, actually debatably I could have profitably stopped writing a lot earlier, LMAO. But I do like write-ups. I think that's pretty abundantly apparent by now, lol.

@DeeDeeG
Copy link
Member Author

DeeDeeG commented Oct 23, 2023

I believe #95 can be left alone, I can do a run on another branch maybe just to confirm it works well on NodeJS 18, but 2colours has been through a lot, another merge or rebase on that PR feels too much disruption to their PR (in what is merely my personal feeling about this rather subjective matter).

DeeDeeG added a commit to pulsar-edit/whats-my-line that referenced this pull request Oct 24, 2023
Workaround for a weird Yarn v1.x issue,
which only happens when global npm is v9.7.2 or newer.

See pulsar-edit/ppm#101 for an explanation.
@DeeDeeG
Copy link
Member Author

DeeDeeG commented Oct 24, 2023

Thanks for the review-approve, I'm merging this!

@DeeDeeG DeeDeeG merged commit 03b81ad into master Oct 24, 2023
11 checks passed
@DeeDeeG DeeDeeG deleted the fix-CI-node-18 branch December 13, 2023 06:17
corneliusroemer added a commit to corneliusroemer/hyper that referenced this pull request Aug 25, 2024
Need to add node-gyp as devDep due to weird node 18.18 bug, see e.g. pulsar-edit/ppm#101
philip-peterson pushed a commit to quine-global/hyper that referenced this pull request Nov 23, 2024
Need to add node-gyp as devDep due to weird node 18.18 bug, see e.g. pulsar-edit/ppm#101
philip-peterson pushed a commit to quine-global/hyper that referenced this pull request Nov 23, 2024
Need to add node-gyp as devDep due to weird node 18.18 bug, see e.g. pulsar-edit/ppm#101
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants