Skip to content
This repository has been archived by the owner on Mar 28, 2021. It is now read-only.

XZ compression support #8

Closed
DeeDeeG opened this issue Aug 27, 2019 · 44 comments
Closed

XZ compression support #8

DeeDeeG opened this issue Aug 27, 2019 · 44 comments

Comments

@DeeDeeG
Copy link
Contributor

DeeDeeG commented Aug 27, 2019

Hi again.

I recently made the pull request to add an xz compression option to n. I noticed NVH doesn't have this option at the moment.

I wondered about the way to do it "properly," since it's a bit obscure the way it's implemented in n right now. As such, I decided to open an issue and not jump straight to a pull request this time.

The feature could be added similarly to how it was done over at tj/n:

  • Simply read the variable NVH_USE_XZ from the environment, and decide to use xz or gzip based on that

But I also thought about the following:

  • Whether to make xz compression the default (keeping gz as a fallback)
  • Whether to specify compression algorithm (xz or gz) with a command-line argument instead of an environment variable
    • Or even to implement both ways of specifying compression?

Overall I figured it would be a low-pressure way to figure out doing xz compression the "right way" to do so in this repo. As a fork, there is a bit more freedom. And I suppose whatever solution is decided here can probably be ported to tj/n relatively easily if desired.

If you have thoughts on the preferred way to do this, I would be interested in working on and/or collaborating on an implementation.

Best regards.

@DeeDeeG
Copy link
Contributor Author

DeeDeeG commented Aug 27, 2019

I made some example implementations:

Side note: I found working with the command-line arguments section of the script pretty easy, so I'm thankful it is set up so neatly to begin with!

@shadowspawn
Copy link
Owner

Hi @DeeDeeG

Whether to make xz compression the default (keeping gz as a fallback)

Making xz the default is my preferred method, if it is robust enough and not too complicated!

Looking at other managers:

  • nvs has an apparently undocumented NVS_USE_XZ environment variable, like n now has.
  • nvm-sh defaults to xz if the tool is present and the version is known to have an xz tarball in nvm_supports_xz

I do not want all the complicated version testing like nvm does, thinking just the >4 test. Older downloads can always use gzip, more recent versions can use xz if available.

@DeeDeeG
Copy link
Contributor Author

DeeDeeG commented Aug 28, 2019

Thanks for the reply.

Here is an implementation with those checks:


  • I did a check similar to the one in nvm-sh to see if the xz command is available on the system.
  • Another check to see if the Node version to be downloaded is greater than 3. (I can change this to "greater than equal to 4," which is a bit more intuitive/readable, I suppose.)

If both of these pass, xz use is allowed. I also added NVH_USE_XZ="true" at the top, in the setup phase, which seemed like the best way to tell NVH to use xz by default.


My thoughts regarding this implementation:

  • Perhaps it makes sense not to have command-line options/switches for xz versus gzip. I'm not sure why people would use them, or if it's worth this being end-user configurable, since most people just want the smaller tarball, and it's not like we keep the tarballs around (I don't think?), i.e. we are downloading and immediately consuming/discarding the tarball. Not sure whether this really needs end-user configuration, since xz is basically better, where it's available and the system can extract it.
  • Even when setting xz to be used by default, it seems prudent to have gzip be the default, general case internally, and xz be the "special case" activated after passing checks. I say this because, if something goes wrong with the checks, it seems saner to bail out to using gzip, rather than the inverse, bailing out to xz if the checks themselves fail to work properly. As such, in my implementation, xz is mostly plumbed in as if it is the "special case" (despite actually being the default, and probably 99%+ of node.js downloads will use it these days).
    • Unless the xz program has a different name than xz on other platforms than Linux, I think the tests are good and robust. So hopefully there will never be a need to anticipate the tests breaking. Just in case, though.
  • This looks to be pretty much done, and it;s ready for a PR and final couple of round of review I think.

@DeeDeeG
Copy link
Contributor Author

DeeDeeG commented Aug 29, 2019

Update about macOS:

On recent macOS, the xz program (as such) is not present on the system on a fresh install. But oddly, their gunzip utility can extract xz archives. (See this StackExchange discussion). Even though gunzip on other platforms cannot. (Refer to man gunzip in a Linux machine and a macOS machine to see the differences; macOS has several other non-gzip algorithms rolled in, whereas Linux does not.)

In any case...

curl [URL] | tar -Jx will work, seemingly irrespective of using the -J flag.

(Oddly, and mildly off-topic/a bit of a sidebar: downloading the xz archive and running tar -Jx [path/to/archive.xz] doesn't work. But a generic tar -xf [path/to/archive.xz] does work.)

I don't think macOS tar properly knows what to make of the -J flag, or how to properly understand that what an xz archive is locally on disk... But can extract it if told "this is the archive I want to extract" with the -f flag, or if piped in via stdin.

I don't 100% understand why it all behaves that way, but the behavior has been mapped out. This requires a bit of extra handling, so I will think about it and look for a good solution for the "supports xz" test. I am thinking of blanket whitelisting macOS as supporting xz, but this is probably not totally accurate? Not sure how far back macOS tar can decompress xz.

Edit: Came back to add a link/source

@shadowspawn
Copy link
Owner

I had noticed the Mac setup when we were looking at adding the environment variable to n (tj/n#571 (comment)) and didn't work out the details. I have xz installed via homebrew as a dependency of something else.

You can see why I was happier initially to add xz as an env var than on by default! 🙂

I did a bit of research and found tar mentions libarchive, and that lists support for xz from 2009. I have not tied that back to tar and Mac and Linux et al, so can't draw any conclusions yet.

https://github.com/libarchive/libarchive/wiki/ReleaseNotes#libarchive-270

@DeeDeeG
Copy link
Contributor Author

DeeDeeG commented Aug 29, 2019

Short summary of this comment: It looks like we can enable xz on macOS 10.7 and above.
macOS version can be checked on command-line with a utility: sw_vers -productVersion


Found some info about OS X and libarchive versions. According to this: https://opensource.apple.com/

OS X 10.6.x shipped with libarchive 2.6.2. And OS X 10.7.x shipped with libarchive 2.8.3.

(2.8.3 is still used in Mojave, albeit with various, minor-looking, darwin-specific patches accumulated over the years if I understand correctly. For example, see libarchive.plist from macOS Mojave 10.14.5 for details.)

So without being able to test a bunch of old mac computers, I'd speculate the correct cutoff is OSX 10.7 and above can use xz, 10.6 and below can't.


Implementation note: It looks like the user's macOS/OSX version can be read on the command-line in a script-friendly way: https://www.cyberciti.biz/faq/mac-osx-find-tell-operating-system-version-from-bash-prompt/

Like so: sw_vers -productVersion output sample:10.11.1


I also poked around in the libarchive source history, and found references to unxz as an xz decompression provider in some cases, and the bsdtar command (separate from GNU tar). And found out that the -J flag is documented via tar --help, despite it not being in the man pages (man tar).

On a mac running El Capitan, I found the following:

  • unxz isn't installed by default, so we don't have to think about that
  • tar is a symlink to bsdtar (from libarchive)
  • confirmed that libarchive is version 2.8.3 in El Capitan
  • confirmed that sw_vers works as expected
  • Update: confirmed curl [archive.xz] | tar -Jx works as expected (as previously tested on Mojave)
  • Mildly off-topic: gunzip can actually decompress xz compression, but you still have to use tar to extract.

@shadowspawn
Copy link
Owner

Good digging, thanks.

I did some digging tonight and came across a few interesting links. I have not formed conclusions yet, but starting to wonder whether full "properly" is too hard! Interesting that node are talking about reducing number of options.

@DeeDeeG
Copy link
Contributor Author

DeeDeeG commented Aug 31, 2019

Mini update to the update about macOS:

As this StackOverflow answer rightly points out, the xz/lzma feature of bsdtar is configurable at compile time. The feature is present back to macOS 10.7, but it is configured off until macOS 10.9.

So I'd say move the cutoff version to macOS 10.9 and above. (Wish I had an array of old macs to test this on!)


I can test a bunch of old Linux releases, so once the more theoretical/research parts are done, I plan to do that (test a bunch of Linux distros against the draft PR) and confirm this works.

I don't feel that working this out for SunOS or AIX is worth it, or feasible, since as far as I can see, I can't test either. Just did a quick search, and you can actually download SmartOS, which is what Node really means by SunOS, apparently. So I might give that a shot.

Regarding AIX: There are no xz tarballs for AIX. e.g. this is the only AIX tarball for node v12.9.1: node-v12.9.1-aix-ppc64.tar.gz) So xz is off the table for AIX. But we can tell that the arch for AIX should always be "ppc64". That could be addressed in a separate PR.


node are talking about reducing number of options.

We can always do is_ok "${xz-url}" to make sure it exists. I actually have a WIP implementation of this, but it's rather messy. Waiting on more research before tidying it up. (Will post as a draft if requested to though. I don't mind posting early/often for review.)

I have not formed conclusions yet, but starting to wonder whether full "properly" is too hard!

That's what I presumed when I began, but I just figured I'd give it a good try and see how far I get. I'm happy with progress so far.

@DeeDeeG
Copy link
Contributor Author

DeeDeeG commented Aug 31, 2019

Here are the work-in-progress implementations:

@DeeDeeG
Copy link
Contributor Author

DeeDeeG commented Sep 3, 2019

Short summary: Linux looks good with these changes.

(I know that 's a broad thing to say, but I did test quite a few distros. All Linux distros that I tested use GNU tar. GNU tar uses xz from the PATH by default (unless configured to do otherwise; no distro appears to have done otherwise). The test added to nvh for this issue, which similarly looks for xz on the PATH, had 1:1 correlation with the given Linux distro's [GNU] tar successfully using xz to extract tarballs.)

(SmartOS is a different story, since it's not Linux, and has its own (unique?) tar.)

(As mentioned in previous comments, macOS uses bsdtar and acts a bit different compared to Linux, so I would like to test on old macOS.)


I have been able to test the script as updated for this issue. I tested it on quite a variety of Linux distros, including some older ones to check how far back xz support goes.

Here are my results: (click to expand)
Operating System nvh status nvh status http tar version on iso latest tar in repos xz installed on iso xz available in repos curl installed on iso wget installed on iso
Debian 5.0.10 Lenny https error ** no rsync GNU tar 1.20 GNU tar 1.20 no no no yes
Debian 6.0.10 Squeeze https error ** good (gzip/xz) GNU tar 1.23 GNU tar 1.23 yes yes ("xz-utils") no yes
Ubuntu 9.10 Karmic Koala https error ** good (gzip) GNU tar 1.22 GNU tar 1.22 no yes ("xz-utils") * no yes
Ubuntu 10.04 Lucid Lynx https error ** good (gzip/xz) GNU tar 1.22 GNU tar 1.22 no yes ("xz-utils") no yes
Ubuntu 12.04.5 Precise Pangolin good (gzip/xz) http not needed GNU tar 1.26 GNU tar 1.26 yes yes ("xz-utils") no yes
Ubuntu 14.04.6 Trusty Tahr good (gzip/xz) http not needed GNU tar 1.27.1 GNU tar 1.27.1 yes yes ("xz-utils") yes yes
Fedora 11 good (gzip) http not needed GNU tar 1.22 GNU tar 1.22 no no yes no
Fedora 12 good (gzip/xz) http not needed GNU tar 1.22 GNU tar 1.22 yes yes ("xz") yes no
Red Hat Enterprise Linux 7.7 no rsync http not needed GNU tar 1.26 no main repo yes no main repos yes no
OpenSUSE 11.3 LiveCD good (gzip/xz) http not needed GNU tar 1.23 GNU tar 1.23 yes yes ("xz") yes yes
OpenSUSE Leap 15.1 good (gzip/xz) http not needed GNU tar 1.30 GNU tar 1.30 yes yes ("xz") yes yes
Arch Linux 2019 09 01 iso good (gzip/xz) http not needed GNU tar 1.32 GNU tar 1.32 yes yes ("xz") yes yes
Manjaro Architect 18.04 Stable good (gzip/xz) http not needed GNU tar 1.32 GNU tar 1.32 yes yes ("xz") yes yes
Gentoo 2019 09 01 iso good (gzip/xz) http not needed GNU tar 1.32 GNU tar 1.32 yes yes ("xz-utils") yes yes
Triton SmartOS https/tar fail good/workaround ?????? ????????????? yes presumably yes yes

Legend:

  • "nvh status" means how well the script worked on the live image, or if there was no live image, immediately after installing.
  • "nvh status http" means how well nvh worked after applying workarounds for broken https.
    • (Workarounds: use nvh with --insecure; OR edit nvh to have only http URLs; OR use curl with --insecure or wget with --no-check-certificate.)

~

Statuses:

  • "https error" means tools like curl and wget fail to fetch resources from nodejs.org, due to being unable to validate the website's ssl certificates. This is not nvh's fault, and implies a system that would be unable to properly use curl/wget for https transfers. As a workaround, to download nvh from raw.githubusercontent.com (for testing), curl can be used with --insecure, wget can be used with --no-check-certificate; nvh itself can be run with --insecure.

  • "http not needed" means that the script worked without using --insecure, and without editing the script to use "http" in urls.

  • "good (gzip)" means that the script successfully uses/falls back to gzip.

  • "good (xz)" means that the script successfully uses xz if xz support is installed.

  • "good (gzip/xz)" means that the script successfully uses gzip and xz, depending on which compression schemes are installed and whether an xz tarball is available for the requested node version and the user's platform (this means the script works fully as intended.)

Notes on Oses:

  • * In Ubuntu Karmic Koala, the "xz-utils" package conflicts with the already-installed "liblzma" package, apparently depended on by "dpkg". Unlikely to be installed for fear of breaking dpkg? But xz-utils is available in the repos, and can work, if one is brave enough to uninstall conflicting packages (and break their system??)
  • ** In Ubuntu Karmic Koala & Lucid Lynx, wget fails to download files over https. In Karmic, curl also fails to download files over https. In Lucid, curl, which must be manually installed from the repositories (old-releases.ubuntu.com), succeeds at downloading files over https.
  • *** In Ubuntu Precise Pangolin, wget fails to authenticate e.g. GitHub https URLs, but succeeds at authenticating Nodejs.org https URLs. For the purposes of the script, this is not a problem (we only need to access nodejs.org anyways.)
  • RHEL Server 7.7's "minimal" installation does not include rsync, so installing node with nvh fails at the last step (last step is to rsync from the cache folder to the "installed" folders). RHEL 7.7 works with gzip or xz up to the last (rsync) step. There are no packages repositories enabled by default, and no obvious public package repos anywhere (that I could find) for RHEL. So I gave up trying to install rsync. tj/n "develop" branch works fine with xz or gzip. Had to enable networking in RHEL manually after install, otherwise system was unable to reach the outside internet. (RHEL Server 7.7 does not have a command-line interface accessible on the install iso; it has no "LiveCD" environment. )
  • Debian Lenny works fine with gzip, but only after manually installing rsync. xz support is not available on Lenny.
  • SmartOS has its own tar. It either doesn't extract files that are piped in, or I can't figure out the syntax to do so. When nvh is allowed to download the tarball, and then have tar extract that as a local file, it works. See the man page for tar at smartos.com. Also, SmartOS has problems with https connections via wget/curl. Can use http as a workaround.
.

None of the operating systems/distros tested were adversely affected by the changes for this issue. (i.e. no OS had less ability to successfully download node with nvh after the changes for this issue, compared to the script as it is on the develop branch.)

Some distros failed to download node with nvh, but this was due to various things unrelated to the changes proposed for this issue. (These issues could be worked around with varying degrees of difficulty.)

For example: RHEL 7 was a bit wonky because it had no rsync installed. (unrelated to xz/this issue).

SmartOS (what Node means when they host tarballs for "sunos") was another case entirely. (Technically not Linux, though.) It rolled its own unique tar that I haven't seen anywhere else. This doesn't really work with piped in tarballs, or else I couldn't figure out the correct syntax to do so. But SmartOS does work with a minimal patch: DeeDeeG/nvh@develop...DeeDeeG:smartos

(Apparently SmartOS isn't intended to be used directly as a general-purpose OS; rather, you are supposed to install VMs and containers on it. But the tarballs for SmartOS are up at nodejs.org/dist, so I thought I'd test anyway.)

Would still want to test this on older macOS, in an ideal world. But I am not exactly sure if I will get the opportunity. I know you can make bootable USB installers for old macOS, so maybe I will try it in Terminal on the bootable installer. (Don't want to actually re-install macOS multiple times.)

Misc note: xz support was added to GNU tar in version 1.22, back in March 2009. Here's the changelog entry.


[Edited 8 September to correct/clarify results and explanatory notes for Karmic Koala.]

@DeeDeeG DeeDeeG closed this as completed Sep 3, 2019
@DeeDeeG DeeDeeG reopened this Sep 3, 2019
@DeeDeeG
Copy link
Contributor Author

DeeDeeG commented Sep 7, 2019

I was able to test on OS X. Here are my results:

Click to expand.
Operating System nvh status nvh status http tar version in fresh install latest tar in repos liblzma.dylib present on fresh install xz present in fresh install xz available in repos curl present on fresh install wget present on fresh install
OS X Snow Leopard (10.6) good (gzip) http not needed bsdtar 2.6.2 no apple repos no no N/A (but yes w/ brew) yes no
OS X Snow Leopard (10.6.8) good (gzip) http not needed bsdtar 2.6.2 no apple repos no no N/A (but yes w/ brew) yes no
OS X Mountain Lion (10.8.5) good (gzip) http not needed bsdtar 2.8.3 no apple repos no no N/A (but yes w/ brew) yes no
OS X Mavericks (10.9.5) good (gzip/xz) http not needed bsdtar 2.8.3 no apple repos yes no N/A (but yes w/ brew) yes no
OS X El Capitan (10.11.6) good (gzip/xz) http not needed bsdtar 2.8.3 no apple repos yes no N/A (but yes w/ brew) yes no
macOS Mojave (10.14.5) good (gzip/xz) http not needed bsdtar 2.8.3 no apple repos yes no N/A (but yes w/ brew) yes no

Legend:

  • "liblzma.dylib" refers to the file "/usr/lib/liblzma.dylib" (a symlink to "/usr/lib/liblzma.5.dylib") being present on macOS. This presumably derives from the same "XZ Utils" project as the "xz" executable does on other platforms.

Notes:

  • All testing (for versions less than OS X 10.11 "El Capitan") was done in VirtualBox, on a Mac host running OS X El Capitan.

OS Notes:

  • Mac OS X Snow Leopard was installed from an old installer CD on VirtualBox. Had to modify script to download only x64 images, since my install of OS X ended up being 32-bit. After that, nvh fundamentally worked fine. (Irrespective of 32/64-bit, there was no xz support in this OS's tar, so only gzip worked).
    • Technically, Snow Leopard 10.6.8 was an upgrade over a fresh install of 10.6, via the "Software Update" utility.
.

I found that Mavericks (OS X 10.9.x) can extract xz tarballs piped into tar.

So I believe the platform checks in place for macOS are correct.


[Edited 8 September to clarify that testing was done in VirtualBox, and that nvh was tested after installing to the hard disk, rather than testing "on the iso", or in the Terminal available on the installer image.]
[Updated again to add whether /usr/lib/liblzma.dylib was present on the system.]
[Edited 12 September to add OS X Mountain Lion]

@DeeDeeG
Copy link
Contributor Author

DeeDeeG commented Sep 7, 2019

I have confidence in these platform checks, but I admit there is increased complexity by trying to check the platform for xz support. It is a low impact problem if it's wrong and off-by-default, but it does become a big deal if it's wrong and on-by-default. Other version managers seem to have decided it was worth it.

I do think it works pretty well, but would like this to have a way to turn it off. A) just so people have control, and B) For the (seemingly unlikely) scenario where the platform checks are wrong. I feel that the environment variable, a command-line switch, or both would be adequate end-user control, which I have already made work in drafts posted to this issue.

@DeeDeeG
Copy link
Contributor Author

DeeDeeG commented Sep 7, 2019

This comment is mostly about whether to do checks that help us when the user has installed a non-platform-default tar... Which it would be pretty easy to decide NOT to do. But if we did want to do that, here's the relevant info. (Also discusses limits of my macOS platform check.)

Click to expand.

For what it's worth: Right now I only anticipate the platform checks to be wrong if the user has installed a non-platform-default tar that has no xz support.* I don't know why someone would do that, but... you never know?

And the other scenario is if Apple breaks my macOS version check. e.g. If they increment the major version ("10") of macOS, or release/switch to a new platform other than macOS, or stop having the sw_vers utility installed...


* (We could check for this by making sure Linux users' tar is GNU tar 1.22+, and that macOS users' tar is bsdtar 2.8.3 or above from libarchive... Unsure if we can check if macOS users have bsdtar linked against liblzma, but we could check if liblzma is present at /usr/lib/liblzma.dylib...)

Again, I am really unclear why someone would go through the trouble of installing non-platform-standard tar, and it feels a lot of extra gymnastics to go out of our way to check for that. But it would be doable. I can picture what the implementation looks like and could draft it up without much difficulty.


Review of prior art... Do they check which tar is being used?

NVS decided to check if local tar supports extracting xz. here and here in nvs.sh (but their check is probably wrong (false positive) for macOS 10.7-10.8; Would need an install disc for OS X Lion and Mountain Lion to confirm.)

NVM mostly doesn't do so. see here. (I note that they only platform check by whether xz is installed, which isn't used by tar on macOS, and is unrelated to successfully extracting xz with macOS's default tar, which is bsdtar.) They do check if the platform is AIX, and if so they use gtar instead of tar: here. Without being able to test on AIX, I can't dispute, nor confirm, whether this is correct. I do not attempt to address use of AIX in my implementations, since I can't test any of it.

@shadowspawn
Copy link
Owner

shadowspawn commented Sep 8, 2019

Some great research, and I love the Linux and macOS tables of tests.

Good work tracking down the release note for GNU tar adding xz support. Likewise, digging into the nvs and nvm checks.

@shadowspawn
Copy link
Owner

Not sure if you discovered it, but there is support for trying things across multiple docker containers (currently a hard-coded list in the script).

$ pwd
/Users/john/Documents/Sandpits/nvh/test
$ bin/for-each-container nvh lsr lts
archlinux
v10.16.3

centos
v10.16.3

centos-old
v10.16.3

ubuntu-curl
v10.16.3

ubuntu-old-curl
v10.16.3

ubuntu-wget
v10.16.3

ubuntu-old-wget
v10.16.3

Darwin
v10.16.3

@shadowspawn
Copy link
Owner

shadowspawn commented Sep 8, 2019

The https errors are presumably due to old certificates in the docker containers. The best fix is to update the certificates, like here:

or as you mentioned, configure curl or wget or nvh to be insecure:

nvh --insecure install lts

or change the protocol to http when defining NVH_NODE_MIRROR (I came across this approach when researching proxy support):

NVH_NODE_MIRROR=http://nodejs.org/dist nvh doctor

@shadowspawn
Copy link
Owner

Is xz-wip-2 still the best branch to comment on implementation?

@DeeDeeG
Copy link
Contributor Author

DeeDeeG commented Sep 8, 2019

Indeed, xz-wip-2 is the best.

I would also point out that running the xz-wip-debug-messy branch will have it narrate what it "thinks" it is doing, which I have found useful, and so I have actually been testing with that, but end results should be identical between both branches. [Edit to add: I did test the xz-wip-2 branch on some, but not all distros/OSes, just to be sure. It worked the same as debug-messy in those cases.] I made xz-wip-2 by "compiling" down the checks in debug-messy into non-redundant checks, and discarding the echoes. Again, they should pass/reject a system for xz with identical results (given identical inputs).

Last note: I had it not set to xz on by default, just to make sure I was awake and testing with/without xz on all platforms.

@DeeDeeG
Copy link
Contributor Author

DeeDeeG commented Sep 8, 2019

On second thought: If you have interest in a command-line option for xz, such as nvh --use-xz, those are available here, and are just about trivial to implement: DeeDeeG/nvh@xz-environment-variable...DeeDeeG:xz-commandline-argument

They are not in xz-wip-2.


Decided questions:

  • The basic concept "use xz" (in some way or other) seems decided already. (leaning yes)
  • Also it seems things are leaning toward "do platform checks." (leaning yes)
  • Last I read, things were leaning toward "xz feature is on by default" as well. (leaning yes)

Other Questions:

  • Develop more intense/"paranoid" platform checks? [yes/no]
    • I would lean "no", but open to it if the research results point to an elegant, "better" check
  • xz can be turned on/off by the end-user? [yes/no]
    • Method of end-user control? [environment variable/command-line option/both/neither]
      • For n it would probably be good practice to keep alive the N_USE_XZ method of enabling xz, regardless of what happens in this repo. Maybe consistency means at least NVH_USE_XZ should be read from environment here in this repo as well?

Research question:

  • Can xz be extracted properly on a fresh install of OS X Lion/Mountain Lion (10.7-10.8)?
    • Unsure but assuming "no". Install DVDs [Downloads from the Mac App Store] cost $20 USD each... hmm. Would be possible to update the checks to inlude these versions at a later date? (By assuming "no," nvh simply uses gzip instead on these OS versions, which works fine.)
    • In theory, the answer is no, since libarchive appears configured not to build with xz enabled, when looking over at https://opensource.apple.com/, credit to a StackExchange user for pointing this out.

@DeeDeeG
Copy link
Contributor Author

DeeDeeG commented Sep 8, 2019

Not sure if you discovered it, but there is support for trying things across multiple docker containers

I noticed this after doing much of the testing with LiveCDs in VirtualBox. It looks really convenient. (Probably a good candidate for CI such as Travis CI, if you wanted to run it on all Pull Requests, but I would equally understand leaving it to be run manually. Personal preferences and all.)

I had tried doing run-all-tests, but it didn't work for me at first with docker, so I carried on testing on Linux LiveCDs in VirtualBox.

(I realize now my docker setup requires running with sudo, e.g. sudo docker-compose, so that's why it wasn't working. I can edit the test runner scripts to use sudo docker-compose on my machine, and then it works.)

For the tests themselves, I am happy to report they all passed on xz-wip-2 equivalently to master branch.


Some tests didn't work for me outside of docker, so I wrote up the details of that below (in an expandable section of this comment).

Click to expand bug info
  1. The script sets NVH_PREFIX as follows:

export NVH_PREFIX="${TMPDIR}/nvh/test/run-which"

Whereas $TMPDIR is turning out to be undefined or empty. Outside of docker, I get errors for making a /nvh directory, "permission denied" (sudo required). Understandable, since it (inadvertently) tries to make a directory directly under root (/). (All docker commands run as root, so this is fine under docker. It just probably installs things to /nvh.)

 ✗ setupAll for run/which/exec # (2 installs)
   (from function `setup' in test file tests/run-which.bats, line 13)
     `nvh install --preserve 4.9.1' failed
   
     installing : node/v4.9.1
          mkdir : /nvh/test/run-which/nvh/versions/node/v4.9.1
   mkdir: cannot create directory ‘/nvh’: Permission denied
   
     Error: sudo required (or change ownership, or define NVH_PREFIX)
  1. TMPDIR is undefined or empty

I don't think TMPDIR is defined anywhere. When I do git grep -i TMPDIR in the root dir of this repo, I just get the one line from test/tests/run-which.bats. Perhaps it should be TMP_PREFIX_DIR which is used in some of the other test files?

  1. Need to run setup_tmp_prefix so TMP_PREFIX_DIR is defined (and a coresponding temporary directory is created).

when I ran git grep TMP, it appeared the variable used in other tests is TMP_PREFIX_DIR, rather than TMPDIR. And this appears to be set in those tests by calling setup_tmp_prefix, which seems to be from test/tests/shared-functions.bash.

So I added that to the run-which.bats test.

  1. install steps were being (conditionally) skipped
  if [[ "${BATS_TEST_NUMBER}" -eq 1 ]] ; then
    # Using --preserve to speed install, as only care about the cached versions.
    nvh install --preserve 4.9.1
    nvh install --preserve lts
  fi

Apparently, and perhaps because this is not the first test file being run, these installs are being conditionally skipped. I commented out the if and fi to make these installs unconditional.

.

I came up with this patch to make things work outside of docker (Ubuntu Linux in my case): DeeDeeG/nvh@develop...DeeDeeG:test-patch-for-linux-outside-of-docker Unclear if this defeats some optimization for use in docker, but it doesn't break the tests in docker, at least.

@DeeDeeG
Copy link
Contributor Author

DeeDeeG commented Sep 8, 2019

Extra nerdy research for Linux that I refrained from posting thus far (in favor of real-world tests, but this is how I knew which OS versions to test to get interesting results):

@DeeDeeG
Copy link
Contributor Author

DeeDeeG commented Sep 10, 2019

For ease of review, I prepared some updated implementations. They differ in how many ways the end-user can control which compression algorithm is used.

(Edit to add: Commits are separated neatly, so any of these features can easily be included in a final PR. Neither of these two extremes necessarily needs to be the final form of this feature.)

@DeeDeeG
Copy link
Contributor Author

DeeDeeG commented Sep 13, 2019

I was able to test on OS X Mountain Lion (10.8.5). I can confirm it does not support extracting xz tarballs. (This has been added to the results chart above.) That supports a macOS cutoff of 10.9 or above, as is present in my existing implementations for this issue.


For thoroughness' sake: Mountain Lion does not support extracting xz-compressed tarballs locally, via tar xf [tarball].tar.xz, nor with gunzip, in contrast with later releases that generally do support xz decompression in these ways. There is no sneaky or hidden xz/lzma support as far as I could tell.

Further details: OS X Mountain Lion does not have /usr/lib/liblzma.dylib, nor /usr/lib/liblzma.5.dylib. This means that the presence of liblzma.dylib has a 1:1 correlation with successfully extracting xz tarballs with system-native tar on macOS. That could be a more low-level, down-to-the-features test than testing against the macOS version. But this gates the same versions of macOS, (assuming no-one uses non-platform-native tar?) so both approaches are basically equivalent. (Furthermore, using xz is a compile-time feature of bsdtar, so if users were to download liblzma.dylib and plunk it in /usr/lib, system-native tar on macOS still would not be able to extract xz tarballs.) The only way to extract xz with tar on old macOS appears to be to install newer tar, && either liblzma or xz as appropriate.


I suppose we could test for this:

([tar is GNU tar 1.22+] && [system has `xz` on the path]) || ([tar is bsdtar 2.7+] && [the file /usr/lib/liblzma.dylib exists])

But again, I suspect this will gate the same systems, and it should be roughly equivalent. This might make it easier to support platforms other than Linux/macOS/SmartOS/AIX, but the tarballs are only prebuilt for those platforms anyway. So IMO being more obvious that we are targeting certain distros/releases of Linux and macOS makes the code more readable and perhaps more to-the-point. I like a principle of code-what-you-mean, so that the code is more readable, and to discourage expansive scope creep of the given project. Just my personal thoughts on this.

And FWIW, neatly version-testing tar would take several more lines in can_use_xz(), as far as I can picture it.


Edit: here's an implementation with tar version checks instead of OS checks:
DeeDeeG/nvh@develop...DeeDeeG:xz-with-tar-version-check

Not really OS-neutral for macOS, since I don't believe other platforms ship libraries as .dylibs. The GNU tar check is, in theory, OS-neutral and generally platform-neutral. But again, GNU tar tends to mean Linux, and Linux tends to mean GNU tar, so...

I do note that the homebrew repos offer both GNU tar and xz, so someone could go out o their way to get GNU tar and xz, and make that the default system tar, but that is about the only scenario where this would help.

  • Pro: Incrementally more technically-correct and precise implementation
  • Con: Half of the lines in can_use_xz(), in this implementaion, are dedicated to parsing/checking tar versions. And the function looks about twice as many lines. I could probably write it more compactly if I gave it a second go.

@shadowspawn
Copy link
Owner

(I have been busy on another project. I'll try and take a look at this soon. Thanks for al the investigations!)

@shadowspawn
Copy link
Owner

shadowspawn commented Sep 17, 2019

TMPDIR. Oops, that is a bug!

  • (my) macOS has it defined and I did not notice was not defined in containers
  • as you say, docker runs as root by default so it still worked (I have wondered about writing some non-root tests)
  • using TMP_PREFIX_DIR as a replacement changes value for every test and TMPDIR is meant to be stable so can be reused across tests (hence the BATS_TEST_NUMBER issue)
  • just using /tmp might work everywhere we care about (I have read about robust ways of getting temp dir in past but forgotten)

@shadowspawn
Copy link
Owner

shadowspawn commented Sep 17, 2019

Implementation feedback

Yes, also leaning towards auto-detection for xz rather than just manual.
Yes, also prefer OS check over tar check.

I think an env override is worthwhile. I don't feel a command line switch is high value but the implementation is tidy, and more findable when hit problems, so ok with keeping that too.

I do not want short flag of -xz. Short flags are technical a single character, and in standard command line processing this expands to -x -z. Like when you do ls -al.

NVH_USE_XZ is currently true/blank/undefined. Rather than blank I would prefer "0" or "false" (pick one, not suggesting support both), and then forcing on would be defined and non-zero or non-false respectively. I do not have a feeling or whether 0 of false is more common. I have slight preference for false because it is more readable, but NO_COLOR and CLICOLOR both use zero, and false is actually a string and not a boolean as such. Perhaps 0 is better?
Edit: leaning towards "0" for forcing off rather than "false". (Partly because I was originally confused when working with n code as I assumed false was the command or a built-in rather than just a string!)

Rather than testing for xz url using is_ok, I would prefer to test target node version to avoid the network call. I feel a bit guilty for suggesting performance over robustness, but want to minimise the network calls where reasonable.

Ask questions if any of these seem dubious or unclear, I might have misunderstood. :-)

@DeeDeeG
Copy link
Contributor Author

DeeDeeG commented Sep 17, 2019

Responding to feedback, then will work on an implementation:

Yes, also leaning towards auto-detection for xz rather than just manual.
Yes, also prefer OS check over tar check.

I think an env override is worthwhile. I don't feel a command line switch is high value but the implementation is tidy, and more findable when hit problems, so ok with keeping that too.

I do not want short flag of -xz. [ . . . ]

✔️ Agreed on these points. Thank you.

NVH_USE_XZ is currently true/blank/undefined. Rather than blank I would prefer "0" or "false" [ . . . ]

Okay. This is certainly doable, if a little less compact/neat in the setup phase. Will try a proof-of-concept for this in my next iteration. Wrote up a bunch below on why alternatives are more complicated than parameter expansion, but this is all theory, and it's probably not a huge deal in practice. (I think we need to use if AND test, to check truthiness and non-null, respectively.)

Lots of thoughts about truthiness/non-null in bash, and what to do about it in nvh (click to expand)
Aside (some background info) about test and [ (click to expand)

test, aka [ is a little weird. (test, aka [, is an actual command, not a bash built-in. See man [ and man test which are both aliases for each-other, and share a man entry.)

Perhaps un-intuitively, the command [ -z "string" ] (or test -z "string") only tests for zero-length of a string. It does not actually test for logical zero, nor logical truthiness/falsiness in general. Why? Because test never checks truthiness in the expected way; it treats all input as text strings. The folloing are all non-null, (non-empty strings), so test evaluates them to true:

  • [ true ]
  • [ false ]
  • [ 0 ]
  • [ null ]

The only thing test evaluates to "false" is the empty argument:

  • [ ]
.

Other than parameter expansion without the colon, e.g. "${var-true}", I don't know of a way to check (with a single command/builtin) for three states of a single variable (unset,set-null,set-nonempty) in bash. We can chain if and test, though. We can do if "${var}" to evaluate truthiness/falsiness. So we could do if ! "${var}" && [ -n "${var}" ] to check logical falsiness (evaluates 0 as false) and non-zero length (distinguishes between null and the string "0").

Aside: why recognizing three states is important

For this variable, we do need to preserve the three states.

  • "set and something falsy (currently null)" is requesting xz off.
  • "set and true-ish" (currently just any non-blank string) is requesting it on (pending platform checks).
  • "unset" means we register no preference from the user.

IMO having three states is important, in case someone modifies the script to have xz off by default. The environment variable and/or command-line switch should still both do what they say on the tin. (If this is not important, we can simplify, by setting xz on by default, and no-op for requests to "turn on xz").

.

I find the parameter expansion that handles "set-null/set-true/unset" is the most compact way to present these three options to the later parts of the script. This is also backward-compatible with the existing implementation with n. The n implementation is currently like this:

  • empty or unset is requesting xz off. (However, xz is off by default, so the brief "set xz" code paths basically no-op in this case, leaving the default gzip implementation to do its thing. We can back-port "0 means force off" to n if you like.)
  • "set and true-ish" (just checks that N_USE_XZ is non-blank) is requesting/forcing it on.

So yeah, IMO as long as n and nvh eventually match, I am on board. I will see how tidy I can make a "unset/0/true" branching logic, rather than "unset/set-null/set-true".

.

Rather than testing for xz url using is_ok, I would prefer to test target node version to avoid the network call. I feel a bit guilty for suggesting performance over robustness, but want to minimize the network calls where reasonable.

This is a subjective one and I leave it as your call in the end. Cost-benefitting this out, from my perspective:

  • Draft implementation increases the most-common code path from having one invocation of is_ok to two, thus increasing network calls, as you point out.
    • This is only a HEAD (headers-only) network request, so it is a very small size over the wire both ways, but for links with long round-trip-time, delay could be significant.
    • Minimal impact on links with low round-trip-time to mirror.
  • Draft implementation is robust to Node.js possibly dropping .tar.gz tarballs for some platforms (Proposal: remove .gz downloads for some platforms nodejs/build#1584)
    • Version checks would not be robust to this.
    • Seems somewhat unlikely they will drop .tar.gz, so this may be only a theoretical advantage

I would propose this as a potential solution:

  • Drop is_ok check from can_use_xz()
  • Rely on the is_ok from install()
    • If this fails, and NVH_USE_XZ is true, try setting NVH_USE_XZ to "0"/"false" and doing is_ok again.

This proposal doesn't sacrifice any robustness, and moves the dual network request onto a rarer code path. (Still easy to test, e.g. by attempting to download an older version of node, such as 0.8.)

Related, but a mild tangent: If version checking rather than using is_ok for the xz tarball URL, I would like to version check for version "3" and up, since io.js had xz on both Linux and macOS by that version. (io.js forked node.js and immediately incremented the version to 1.0, then node.js took back over and immediately incremented the version to 4.0, so the version scheme and development of features are relatively continuous and unidirectional, despite the forking.)

@DeeDeeG
Copy link
Contributor Author

DeeDeeG commented Sep 19, 2019

Example implementation with true/false for NVH_USE_XZ: DeeDeeG/nvh@develop...DeeDeeG:xz-pr-candidate-with-feedback-true-false


I'm inclined to go with "true/false" for this feature at the moment, if only to match the rest of the script's internals, but I suppose "1/0" would also be okay. (The internals are a code style/developer consideration, but what we expect users to set their environment variables to is also a UI/UX consideration.)

Might not need the case for normalizing input now, since we can tell users to set the variable to "true/false" (or "1/0"), and other values would simply not be supported. (This would be different from the way it is in tj/n now, though, where "null" is implied to be meaningful. Maybe the existing implementation there should be deprecated/replaced with one that matches what is decided on here? I doubt if anyone has started using it there yet?)


Aside about performance of "full-fat string comparisons" (click to expand)

As for "full-fat string comparisons"... For some reason it's been drilled into my head that "string comparisons are bad" because they're supposedly "very slow." I tried changing NVH_USE_XZ to be "true/false" or "1/0" and it didn't seem slower to me than "empty/not-empty", really. Maybe it's slower if you do it thousands of times in a loop, but this isn't thousands of times in a loop. And I haven't even tested that. Just tested it, and 10,000 full string comparisons ([ "${var}" = "true" ]) and 10000 string-non-null checks ([ -n "${var}" ]) both take about 200 milliseconds, give or take 20 milliseconds. integer comparison checking the variable against "0" seemed consistently a bit slower??? But even so, performance impact is quite negligible. Conclusion: Seems like the people who said string comparisons are slow were wrong. At least for bash. And I suspect the advice is outdated, as the CPUs have gone from MHz to GHz... A few more instructions doesn't mean a lot. (And it's not like we're compiling down to assembly/machine code where these optimizations are more likely to manifest, I guess? Not a big compiled language person at the moment.)

@shadowspawn
Copy link
Owner

I had something like this in mind. Note, this code fragment is not passing in the target version to can_use_xz as I wrote it looking at one of your branches without a version check!

# User may set NVH_USE_XZ to 0 to disable, or set to anything else to enable, or undefined for automatic.
# Normalise to true/false.
if [[ "${NVH_USE_XZ}" = "0" ]]; then
  NVH_USE_XZ="false"
elif [[ -n "${NVH_USE_XZ+defined}" ]]; then
  NVH_USE_XZ="true"
elif can_use_xz(); then
  NVH_USE_XZ="true"
else
  NVH_USE_XZ="false"
fi

# May be overridden by command line flags.

@shadowspawn
Copy link
Owner

Style: I am using [[ x ]] rather than [ x ] or test x, per:

https://google.github.io/styleguide/shell.xml?showone=Test,_%5B_and_%5B%5B#Test,_%5B_and_%5B%5B

@shadowspawn
Copy link
Owner

I think you have all the pieces I would want, scattered among the branches! Should I try and sort out the pieces so you can put together a PR?

@DeeDeeG
Copy link
Contributor Author

DeeDeeG commented Sep 19, 2019

Sure, that sounds good. It's getting into the smaller details now where I think a PR makes sense.

@DeeDeeG DeeDeeG closed this as completed Sep 19, 2019
@DeeDeeG DeeDeeG reopened this Sep 19, 2019
@DeeDeeG
Copy link
Contributor Author

DeeDeeG commented Sep 19, 2019

Re: this comment (which suggests using a single if-else[...] statement to handle NVH_USE_XZ, with "xz off, xz on, and auto xz" as the three code paths)... Looks pretty reasonable.


Unclear if the "xz on" code path should "force" xz on, i.e. skip the minimum node version check (or xz URL is_ok check). (As opposed to the "Auto" code path, which could check more rigorously that xz works before settling on using xz.)

Pros:

  • If we go with a minimum node version check for xz, users could override this to download versions 0.10.42+ and 0.12.10+ for node.js, 1.0+ (Linux) and 2.3.2+ (macOS) from iojs.org/dist
  • If we go with a backup gzip is_ok after failed xz is_ok, users could override this and skip the second network request
  • Makes nvh easier to use with forks/mirrors of node with custom version schemes? (Do these forks exist? Would anyone need this particular use-case with nvh?)

Con:

  • Increased script complexity (for a tiny feature that IMO seems somewhat unlikely to be used)

Related: If we don't distinguish between "forced on" and "auto on" later in the script, the if-else statement could potentially be simplified a bit. (But I suppose it is a little less fork-friendly to remove the distinction, since that would mean it's a tiny bit more work to turn xz off by default in a fork. (Is fork-friendliness a priority? Do people fork nvh/n often?))

@DeeDeeG
Copy link
Contributor Author

DeeDeeG commented Sep 21, 2019

Re: #8 (comment) again: I just noticed we can't use the function can_use_xz() in the setup phase, at least without refactoring, because the setup phase precedes all the function definitions.

Working on putting together a PR.

@shadowspawn
Copy link
Owner

Unclear if the "xz on" code path should "force" xz on, i.e. skip the minimum node version check

Interesting question, and I have been thinking about that. I started out thinking that command line should override all checks, but have changed my mind.

I think it is reasonable that the version check is independent of the auto check. The auto check is looking at the system support for xz, and can be overridden by user due to preference or incorrect or incomplete detection. The version check is a limitation in availability of that archive format. (While in theory there could be mirrors with different combinations of archive formats, only worth revisiting when proves an actual use case.)

@shadowspawn
Copy link
Owner

Makes nvh easier to use with forks/mirrors of node with custom version schemes?

Not a consideration at this time.

On a related note, there was some support in n for using with other "products" by defining environment variables. I never found an actual use of this, and removed the support in nvh early on and now gone from n too.

@shadowspawn
Copy link
Owner

shadowspawn commented Sep 21, 2019

(Is fork-friendliness a priority? Do people fork nvh/n often?)

Simple answer is no. I don't see issues reported from people having issues integrating with forks, including from the recent major changes.

Focus on making the code simple and robust for our use case. 😄

@DeeDeeG
Copy link
Contributor Author

DeeDeeG commented Sep 21, 2019

On a related note, there was some support in n for using with other "products" by defining environment variables. I never found an actual use of this, and removed the support in nvh early on and now gone from n too.

In other words, trying to get io.js doesn't really work anymore? (because the "product name" would be "iojs" rather than "node" or something like that?)

I am trying setting iojs.org as the custom mirror at the moment, but it fails with this:
Error: download preflight failed for v2.5.0 (https://iojs.org/dist/v2.5.0/node-v2.5.0-linux-x64.tar.gz)

whereas the proper url would have v[num]/iojs-v[num]-[os]-[arch].[ext]

@shadowspawn
Copy link
Owner

A musing, rather than a suggestion. If the internal and external formats for a variable are different, then may be appropriate to use a different name. For example if externally NVH_USE_XZ is undefined/0/defined and internal we want a variable which is true/false, then might be cleaner to use different variable name.

It is convenient to reuse when supplying a default and doing minor normalising, like with NVH_NODE_MIRROR.

@shadowspawn
Copy link
Owner

In other words, trying to get io.js doesn't really work anymore? (because the "product name" would be "iojs" rather than "node" or something like that?)

It was more a general comment that io in particular. There used to be explicit support for io with variations on a number of the routines, separate from the generic "product" support. The io support got removed too as extra complication and obsolete.

DeeDeeG added a commit to DeeDeeG/nvh that referenced this issue Sep 21, 2019
Adapted from this comment:
shadowspawn#8 (comment)

Co-authored-by: John Gee <john@ruru.gen.nz>
DeeDeeG added a commit to DeeDeeG/nvh that referenced this issue Sep 21, 2019
Skips node version checks, which would otherwise gate xz usage.

In other words:
Overrides any checks that would normally be run,
so that nvh always attempts to use xz.

---

Partly adapted from this comment (with liberties taken on implementation):
shadowspawn#8 (comment)

Co-authored-by: John Gee <john@ruru.gen.nz>

---

Also updates "--use-xz" to not set the "xz forced on" state.
(This matched the flag's description in the help messages.)
DeeDeeG added a commit to DeeDeeG/nvh that referenced this issue Sep 22, 2019
Adapted from this comment:
shadowspawn#8 (comment)

Co-authored-by: John Gee <john@ruru.gen.nz>
DeeDeeG added a commit to DeeDeeG/nvh that referenced this issue Sep 22, 2019
Skips node version checks, which would otherwise gate xz usage.

In other words:
Overrides any checks that would normally be run,
so that nvh always attempts to use xz.

---

Partly adapted from this comment (with liberties taken on implementation):
shadowspawn#8 (comment)

Co-authored-by: John Gee <john@ruru.gen.nz>

---

Also updates "--use-xz" to not set the "xz forced on" state.
(This matches the flag's description in the help messages.)
DeeDeeG added a commit to DeeDeeG/nvh that referenced this issue Sep 22, 2019
Adapted from this comment:
shadowspawn#8 (comment)

Co-authored-by: John Gee <john@ruru.gen.nz>
DeeDeeG added a commit to DeeDeeG/nvh that referenced this issue Sep 22, 2019
Skips node version checks, which would otherwise gate xz usage.

In other words:
Overrides any checks that would normally be run,
so that nvh always attempts to use xz.

---

Partly adapted from this comment (with liberties taken on implementation):
shadowspawn#8 (comment)

Co-authored-by: John Gee <john@ruru.gen.nz>

---

Also updates "--use-xz" to not set the "xz forced on" state.
(This matches the flag's description in the help messages.)
DeeDeeG added a commit to DeeDeeG/nvh that referenced this issue Sep 22, 2019
Adapted from this comment:
shadowspawn#8 (comment)

Co-authored-by: John Gee <john@ruru.gen.nz>
DeeDeeG added a commit to DeeDeeG/nvh that referenced this issue Sep 22, 2019
Skips node version checks, which would otherwise gate xz usage.

In other words:
Overrides any checks that would normally be run,
so that nvh always attempts to use xz.

---

Also updates "--use-xz" to not set the "xz forced on" state.
(This matches the flag's description in the help messages.)

---

Partly adapted from this comment (with liberties taken on implementation):
shadowspawn#8 (comment)

Co-authored-by: John Gee <john@ruru.gen.nz>
@shadowspawn
Copy link
Owner

Preference for xz downloads, if detected, released in nvh 9.0.0

Thank you for your contributions.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants