Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore `Denied behaviour of PATH resolution #4265

Merged
merged 8 commits into from
Jul 21, 2020

Conversation

dra27
Copy link
Member

@dra27 dra27 commented Jul 8, 2020

Follow-on from #4072, which alas is evidence that absence of evidence is not evidence of absence 🙂

This PR alters OpamSystem.t_resolve_command so that `Not_found is only returned if name was not present in of the directories of PATH. If name is present in at least one directory, but all the matches fail the permissions check, then `Denied is now returned instead, which more accurately mirrors the previous behaviour (and is also consistent with the shell's behaviour).

This is a draft PR, as sadly this re-opens my need to fix the Cygwin's permission check...

While working on this, I also noticed that check_perms was not using the correct set of uids/gids - it now correctly uses the EUID and also ensures that EGID is included in the list of groups considered (since getgroups is not required to return the EGID).

The fix to the behaviour restores the breakage which we'd seen on Cygwin CI since late September last year. I chased down what was broken in January, but erroneously put fixing it properly on the backburner as CI started working when #4072 was merged.

The history is mildly interesting, and yields another cautionary tale for our testing:

  • The fundamental error being seen was "cmd": permission denied. on the Cygwin32 build during initial switch creation. This was coming from the command in OpamSysPoll.os_version_lazy failing the permissions check in OpamSystem.t_resolve_command. As we didn't think we'd changed anything, I initially assumed that Cygwin had changed the way the ACL handling was working, but that was incorrect (it hasn't changed since at least Cygwin 1.7, which is a very long time ago). It turns out that 2 packages from issuu/ocaml-protoc-plugin at 0.9 opam-repository#14908 innocently created the first opam package in opam-repository to use os-version in an available field which means that determining the set of available packages will now always force that variable. This was merged on 26 Sep 2019, and you can then see the error appearing in our AppVeyor logs for all PRs and branches after this time. One can test with the previous SHA opam init git+https://github.com/ocaml/opam-repository.git#3303f6 and opam built at that time. However, depexts integration  #3975 then cemented the issue further - there have been several packages (correctly) using os-version in depext so in fact you can only get the pre 26 Sep 2019 behaviour by running opam init git+https://github.com/ocaml/opam-repository.git#3303f6 --no-depexts!
  • My conclusion from this is that, as with the locking of opam-repository for the Lwt test, our CI should always use a known SHA for opam-repository. The bug has always been there, but bug reports should be logged in issues, not by completely blocking our CI for 3 months! I have done this by generalising the SHA already used for the LWT test on Travis to have OPAM_REPO_SHA defined for both Travis and AppVeyor and all opam init instructions now use the opam-repository Git repo. The only downside is that it means Travis and AppVeyor no longer test the HTML backend - if this is a concern, then I guess we could mirror opam-repository in order to have an index.tar.gz at our own SHA?
  • That explains both what the error is and why we unexpectedly started seeing it. The actual fault is to do with Windows ACLs mapping to POSIX ACLs in Cygwin. Windows is unencumbered by the limited POSIX permissions model, but Cygwin has to map Windows ACLs onto something which look Unix-like. Its solution for many years for non-Cygwin files has been to use the file's owner for both st_uid and st_gid and then use the Everyone well-known SID for other permissions. Cygwin then puts the remaining Windows ACL entries as Posix 1003.1e entries. Herein lies the problem: %COMSPEC% has this ACL:
    • ALL APPLICATION PACKAGES: rx
    • ALL RESTRICTED APP PACKAGES: rx
    • SYSTEM: rx
    • BUILTIN\Administrators: rx
    • BUILTIN\Users: rx
    • NT SERVICE\TrustedInstaller: rwx ("Full Control") - this is also the Owner
      Therefore:
$ ls -l /cygdrive/c/Windows/system32/cmd.exe
-rwxr-x---+ 2 NT SERVICE+TrustedInstaller NT SERVICE+TrustedInstaller 236032 Nov 14  2019 /cygdrive/c/Windows/system32/cmd.exe

Is entirely expected. The execute permission cannot be determined from stat, therefore, but only by checking the ACL.

  • Which is what this PR now does. POSIX 1003.1e is an example of everything that's truly dreadful about Unix being completely flawed, a draft, and - though abandoned - universally partially implemented on just about all Unixes. Fortunately, at this time, we only need to follow Cygwin's example which, in this instance, follows the draft precisely and includes the bizarrely-omitted functions missing from the spec (who would ever want to determine if a permission is present?!), making its support equivalent to Linux. I have therefore simplified this by just adding --with-libacl which requires libacl-like support rather than --enable-acl, which would be the more complete "do whatever is necessary to get ACL support on this platform" (the horrific detail of how you get a cross-platform version of the so-niche-that-they-didn't-include-it-in-the-draft function acl_get_perm can be seen in libarchive's sources). On Cygwin, libacl support is required (i.e. --with-libacl is the default), on all other platforms it must be explicitly selected. You can explicitly disable libacl support on Cygwin with --without-libacl but your opam won't work properly. The support appears to work OK on Ubuntu (where it correctly detects the need for -lacl) and Cygwin (where it correctly detects that no linker options are required).

The final entertainment is that while Cygwin's libacl works correctly on 64-bit, it turns out that there's a bug affecting two of the functions on 32-bit Cygwin! My fix to that bug has been merged upstream but, until Cygwin 3.1.7 is released, AppVeyor is switched to use a developer snapshot of the Cygwin DLL (as it happens, this fix is the only new thing in that snapshot), I'll keep an eye on Cygwin releases and update AppVeyor after the next Cygwin release. If used on Cygwin 3.1.6 and earlier, the ACL simply fails to see the execute permission and you get the "Permission denied" error, as before.

@dra27 dra27 added this to the 2.1.0~alpha2 milestone Jul 8, 2020
@dra27
Copy link
Member Author

dra27 commented Jul 8, 2020

Excellent, our old friend the "cmd": permission denied error has returned! (https://ci.appveyor.com/project/dra27/opam/builds/33984887/job/893wh8rrka4chiq5#L4353)

@dra27 dra27 force-pushed the break-cygwin-again branch 6 times, most recently from 614344d to 0921640 Compare July 13, 2020 13:28
@dra27 dra27 marked this pull request as ready for review July 13, 2020 13:42
@AltGr
Copy link
Member

AltGr commented Jul 13, 2020

Wow! Impressive detective work, and I agree with your recommendations about testing

When resolving a command in PATH, `Not_found should be returned only if
the command was not in all the directories listed in PATH. If the
command was found in at least one, but none of the commands found were
executable, then `Denied should be returned instead.
A principal does not get the union of user, group and other permissions
- you only ever get one part of the mask.
Permissions checking is based on effective UID. It is undefined whether
getgroups includes the effective GID, so include it in the list
explicitly.
libacl is only mandatory for Cygwin and must be explicitly selected for
other platforms. For now, expressly binding to libacl in order to have
acl_get_perm, rather than adding general ACL support (see libarchive for
the full horrendous details of how to do this portably otherwise).
Need 3.1.7 when it's released.
@rjbou
Copy link
Collaborator

rjbou commented Jul 21, 2020

Rebased & fix upgrade job (no hash needed for opam 1.2.2 init)

@dra27
Copy link
Member Author

dra27 commented Jul 21, 2020

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants