Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XDG: Broken cabal installation in macos 20240105 GHA runners #62

Closed
jasagredo opened this issue Jan 10, 2024 · 26 comments
Closed

XDG: Broken cabal installation in macos 20240105 GHA runners #62

jasagredo opened this issue Jan 10, 2024 · 26 comments

Comments

@jasagredo
Copy link

These macos runners come with existing XDG directories for cabal. Since the start of the runner, the following files are present:

  • .config/cabal/config
  • .cache/cabal/packages/...

In this haskell-actions/setup there is a deliberate choice (decided in haskell/actions#210) to create ~/.cabal/bin (here) in order to disable XDG, however, cabal will still think it is using XDG because of the existence of ~/.config/cabal. See this logic.

I do not know which one should be the path forward:

  1. This action relocates/deletes those files in the MacOS runners if they exist.
  2. This action deletes those files in all runners if they exist to disable XDG always.
  3. Users of this action should take care of this when using MacOS runners.
  4. The MacOS runners should not create those files in the first place.
  5. The logic in Cabal should be made more accurate to even deal with cases where ~/.cabal and ~/.config/cabal/config exist but not ~/.cabal/config (as is happening here).

The 20240105 image is still very new, only 9.79% of rollout progress as I'm writing this, but this still has resulted in CI pipelines failing for example in Cabal itself. 1 2 and more if you look at the Validate workflow.

@andreasabel andreasabel changed the title Broken cabal installation in macos 20240105 GHA runners XDG: Broken cabal installation in macos 20240105 GHA runners Jan 11, 2024
@hasufell
Copy link

I told you it's not a good idea to enable/disable XDG based on directory existence :)

@Mikolaj
Copy link

Mikolaj commented Jan 11, 2024

There is one more alternative, due to

If the CABAL_DIR environment variable is set, the indicated directory will be used to store all Cabal-related files, as in previous versions.

(e.g., see haskell/actions#210). So, a user of this action could set this variable and be safe.

I vote for all of above. I don't see a reason the choice should be made by the [edit: images or the] runner and when it is, let's revert it early.

@andreasabel
Copy link
Member

There is one more alternative, due to

If the CABAL_DIR environment variable is set, the indicated directory will be used to store all Cabal-related files, as in previous versions.

So maybe simply the action should set CABAL_DIR at the start to point to ~/.cabal and then XDG would be totally disabled?

@Mikolaj
Copy link

Mikolaj commented Jan 12, 2024

So maybe simply the action should set CABAL_DIR at the start to point to ~/.cabal and then XDG would be totally disabled?

I think @jasagredo proposed that and it sounds fine. Then, if people want to switch their Haskell CI to XDG, creating the XDG directory won't work, they'd need to set the CABAL_DIR accordingly. A bit more of a hassle, but less likely to be done by accident, either by the CI users or by the container image creators.

@hasufell
Copy link

Doesn't seem like a good idea to set this variable. It'll cause more confusion.

@Mikolaj
Copy link

Mikolaj commented Jan 12, 2024

Doesn't seem like a good idea to set this variable. It'll cause more confusion.

What kind of confusion? By people that already switched their CI to XDG and it would switch their setup back? Are there any such brave souls? Or in the future, if the cabal manual does not mention the variable prominently enough (does it?)?

@hasufell
Copy link

What kind of confusion?

This changes the behavior of the action.

What is the interplay with users setting the env var somewhere in the action? Does the action pick it up or overwrite it? Will it now detect the store-dir properly?

https://github.com/haskell-actions/setup/blob/b1dc5f1786a382e9a2532d039e8c53cf56f21392/src/setup-haskell.ts#L77C1-L77C1

What is the interplay with users not setting the env var in the action?

How do users switch to XDG or away from XDG?


Avoid pre-setting more user-exposed env vars. Have sane defaults. Don't make decisions based on directory existence.

@andreasabel
Copy link
Member

One problem we are facing with the XDG migration is that users might have hardwired the path ~/.cabal/store into their caching step. Failure of caching does not trigger failure of the action, so users might for a long time not notice that their caching of dependencies is non-functional when we switch this action to XDG based on a pre-installed XDG-cabal.

The goal is thus to force cabal away from XDG.

I don't see a way to achieving this goal (in our multi-cabal context) without setting CABAL_DIR or removing the XDG cabal directories, but I am happy to learn otherwise.

Of course, forcing cabal to use the old directory structure means that we hardcode it here in the action, which is an ugly thing to do.

@hasufell if you have an idea how to do it otherwise, I'd happy to learn, but please be concrete, as I am failing to understand how you envision this to work atm.

@hasufell
Copy link

hasufell commented Jan 12, 2024

One problem we are facing with the XDG migration is that users might have hardwired the path ~/.cabal/store into their caching step. Failure of caching does not trigger failure of the action, so users might for a long time not notice that their caching of dependencies is non-functional when we switch this action to XDG based on a pre-installed XDG-cabal.

I don't think it's this actions responsibility to keep up with every breakage that cabal or ghc cause.

That said, I think the current approach here isn't proper either:

const storeDir =
process.platform === 'win32'
? 'C:\\sr'
: `${process.env.HOME}/.cabal/store`;
fs.appendFileSync(configFile, `store-dir: ${storeDir}${EOL}`);
core.setOutput('cabal-store', storeDir);

I would:

  1. stop creating any files or folders on unix and macOS
  2. set CABAL_DIR on windows only (I'm hoping cabal doesn't try to do XDG there)
  3. re-implement the logic to return/detect store-dir (I know cabal doesn't have a proper interface, but it can be done)
  4. tell users to use cabal-store for their caching instead of hardcoding paths?
  5. comment on cabal issue tracker that the current XDG logic was a mistake

@Mikolaj
Copy link

Mikolaj commented Jan 16, 2024

I'm absolutely fine to handle step 5 and announce anywhere and everywhere I've made a mistake. However, we still need volunteers to handle the other tasks, which includes a design discussion for store-dir, revisiting the XDG design discussion with the original author and others than took part in it and agreed on the design, the actual implementations, transition plan, etc. I don't think this is going to be ready any time soon.

For now, I'm guessing, haskell-actions still requires a way to work with the current XDG implementation and prevent both the caching problem Andreas mentions and the CI breakage we just experienced. Making sure (adding checks? adding rm -rf?) the XDG directories are not passed to the user is one way. Setting CABAL_DIR is another. I don't have an informed opinion on which is better.

Whatever the decision, I don't recommend migrating haskell-actions to XDG in the near future. The original plan was that the details of XDG are subject to change and that we count on feedback from brave users [edit: or new users that don't care, since XDG is the default for new setups] enabling it for their particular projects, not from tools or systems that'd move many users to XDG at once and at unawares. As expected, unforeseen problems emerge and that may not be the end of them.

@athas
Copy link

athas commented Jan 16, 2024

One way to simplify this is to remove the automatic backwards compatibility code in cabal and always use XDG unless CABAL_DIR is set.

@hasufell
Copy link

One way to simplify this is to remove the automatic backwards compatibility code in cabal and always use XDG unless CABAL_DIR is set.

This is going to cause even more problems. Please don't.

@athas
Copy link

athas commented Jan 16, 2024

I suggest setting CABAL_DIR=$HOME/.cabal. This will make cabal behave as it did before the XDG changes (including sensitivity to users further changing CABAL_DIR somewhere in their action). Using XDG can work fine in CI, but carries no advantages.

@hasufell
Copy link

Please open an issue at https://github.com/actions/runner-images/issues and investigate why XDG is enabled automatically.

It's possible that it has to do with brew doing shenanigans. The actual macOS ghcup provisioning does not enable XDG: https://github.com/actions/runner-images/blob/main/images/macos/scripts/build/install-haskell.sh

Maybe related: actions/runner-images#8778

@athas
Copy link

athas commented Jan 16, 2024

If something runs cabal update along the way, that'd explain why the config file is created. I don't see that in those scripts, but I don't really understand how the infrastructure works.

@hasufell
Copy link

hasufell commented Jan 16, 2024

If something runs cabal update along the way

That's likely the installation of yq then, which depends on pandoc and hence ghc/cabal:


edit: well, scratch that, of course ghcup also runs cabal update: https://github.com/haskell/ghcup-hs/blob/55030d83da6d59196df28c0eb115cdf19ff20393/scripts/bootstrap/bootstrap-haskell#L823

@athas
Copy link

athas commented Jan 16, 2024

That's true, but Homebrew resets $HOME and such to a temporary directory when building a formula. It should not have any effect on the user's real home directory and cabal state. It certainly didn't when I (re-)wrote Homebrew's Haskell infrastructure a few years ago, but I no longer have a macOS system to test it. Homebrew's sandboxing during building is quite crude, and there may actually be a risk that the XDG change in cabal broke it.

Edit: oh yeah, that ghcup command is definitely the cause.

@hasufell
Copy link

hasufell commented Jan 16, 2024

oh yeah, that ghcup command is definitely the cause

What is the proper way of disabling XDG during cabal update (unless it's already enabled)?

@athas
Copy link

athas commented Jan 16, 2024

Same as any other cabal command: set CABAL_DIR or point CABAL_CONFIG to a config file that sets all directories explicitly.

@hasufell
Copy link

Same as any other cabal command: set CABAL_DIR or point CABAL_CONFIG to a config file that sets all directories explicitly.

Those are both poor options, imo.

I've decided to attempt this: haskell/ghcup-hs#977

If that works correctly, then the github runners will fix themselves in the next deployment. It might make sense to add a test-case though to check that XDG is indeed disabled: https://github.com/actions/runner-images/blob/main/images/macos/scripts/tests/Haskell.Tests.ps1

@hasufell
Copy link

hasufell commented Jan 16, 2024

So it was probably the recommended tag bump from 3.6.x.x to 3.10.x.x that lead to this whole thing, because previously the bootstrap script would install 3.6 and then run cabal update with that old version.

@athas
Copy link

athas commented Jan 16, 2024

That seems like it has the exact same behaviour as setting CABAL_DIR=$HOME/.cabal. Am I missing something?

@hasufell
Copy link

That seems like it has the exact same behaviour as setting CABAL_DIR=$HOME/.cabal. Am I missing something?

The difference is that I don't set CABAL_DIR.

@hasufell
Copy link

haskell/ghcup-hs#977 has been merged and deployed, which means the next time the github runners are deployed, they should pick up the change (which is every 2 weeks or so).

@andreasabel
Copy link
Member

I have now also been bitten by this: https://github.com/haskell/alex/actions/runs/7680499933/job/20932609305?pr=257

Image: macos-12
Version: 20240119.1

erikd pushed a commit to erikd/cabal that referenced this issue Apr 22, 2024
@jasagredo
Copy link
Author

I'm closing this issue as not relevant anymore as MacOS no longer is shipped with Haskell (see actions/runner-images#9242 (comment)).

@jasagredo jasagredo closed this as not planned Won't fix, can't repro, duplicate, stale Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants