Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid writing large amounts of data to system temp directory #996

Closed
mgsloan opened this issue Sep 16, 2015 · 19 comments
Closed

Avoid writing large amounts of data to system temp directory #996

mgsloan opened this issue Sep 16, 2015 · 19 comments

Comments

@mgsloan
Copy link
Contributor

mgsloan commented Sep 16, 2015

Reported by joseph07 in #haskell. stack setup failed due to not having enough space on the device, despite there being 38GB. According to pdxleif, archlinux defaults to using tmpfs, which stores it in ram, and so is limited in size.

Here's where a system temp directory is used as a place to unpack the GHC tarball:

withSystemTempDirectory "stack-setup" $ \root' -> do

It'd also likely be worthwhile to take a look at other usages of the system temp directory. On one hand, it seems odd to have a system configuration that has such limited space in tmp. On the other hand, I see few downsides to instead storing such temp directories somewhere in ~/.stack (and the big upside of resolving this issue)

@LeifW
Copy link

LeifW commented Sep 16, 2015

Their article says tmpfs on Arch defaults to having its max size be half total RAM. https://wiki.archlinux.org/index.php/Tmpfs#Examples

@snoyberg
Copy link
Contributor

The downside to not using a proper temp directory is that if the process fails halfway through, the disk space is never reclaimed.

@LeifW
Copy link

LeifW commented Sep 16, 2015

If that's a concern, what about using say https://hackage.haskell.org/package/temporary-1.2.0.3/docs/System-IO-Temp.html#v:withSystemTempDirectory ? Or are you referring to the Haskell process crashing?

@snoyberg
Copy link
Contributor

I'm referring to the Haskell process crashing (or system simply shutting down). And that's the function we're using already; the problem is Arch's setup not providing enough disk space in the temp directory.

@LeifW
Copy link

LeifW commented Sep 16, 2015

I think other distros using systemd will have /tmp be tmpfs, too (edit: Sorry, apparently that's not the case. e.g. I don't think RedHat / CentOS have this; maybe just Arch and CentOS?). E.g. Fedora since version 18. http://fedoraproject.org/wiki/Features/tmp-on-tmpfs

@mgsloan
Copy link
Contributor Author

mgsloan commented Sep 16, 2015

The downside to not using a proper temp directory is that if the process fails halfway through, the disk space is never reclaimed.

I figured we'd have stack check for that and free it up. This way, the only case where disk space is never reclaimed is if stack isn't run again. Could get a bit tricky with concurrent stack executions, though..

@LeifW
Copy link

LeifW commented Sep 16, 2015

That Fedora page mentions a workaround: getTemporaryDirectory respects $TMPDIR, so one could override the default of /tmp on systems that have issues with TMPDIR=/var/tmp stack ... That Fedora page suggests applications needing to write large files to /tmp should use /var/tmp instead, or maybe "XDG user-dir's Download directory". It sounds like there's a cron job cleaning out /var/tmp, too?

If indeed /var/tmp is supposed to become some standard alternative to /tmp for large files; it makes one think there ought to be similar env vars for referencing it, and a corresponding method in System.Directory for getting its value in some cross-platform way?

@mgsloan
Copy link
Contributor Author

mgsloan commented Sep 17, 2015

I've just noticed that there are a couple other related issues:

#623

#841 (this one mentions the TMPDIR workaround)

@LeifW
Copy link

LeifW commented Sep 17, 2015

http://fedoraproject.org/wiki/Features/tmp-on-tmpfs#Detailed_Description says /tmp on tmpfs is becoming more common.
I was thinking it would go away on its own as the average system gradually got more memory, but I forgot about VPS's. Also, I'm wondering when that space is reclaimed - filling up your RAM with large files would be unhelpful while you're running a compiler.

Not sure of a cross-platform way of specifying /var/tmp; seems like that functionality might belong in System.Directory, anyways. Perhaps we could just check if it exists. Proposed behaviour: Check for $TMPDIR (to still allow override of location), check if /var/tmp exists (and is writeable), and finally fall back to getTemporaryDirectory (/tmp).

Though, articles I read seem to dissuade /tmp and /var/tmp (especially when not being used to communicate between processes) due to security concerns, and suggest something like XDG user dirs (or somewhere in ~/.stack seems perfect for that).

@harendra-kumar
Copy link
Collaborator

I just ran into this issue. I have a VM setup where the root fs is mounted readonly and (therefore) /tmp is mounted as tmpfs with limited amount of space available. There is sufficient space available in /var/tmp though.

I know of at least one big installation where many systems are booted from the same image which is mounted over NFS. In such cases /tmp is mounted as a RAM file system. Though usually in enterprise settings the available RAM is usually enough to provide sufficient space but I guess there may be cases where /tmp is limited in space. However /var/tmp should usually have enough space. Preferring /var/tmp over /tmp might be a good idea.

@rrnewton
Copy link
Contributor

Unfortunately, ~/.stack is usually NFS for any of our user accounts and thus a lot slower than /tmp/...

It sounds like there's a strong need for a portable way of finding a tmp directory that's on local disk, not in memory, and not on the network.

@borsboom
Copy link
Contributor

There doesn't seem to be any perfect solution to this, but I think writing big temporary files under ~/.stack is going to be the best default.

Advantages:

  • /tmp on tmpfs is increasingly common
  • modern practise is /tmp and /var/tmp recommended only for inter-process communication
  • no portable solution to find the "best" temporary location for every case, so let's go with the simplest option

Disadvantages:

  • Will be slower if $HOME is on NFS, but this will only effect stack setup so it won't be a drag on every-day performance
  • Users with a small quota on $HOME may have trouble, but they already do since ~/.stack gets big fast, so this will only make it happen a bit sooner

We must ensure any error messages about disk space are clear and offer workarounds.

@borsboom borsboom modified the milestones: P2: Should, P1: Must Nov 30, 2015
@harendra-kumar
Copy link
Collaborator

Sounds good.

Can we first check the amount of space available in a given volume before we try another one? We can try /tmp, $HOME, /var/tmp in that order based on the amount of space available. That way we will be able to try our best and bail out only if it is not at all possible to install. Space check will also allow us to provide a way to gracefully exit with an error message rather than trying and running out of space.

@mgsloan
Copy link
Contributor Author

mgsloan commented Jan 3, 2016

Another reason to manage our own tmp directory is that we can make the process invocations in the verbose log copy+pasteable. Currently, some commands don't work due to the temporary files being eagerly deletd. See the first point of #1596 for more info.

@maxzinkus
Copy link

Note that you need an absolute path e.g. TMPDIR=/home/user/tmp in some instances

@mgsloan
Copy link
Contributor Author

mgsloan commented Aug 9, 2016

I've fixed this! The downside is that now there will be dirs leftover that the user will need to cleanup (unless they try to do setup again). This is even the case where stack gets to handle the exceptions, etc. The reason for this is that configure errors + etc say things like Seeconfig.log' for more details` - it's convenient for the user to be able to manually run the command:

2016-08-08-192859_583x115_scrot

@mgsloan mgsloan closed this as completed Aug 9, 2016
mgsloan added a commit that referenced this issue Aug 9, 2016
Also gives a good error message letting you know that directories now
exist which won't be used by stack
mgsloan added a commit that referenced this issue Aug 9, 2016
+ remove readInNull utility. I think "exitFailure" should be mentioned
upfront.
mgsloan added a commit that referenced this issue Aug 9, 2016
Also gives a good error message letting you know that directories now
exist which won't be used by stack

Also removes readInNull utility. I think "exitFailure" should be
mentioned upfront.
@ruuda
Copy link
Contributor

ruuda commented Nov 5, 2016

I am now experiencing the exact opposite issue: stack setup fails because there is no space left on the disk that contains $LOCALAPPDATA\Programs\stack, but I have other disks with plenty of space. My home directory is on a small SSD that contains only my operating system and a few critical files. Unfortunately it is not possible on Windows to move the home directory after installation, so I am stuck with $LOCALAPPDATA\Programs\stack living on the small disk.

I tried the following:

  • Set $STACK_ROOT to be on a different disk. This works fine for most stuff, the index and snapshots are stored there, but not the download.
  • Set $TMP to a directory on a different disk, as indicated in the FAQ. However, Stack still downloads to $LOCALAPPDATA\Programs\stack.
  • As a hack, I tried to set $LOCALAPPDATA to a custom temporary directory when running stack setup. This allowed the download to complete, but then Stack went on to actually install GHC in that location, not in $STACK_ROOT.

To be able to support both the original scenario in this issue and the case of a small (or slow for that matter, as mentioned before here too) home directory, it would be nice if the download location were configurable. As there is TMPDIR already, would that be a good way to override the default download location?

And shouldn’t GHC be installed to $STACK_ROOT?

@mgsloan
Copy link
Contributor Author

mgsloan commented Nov 7, 2016

@ruuda That is a known issue, contribution appreciated! #1644

ruuda added a commit to ruuda/stack that referenced this issue Nov 8, 2016
As far as I can git grep, Stack does not use TMP or TEMP or TEMPDIR any
more. As of commercialhaskell#996, stack setup does not download to the system temp
directory any more.
@ruuda
Copy link
Contributor

ruuda commented Nov 8, 2016

@mgsloan: Thanks for pointing me in the right direction. I opened #2766.

ruuda added a commit to ruuda/stack that referenced this issue Nov 26, 2016
As far as I can git grep, Stack does not use TMP or TEMP or TEMPDIR any
more. As of commercialhaskell#996, stack setup does not download to the system temp
directory any more.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants