Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing option to override temp location would be a great enhancement #172

Open
scottporter opened this issue Oct 16, 2020 · 13 comments
Open
Labels
feature a feature request or enhancement

Comments

@scottporter
Copy link

Our server cluster is set up so each server has a small /tmp partition. When I need to save out large temporary files, I do so by manually specifying another partition when I create the temporary files. I don't see any facility in callr for specifying a different location other than the one returned by tempdir(). If I understand correctly, I would have to start R by specifying a different location either in .Renviron or using an environment variable. Our R setup works fine for our regular use, but I started testing out callr using future.callr and quickly brought a server to its knees and incurred the wrath of our sys admins. With some convincing I can get the setup of our cluster changed, but it would be nice to have control over the temp location callr uses from inside R (instead of having to set it before R starts). However, not sure if that's even possible based on how each callr session starts.

@gaborcsardi
Copy link
Member

That seems reasonable to me. OTOH the advantage of an environment variable is that it is inherited in subprocesses by default, whereas an option will not be set in a subprocess. If you set an option in .Rprofile, then you might as well set TMPDIR (or a callr specific env var) in .Renviron, no?

Here is a workaround to change the temporary directory from within the session. It works from R 3.5:

> tempdir()
[1] "/var/folders/59/0gkmw1yj2w7bf2dfc3jznv5w0000gn/T//RtmpUI1O6O"
> Sys.setenv(TMPDIR = "/Users/gaborcsardi")
> unlink(tempdir(), recursive = TRUE)
> tempdir(check = TRUE)
[1] "/Users/gaborcsardi/RtmpXaziuR"
> tempdir()
[1] "/Users/gaborcsardi/RtmpXaziuR"

@gaborcsardi gaborcsardi added the feature a feature request or enhancement label Dec 18, 2020
@gaborcsardi
Copy link
Member

@scottporter So, how about setting TMPDIR? Is there anything wrong with that?

@scottporter
Copy link
Author

Just noticed the response. I'll give the workaround a try and give feedback.

@scottporter
Copy link
Author

scottporter commented Dec 23, 2020

At least on Linux, changing the temp location the way you've laid out doesn't work (I tried R 3.4.1 and R 3.5.3) Here is the log from R 3.5.3.

> tempdir()
[1] "/tmp/Rtmpi9Icnx"
> Sys.setenv(TMPDIR = tools::file_path_as_absolute("~/tmp/rcall"))
> unlink(tempdir(), recursive = TRUE)
> tempdir(check=TRUE)
[1] "/tmp/Rtmpi9Icnx"
> tempdir()
[1] "/tmp/Rtmpi9Icnx"
> 

Setting the environment variable in the session is too late for the tempdir... it's already been specified. I also tried running my process to see if, even though it doesn't reflect properly here, if that environment variable would get picked up by the callr sessions... but no such luck.

So, the only workaround that I have found is adding the environment variable to my ~.Renviron file.

@gaborcsardi
Copy link
Member

gaborcsardi commented Dec 23, 2020

You have to remove the old temp dir first, like here: #172 (comment)

But setting it in .Renviron is completely fine as well.

EDIT: now you edited and added the unlink() line, but with that I am pretty sure that it works, assuming the new TMPDIR exists. This is Linux and R 3.5.3:

> tempdir()
[1] "/tmp/RtmpEpCIDc"
> newtmp <- "~/tmp/rcall"
> dir.create(newtmp, recursive = TRUE)
> Sys.setenv(TMPDIR = tools::file_path_as_absolute(newtmp))
> unlink(tempdir(), recursive = TRUE)
> tempdir(check=TRUE)
[1] "/root/tmp/rcall/RtmpmGFIia"

@scottporter
Copy link
Author

scottporter commented Dec 23, 2020

My edit was because I tried it again, adding the unlink, and got the same result. However, I tried your code above, and it worked. I'm not sure what I did wrong last time.

Thanks.

R version 3.5.3 (2019-03-11) -- "Great Truth"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
Type 'q()' to quit R.

Memory limits on this session set with `unix::rlimit_as`
Soft limit: 1e+10 

Completed loading customized R settings from Rprofile.site

##------ [/sasdata/uat/rs/sporter/repos/method/crest_u21] Wed Dec 23 14:41:17 2020 ------##
> tempdir()
[1] "/tmp/RtmpHPRloQ"
> newtmp <- "~/tmp/rcall"
> dir.create(newtmp, recursive = TRUE)
Warning message:
In dir.create(newtmp, recursive = TRUE) :
  '/users/sporter/tmp/rcall' already exists
> Sys.setenv(TMPDIR = tools::file_path_as_absolute(newtmp))
> unlink(tempdir(), recursive = TRUE)
> tempdir(check=TRUE)
[1] "/users/sporter/tmp/rcall/RtmpB0piE4"

@scottporter
Copy link
Author

I'm guessing that last time I accidentally ran it on R 3.4.1. I have both that and R 3.5.3 installed on my server cluster. I don't think this workaround works for R 3.4.1, which is why I ended up getting so confused.

@gaborcsardi
Copy link
Member

gaborcsardi commented Dec 23, 2020 via email

@scottporter
Copy link
Author

Thanks again.

@gaborcsardi
Copy link
Member

Since there is an adequate workaround for this, I am going to close it.

@richarddmorey
Copy link

Apologies for posting on an old-ish issue, but I thought it was most relevant here.

I've tried this workaround for a problem I'm having where the R tempdir() (apparently?) gets removed by a child process, and as a result callr fails. The workaround here fails because it still depends on tempdir. I've reported this here: rexyai/RestRserve#174 (comment) (see also https://stat.ethz.ch/pipermail/r-devel/2017-February/073748.html) and I've managed to work around it by manually setting working folders for packages such as cachem that allow it, but it seems that callr does not.

When I try to recreate a tempdir using check=TRUE, a new tempdir is created, but callr still points back to the old one:

td = tempdir()
td
# [1] "/var/folders/59/r7wjy0gn6yv59w19694x5gy80000gp/T//RtmpQNELAf"
callr::r(function(){ 2+2 })
# [1] 4

Deleting the temp folder causes callr::r to fail, as expected:

unlink(td, recursive = TRUE)
callr::r(function(){ 2+2 })
# Error in file(con, "wb") : cannot open the connection
# In addition: Warning message:
# In file(con, "wb") :
  cannot open file '/var/folders/59/r7wjy0gn6yv59w19694x5gy80000gp/T//Rtmp1IhdbV/callr-client--f520c18.so': No such file or directory

Now recreate a tempdir() and try again:

tempdir(check = TRUE)
# [1] "/var/folders/59/r7wjy0gn6yv59w19694x5gy80000gp/T//Rtmp8wyQAC"
callr::r(function(){ 2+2 })
# Error in file(con, "wb") : cannot open the connection
# In addition: Warning message:
# In file(con, "wb") :
  cannot open file '/var/folders/59/r7wjy0gn6yv59w19694x5gy80000gp/T//Rtmp1IhdbV/callr-client--f520c18.so': No such file or directory

Honestly, I wish I knew what how the tempdirs were being deleted and how to fix that, but the I can't get the problem to reliably occur. It seems like relying on tempdir() can cause issues so a workaround would be appreciated.

@gaborcsardi
Copy link
Member

gaborcsardi commented Mar 8, 2023

Can you set the TMPDIR env var before starting R, e.g. in a shell, to a place other than the default "/tmp, so the temporary file cleaning processes of the system do not delete tempdirs of long running R processes?

@richarddmorey
Copy link

Can you set the TMPDIR env var before starting R, e.g. in a shell, to a place other than the default "/tmp, so the temporary file cleaning processes of the system do not delete tempdirs of long running R processes?

Yes; in my application, this is unsufficient to avoid the error. Something (I think maybe when a child process exits) is deleting the folder while the main process is running, I think.

In my .Renviron I have:

TMPDIR = /Users/saprm3/tmp/

and sometimes, when callr::r is run, I get (e.g.):

In file(con, "wb") :
  cannot open file '/Users/saprm3/tmp//RtmpcYeSpd/callr-client--f520c18.so': No such file or directory

and RtmpcYeSpd no longer exists (though the main R process is still running).

But this is annoying because I can't replicate it reliably enough to pin down what is deleting the tempdir. That's why it would be useful to use a folder that doesn't rely on tempdir().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants