Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot delete files after vroom #351

Closed
MyKo101 opened this issue Jul 2, 2021 · 1 comment
Closed

Cannot delete files after vroom #351

MyKo101 opened this issue Jul 2, 2021 · 1 comment

Comments

@MyKo101
Copy link

MyKo101 commented Jul 2, 2021

I have ran into the issue where I cannot delete files once accessed by vroom::vroom()

This has been discussed in two previous issues (#177 and #280), but these do not seem to resolve the problem as in some cases, the link to the file is still open.

Take this example:

write.csv(data.frame(x=1:3,y=c("a","b","c")),"test.csv",row.names=FALSE)
df ​<- vroom::vroom("test.csv")
ps::ps_open_files()

This returns a list of the currently opened files (as described in #280). Materializing the object closes this connection

vroom:::vroom_materialize(df,FALSE)
ps::ps_open_files()

However, this seems to require explicit materialization as merely manipulating the data does not close the connection. Restarting R before the following still shows the connection after mutate():

df ​<- vroom::vroom("test.csv")
dplyr::mutate(df,z=paste(x,y))
ps::ps_open_files()

It also persists if we delete the object (again restarting R):

df <- vroom::vroom("test.csv")
rm(df)
ps::ps_open_files()

And it stays if we do not save the object (R restart again to be certain), although figuring out if an object is being stored or not is an entirely different problem:

vroom::vroom("test.csv")
ps::ps_open_files()

Other than materializing an object (obviously not ideal for large files) or restarting R, how do we close this connection? It would be useful to have a vroom_delete() function.

@jimhester
Copy link
Collaborator

You cannot delete the file until all the data has been read from it, e.g. it must be fully materialized if you want to delete it.

If you element-wise access vroom has no way to know which parts of the file has been read, this would require tracking the accesses using occupancy map or similar idea (e.g. #101).

If you expect to be able to remove the file after reading I would suggest you use altrep = FALSE when reading the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants