Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Figure out a way to use tempfile.NamedTemporaryFile on Windows #2743

Closed
fmassa opened this issue Oct 2, 2020 · 18 comments · Fixed by #2768
Closed

Figure out a way to use tempfile.NamedTemporaryFile on Windows #2743

fmassa opened this issue Oct 2, 2020 · 18 comments · Fixed by #2768

Comments

@fmassa
Copy link
Member

fmassa commented Oct 2, 2020

Using tempfile.NamedTemporaryFile or TemporaryDirectory on Windows is very brittle, and a Google search returns multiple users facing similar issues.

We obtain errors on Windows such as

PermissionError: [WinError 5] Access is denied: 'C:\\Users\\circleci\\AppData\\Local\\Temp\\tmpuwm1pkc1\\test1'

and from there it was a rabbithole.

The latest example of it was in #2728 (comment), where in the end I gave up on using tempfile and instead went for saving the temporary files on an existing directory.

This is a recurring issue in torchvision (many tests are disabled on Windows because of this), and on PyTorch as well, so it would be good to have a canonical solution for this problem.

cc @peterjc123 @nbcsm @guyang3532 @maxluk @gunandrose4u @smartcat2010 @mszhanyi

@pmeier
Copy link
Collaborator

pmeier commented Oct 2, 2020

I always use

def onerror(func, path, exc_info):
    """Error handler for ``shutil.rmtree``.

    If the error is due to an access error (read only file)
    it attempts to add write permission and then retries.

    If the error is for another reason it re-raises the error.

    Usage : ``shutil.rmtree(path, onerror=onerror)``
    """
    if not os.access(path, os.W_OK):
        # Is the error an access error ?
        os.chmod(path, stat.S_IWUSR)
        func(path)
    else:
        raise

from here. Could you try this and see if it works for your tests?

@fmassa
Copy link
Member Author

fmassa commented Oct 2, 2020

I tried that in the past, see 3038c21

I have a guess of why this wasn't working now -- I didn't add a file extension, and maybe that's what was breaking it all on Windows.

I'll report back if this is indeed the case, and then close this issue if that fixes it


This didn't work. The problem doesn't seem to be specific to TemporaryFile or temporary directories.

@peterjc123
Copy link
Contributor

peterjc123 commented Oct 2, 2020

@fmassa According to my experience, the message "permission failed" here doesn't actually refer to a problem related to access/rights. Instead, it is complaining about sth else like you couldn't delete a file/directory if it is still in use, or you couldn't open a file twice with different acess rights, which makes tempfile not so useful under Windows.
@fmassa Do you think it is a good idea to add similar functions to PyTorch? I could implement one using the Windows API if you want. But actually, it won't be too easy because of the aforementioned limitations(especially for a temp file).

@fmassa
Copy link
Member Author

fmassa commented Oct 2, 2020

@peterjc123 it looks like the issue is in our code, either the stat call in C++ or on from_file, following 2d60d28

Do you have any idea on what this might be?

@jamt9000
Copy link
Contributor

jamt9000 commented Oct 2, 2020

Related: #1662

@jamt9000
Copy link
Contributor

jamt9000 commented Oct 5, 2020

If you del the returned tensor before unlinking does that work?

Since CloseHandle(handle_); may be called in the THMapAllocator destructor (although that wouldn't make sense with the expected values of the flags...)

https://github.com/pytorch/pytorch/blob/235f62417d9d66b8f0bd62db385e8eba14a6c1ed/aten/src/TH/THAllocator.cpp#L361

Edit:

In fact it seems more likely that the MapViewOfFile path is being taken, which will be released on destruction with UnmapViewOfFile

Windows docs say: Although an application may close the file handle used to create a file mapping object, the system holds the corresponding file open until the last view of the file is unmapped.

@fmassa
Copy link
Member Author

fmassa commented Oct 6, 2020

Thanks a lot @jamt9000 , your comment was spot on!

Deleting the tensor fixes the issue on Windows.

test/test_image.py::ImageTester::test_read_file PASSED                   [ 29%]

@fmassa
Copy link
Member Author

fmassa commented Oct 6, 2020

The root cause of the issue that sparkled this issue was fixed via #2743 (comment) , so this is less of a priority for now. Using get_tmp_dir (or TemporaryDirectory probably works as well) should be fine.

@peterjc123
Copy link
Contributor

peterjc123 commented Oct 6, 2020

It seems that I have looked into this before and made the PR here for PyTorch but it gets rejected. Actually, it is okay to delete the file while it's open if all file read/write requests to the file are opened with FILE_SHARE_DELETE. The challenge is that how to determine whether a file is a temporary one or not.

@peterjc123
Copy link
Contributor

@fmassa @jamt9000 If you have any good suggestion for this, please don't hesitate to tell me. If we got a good idea, we should send the patch to PyTorch.

@fmassa
Copy link
Member Author

fmassa commented Oct 6, 2020

@peterjc123 I might be missing something, but given that from_file doesnt' lazily load the contents of the file in memory (contrary to numpy.memmap, could we close all references to the file upon returning it?

A quick test in #2763 seems to indicate that numpy handles this case correctly.

@peterjc123
Copy link
Contributor

@fmassa The problem is that torch::from_file is using THMapAllocator which uses CreateFileMapping & MapViewOfFile. We should not automatically release those resources unless the allocator is destroyed because we don't know when we finish reading the file. I guess to implement what you want, we should use ReadFile instead.

@jamt9000
Copy link
Contributor

jamt9000 commented Oct 7, 2020

I guess a workaround would be for the torchvision read_file wrapper to clone the tensor from torch::from_file() and delete the original (at the expense of an extra copy).

@fmassa
Copy link
Member Author

fmassa commented Oct 7, 2020

@jamt9000 yes, I considered that as a workaround as well. It would be a bit slower and double the amount of memory used, but might be better.
We could even add an #ifdef _WIN32 so that this only affects Windows, I would be ok with that.

Would you be willing to send a PR with this workaround?

@peterjc123
Copy link
Contributor

Another workground is mentioned in pytorch/pytorch#45962, that is, to create a context manager to delete the tensors automatically.

@fmassa
Copy link
Member Author

fmassa commented Oct 7, 2020

@peterjc123 I would prefer to avoid the extra context manager, as it might have some weird side effects (does it delete all tensors created within the context managers, or only those created via from_file ?)

I wonder what would be the drawbacks of not using file mapping, and instead reading the content of the file in memory, if shared=False ?

@peterjc123
Copy link
Contributor

File mapping is great for random IO. Also, it is more efficient because all the data are mapped to a virtual address so it is benefiting from the system caching. But if you just read a file from head to tail, I don't see too many differences it would make.

@fmassa
Copy link
Member Author

fmassa commented Oct 7, 2020

@peterjc123 Hum, ok, so maybe the system caching could be beneficial.

My first thought was that given that we don't expose anything in PyTorch similar to numpy.memmap, which allows us to benefit from only reading partially the file, there wouldn't be much benefits from file mapping, but I missed what you just mentioned about the system caching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants