Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open file leak with file uploads with wsgi server in daemon mode #427

Closed
epeisach opened this issue Mar 3, 2021 · 6 comments
Closed

Open file leak with file uploads with wsgi server in daemon mode #427

epeisach opened this issue Mar 3, 2021 · 6 comments

Comments

@epeisach
Copy link

epeisach commented Mar 3, 2021

Python 3.8.6, webob 1.8.7

Normally run from mod_wsgi, but displays similar issue
Scripts are attached.

Need to upload a file > 8K to demonstrate the issue. Essentially, webob results in open but deleted files left open when application goes out of scope.

Minimal server code looks like:

`
def application(environment, responseApplication):

myRequest  = Request(environment)

myResponse = Response()
myResponse.status       = '200 OK'
myResponse.content_type = 'text/html'       

print("Handle request")

p = myRequest.params
return myResponse(environment,responseApplication)

`
Without the line p=... - not file descriptor leak.
Why does this matter - files create in temp - and never released.

Test client - requires requests - python test_upload.py

Run one of two servers: as mod_wsgi-express start-server minimal.wsgi or python simple_server.py

mod_wsgi results in three processes and simple_server.py

In both cases - if you look at /proc/####/fd you will see something like:
lrwx------. 1 peisach annotator 64 Mar 3 14:00 0 -> /dev/pts/9 lrwx------. 1 peisach annotator 64 Mar 3 14:00 1 -> /dev/pts/9 lrwx------. 1 peisach annotator 64 Mar 3 14:00 11 -> anon_inode:[eventpoll] lrwx------. 1 peisach annotator 64 Mar 3 14:00 12 -> /tmp/#151170102 (deleted) <------ lrwx------. 1 peisach annotator 64 Mar 3 14:00 13 -> /tmp/#151170105 (deleted) <------ l-wx------. 1 peisach annotator 64 Mar 3 13:59 2 -> /tmp/mod_wsgi-localhost:8000:50070/error_log lrwx------. 1 peisach annotator 64 Mar 3 14:00 3 -> socket:[724118505] lr-x------. 1 peisach annotator 64 Mar 3 14:00 4 -> pipe:[724106754] lr-x------. 1 peisach annotator 64 Mar 3 14:00 5 -> pipe:[724118518] l-wx------. 1 peisach annotator 64 Mar 3 14:00 6 -> pipe:[724118518] lrwx------. 1 peisach annotator 64 Mar 3 14:00 7 -> socket:[724118520] l-wx------. 1 peisach annotator 64 Mar 3 14:00 8 -> pipe:[724106754]

If you look at the file through /proc/###/fd/12 -- you will find the FieldStorage header + file, the other is just the file.

I think the issue is in MultiDict.from_fieldstorage(). -- when filename is present, simply copying the class is a shallow copy - and I suspect has messed up reference counts of lower structures. I tried using copy.copy(field) - which works in this simple case - but if you try to read the file - it is closed - so that is not the full solution.

issue.tar.gz

@stevepiercy
Copy link
Member

@epeisach I am pretty sure that the maintainers will not download the .tar file. Would you please create a repo or gist so that maintainers can view the code?

Also the list of your procs is not readable. Can you reformat that by enclosing it in triple backticks, instead of single?

Making it easier for contributors to respond will help resolve this issue sooner. Thank you!

@digitalresistor
Copy link
Member

When you call .params on a Request (which is generally a bad idea since it mixes GET/POST together) it will parse the POST request. When the FieldStorage is garbage collected, the file will be closed (since Python 3.4). Python doesn't have a guaranteed time that the garbage collection completes and calls the destructor for the FieldStorage, which will close the file.

The entire thing is cached in the environment and will live as long as the environment is alive (this avoids the need to re-parse the uploaded content multiple times if you call myRequest.POST multiple times in your code).

The MultiDict I don't believe is doing a shallow copy, it is directly getting a reference to the original field inside the top-level cgi.FieldStorage, which keeps the originally opened file alive for the lifetime of the MultiDict, using copy.copy(field) will not keep the original file alive since it is doing a copy (shallow actually, see https://docs.python.org/3/library/copy.html#copy.copy), and then the original FieldStorage is no longer referenced and thus gets garbage collected.

See __del__ on the cgi.FieldStorage class: https://github.com/python/cpython/blob/master/Lib/cgi.py#L484

If you don't want to leave those files hanging around until the garbage collector comes around to collecting the garbage you will have to manually loop over all of the fields, check if its a file, and then call .close() on it manually.

There may be a finalizer that gets added to WebOb so that there can be a place where a programmer can say "clean up all caches/all resources WebOb created". If this is a real leak, please show a growing amount of temporary files with a growing amount of file descriptors in use.

@digitalresistor
Copy link
Member

Related: #415

@epeisach
Copy link
Author

epeisach commented Mar 6, 2021

So - garbage collection issues. I will reformat my example. I was limited in what I could upload - but I will make the examples inline here.

I am able to handle the cleanup.

@digitalresistor
Copy link
Member

@epeisach most likely no issues at all, python garbage collection is non-deterministic as to when it runs. If you run hundreds of requests against your example does the amount of open file descriptors continue to increase?

@digitalresistor
Copy link
Member

Closing this issue in favor of #415

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants