-
-
Notifications
You must be signed in to change notification settings - Fork 797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforced data size in channels 2.1.7 prevent large file uploads #1242
Conversation
Okay, awesome. I’m not going to get to this until next week now BUT, your breakdown is exactly what I was half-thinking , so if you’d like to sketch that out too, that would be great. (First draft doesn’t have to be perfect.) |
e93a5de
to
3e448c2
Compare
@carltongibson I think I got a working fix, it took less time than expected :D I've added another test case, (multipart + file data + too large POST data) just to ensure we've got that right too. Let me know what you think of it. Basically:
|
I tested on my project and it indeed fix the denied file upload issue, by the way! |
Why didn't you decide to set DATA_UPLOAD_MAX_MEMORY_SIZE=None in your project which should disable this check at all, or why did you decide not to increase your DATA_UPLOAD_MAX_MEMORY_SIZE ? |
@jpic as explained in #1240, there is a bug in the current implementation. As stated in Django's documentation about
I do want to benefit from this protection in my project, I just don't want to have it applied to file uploads, which is the behaviour implemented in Django. |
Nice, thanks @EliotBerriot. So, your implementation works for me in the sense that it does what it means to do. However, I'm wondering if it's not the occasion to move that check "upon access" as it is it Django, rather than keeping it in the constructor. I mean, having a too big request should prevent me to read it with RequestTooBig, but should it really prevent me from instantiating a request object at all ? Sorry if it's out of the scope of your PR which does fix a bug. |
@jpic oh, I understand what you mean. Well that's really an API design decision. I don't see anything inherently wrong with the current approach. In terms of security/resource usage, failing upon instanciation is probably the best choice. Now, if we need to instanciate a big request, it could indeed be impractical. I trying to think of a few situations/use cases where we'd need to do that, but cannot find any right now :) |
@jpic I've pushed a commit that factorize the boundary / multipart request thing, let me know what you think about it :) |
+ b'Content-Disposition: form-data; name="pdf"; filename="book.pdf"\r\n\r\n' | ||
+ b"FAKEPDFBYTESGOHERETHISISREALLYLONGBUTNOTUSEDTOCOMPUTETHESIZEOFTHEREQUEST" | ||
+ END_BOUNDARY | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if body is the same as above, would it work to define a module or class level variable ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpic I can, but my own experience is that we usually don't need the same level of factorization in tests and in runtime code, and I tend to avoid shared state between tests.
Tests should be readable, and independant. If we try to factorize everything, it makes it really harder to understand and maintain tests. Quick example: it took me literally 30 seconds to write those tests, because I could read above tests and copy-paste the code, and edit it as needed.
It would have take me way more time if I had to jump between 5 constants, understand if each matched my needs. Then figure if I should create a new one for my test (which may be slightly different), or edit the one in place (which could break other tests). What if someone edit that contstant in the future? They would need to understand their tests, and mine, to avoid breaking anything.
I like that you can screenshot/copy-paste a test and understand what's going on. It's far more important than factorization IMHO, which quickly become a nightmare with huge codebases and 1000s of tests.
That being said, I'm not maintaining this project, and I'll happily move the whole body in a constant if you think it's the best thing :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I can live with anything, it was just my advice because I like to refactor tests after writing them, let's see what others think about it ;)
Grats for your contribution and have a great day !
Hey there, what are we doing about this? :) |
Hi @EliotBerriot did you see #1240 (comment)? |
Hi @carltongibson :) I did, but I don't really know what this mean for the current PR, hence my question. Do we close this? Do we merge this? Do I need to amend this PR? I understand the need to refactor how channels handle request body. I still think there was a backward incompatible change and a non-standard behaviour (according to django's documentation) with the 2.1.7 release, and in this context I'd say fixing the issue is still relevant. It does not prevent a refactor at a later point. Setting |
Hey @EliotBerriot. Yes, I understand. It doesn't make any sense from a project management POV to just merge this, just to alter it again immediately. Better to think through the approach, do it right and then roll a release, with appropriate change notes so people know what has occurred. If you're in a super rush here, you can deploy your branch (using Pip's super VCS support) or create a middleware that just wraps the check for you, which you can remove later. I am grateful for your effort here, and in particular the test cases that help show what's going on! At heart, in patching one issue, we've revealed a sub-optimal area. We'll improve that and make progress. I appreciate that you having found it want a fix to be released now, but as I said, measure twice, cut once here. (It's top of my list.) |
Sure, this makes complete sense, and we can close this. I wanted to avoid having an open PR lying around for months :D
For now, I've pinned the previous version, and will follow the progress on this topic as well (I'm not sure if I'll be able to help with the dev part, but I can test things, which is still something) Thank you for the dedication and maintenance effort :) |
No, don't close it. 😀 It won't be months. It'll be a week or so. (I just need a hour or two to sit down with it.) I need it open so I don't think I've done it, when I haven't. Thanks. 👍 |
OK, @EliotBerriot. Thanks for your efforts here. I've begun work on #1251, which incorporates the tests you've provided. (Very helpful, thank you!) so I'll close this now. I'll ping you on #1251 when that's ready for you to run. 👍 |
Fix #1240
Todo:
I think the actual fix will be a bit tricky because right now, we rely on the
Content-Length
header value to accept or refuse the request. Mimicking Django's implementation would probably be a better option:If that looks good to you, then I can start working on a fix :)
cc @Zarathustra2 @carltongibson