You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Often, when downloading a file, I need to parse "Content-Disposition" header from the response. Usually, I just use regular expressions, which is definitely not ideal, because this header format can be quite complex.
For example, it can be filename*= instead of filename=, and it is also url-encoded with utf-8'' prefix.
Using regular expressions here is not ideal too, because there are quite a few things which are easy to miss.
There's werkzeug.http.parse_options_header, which produces this:
('attachment', {'filename': 'a filename with spaces (and $pec1al ch@aracter$)(en-ru).xliff'})
It looks almost ideal, but it does not know to pick filename*= over filename= when both of them are present, so it gets the last one it finds in the string. And it is also an external dependency of an http-server. :)
Do you think it is a good idea to add this functionality into httpx? Maybe as a function in the _utils module (exposed in the top-level package)? Or to create a separate package for that purpose? 🤔
Would love to hear your thoughts on this, as it seems to be a common use-case and it looks like that Python lacks proper support for this.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello!
Often, when downloading a file, I need to parse "Content-Disposition" header from the response. Usually, I just use regular expressions, which is definitely not ideal, because this header format can be quite complex.
For example, it can be
filename*=
instead offilename=
, and it is also url-encoded withutf-8''
prefix.It may also contain both
filename
andfilename*=
, and according to the spec the client should always prioritizefilename*=
.After parsing, it should become this:
But how should we do it in Python?
There's
cgi.parse_header
, which is already used inhttpx
, but it does not know about*=
and does not handle theutf-8
part:Using regular expressions here is not ideal too, because there are quite a few things which are easy to miss.
There's
werkzeug.http.parse_options_header
, which produces this:It looks almost ideal, but it does not know to pick
filename*=
overfilename=
when both of them are present, so it gets the last one it finds in the string. And it is also an external dependency of an http-server. :)There's also https://github.com/g2p/rfc6266 and several forks (1, 2), but none of them (AFAIK) work on modern Python versions.
What do you use in your projects?
Do you think it is a good idea to add this functionality into
httpx
? Maybe as a function in the_utils
module (exposed in the top-level package)? Or to create a separate package for that purpose? 🤔Would love to hear your thoughts on this, as it seems to be a common use-case and it looks like that Python lacks proper support for this.
Beta Was this translation helpful? Give feedback.
All reactions