Parsing "Content-Disposition" response header #1767

senpos · 2021-07-22T07:06:05Z

senpos
Jul 22, 2021

Hello!

Often, when downloading a file, I need to parse "Content-Disposition" header from the response. Usually, I just use regular expressions, which is definitely not ideal, because this header format can be quite complex.

For example, it can be filename*= instead of filename=, and it is also url-encoded with utf-8'' prefix.

attachment; filename*=utf-8''a%20filename%20with%20spaces%20%28and%20$pec1al%20ch%40aracter$%29%28en-ru%29.xliff

It may also contain both filename and filename*=, and according to the spec the client should always prioritize filename*=.

After parsing, it should become this:

('attachment', {'filename': 'a filename with spaces (and $pec1al ch@aracter$)(en-ru).xliff'})

But how should we do it in Python?

There's cgi.parse_header, which is already used in httpx, but it does not know about *= and does not handle the utf-8 part:

('attachment', {'filename*': "utf-8''a%20filename%20with%20spaces%20%28and%20$pec1al%20ch%40aracter$%29%28en-ru%29.xliff"})

Using regular expressions here is not ideal too, because there are quite a few things which are easy to miss.

There's werkzeug.http.parse_options_header, which produces this:

('attachment', {'filename': 'a filename with spaces (and $pec1al ch@aracter$)(en-ru).xliff'})

It looks almost ideal, but it does not know to pick filename*= over filename= when both of them are present, so it gets the last one it finds in the string. And it is also an external dependency of an http-server. :)

There's also https://github.com/g2p/rfc6266 and several forks (1, 2), but none of them (AFAIK) work on modern Python versions.

What do you use in your projects?

Do you think it is a good idea to add this functionality into httpx? Maybe as a function in the _utils module (exposed in the top-level package)? Or to create a separate package for that purpose? 🤔

Would love to hear your thoughts on this, as it seems to be a common use-case and it looks like that Python lacks proper support for this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing "Content-Disposition" response header #1767

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Parsing "Content-Disposition" response header #1767

senpos Jul 22, 2021

Replies: 0 comments

senpos
Jul 22, 2021