Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add charset to content type in getHandler #545

Merged
merged 1 commit into from
Apr 29, 2023
Merged

Conversation

snowphone
Copy link
Contributor

@snowphone snowphone commented Apr 28, 2023

When I use https/transfer.sh/inline/...., CJK letters are crashed, at least on MSEdge. It seems it's because of absence of charset (UTF-8).

So I added charset to content type in the getHandler function to fix CJK-letter related issues. If the content type is empty after trimming, set it to "text/plain; charset=utf-8".

BTW, I'm not familiar with REST APIs and this change is not tested yet, feel free to give me feedback.

Add charset to content type in the getHandler function to fix CJK-letter related issues.
If the content type is empty after trimming, set it to "text/plain; charset=utf-8".
@aspacca
Copy link
Collaborator

aspacca commented Apr 28, 2023

@snowphone thanks for the PR

with your code the charset will be set only if the content type is empty: this will happen probably only uploading a file with an empty extension

could you please validate that if you upload a file with .txt extension and CJK content your fix won't solve the issue?

I'm not sure how not relaying on the content-type header of the request is safer than relaying on the extension: both can be forged to produce a specific result, it's just that using the golang stdlind the possible results available are lower

@cr-sh, sorry if I chime you in, do you see anything security related if we change this?

@snowphone
Copy link
Contributor Author

Since your official server (transfer.sh) seems down, I tested on my server.
This is the link and The original content is hello 안녕하세요. You'll see that something is wrong

BTW, My docker instance uses dutchcoders/transfer.sh:edge.

image

@stefanbenten
Copy link
Collaborator

@snowphone The server is not down, it sadly has geoblocking enabled due to heavy abuse. I am sorry for the inconvenience caused and we hope to lift that restriction at some point 👍

@aspacca
Copy link
Collaborator

aspacca commented Apr 28, 2023

This is the link and The original content is hello 안녕하세요. You'll see that something is wrong

yes, that was clear

what I mean is that even with your change, uploading greeting.txt won't fix the issue, since the content type will not be empty, and then set to text/plain; charset=utf-8

it will fix only if you upload gretting most probably

@snowphone
Copy link
Contributor Author

snowphone commented Apr 28, 2023

Hmm.. It works on trasfer.sh. It seems I missed some configurations on my docker instance.

Here's the link and The original content is hi 안녕.

EDIT: It works if the extension is .txt, but the problem occurs if extensions do not exist.
Here's another link (transfer.sh/greeting) and the content is same.
https://transfer.sh/inline/rOEg0s/greeting

image

@cr-sh
Copy link

cr-sh commented Apr 28, 2023

@cr-sh, sorry if I chime you in, do you see anything security related if we change this?

No worries, it's always my pleasure to be mentioned by you @aspacca :)

Client-side, from a security perspective, relaying on the content-type header of the request is quite the same than relaying on the file extension, both are (of course) in the control of a possible attacker... "ideally" the relative server-side security best practice in file upload endpoints is to match the header with the provided file extension.

In opposite direction the content-type header on http responses are more prone to bad things when empty or evil, because the browser will try to guess the file type sniffing the content, so in case of "empty" content types the best option is to add X-Content-Type-Options: nosniff when serving the file. And always using text/plain for every dangerous content for browser attacks (html, js, css, etc).

In any case forcing to text/plain any "empty" Content-Type as I see on handlers.go is also an effective solution for this use case and I don't see any evident security implication in adding a UTF-8 charset to it.

[edit] reference:
https://www.invicti.com/blog/web-security/importance-content-type-header-http-requests
https://www.invicti.com/white-papers/whitepaper-http-security-headers/#XContentTypeOptionsHTTPHeader

Cheers, crash

@snowphone
Copy link
Contributor Author

Hmm.. It works on trasfer.sh. It seems I missed some configurations on my docker instance.

Here's the link and The original content is hi 안녕.

EDIT: It works if the extension is .txt, but the problem occurs if extensions do not exist. Here's another link (transfer.sh/greeting) and the content is same. https://transfer.sh/inline/rOEg0s/greeting

image

I can't understand why greeting.txt is okay on transfer.sh but not on my docker instance. I use docker-compose and overridden configs are the following:

    - TZ=Asia/Seoul
- PROVIDER=local
- BASEDIR=/tmp/
- PURGE_DAYS=3
- PURGE_INTERVAL=6

and a reverse-proxy server is in front of transfer.sh instance. Do you guys know why this difference happened?

@aspacca
Copy link
Collaborator

aspacca commented Apr 29, 2023

I can't understand why greeting.txt is okay on transfer.sh but not on my docker instance. I use docker-compose and overridden configs are the following:

my guess is something about the locale env variables

@aspacca
Copy link
Collaborator

aspacca commented Apr 29, 2023

Here's the link and The original content is hi 안녕.

this has content-type: text/plain; charset=utf-8 in the response headers

but the problem occurs if extensions do not exist. Here's another link (transfer.sh/greeting) and the content is same. https://transfer.sh/inline/rOEg0s/greeting

this has no content-type header at all in the response
I will try to reproduce and check what happens exactly

@aspacca
Copy link
Collaborator

aspacca commented Apr 29, 2023

thanks @cr-sh for the feedback

"ideally" the relative server-side security best practice in file upload endpoints is to match the header with the provided file extension.

what if they don't match? should we reject the upload?

something like this is correct from your point of view?

contentTypeExt := mime.TypeByExtension(filepath.Ext(vars["filename"]))
contentTypeHeader := r.Header.Get("content-type")
if !strings.HasPrefix(contentTypeHeader, contentTypeExt) && !strings.HasPrefix(contentTypeExt, contentTypeHeader) {
    // reject upload
}

@aspacca
Copy link
Collaborator

aspacca commented Apr 29, 2023

this has no content-type header at all in the response
I will try to reproduce and check what happens exactly

I cannot reproduce: https://github.com/dutchcoders/transfer.sh/blob/main/server/handlers.go#L1221 is properly reached, and I can see locally the header content-type: text/plain in the response.
still it has to be changed to content-type: text/plain; charset=utf-8 to fix showing the proper enconding

@stefanbenten is it possible that you have some kind of proxy on top of transfer.sh that drops the content-type header in such case?

@snowphone
Copy link
Contributor Author

I can't understand why greeting.txt is okay on transfer.sh but not on my docker instance. I use docker-compose and overridden configs are the following:

my guess is something about the locale env variables

Unfortunately, locale was not the one. I tried LANG=en_US.UTF-8 and LC_ALL=en_US.UTF-8 on my docker compose environments but it seems nothing has changed.

@aspacca
Copy link
Collaborator

aspacca commented Apr 29, 2023

Unfortunately, locale was not the one. I tried LANG=en_US.UTF-8 and LC_ALL=en_US.UTF-8 on my docker compose environments but it seems nothing has changed.

and a reverse-proxy server is in front of transfer.sh instance. Do you guys know why this difference happened?

maybe something on the reverse-proxy?

are you able to bypass the proxy and reproduce the issue directly connecting to docker?

@aspacca
Copy link
Collaborator

aspacca commented Apr 29, 2023

my guess is something about the locale env variables

Unfortunately, locale was not the one. I tried LANG=en_US.UTF-8 and LC_ALL=en_US.UTF-8 on my docker compose environments but it seems nothing has changed.

sorry, indeed not specific to local env variables, but in general to the locale.
since the docker image is based on scratch, i'd say the locale settings are not related to the locale env variables, but most probably to the locale of the host where the docker container runs. not sure about the details, I'd have to check how locale works in docker between the image and the host

@aspacca
Copy link
Collaborator

aspacca commented Apr 29, 2023

sorry, indeed not specific to local env variables, but in general to the locale.

indeed the locale concerns might be off-point and in case come in the picture after the proper content-type header is set in the response

could you please check the content-type header value for greetings.txt when getting the file through the proxy and directly from the docker instance?

the proper value should be text/plain; charset=utf-8, if it's not returned directly by the docker instance I have to check what mime.TypeByExtension does exactly and if its behaviour is somehow related to the locale. if the value is returned directly by docker, but not from the proxy, the problem is in the proxy

what's your host locale?

@aspacca
Copy link
Collaborator

aspacca commented Apr 29, 2023

what mime.TypeByExtension does exactly and if its behaviour is somehow related to the locale

https://pkg.go.dev/mime#TypeByExtension
the docker image is based on scratch, so only the built-in table should be relevant: https://cs.opensource.google/go/go/+/refs/tags/go1.20.3:src/mime/type.go;l=60-77
.txt is missing, so we should end up in https://github.com/dutchcoders/transfer.sh/blob/main/server/handlers.go#L1221, that without your change will return only text/plain.

if you run the binary on a *nix machine you'll get much more extensions, that's the reason why I could not reproduce. I don't know if on https://transfer.sh the service runs from docker or not (@stefanbenten ?)

it might be worth to add at least a few of following files to the docker image:

/usr/local/share/mime/globs2
/usr/share/mime/globs2
/etc/mime.types
/etc/apache2/mime.types
/etc/apache/mime.types

@snowphone
Copy link
Contributor Author

First, the problem is NOT related to a reverse-proxy server. I tried without reverse-proxy (I stopped my nginx instance for make it sure) but the problem still happens.
Here's the link and I uploaded images too.

And, my host locale is en_US.UTF-8. I hope this would be helpful.

image
image
image

@aspacca
Copy link
Collaborator

aspacca commented Apr 29, 2023

First, the problem is NOT related to a reverse-proxy server. I tried without reverse-proxy (I stopped my nginx instance for make it sure) but the problem still happens.

yes, @snowphone , please read my last comment at #545 (comment)

let's split the two problems you are facing:

  1. empty content type will fallback to text/plain
  2. docker image produces empty content type for .txt, vs binary running on an *nix host producing text/plain; charset=utf-8

solutions:

  1. your PR addressess this adding charset=utf-8
  2. please mount on the docker container the following files:
/usr/local/share/mime/globs2
/usr/share/mime/globs2
/etc/mime.types
/etc/apache2/mime.types
/etc/apache/mime.types

(only the one you have available on your system)

  1. should be enough for the .txt case. once you validate I will change the Dockerfile in order to copy those files in the final image

  2. your PR will solve the issue for empty content type fallback, that's always the case of file upload with no extension

@snowphone
Copy link
Contributor Author

Thanks! mounting /etc/mime.types and /usr/share/mime is the key for files with extension.
So, I think this should be documented in README.md or included in docker image. What do you guys think about it? (Maybe a different PR).

@aspacca
Copy link
Collaborator

aspacca commented Apr 29, 2023

So, I think this should be documented in README.md or included in docker image. What do you guys think about it? (Maybe a different PR).

inlcuding them in the docker image is the preferred solution, if you have some times to work on it a PR is welcome

I will merge this current one in the meanwhile

@aspacca aspacca merged commit e837849 into dutchcoders:main Apr 29, 2023
@cr-sh
Copy link

cr-sh commented Apr 29, 2023

what if they don't match? should we reject the upload?

yep, i think it's the best policy

something like this is correct from your point of view?

Looks good, but some testing is mandatory here, because there are some cases where we can have multiple valid content-type headers for a given file extension (like for images and videos). Also remember to place this snippet after handling the case of uploads without extension.

Probably, to push it even further, you can actively check on the server the mimetype of uploaded files, this is a good example https://github.com/h2non/filetype

snowphone added a commit to snowphone/transfer.sh that referenced this pull request Apr 29, 2023
This commit includes /etc/mime.types file to the container, which is necessary to properly select the charset using MIME typing during file upload.

For more information, read dutchcoders#545 (comment)
aspacca added a commit that referenced this pull request May 19, 2023
* Add charset to content type in getHandler

Add charset to content type in the getHandler function to fix CJK-letter related issues.
If the content type is empty after trimming, set it to "text/plain; charset=utf-8".

* Add mailcap and mime.types to transfer.sh container

This commit includes /etc/mime.types file to the container, which is necessary to properly select the charset using MIME typing during file upload.

For more information, read #545 (comment)

---------

Co-authored-by: Andrea Spacca <andrea.spacca@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants