Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Header encoding changes, leading to ERR_INVALID_CHAR #287

Closed
2 tasks done
SimenB opened this issue Jan 27, 2023 · 16 comments · Fixed by #289
Closed
2 tasks done

Header encoding changes, leading to ERR_INVALID_CHAR #287

SimenB opened this issue Jan 27, 2023 · 16 comments · Fixed by #289

Comments

@SimenB
Copy link
Member

SimenB commented Jan 27, 2023

Prerequisites

  • I have written a descriptive issue title
  • I have searched existing issues to ensure the bug has not already been reported

Fastify version

4.12.0

Plugin version

8.3.1

Node.js version

16.19.0

Operating system

macOS

Operating system version (i.e. 20.04, 11.3, 10)

13.1

Description

When a header is set using a character like å (Norwegian character) its encoding gets messed up. A fastify server serving it serializes correctly, but if reply-from is used, the encoding changes. If another server with reply-from is chained, the app crashes with TypeError [ERR_INVALID_CHAR]: Invalid character in header content ["content-disposition"].

Steps to Reproduce

const replyFrom = require('@fastify/reply-from');
const fastify = require('fastify');

const app = fastify();

app.get('/', (_request, reply) => {
  reply.header('content-disposition', 'år.pdf').send('OK');
});

app.listen({ port: 6666 });

const proxy = fastify();

proxy.register(replyFrom);

proxy.get('/', (_request, reply) => {
  return reply.from('http://localhost:6666');
});

proxy.listen({ port: 7777 });

const secondProxy = fastify();

secondProxy.register(replyFrom);

secondProxy.get('/', (_request, reply) => {
  return reply.from('http://localhost:7777');
});

secondProxy.listen({ port: 8888 });
$ curl -i http://localhost:6666
HTTP/1.1 200 OK
content-disposition: år.pdf
content-type: text/plain; charset=utf-8
content-length: 2
Date: Fri, 27 Jan 2023 15:47:49 GMT
Connection: keep-alive
Keep-Alive: timeout=72


$ curl -i http://localhost:7777
HTTP/1.1 200 OK
content-disposition: r.pdf
content-type: text/plain; charset=utf-8
content-length: 2
date: Fri, 27 Jan 2023 15:47:52 GMT
connection: keep-alive
keep-alive: timeout=72


$ curl -i http://localhost:8888
curl: (52) Empty reply from server

Expected Behavior

The header should be correctly encoded and decoded (i.e. 7777 and 8888 should have the same result as 6666).

If that's not possible I expect the first server to complain when the header is added.

@mcollina
Copy link
Member

Ouch! have you got a full stack trace? Would you like to send a PR?

@SimenB
Copy link
Member Author

SimenB commented Jan 28, 2023

Stack trace:

TypeError [ERR_INVALID_CHAR]: Invalid character in header content ["content-disposition"]
    at storeHeader (node:_http_outgoing:572:5)
    at processHeader (node:_http_outgoing:567:3)
    at ServerResponse._storeHeader (node:_http_outgoing:457:11)
    at ServerResponse.writeHead (node:_http_server:387:8)
    at onSendEnd (/Users/simen/repos/monorepo/node_modules/fastify/lib/reply.js:596:7)
    at onSendHook (/Users/simen/repos/monorepo/node_modules/fastify/lib/reply.js:529:5)
    at fallbackErrorHandler (/Users/simen/repos/monorepo/node_modules/fastify/lib/error-handler.js:130:3)
    at handleError (/Users/simen/repos/monorepo/node_modules/fastify/lib/error-handler.js:60:5)
    at onErrorHook (/Users/simen/repos/monorepo/node_modules/fastify/lib/reply.js:742:5)
    at _Reply.Reply.send (/Users/simen/repos/monorepo/node_modules/fastify/lib/reply.js:133:5)
    at defaultErrorHandler (/Users/simen/repos/monorepo/node_modules/fastify/lib/error-handler.js:91:9)
    at handleError (/Users/simen/repos/monorepo/node_modules/fastify/lib/error-handler.js:64:18)
    at onErrorHook (/Users/simen/repos/monorepo/node_modules/fastify/lib/reply.js:742:5)
    at BodyReadable.<anonymous> (/Users/simen/repos/monorepo/node_modules/fastify/lib/reply.js:630:9)
    at BodyReadable.<anonymous> (node:internal/util:453:5)
    at BodyReadable.onerror (node:internal/streams/end-of-stream:133:14)
    at BodyReadable.emit (node:events:513:28)
    at BodyReadable.emit (/Users/simen/repos/monorepo/node_modules/undici/lib/api/readable.js:66:18)
    at emitErrorNT (node:internal/streams/destroy:157:8)
    at emitErrorCloseNT (node:internal/streams/destroy:122:3)
    at processTicksAndRejections (node:internal/process/task_queues:83:21

As for a PR, I have no idea where to start. 😅 I'm unsure if the bug is here, in Fastify core, or even in Node core itself.

@mcollina
Copy link
Member

HTTP Headers are not supposed to contain UTF-8 characters.

There are two bugs:

  1. the first server actually sending this header out without throwing. I guess we bypass this check in Fastify for throughput reasons... we might rethink it.
  2. the second server should definitely catch the error while it validates this.

@climba03003
Copy link
Member

climba03003 commented Jan 28, 2023

This issue fastify/fastify#3808 just pop up in my memory when seeing non-ASCII string in header.

the first server actually sending this header out without throwing. I guess we bypass this check in Fastify for throughput reasons... we might rethink it.

According to the research from that issue nodejs/node#42579
If there is "Transfer-Encoding: chunked" header (exactly chunked) setHeader works properly, it sets the input header (ByteString) as is.

I assume we are not bypass anything in fastify but node.js just serialize the header as-is from the above statement.

the second server should definitely catch the error while it validates this.

We need to think why only chaining will set Transfer-Encoding: chunked.

@climba03003
Copy link
Member

I still see it as a upstream bug from node.js.
It treats the header differently when there is a header of Transfer-Encoding or not.

It should throw no matter the header Transfer-Encoding exists.

@mcollina
Copy link
Member

@climba03003 could you open an issue on Node.js describing the problem?

@SimenB
Copy link
Member Author

SimenB commented Jan 28, 2023

Ah, based on the links in the Node issue, adding encodeURIComponent works. 👍 A bit too aggressive (e.g. encoding spaces etc.), but doesn't hurt I think. At least the file from Chrome is named correctly by it.


Note that content-disposition does not pick this up (possibly jshttp/content-disposition#27) so @fastify/static's reply.download will produce an invalid header if the filepath includes utf-8 characters

https://github.com/fastify/fastify-static/blob/c95eaf7bf20489007c7d85f5516c421b2d6b4334/index.js#L306-L307

@climba03003
Copy link
Member

content-disposition should be encoding the utf-8 character from it's claim.

When the latin1 is not same as input, it pass to filename* field.
https://github.com/jshttp/content-disposition/blob/73bf21e7c3f55f754932844584061027767289f4/index.js#L203-L205

When there is a filename* field, it use percent encode.
https://github.com/jshttp/content-disposition/blob/73bf21e7c3f55f754932844584061027767289f4/index.js#L246-L248

@mcollina
Copy link
Member

Should we fork content-disposition?

@climba03003
Copy link
Member

climba03003 commented Jan 30, 2023

content-disposition actually works as intended.
The character å also allowed from node itself.

@SimenB
Copy link
Member Author

SimenB commented Jan 30, 2023

å is not the same character code in latin1 and utf8, so there needs to (or should) be some sort of transcoding (or usage of filename*=utf-8)

const {transcode: bufferTranscode} = require('node:buffer');

const utf8Buffer = Buffer.from('å');
const latin1Buffer = bufferTranscode(utf8Buffer, 'utf8', 'latin1');

console.log(Buffer.compare(utf8Buffer, latin1Buffer)); // logs -1

@climba03003
Copy link
Member

climba03003 commented Jan 30, 2023

It still confuse me why the first comparison is true if the byte should not be the same (assume utf8 to latin1 conversion exist).
Only after the second write to the header, it becomes totally wrong.

const contentDeposition = require('content-disposition')
const http = require('http')
const { request } = require('undici')

const header = contentDeposition('år.pdf') // C3A5

http.createServer(function(_, response) {
  response.writeHead(200, {
    'content-length': 2,
    'content-disposition': header
  })
  response.end('OK')
}).listen({ port: 6666 })

http.createServer(async function(_, response) {
  const { statusCode, headers, body } = await request('http://localhost:6666', {
    method: "GET"
  })
  console.log(headers)
  console.log(headers['content-disposition'] === header) // true C3A5
  delete headers['transfer-encoding']
  response.writeHead(statusCode, headers)
  body.pipe(response)
}).listen({ port: 7777 })

http.createServer(async function(_, response) {
  const { statusCode, headers, body } = await request('http://localhost:7777', {
    method: "GET"
  })
  console.log(headers)
  console.log(headers['content-disposition'] === header) // false EFBFBD
  delete headers['transfer-encoding']
  response.writeHead(statusCode, headers)
  body.pipe(response)
}).listen({ port: 8888 })

@SimenB
Copy link
Member Author

SimenB commented Jan 30, 2023

When fetching 7777 you can see the header has seemingly lost the character

$ curl -i http://localhost:6666
HTTP/1.1 200 OK
content-length: 2
content-disposition: attachment; filename="år.pdf"
Date: Mon, 30 Jan 2023 10:47:44 GMT
Connection: keep-alive
Keep-Alive: timeout=5

OK

$ curl -i http://localhost:7777
HTTP/1.1 200 OK
content-length: 2
content-disposition: attachment; filename="r.pdf"
date: Mon, 30 Jan 2023 10:42:49 GMT
connection: keep-alive
keep-alive: timeout=5

OK

I think this is just bad printing by curl. Using e.g. httpie shows this.

$ http http://localhost:6666
HTTP/1.1 200 OK
Connection: keep-alive
Date: Mon, 30 Jan 2023 10:44:26 GMT
Keep-Alive: timeout=5
content-disposition: attachment; filename="Ã¥r.pdf"
content-length: 2

OK


$http http://localhost:7777
HTTP/1.1 200 OK
connection: keep-alive
content-disposition: attachment; filename="år.pdf"
content-length: 2
date: Mon, 30 Jan 2023 10:44:35 GMT
keep-alive: timeout=5

OK

6666 gives å encoded weirdly (probably since it tries to print a utf8 character as latin1), while 7777 it's back. 8888 still crashes of course, regardless of curl vs httpie (thankfully 😅).

So I do think there's some encoding issue here, and I do think content-disposition should deal with it (and ideally node would validate of course)

@climba03003
Copy link
Member

climba03003 commented Jan 30, 2023

curl is showing the UTF-8 character but httpie is showing the Latin1 character.
Weird, since inside the 7777 server log, it is properly received as UTF-8 byte.

If conversion is happened.
6666 -> Latin1 -> 7777 (convert to UTF-8 in header list?) -> Why broken? Shouldn't it convert to Latin1 from UTF-8 again -> 8888

@climba03003
Copy link
Member

climba03003 commented Jan 30, 2023

I see the problem, whenever you are not writing string to response.
The encoding is actually changed.

So, 6666 is display correctly (sent out as utf8).
7777 received correctly, but since it pipe with stream. It sent out wrongly (sent out as latin1 or even something else).

The problematic behavior actually cause by
https://github.com/nodejs/node/blob/2a29df64645a70bbb833298423a29206c4ec6a2e/lib/_http_outgoing.js#L361-L373

The problem exist because

  1. The validation of header characters is incomplete (some utf-8 character may pass through the regexp.)
  2. The header is not always enforced to be latin1. It vary between utf-8 and latin1 depends on first chunk.

@SimenB
Copy link
Member Author

SimenB commented Feb 13, 2023

Can confirm that this is fixed by nodejs/undici#1911 (without #289 applied) 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants