Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undici fails to read bodies from responses with implicit lengths #1414

Closed
pimterry opened this issue May 5, 2022 · 11 comments
Closed

Undici fails to read bodies from responses with implicit lengths #1414

pimterry opened this issue May 5, 2022 · 11 comments
Labels
bug Something isn't working

Comments

@pimterry
Copy link
Member

pimterry commented May 5, 2022

Bug Description

Paraphrasing the HTTP spec (RFC 7230 3.3.3), the message length for a normal response body can be defined in a few ways:

  • An explicit content-length header with a fixed length (point 3 in the RFC list)
  • Given a transfer-encoding header, a transfer coding chunk that explicitly ends the body (point 5 in the list)
  • If otherwise unspecified, all remaining bytes on the connection until the connection is closed (point 7 in the list)

Undici handles the last case incorrectly, making it impossible to read the HTTP response body for this simple "respond and then close the connection" case.

This doesn't come up much for big fancy servers, which tend to use keep-alive etc and explicit framing wherever they can, but it is common on quick simple HTTP server implementations, which avoid state and complexity by just streaming responses and closing the connection when they're done. It's a legitimate way to send responses according to the HTTP spec, and it is supported correctly by browsers and Node's http module, but not by Undici.

Reproducible By

Running in Node 18.0.0:

const http = require('node:http');

http.createServer((req, res) => {
    res.removeHeader('transfer-encoding');
    res.writeHead(200, {
        // Header isn't actually necessary, but tells node to close after response
        'connection': 'close'
    });
    res.end('a response body');
}).listen(8008);

fetch('http://localhost:8008').then(async (response) => { // Global fetch from Undici
    console.log('got response', response.status, response.headers);

    try {
        const responseText = await response.text()
        console.log('got body', responseText); // Should log the body OK here
    } catch (e) {
        console.log('error reading body', e); // Throws a 'terminated' error instead
    }
});

The server returns a simple response, just containing the default date header and a connection: close header, then sends the body, then closes the connection.

The connection: close header is for clarity - this should work equally well without that, just using res.socket.end() explicitly instead.

Expected Behavior

The above should print the status, headers, and then the response body, which is everything after the headers until the connection is closed.

This does work using fetch in browsers. To test this, run the script above (which will print the fetch error, but then keep running) then load localhost:8008 in your browser - the string loads successfully.

With that page loaded (for CORS) you can also run the equivalent command in the browser dev console, which also works and prints the response body correctly:

fetch('http://localhost:8008').then(async (response) =>
    console.log(await response.text()) // Logs the body correctly, no errors
);

This also works using Node's built-in HTTP module:

const http = require('http');
const streamConsumers = require('stream/consumers');

http.get('http://localhost:8008', async (res) => {
    const body = await streamConsumers.text(res);
    console.log('GET got body:', body); // Logs the body correctly
});

Logs & Screenshots

Undici's fetch does not print the response body, instead it throws an error:

error reading body TypeError: terminated
    at Fetch.onAborted (node:internal/deps/undici/undici:7881:53)
    at Fetch.emit (node:events:527:28)
    at Fetch.terminate (node:internal/deps/undici/undici:7135:14)
    at Object.onError (node:internal/deps/undici/undici:7968:36)
    at Request.onError (node:internal/deps/undici/undici:696:31)
    at errorRequest (node:internal/deps/undici/undici:2774:17)
    at Socket.onSocketClose (node:internal/deps/undici/undici:2236:9)
    at Socket.emit (node:events:527:28)
    at TCP.<anonymous> (node:net:715:12) {
  [cause]: SocketError: closed
      at Socket.onSocketClose (node:internal/deps/undici/undici:2224:35)
      at Socket.emit (node:events:527:28)
      at TCP.<anonymous> (node:net:715:12) {
    code: 'UND_ERR_SOCKET',
    socket: {
      localAddress: undefined,
      localPort: undefined,
      remoteAddress: undefined,
      remotePort: undefined,
      remoteFamily: 'IPvundefined',
      timeout: undefined,
      bytesWritten: 171,
      bytesRead: 90
    }
  }
}

Environment

Ubuntu

  • Node v18.0.0 with built-in fetch
  • Node v16.14.2 with Undici 5.1.1
@pimterry pimterry added the bug Something isn't working label May 5, 2022
@ronag
Copy link
Member

ronag commented May 5, 2022

If otherwise unspecified, all remaining bytes on the connection until the connection is closed (point 7 in the list)

My reading of the spec is different.

If a Transfer-Encoding header field is present in a response and
the chunked transfer coding is not the final encoding, the
message body length is determined by reading the connection until
it is closed by the server.

i.e. the transfer-encoding header still needs to be present. Hence your example actually goes under the 4. invalid response clause.

@pimterry
Copy link
Member Author

pimterry commented May 5, 2022

I don't think that paragraph applies. It starts with "If a transfer-encoding header field is present" and in this case there is no transfer-encoding header.

I'm referring to the final case, point 7:

Otherwise, this is a response message without a declared message body length, so the message body length is determined by the number of octets received prior to the server closing the connection.

Also, this seems very unlikely to be an invalid response as that would imply that the HTTP stacks of Chrome, Firefox and Node.js are all doing the wrong thing here. Point 4 regarding invalid response bodies says:

If this is a response message received by a user agent, the user agent MUST close the connection to the server and discard the received response.

Nobody else does so - everybody except Undici happily accepts this response as legit.

@ronag
Copy link
Member

ronag commented May 5, 2022

PR welcome

@pimterry
Copy link
Member Author

pimterry commented May 5, 2022

I'd love to, but it looks like this is going to be deep in llhttp, which doesn't seem easy to dive into.

I've skimmed the README from https://github.com/nodejs/llhttp and the llparse intro (https://llparse.org/) and I get the gist, but that's pretty spare and it's quite hard to immediately work out how the details work in practice.

Is there a contributing guide for llhttp, or any more documentation for llparse anywhere? I'm not clear how to write test cases or debug it, and it's not clear exactly where Undici interacts with it to check inputs & outputs at the entrypoints.

Looks like the trail starts here:

} else {
/* Read body until EOF */
return 4;
}

That seems intended to start handling this exact situation to me, but something's going wrong in the parsing for that case at a later step. Or maybe parsing is working correctly, but then Undici is failing due to the connection close somewhere before it realises the response is complete & OK?

@pimterry
Copy link
Member Author

pimterry commented May 5, 2022

Doing some digging, I've just run into #1412.

Pretty sure that's caused by this underlying issue (and the HTTP/2 mention there is an unrelated red herring). I can reproduce the exact same terminated error as here with the example site they use:

fetch('https://www.dailymail.co.uk/').then(res => res.text()).then(console.log)

Logging the headers too, they do include connection: close with no transfer-encoding or content-length, and Undici fails to read the body in the same way.

That suggests this failing edge case is a lot more common in reality than I'd expected, since this is the landing page of one of the top 100 most visited websites in the world (https://www.semrush.com/website/dailymail.co.uk/ says it's 73rd - they're terrible but sadly popular) and other headers suggest this response is being served by Akamai.

@mcollina
Copy link
Member

mcollina commented May 6, 2022

it looks like this is going to be deep in llhttp

No it's not in llhttp. Node.js uses llhttp as well and it implements this.

@jcheese1
Copy link

yep getting the same error trying to fetch github graphql api https://api.github.com/graphql

@mcollina
Copy link
Member

A PR to fix this would be nice.

@evanderkoogh
Copy link

I have figured out the problem and I am currently figuring out how best to fix it.. Might not get the PR up before dinner, but hopefully in the next couple of hours.

@ronag
Copy link
Member

ronag commented Aug 22, 2022

@evanderkoogh did we fix this?

@pimterry
Copy link
Member Author

@ronag I'm going to close this because the original problem no longer exists. There are other issues being reported as dupes of this, but I think that's not correct and might be confusing things. Both the examples I provided above that originally failed do now pass successfully, and have for a while.

The first standalone example passes in Undici since v5.12.0, and in Node v18.12.0+ and all v20 versions.

The second example (with the daily mail) passes Undici since v5.13.0, and in Node v18.13.0+ and all v20 versions.

Any other related issues are due to something else, or a much more specific version of this issue, and need investigating separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
5 participants