Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readable.fromWeb is never finished #49233

Closed
jan-tosovsky-cz opened this issue Aug 18, 2023 · 18 comments
Closed

Readable.fromWeb is never finished #49233

jan-tosovsky-cz opened this issue Aug 18, 2023 · 18 comments
Labels
stream Issues and PRs related to the stream subsystem.

Comments

@jan-tosovsky-cz
Copy link

jan-tosovsky-cz commented Aug 18, 2023

Version

v18.17.1

Platform

Win64

Subsystem

No response

What steps will reproduce the bug?

Run this script:

const fs = require("fs")
const path = require("path")
const { Readable } = require("stream")
const { finished } = require("stream/promises")

async function saveURLToFile(url, path) {
    try {
        const response = await fetch(url)
        const fileStream = fs.createWriteStream(path)
        await finished(Readable.fromWeb(response.body).pipe(fileStream))

    } catch (err) {
        console.log(err)
    }
}

saveURLToFile("http://www.milicovice.cz/", path.resolve(__dirname, "output.html"))

How often does it reproduce? Is there a required condition?

Always

What is the expected behavior? Why is that the expected behavior?

The content (HTML page) is stored to the file.

What do you see instead?

The node crashes.

Additional information

This method works for other sites. It is unclear why this one is problematic. That error is apparently uncaught by the catch block.
This site belongs to the municipality of a small village. Please minimize the traffic (during testing) to a minimum. Thanks.

@bnoordhuis
Copy link
Member

Define "crashes"? I'd like to see the stack trace or whatever you get.

@jan-tosovsky-cz
Copy link
Author

jan-tosovsky-cz commented Aug 19, 2023

Node.js exits during executing this line:
await finished(Readable.fromWeb(response.body).pipe(fileStream))
No exception is thrown, hence it is not caught.
When trying to capture the error this way, that err.out stays empty:
node script.js 1> log.out 2> err.out
However, when looking into Windows Event Viewer, no crash was reported for Node.js, so it may not be a typical app crash.

So I think that site has something unexpected (SSL certificate, server settings, ...) which triggers an exception not handled correctly and node.js exits abruptly.

Is there any option to enable some node.js logging to inspect what is going on?

@bnoordhuis
Copy link
Member

Ah, okay. I think your report is a duplicate of #48466 and one or two other similar reports (that I can't find right now but I remember commenting on them. They're in the bug tracker... somewhere.)

If you agree it's the same bug, then please go ahead and close this issue.

@jan-tosovsky-cz
Copy link
Author

The linked issue, if I understand correctly, causes an exiting the Node.js when the stream is consumed, but it is interrupted before calling close() method. Ok, this should be fixed. But there is another issue here. Why that stream interruption is not detected and the corresponding exception thrown, so it could be processed in my catch block?

When accessing different URLs I can see many errors (certificate problems, timeouts) so I would expect to handle such interruption events in the same way.

From the linked issue I deduce here the server closes the connection before sending the response. If so, it should be handled gracefully. But I can be wrong. I am not an experienced Node.js user to be able to dig into it more deeply.

@aduh95
Copy link
Contributor

aduh95 commented Aug 20, 2023

Why that stream interruption is not detected and the corresponding exception thrown, so it could be processed in my catch block?

Most likely because the promise returned by finished is never rejected (nor resolved), instead the process exits because nothing is keeping it alive. If you were using ESM (a .mjs file), you could use top-level await, and you would see an exit code of 13.

ESM version
import fs from "node:fs";
import { Readable } from "node:stream";
import { finished } from "node:stream/promises";

async function saveURLToFile(url, path) {
  try {
    const response = await fetch(url);
    const fileStream = fs.createWriteStream(path);
    await finished(Readable.fromWeb(response.body).pipe(fileStream));
  } catch (err) {
    console.log(err);
  }
}

await saveURLToFile(
  "http://www.milicovice.cz/",
  new URL("./output.html", import.meta.url)
);
$ node entry.mjs
$ echo $?
13

@aduh95 aduh95 changed the title Crash when storing specific URL to file Readable.fromWeb is never finished Aug 21, 2023
@aduh95
Copy link
Contributor

aduh95 commented Aug 21, 2023

I've rename the issue because there's no crash.

@nodejs/whatwg-stream can you comment wether the current behavior is expected or it should be fixed?

@VoltrexKeyva VoltrexKeyva added the stream Issues and PRs related to the stream subsystem. label Aug 22, 2023
@mcollina
Copy link
Member

mcollina commented Sep 4, 2023

I'm not sure what is the actual issue. The following code works correctly on my machine (it prints finished):

const fs = require("fs")
const path = require("path")
const { Readable } = require("stream")
const { finished } = require("stream/promises")

async function saveURLToFile(url, path) {
    try {
        const response = await fetch(url)
        const fileStream = fs.createWriteStream(path)
        await finished(Readable.fromWeb(response.body).pipe(fileStream))
        console.log('finished')
    } catch (err) {
        console.log(err)
    }
}

saveURLToFile("http://www.milicovice.cz/", path.resolve(__dirname, "output.html"))

@nodejs/undici I think I saw this problem already but I can't find an issue/pr.

@jan-tosovsky-cz
Copy link
Author

@mcollina That code still doesn't work on my laptop in Nodejs v18.17.1 on Win64 Pro (10.0.22621 Build 22621). But now this stacktrace is printed:

C:\Program Files\nodejs\node.exe .\save-url-to-file.js
Uncaught TypeError TypeError: terminated
    at onAborted (internal/deps/undici/undici:11442:53)
    at emit (events:526:35)
    at terminate (internal/deps/undici/undici:10695:14)
    at fetchParams.controller.resume (internal/deps/undici/undici:11421:36)
    at processTicksAndRejections (internal/process/task_queues:95:5)
    --- TickObject ---
    at init (internal/inspector_async_hook:25:19)
    at emitInitNative (internal/async_hooks:202:43)
    at emitInitScript (internal/async_hooks:504:3)
    at nextTick (internal/process/task_queues:132:5)
    at onDestroy (internal/streams/destroy:103:15)
    at done (internal/webstreams/adapters:526:11)
    at destroy (internal/webstreams/adapters:544:7)
    at _destroy (internal/streams/destroy:109:10)
    at destroy (internal/streams/destroy:71:5)
    at destroyer (internal/streams/destroy:319:12)
    at <anonymous> (internal/webstreams/adapters:555:7)
    at processTicksAndRejections (internal/process/task_queues:95:5)
    --- Promise.then ---
    at newStreamReadableFromReadableStream (internal/webstreams/adapters:548:3)
    at Readable.fromWeb (internal/streams/readable:1403:27)
    at saveURLToFile (c:\test\save-url-to-file.js:10:33)
    at processTicksAndRejections (internal/process/task_queues:95:5)
    --- await ---
    at <anonymous> (c:\test\save-url-to-file.js:17:1)
    at Module._compile (internal/modules/cjs/loader:1256:14)
    at Module._extensions..js (internal/modules/cjs/loader:1310:10)
    at Module.load (internal/modules/cjs/loader:1119:32)
    at Module._load (internal/modules/cjs/loader:960:12)
    at executeUserEntryPoint (internal/modules/run_main:81:12)
    at <anonymous> (internal/main/run_main_module:23:47)
Process exited with code 1

@jan-tosovsky-cz
Copy link
Author

Exactly the same stracktrace I am getting also when saving the content for http://www.hornimarsov.cz/ so this issue is not isolated for the above single server URL.

@mcollina
Copy link
Member

mcollina commented Sep 5, 2023

Something that would be really useful to debug this is to have a complete reproduction, i.e. a local Node.js server that can trigger this problem in the client.

The bug might even be Windows specific.

@metcoder95
Copy link
Member

https://github.com/orgs/nodejs/teams/undici I think I saw this problem already but I can't find an issue/pr.

Couldn't find it, the closest one was nodejs/undici#2026

@aduh95
Copy link
Contributor

aduh95 commented Sep 5, 2023

FWIW when I run the snippet in #49233 (comment) using the latest main, I am getting exit code 13 and 'finished' is never logged. Or sometimes I'm getting an error, it seems to be flaky:

node:events:492
      throw er; // Unhandled 'error' event
      ^

TypeError: terminated
    at Fetch.onAborted (node:internal/deps/undici/undici:11463:53)
    at Fetch.emit (node:events:526:35)
    at Fetch.terminate (node:internal/deps/undici/undici:10717:14)
    at fetchParams.controller.resume (node:internal/deps/undici/undici:11442:36)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
Emitted 'error' event on Readable instance at:
    at emitErrorNT (node:internal/streams/destroy:151:8)
    at emitErrorCloseNT (node:internal/streams/destroy:116:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
  [cause]: Error: incorrect header check
      at Zlib.zlibOnError [as onerror] (node:zlib:189:17) {
    errno: -3,
    code: 'Z_DATA_ERROR'
  }
}

Node.js v21.0.0-pre

@mcollina
Copy link
Member

mcollina commented Sep 5, 2023

Funny insight: right now I can reproduce your results @aduh95.

My understanding is that our decompression algorithm is failing to decompress that payload: this is likely some kind of incompatibility with the meaning of deflate compression.
Then because promises are resolved in the next tick and there is nothing to keep that event loop busy, the value or error are never emitted.

@jan-tosovsky-cz
Copy link
Author

Wappalyzer plugin + Dev Tools reports this set of server details

http://www.milicovice.cz/
Kentico CMS (written in C#) powered by ASP.NET 4.0.30319 running on IIS 10.0 on Windows Server. The response is compressed using deflate. Keep-alive is not specified.

http://www.hornimarsov.cz/
PHP 5.5.38 running on Apache, encoded using gzip. Keep-alive timeout=5, max=100.

Each server's root cause will most likely differ as server configurations differ significantly.

Something about compression on IIS is here https://learn.microsoft.com/en-us/iis/extensions/iis-compression/iis-compression-overview There is stated two libs are available for the deflate compression, gzip.dll and iiszlib.dll. The second is not OOTB, but is more efficient and has better dynamic compression support. But it could explain incompatibilities.

@mcollina
Copy link
Member

mcollina commented Oct 3, 2023

I think this is due to nodejs/undici#2128.

@huseyinacacak-janea
Copy link
Contributor

I've investigated this issue with Node v18.20.2, v20.16.0 and v22.9.0. Both v20 and v22 worked as expected.
@jan-tosovsky-cz Do you think this can be closed now?

@RedYetiDev
Copy link
Member

Per the comment above. See nodejs/undici#2650

@jan-tosovsky-cz
Copy link
Author

Thanks a lot for managing the fix. I downloaded 20.17.0 LTS (Win MSI installer) and ran the script again. The task is now completed without any issues.

Making this functionality more robust (i.e. supporting even obscure old sites) will eliminate headaches for many future developers. Kudos to all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

No branches or pull requests

8 participants