-
Notifications
You must be signed in to change notification settings - Fork 30.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streams, Buffers, and Worker Threads #33240
Comments
Oh, move semantics in javascript. How exciting. I would prefer if the ownership was transferred down the pipe. However, I find it unlikely that we will be able to enforce this in user land... It's not the most performant way but I think in this case you should not transfer the Buffer, but copy it. |
Maybe stream implementators can set a standardised flag on the Buffer to indicate it is safe to transfer? |
Yep, for my case, I'm going to allow an option that does both, with copy as the default. We should, however, deal with the Abort case here more elegantly to avoid the crash as it's just a matter of time before other end users hit it. |
I think the question you posed in the linked issue is the relevant one here for the second issue:
We kind of do have an answer for that: Modifications to the Buffer that happen between There kind of is a good and performand solution here, which would be having copy-on-write semantics for And for issue 1, yeah, it might make sense to audit the codebase for calls that read from buffers and check how they react to detaching (None of this is really Worker-thread specific, btw.) |
Nope, definitely not worker-thread specific but makes it easy to spot the issue :-) |
@jasnell I think your issue is due to your stream "breaking" the pooling contract here https://github.com/jasnell/piscina/pull/34/files#r419929366 |
@mafintosh yeah, that's precisely the point here tho. The streams contract is underdefined here with an implicit, undocumented rule that the writer owns the buffer most of the time. That doesn't hold true all of the time, however. Such as when the Buffer is sliced off the pool, or when the stream instance itself creates the Buffer because it was given a string instead. Meanwhile, there aren't protections downstream to prevent the Buffer from being misused (such as being put into a transfer list a triggering an abort). There are two actions here:
|
@jasnell I don't think this is a stream issue tho. Node pooling makes this impossible to solve, in general. const a = Buffer.from('hello')
const b = Buffer.from('world')
console.log(a.buffer === b.buffer) |
To expand on o/, is there ever a case where transferring an unsafe buffer using transferlist won't cause undefined behaviour in your Node program, since it'll "free" all kinds of unrelated buffers? |
Given that, is there any case where we can allow buffer in the transferlist? Shouldn't we always make a copy (event if the user puts the buffer into the transferlist), i.e. we support the API but with a slow implementation. Until we have a better solution? |
@mafintosh Just fyi, you are right, but in the near future, that is no longer going to be a concern here: #32759 |
@addaleax excellent! I don't think this a stream issue still. You call write with a buffer, you're implicitly giving ownership of that buffer to the stream. If you mess with the arraybuffer, without taking into account byteOffset and byteLength you're breaking out of the "Buffer view" sandbox which requires some kind of coupling with whoever else uses that arraybuffer. |
For streams, I think the rule is actually that the stream has zero ownership over the Buffer at all. For handling of Assuming the setup: const Piscina = require('piscina');
const { resolve } = require('path');
const pool = new Piscina({
filename: resolve(__dirname, 'worker.js')
});
const b = Buffer.from([1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,10]);
const b1 = b.slice(0, 10);
const b2 = b.slice(10);
console.log(b.buffer === b1.buffer);
console.log(b.buffer === b2.buffer);
console.log(b1.buffer === b2.buffer);
(async () => {
console.log(await pool.runTask({data:b2}, [b2.buffer]));
console.log(b2, b2.buffer);
})(); Note the use of Then, in module.exports = ({data}) => {
const u = new Uint8Array(data.buffer);
u[0] = 1;
u[1] = 2;
u[2] = 3;
data.fill(2);
console.log('\t',data, data.buffer);
} Running this in Node.js 12.x and 13.x yields an error.
Running this in Node.js 14.1 yields:
Note that, in this case, b2 (and it's entire underlying ArrayBuffer) ends up being cloned rather than transferred, despite use of the transfer list, and all three Buffer instances in the main thread continue to be usable. Things change, however, if we switch from using const Piscina = require('piscina');
const { resolve } = require('path');
const pool = new Piscina({
filename: resolve(__dirname, 'worker.js')
});
const b = Buffer.alloc(20).fill(1);
//const b = Buffer.from([1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,10]);
const b1 = b.slice(0, 10);
const b2 = b.slice(10);
console.log(b.buffer === b1.buffer);
console.log(b.buffer === b2.buffer);
console.log(b1.buffer === b2.buffer);
(async () => {
console.log(await pool.runTask({data:b2}, [b2.buffer]));
console.log(b2, b2.buffer);
console.log(b1, b1.buffer);
})(); If we run this in Node.js 14, 13, or 12 we get:
The So, with |
One bit to point out... using |
@jasnell unless something has changed almost all of cores apis return unsafe allocated buffers (i did some research into this for a potentially security issue). So any buffer from fs.readFile, fs.createReadStream etc won't work |
For the streams discussion, I'd still argue that the stream owns the Buffer when you pass it to it, but not the arraybuffer. Ie when you do |
That's not really true tho... because if that const { createWriteStream, readFileSync } = require('fs')
const assert = require('assert');
const u = new Uint8Array(10);
const b = Buffer.from(u.buffer);
assert.strictEqual(b[0], 0);
const out = createWriteStream('./test');
out.cork();
out.write(b);
u[0] = 1;
out.uncork();
out.end();
out.on('close', () => {
const check = readFileSync('./test');
console.log(check);
assert.strictEqual(check[0], 0); // fails
}) // and...
|
@jasnell that's why i wrote the Buffer and not the attached array buffer. Messing around the attached array buffer will have side effects and security issues all over the program. The only one that owns the array buffer is Node.js internals for pooled buffers and whoever allocated it for slices of safe ones. |
Btw, conversation moving over to the PR at #33252 |
Closing as the PR #33252 landed |
There is a bit of undefined grey area when passing a
Buffer
instance into astream.Writable
: who has ownership responsibility for theBuffer
?See this example as reference: https://github.com/jasnell/piscina/pull/34/files#diff-123d6328fe4f2579686ee49111e38427
In the example, I create a
stream.Duplex
instance that wraps aMessagePort
. On_write
, I post the givenBuffer
to theMessagePort
and include it in the transfer list to avoid copying the data:This is where it gets fun. In my example, from within a worker thread, I open a pipeline that reads from a file, passes that into a gzip compressor, then out to my custom duplex. What happens is a fun little Abort...
The reason for the Abort is that my custom
Duplex
just transferred ownership of theBuffer
away to the main thread while the gzip compressor was still using it.Replace the gzip compressor with the brotli compressor, and things work! Change the
postMessage()
call to remove the transferList and things work!Obviously the Abort itself is problematic and we need to figure out how to avoid that in the zlib, so that's issue #1.... issue #2 is that we really should try to specify some ownership expectations on
Buffer
instances passed into streams instances./cc @nodejs/streams @ronag @addaleax @nodejs/zlib
The text was updated successfully, but these errors were encountered: