Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "tls: add highWaterMark option for connect" #33387

Closed
wants to merge 2 commits into from

Conversation

puzpuzpuz
Copy link
Member

  1. Reverts commit 58682d8

As it was discussed in #33262, TLS assumes 16KB as the maximum record size, so it makes no sense to expose hwm option in tls.connect(). Even if we do so, it's also a problem as connections may be reused in https (see #30107 (comment)).

  1. Adds a note on applicability of highWaterMark for tls.TLSSocket (though, I'm not sure if that's the right place for this note).
Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@puzpuzpuz puzpuzpuz added tls Issues and PRs related to the tls subsystem. doc Issues and PRs related to the documentations. labels May 13, 2020
@richardlau
Copy link
Member

Can we revert this (as opposed to (doc?) deprecate) as #32786 went out in 13.4.0 and 14.1.0?

doc/api/https.md Outdated Show resolved Hide resolved
doc/api/tls.md Outdated
@@ -650,6 +650,10 @@ Methods that return TLS connection metadata (e.g.
[`tls.TLSSocket.getPeerCertificate()`][] will only return data while the
connection is open.

The `highWaterMark` parameter of the duplex [Stream][] makes no sense for
`tls.TLSSocket`, because the TLS protocol assumes 16KB as the maximum record
size.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to move this comment down below and to mention that it's not supported by extending the following entry * ...: Any [`socket.connect()`][] option not already listed..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved it. Please check the update.

@puzpuzpuz
Copy link
Member Author

Can we revert this (as opposed to (doc?) deprecate) as #32786 went out in 13.4.0 and 14.1.0?

@richardlau that's a good question. I don't think anyone started using this option already. I could keep entries in changes sections and add a new one for the removal. This will help to avoid confusion when users see an option that silently disappeared. WDYT?

@richardlau
Copy link
Member

Can we revert this (as opposed to (doc?) deprecate) as #32786 went out in 13.4.0 and 14.1.0?

@richardlau that's a good question. I don't think anyone started using this option already. I could keep entries in changes sections and add a new one for the removal. This will help to avoid confusion when users see an option that silently disappeared. WDYT?

I think I'd be okay with that. The code will just ignore the option in the case anyone is setting it, correct?

Also restores change entries for `highWaterMark` option
in `tls` and `https` docs.
@puzpuzpuz
Copy link
Member Author

I think I'd be okay with that. The code will just ignore the option in the case anyone is setting it, correct?

Yes, that's right. Also that option wasn't working as expected when set (see #33262), so this PR shouldn't break any user code.

@addaleax
Copy link
Member

@benjamingr Is bringing this over in the issue thread, and it probably makes sense to discuss it there, but what does the highWaterMark option have to do with the TLS record size? I don’t really see how they are related, conceptually. The highWaterMark option is telling us how much data to buffer at most in order to reduce latency at the cost of extra memory – but it doesn’t matter whether it takes 1 or 2 or 20 TLS frames to write that, does it?

Copy link
Member

@addaleax addaleax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment above – I’m not opposed but we should be clear about why it makes sense to remove this.

@puzpuzpuz
Copy link
Member Author

The highWaterMark option is telling us how much data to buffer at most in order to reduce latency at the cost of extra memory – but it doesn’t matter whether it takes 1 or 2 or 20 TLS frames to write that, does it?

@addaleax as it was discussed in #33262, such buffering does not happen in tls for some reasons. So, we either need to revert the change, or to fix tls. I came up with the revert as the fastest option, but the fix is certainly more valuable for users. So, I'm going to check what it takes to fix the option.

@addaleax
Copy link
Member

@puzpuzpuz This test passes for me:

Test in the fold
'use strict';
const common = require('../common');
if (!common.hasCrypto)
  common.skip('missing crypto');

const assert = require('assert');
const tls = require('tls');
const { PassThrough } = require('stream');
const fixtures = require('../common/fixtures');

const pem = (n) => fixtures.readKey(`${n}.pem`);

const server = tls.createServer({
  key: pem('agent1-key'),
  cert: pem('agent1-cert')
}, common.mustCall((client) => {
  // Do not read from the client here.
  client.destroy();
  server.close();
}));

server.listen(0, common.mustCall(() => {
  const stream = tls.connect({
    port: server.address().port,
    rejectUnauthorized: false,
    highWaterMark: 128000,
  }, common.mustCall(() => {
    const pt = new PassThrough();
    pt.pipe(stream);
    for (let i = 0; i < 256; i++)
      pt.write('a'.repeat(1000));
    process.nextTick(() => {
      assert.strictEqual(stream.writableLength, 128000);
    });
  }));
  stream.on('error', () => {});
}));

So I’d take that as proof that the HWM option does work?

@benjamingr
Copy link
Member

@addaleax

but it doesn’t matter whether it takes 1 or 2 or 20 TLS frames to write that, does it?

That has been my understanding as well. I agree we shouldn't remove highWatermark support from tls just because it doesn't work with HTTPs.

In general setting a highWatermark in one part of a stream doesn't ensure that backpressure will be correctly respected throughout the different transformations of the stream.

(A semi related interesting side note is the time when David Fowler who built the new .net web framework raised this issue of buffering streams when discussing I/O interfaces in Deno a while ago denoland/deno#387 (comment) to denoland/deno#387 (comment) )

@puzpuzpuz I'm sorry but I'm still confused:

as it was discussed in #33262, such buffering does not happen in tls for some reasons.

In #33262 (comment) @bnoordhuis said:

I don't think readableHighWaterMark can be made meaningful for TLS/HTTPS because that 16k will still be sitting in a buffer somewhere.

There is a difference between "doesn't currently work" and "doesn't conceptually make sense". I'm not sure why having 16k sitting in a buffer somewhere mean Node.js can't buffer multiples of 16K with higher highWaterMark.

@puzpuzpuz
Copy link
Member Author

puzpuzpuz commented May 13, 2020

@puzpuzpuz This test passes for me:
...
So I’d take that as proof that the HWM option does work?

Not really or, to be more precise, it works partially. The original discussion was about readableHighWaterMark part of HWM, so the test should roughly look like the following one.

Test in the fold
'use strict';
const common = require('../common');
if (!common.hasCrypto)
  common.skip('missing crypto');

const assert = require('assert');
const tls = require('tls');
const { Buffer } = require('buffer');
const fixtures = require('../common/fixtures');

const pem = (n) => fixtures.readKey(`${n}.pem`);

const hwm = 16 * 1024; // this value works, while 128000 does not

const server = tls.createServer({
  key: pem('agent1-key'),
  cert: pem('agent1-cert')
}, common.mustCall((socket) => {
  socket.write(Buffer.allocUnsafe(hwm), common.mustCall(() => {
    socket.destroy();
    server.close();
  }));
}));

server.listen(0, common.mustCall(() => {
  const socket = tls.connect({
    port: server.address().port,
    rejectUnauthorized: false,
    highWaterMark: hwm,
  }, common.mustCall(() => {}));
  socket.on('data', (data) => {
    assert.strictEqual(data.length, hwm);
  });
  socket.on('error', () => {});
}));

@puzpuzpuz
Copy link
Member Author

@puzpuzpuz I'm sorry but I'm still confused:

as it was discussed in #33262, such buffering does not happen in tls for some reasons.

In #33262 (comment) @bnoordhuis said:

I don't think readableHighWaterMark can be made meaningful for TLS/HTTPS because that 16k will still be sitting in a buffer somewhere.

There is a difference between "doesn't currently work" and "doesn't conceptually make sense". I'm not sure why having 16k sitting in a buffer somewhere mean Node.js can't buffer multiples of 16K with higher highWaterMark.

@benjamingr I wasn't saying "doesn't conceptually make sense", but "doesn't currently work" (and it doesn't work as expected - see #33387 (comment)). Could you specify what exactly confuses you?

@addaleax
Copy link
Member

@puzpuzpuz But that’s not what the HWM indicates – it’s not the size of the chunks in which data is read, it is the limit up to which data is read before a 'data' handler or .read() call consumes it.

@puzpuzpuz
Copy link
Member Author

@puzpuzpuz But that’s not what the HWM indicates – it’s not the size of the chunks in which data is read, it is the limit up to which data is read before a 'data' handler or .read() call consumes it.

@addaleax then my understanding wasn't correct. Let me double check how HWM works. Thanks for the clarification.

@benjamingr
Copy link
Member

@puzpuzpuz the goal of watermarks is to enable steaming and save I/O at the cost of memory. As an example see this diagram from the Chromium wiki that describes how Chrome downloads MP4 files:

image

In your example setting the highWatermark to 128k only changes the amount of buffering not the size of each 'data'.

@puzpuzpuz
Copy link
Member Author

@addaleax then my understanding wasn't correct. Let me double check how HWM works. Thanks for the clarification.

@addaleax after checking the documentation it seems that my understanding was correct, i.e. HWM sets the up to limit for a readable stream. Considering this, I agree that the option in tls works correctly by the definition of HWM.

Yet, I'm curious about the 16KB limitation. Do you happen to know where it comes from? TLS record size may be the answer, but as discussed here multiple records could be buffered when reading from the socket.

@bnoordhuis
Copy link
Member

I'm basing my comment in the other issue on this line from doc/api/stream.md:

The amount of data potentially buffered depends on the highWaterMark option

In the context of receiving data (sending might a different matter), a HWM < 16k is meaningless because it's effectively ignored (because TLS frame size.)

HWM > 16k is really HWM + 16k because the TLS frame record buffer isn't under control of JS land. Any HWM buffering is on top of that frame record buffer.

@benjamingr
Copy link
Member

, a HWM < 16k is meaningless because it's effectively ignored (because TLS frame size.)

That makes sense, I'm +1 on at least documenting HWM < 16K not being supported.

Any HWM buffering is on top of that frame record buffer.

Right, that is my expectation and why I'd use HWM - why is there an issue with hwm > 16K?

@puzpuzpuz
Copy link
Member Author

That makes sense, I'm +1 on at least documenting HWM < 16K not being supported.

I'll close this PR and submit another one with the doc change. But I'm also curious about the reason behind this restriction.

@benjamingr
Copy link
Member

But I'm also curious about the reason behind this restriction.

What restriction in particular? HWM must be at least 16K because buffering less than that in TLS doesn't make much sense because any size smaller than one frame record can't be understood without the rest of the frame.

I don't think it has to be at most 16Kb, I think Anna's test shows it (with the support already added) can be more than 16Kb and be multiple frames.

We can discuss another property that lets users control the size of each chunk emitted (separate from hwm), that would presumably have to be multiples of 16kb though I'm not sure we'd even want to support customizing that on the stream (as it's an implementation detail).

@puzpuzpuz
Copy link
Member Author

What restriction in particular?

The "I'd use HWM - why is there an issue with hwm > 16K?" one from #33387 (comment)

@puzpuzpuz
Copy link
Member Author

I don't think it has to be at most 16Kb, I think Anna's test shows it (with the support already added) can be more than 16Kb and be multiple frames.

The test shows that the option applies to writableHighWaterMark in the underlying TLS stream. And if I'm not mistaken the latest messages consider readableHighWaterMark behavior.

@bnoordhuis
Copy link
Member

why is there an issue with hwm > 16K?

It's not a reliable measure if you use it to limit memory usage.

Say I have M bytes of memory. Most people probably expect S=M/HWM, where S is the number of TLS streams that can exist simultaneously.

But it doesn't work that way. The real formula is S=M/(HWM+X), where X is the additional overhead of the TLS stream: frame record buffer, keying material, etc.

X can be sizable (an extra 40 or 50k is not exceptional) and can vary over time. Maybe I set HWM=32k but the actual memory footprint might well be double that.

@benjamingr
Copy link
Member

I always thought of hwm like one would a literal watermark, it's the amount of data I have where I stop asking for more. It was never the absolute maximum amount of data I can hold in a stream. It's only the amount of data I stop asking for new data at.

There is no expectation on my part that setting the hwm somehow limits how much memory my program could consime. If you believe users have such an expectation we should probably document the fact it is not possible to calculate the number of streams a server can support with a given amount of memory based on hwm

@bnoordhuis
Copy link
Member

I asked a few people and if what you're saying is the right way to think about HWM, then our docs are supremely terrible at conveying that because no one guessed it correctly (including me.)

From stream.md:

Once the total size of the internal read buffer reaches the threshold specified
by `highWaterMark`, the stream will temporarily stop reading data from the
underlying resource until the data currently buffered can be consumed (that is,
the stream will stop calling the internal `readable._read()` method that is
used to fill the read buffer).

My, ah, proof readers all thought that means it's a memory usage cap. And yes, I was careful to avoid leading questions. :-)

@jasnell
Copy link
Member

jasnell commented May 15, 2020

@benjamingr :

I always thought of hwm like one would a literal watermark, it's the amount of data I have where I stop asking for more. It was never the absolute maximum amount of data I can hold in a stream. It's only the amount of data I stop asking for new data at.

This is correct. Once the highwatermark is hit the object will still happily keep buffering more but will keep telling you to stop. Specific stream implementations could choose to enforce stricter limits but doing so is optional. Specifically, it's a threshold, not a limit.

@benjamingr
Copy link
Member

@jasnell @bnoordhuis I opened a pull request partially based on James's comment and my previous one for the docs. #33432

Feel free to make changes/suggestions directly or if you believe we should make a more substantial change in the docs

@BridgeAR
Copy link
Member

I am going to close this due to the discussion above. Please reopen in case someone would rather keep this open. I think it might be a good idea to open a PR that adds the test mentioned above by @addaleax

@BridgeAR BridgeAR closed this May 23, 2020
@puzpuzpuz
Copy link
Member Author

@BridgeAR thanks for closing this PR. I wasn't able to follow the discussion for some time and after reading the updates I agree with all points.

@benjamingr thanks for opening the doc enhancement PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Issues and PRs related to the documentations. tls Issues and PRs related to the tls subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants