Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readline: add support for async iteration #23916

Closed
wants to merge 2 commits into from

Conversation

TimothyGu
Copy link
Member

Rewritten version of #18904, using more existing streams mechanisms.

Depends on #23901 for some of the edge case tests (relevant commits included within this PR).

Co-authored-by: Ivan Filenko ivan.filenko@protonmail.com
Fixes: #18603
Refs: #18904

/cc @mcollina @devsnek @prog1dev

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added doc Issues and PRs related to the documentations. readline Issues and PRs related to the built-in readline module. tools Issues and PRs related to the tools directory. labels Oct 27, 2018
@TimothyGu
Copy link
Member Author

doc/api/readline.md Outdated Show resolved Hide resolved
@vsemozhetbyt
Copy link
Contributor

Should we benchmark this comparatively with the 'line' event way and maybe add a note if this currently proves to be significantly slower? (Refs: #23032 (comment))

@vsemozhetbyt vsemozhetbyt added the promises Issues and PRs related to ECMAScript promises. label Oct 27, 2018
Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work!

};
this.on('line', lineListener);
this.on('close', closeListener);
this[kLineObjectStream] = readable;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense expose this as a separate method? converting to a stream might be an issue for multiple people.

Copy link
Member Author

@TimothyGu TimothyGu Nov 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider this an implementation detail of @@asyncIterator method. A major reason of why the performance of this method isn't up to par to 'line' event, as you have noted in #23916 (comment), is because of the double buffering necessitated by the intermediate stream, so I'd rather not expose the stream at the moment.

@vsemozhetbyt
Copy link
Contributor

I've compiled the branch and tested with 1 GB file (22 514 395 lines):

Scripts and results:
'use strict';

const fs = require('fs');
const readline = require('readline');

let counter = 0;
let dummy;

console.time('event');

const rl = readline.createInterface({
  input: fs.createReadStream('big-file.txt', 'utf8'),
  crlfDelay: Infinity,
});

rl.on('line', (line) => {
  counter++;
  dummy = line;
}).on('close', () => {
  console.timeEnd('event');
  console.log(`Lines: ${counter}, last line length: ${dummy.length}`);
});
event: 15763.332ms
Lines: 22514395, last line length: 1
'use strict';

const fs = require('fs');
const readline = require('readline');

(async function main() {
  let counter = 0;
  let dummy;

  console.time('asyncIterator');

  const rl = readline.createInterface({
    input: fs.createReadStream('big-file.txt', 'utf8'),
    crlfDelay: Infinity,
  });

  for await (const line of rl) {
    counter++;
    dummy = line;
  }

  console.timeEnd('asyncIterator');
  console.log(`Lines: ${counter}, last line length: ${dummy.length}`);
})();
asyncIterator: 44865.388ms
Lines: 22514395, last line length: 1

So event implementation currently 3 times as fast as async iterator implementation. Maybe we should warn about this.

@mcollina
Copy link
Member

Can you check reading the same file with fs.createReadStream()  and iterating? I think our last check was about 2 times slower. Considering that we are doing double-buffering, a 3x factor seems ok to me. I don't know if we can improve this at all, a good part of that is due to async iteration overhead.

@vsemozhetbyt
Copy link
Contributor

vsemozhetbyt commented Oct 28, 2018

@mcollina Do you mean to compare async iterating over unsplit chunks vs async iterating over split lines? If so, I have ~ 4x factor:

Scripts and results:
const fs = require('fs');

(async function main() {
  let counter = 0;
  let dummy;

  console.time('asyncIteratorChunks');

  for await (const chunk of fs.createReadStream('big-file.txt', 'utf8')) {
    counter++;
    dummy = chunk;
  }

  console.timeEnd('asyncIteratorChunks');
  console.log(`Chunks: ${counter}, last chunk length: ${dummy.length}`);
})();
asyncIteratorChunks: 10048.069ms
Chunks: 15811, last chunk length: 22280
'use strict';

const fs = require('fs');
const readline = require('readline');

(async function main() {
  let counter = 0;
  let dummy;

  console.time('asyncIteratorLines');

  const rl = readline.createInterface({
    input: fs.createReadStream('big-file.txt', 'utf8'),
    crlfDelay: Infinity,
  });

  for await (const line of rl) {
    counter++;
    dummy = line;
  }

  console.timeEnd('asyncIteratorLines');
  console.log(`Lines: ${counter}, last line length: ${dummy.length}`);
})();
asyncIteratorLines: 43501.427ms
Lines: 22514395, last line length: 1

@vsemozhetbyt
Copy link
Contributor

vsemozhetbyt commented Oct 28, 2018

@mcollina Or do you mean to compare async iterating over unsplit chunks vs event implementation for unsplit chunks? If so, I have 1:1 factor, i.e. the same speed:

Scripts and results:
'use strict';

const fs = require('fs');

let counter = 0;
let dummy;

console.time('eventChunks');

const readable = fs.createReadStream('big-file.txt', 'utf8');

readable.on('data', (chunk) => {
  counter++;
  dummy = chunk;
}).on('close', () => {
  console.timeEnd('eventChunks');
  console.log(`Chunks: ${counter}, last chunk length: ${dummy.length}`);
});
eventChunks: 9978.033ms
Chunks: 15811, last chunk length: 22280
'use strict';

const fs = require('fs');

(async function main() {
  let counter = 0;
  let dummy;

  console.time('asyncIteratorChunks');

  for await (const chunk of fs.createReadStream('big-file.txt', 'utf8')) {
    counter++;
    dummy = chunk;
  }

  console.timeEnd('asyncIteratorChunks');
  console.log(`Chunks: ${counter}, last chunk length: ${dummy.length}`);
})();
asyncIteratorChunks: 9922.896ms
Chunks: 15811, last chunk length: 22280

@vsemozhetbyt
Copy link
Contributor

vsemozhetbyt commented Oct 28, 2018

#23901 has landed, it seems those commits can be excluded to simplify reviews.

@vsemozhetbyt
Copy link
Contributor

And beware #23929, we may have conflicts.

@mcollina
Copy link
Member

@vsemozhetbyt thanks for those benchmarks, those are quite interesting. Specifically the fact that using stream iteration is now essentially on par with on('data') is fantastic!

Related to 'readline', I think the culprit of the problem is double-buffering. Essentially we would need a custom implementation that does not rely on the streams one if we want to improve it.

TimothyGu and others added 2 commits November 19, 2018 16:38
Co-authored-by: Ivan Filenko <ivan.filenko@protonmail.com>
Fixes: nodejs#18603
Refs: nodejs#18904
@TimothyGu
Copy link
Member Author

@vsemozhetbyt @mcollina I've updated this PR to address the documentation comments. Please take a look.

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vsemozhetbyt
Copy link
Contributor

Docs LGTM. Thank you!

@vsemozhetbyt
Copy link
Contributor

Maybe cc @nodejs/streams ?

@mcollina
Copy link
Member

@devsnek maybe? Anyway this could land because it's older than a week. I'd recommend to wait for 2 days and then land if no one objects.

refack pushed a commit to refack/node that referenced this pull request Jan 14, 2019
Co-authored-by: Ivan Filenko <ivan.filenko@protonmail.com>
Fixes: nodejs#18603
Refs: nodejs#18904
PR-URL: nodejs#23916
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Gus Caplan <me@gus.host>
refack pushed a commit to refack/node that referenced this pull request Jan 14, 2019
Notable Changes:

* console,util:
  * `console` functions now handle symbols as defined in the spec.
    nodejs#23708
  * The inspection `depth` default is now back at 2.
    nodejs#24326
* dgram,net:
  * Added ipv6Only option for `net` and `dgram`.
    nodejs#23798
* http:
  * Chosing between the http parser is now possible per runtime flag.
    nodejs#24739
* readline:
  * The `readline` module now supports async iterators.
    nodejs#23916
* repl:
  * The multiline history feature is removed.
    nodejs#24804
* tls:
  * Added min/max protocol version options.
    nodejs#24405
  * The X.509 public key info now includes the RSA bit size and the
    elliptic curve. nodejs#24358
* url:
  * `pathToFileURL()` now supports LF, CR and TAB.
    nodejs#23720
* Windows:
  * Tools are not installed using Boxstarter anymore.
    nodejs#24677
  * The install-tools scripts or now included in the dist.
    nodejs#24233
* Added new collaborator:
  * [antsmartian](https://github.com/antsmartian) - Anto Aravinth.
    nodejs#24655

PR-URL: nodejs#24854
vsemozhetbyt added a commit that referenced this pull request Mar 8, 2019
PR-URL: #26472
Refs: #23916
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
BridgeAR pushed a commit to BridgeAR/node that referenced this pull request Mar 13, 2019
PR-URL: nodejs#26472
Refs: nodejs#23916
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
BridgeAR pushed a commit that referenced this pull request Mar 14, 2019
PR-URL: #26472
Refs: #23916
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
BethGriggs pushed a commit that referenced this pull request Apr 16, 2019
PR-URL: #26472
Refs: #23916
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
BethGriggs pushed a commit that referenced this pull request Apr 17, 2019
Co-authored-by: Ivan Filenko <ivan.filenko@protonmail.com>
Fixes: #18603
Refs: #18904
PR-URL: #23916
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Gus Caplan <me@gus.host>
BethGriggs pushed a commit that referenced this pull request Apr 28, 2019
Co-authored-by: Ivan Filenko <ivan.filenko@protonmail.com>
Fixes: #18603
Refs: #18904
PR-URL: #23916
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Gus Caplan <me@gus.host>
@BethGriggs BethGriggs mentioned this pull request May 1, 2019
MylesBorins pushed a commit that referenced this pull request May 16, 2019
Co-authored-by: Ivan Filenko <ivan.filenko@protonmail.com>
Fixes: #18603
Refs: #18904
PR-URL: #23916
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: Gus Caplan <me@gus.host>
const TOTAL_LINES = 18;

(async () => {
const readable = new Readable({ read() {} });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this now (to solve another bug) I think this backpressure behaviour is confusing since other consumers can listen to line on the stream and it's surprising we pause() it for them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. doc Issues and PRs related to the documentations. promises Issues and PRs related to ECMAScript promises. readline Issues and PRs related to the built-in readline module. semver-minor PRs that contain new features and should be released in the next minor version. tools Issues and PRs related to the tools directory.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants