-
-
Notifications
You must be signed in to change notification settings - Fork 47
Parser
This is the workhorse of the package. It is a Transform stream, which consumes text, and produces a stream of data items corresponding to high-level tokens. It is always the first in a pipe chain being directly fed with a text from a file, a socket, the standard input, or any other text stream.
Its Writable
part operates in a buffer/text mode, while its Readable
part operates in an objectMode.
(Since 1.2.0) Parser
assumes that the input is well-formed, otherwise it produces simple errors. If you want to troubleshoot a file and pinpoint a problem use Verifier.
(Since 1.6.0) If you deal with JSONL, and you know that individual items will fit in memory, take a look at jsonl/Parser. This module can be faster than Parser
.
The simple example (streaming from a file):
const Parser = require('stream-json/Parser');
const parser = new Parser();
const fs = require('fs');
let objectCounter = 0;
parser.on('data', data => data.name === 'startObject' && ++objectCounter);
parser.on('end', () => console.log(`Found ${objectCounter} objects.`));
fs.createReadStream('sample.json').pipe(parser);
The alternative example:
const {parser} = require('stream-json/Parser');
const fs = require('fs');
const pipeline = fs.createReadStream('sample.json').pipe(parser());
let objectCounter = 0;
pipeline.on('data', data => data.name === 'startObject' && ++objectCounter);
pipeline.on('end', () => console.log(`Found ${objectCounter} objects.`));
The module returns the constructor of Parser
. Being a stream Parser
doesn't have any special interfaces. The only thing required is to configure it during construction.
Parser
produces a rigid stream of tokens, which order is strictly defined. It is impossible to get an item out of sequence. All data items (strings, numbers, even object keys) are streamed in chunks and potentially they can be of any size: gigabytes, terabytes, and so on.
In many real cases, while files are huge, individual data items can fit in memory. It is better to work with them as a whole, so they can be inspected. In that case, Parser
can optionally pack items efficiently.
The details of the stream of tokens are described later.
options
is an optional object described in details in node.js' Stream documentation. Additionally, the following custom flags are recognized, which can be truthy or falsy:
-
jsonStreaming
controls the parsing algorithm. If truthy, a stream of JSON objects is parsed as described in JSON Streaming as "Concatenated JSON". Technically it will recognize "Line delimited JSON" AKA "JSON Lines" AKA JSONL as well. Otherwise, it will follow the JSON standard assuming a singular value. The default:false
.- It allows streaming any number of values one after another.
- It handles empty streams producing no values.
- (Since 1.6.0) If you deal with JSONL, you may want to use jsonl/Parser to improve the performance.
- It allows streaming any number of values one after another.
- Packing options control packing values. They have no default values.
-
packValues
serves as the initial value for packing strings, numbers, and keys. -
packKeys
specifies, if we need to pack keys and send them as a value. -
packStrings
specifies, if we need to pack strings and send them as a value. -
packNumbers
specifies, if we need to pack numbers and send them as a value. - More details in the section below.
-
- Streaming options control sending unpacked values. They have no default values.
-
streamValues
serves as the initial value for other three options described above. -
streamKeys
specifies, if we need to send items related to unpacked keys. -
streamStrings
specifies, if we need to send items related to unpacked strings. -
streamNumbers
specifies, if we need to send items related to unpacked numbers. - More details in the section below.
-
By default, Parser
follows a strict JSON format, streams all values by chunks and individual (packed) values.
This is the list of data objects produced by Parser
in the correct order:
// a sequence can have 0 or more items
// a value is one of: object, array, string, number, null, true, false
// a parser produces a sequence of values
// object
{name: 'startObject'};
// sequence of object properties: key, then value
{name: 'endObject'};
// array
{name: 'startArray'};
// sequence of values
{name: 'endArray'};
// key
{name: 'startKey'};
// sequence of string chunks:
{name: 'stringChunk', value: 'string value chunk'};
{name: 'endKey'};
// when packing:
{name: 'keyValue', value: 'key value'};
// string
{name: 'startString'};
// sequence of string chunks:
{name: 'stringChunk', value: 'string value chunk'};
{name: 'endString'};
// when packing:
{name: 'stringValue', value: 'string value'};
// number
{name: 'startNumber'};
// sequence of number chunks (as strings):
{name: 'numberChunk', value: 'string value chunk'};
{name: 'endNumber'};
// when packing:
{name: 'numberValue', value: 'string value'};
// null, true, false
{name: 'nullValue', value: null};
{name: 'trueValue', value: true};
{name: 'falseValue', value: false};
All value chunks (stringChunk
and numberChunk
) should be concatenated in order to produce a final value. Empty string values may have no chunks. String chunks may have empty values.
Important: values of numberChunk
and numberValue
are strings, not numbers. It is up to a downstream code to convert it to a number using parseInt(x)
, parseFloat(x)
or simply x => +x
.
All items follow in the correct order. If something is going wrong, a parser will produce an error
event. For example:
- All
startXXX
are balanced withendXXX
. - Between
startKey
andendKey
can be zero or morestringChunk
items. No other items can be seen. - After
startObject
optional key-value pairs emitted in a strict pattern: a key-related item, then a value, and this cycle can be continued until all key-value pairs are streamed.- It is not possible for a key to be missing a value.
- All
endObject
are balanced with the correspondingstartObject
. -
endObject
cannot closestartArray
. - Between
startString
andendString
can go 0 or morestringChunk
, but no other items. -
endKey
can be optionally followed bykeyValue
, then a new value will be started, but noendObject
.
In short, the item sequence is always correctly formed. No need to do unnecessary checks.
Parser
packs keys, strings, and numbers separately. A frequent case when it is known that key and number values can fit in memory, but strings cannot.
Internally each type of value is controlled by a flag:
- By default, this flag is
true
. - If
packValues
is set, it is assigned to each flag. - If an individual option is set, it is assigned to the flag.
Examples:
Supplied options | packKeys |
packStrings |
packNumbers |
---|---|---|---|
{} |
true |
true |
true |
{packValues: false} |
false |
false |
false |
{packValues: false, packKeys: true} |
true |
false |
false |
{packKeys: true, packValues: false} |
true |
false |
false |
{packStrings: false} |
true |
false |
true |
{packKeys: true, packStrings: false, packNumbers: true} |
true |
false |
true |
Parser
can optionally skip streaming keys, strings, and/or numbers for optimization purposes, if a corresponding packing option is enabled. It means that only three configurations are supported for values and keys:
- The default:
startXXX
, 0 or morestringChunk
(numberChunk
for numbers),endXXX
,xxxValue
. -
packXXX
isfalse
:startXXX
, 0 or morestringChunk
(numberChunk
for numbers),endXXX
. -
packXXX
istrue
,streamXXX
isfalse
:xxxValue
.
Internally each type of value is controlled by a flag:
- By default, this flag is
true
. - If
streamValues
is set, it is assigned to each flag. - If an individual option is set, it is assigned to the flag.
- If a corresponding packing option is
false
, it is set totrue
.
Examples:
Supplied options | streamKeys |
streamStrings |
streamNumbers |
---|---|---|---|
{} |
true |
true |
true |
{packValues: true, streamValues: false} |
false |
false |
false |
{packKeys: true, streamKeys: false} |
false |
true |
true |
{packKeys: false, streamKeys: false} |
true |
true |
true |
{packValues: true, streamValues: false, streamKeys: true} |
true |
false |
false |
{streamStrings: false} |
true |
true |
true |
{packKeys: true, streamKeys: false, streamStrings: false, streamNumbers: true} |
false |
true |
true |
make()
and parser()
are two aliases of the factory function. It takes options
described above, and return a new instance of Parser
. parser()
helps to reduce a boilerplate when creating data processing pipelines:
const {chain} = require('stream-chain');
const {parser} = require('stream-json/Parser');
const fs = require('fs');
const pipeline = chain([
fs.createReadStream('sample.json'),
parser()
]);
let objectCounter = 0;
pipeline.on('data', data => data.name === 'startObject' && ++objectCounter);
pipeline.on('end', () => console.log(`Found ${objectCounter} objects.`));
Constructor
property of make()
(and parser()
) is set to Parser
. It can be used for indirect creating of parsers or metaprogramming if needed.