Skip to content

Commit

Permalink
Add streamingProfile flag that is disabled by default
Browse files Browse the repository at this point in the history
  • Loading branch information
rubensworks committed Mar 23, 2020
1 parent 42c2f7e commit 410a8b6
Show file tree
Hide file tree
Showing 7 changed files with 86 additions and 79 deletions.
44 changes: 25 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,14 @@
[![Coverage Status](https://coveralls.io/repos/github/rubensworks/jsonld-streaming-parser.js/badge.svg?branch=master)](https://coveralls.io/github/rubensworks/jsonld-streaming-parser.js?branch=master)
[![npm version](https://badge.fury.io/js/jsonld-streaming-parser.svg)](https://www.npmjs.com/package/jsonld-streaming-parser)

A fast and lightweight _streaming_ and 100% _spec-compliant_ [JSON-LD](https://json-ld.org/) parser,
A fast and lightweight _streaming_ and 100% _spec-compliant_ [JSON-LD 1.1](https://json-ld.org/) parser,
with [RDFJS](https://github.com/rdfjs/representation-task-force/) representations of RDF terms, quads and triples.

The streaming nature allows triples to be emitted _as soon as possible_, and documents _larger than memory_ to be parsed.

Make sure to enable the `streamingProfile` flag when parsing a JSON-LD document with a streaming profile
to exploit the streaming capabilities of this parser, as this is disabled by default.

## Installation

```bash
Expand Down Expand Up @@ -118,7 +121,7 @@ Optionally, the following parameters can be set in the `JsonLdParser` constructo
* `dataFactory`: A custom [RDFJS DataFactory](http://rdf.js.org/#datafactory-interface) to construct terms and triples. _(Default: `require('@rdfjs/data-model')`)_
* `context`: An optional root context to use while parsing. This can by anything that is accepted by [jsonld-context-parser](https://github.com/rubensworks/jsonld-context-parser.js), such as a URL, object or array. _(Default: `{}`)_
* `baseIRI`: An initial default base IRI. _(Default: `''`)_
* `allowOutOfOrderContext`: If @context definitions should be allowed as non-first object entries. When enabled, streaming results may not come as soon as possible, and will be buffered until the end when no context is defined at all. _(Default: `false`)_
* `streamingProfile`: If this parser can assume that parsed documents follow the streaming JSON-LD profile. If true, and a non-streaming document is detected, an error may be thrown. If false, non-streaming documents will be handled by preemptively buffering entries, which will lose many of the streaming benefits of this parser. _(Default: `true`)_
* `documentLoader` A custom loader for fetching remote contexts. This can be set to anything that implements [`IDocumentLoader`](https://github.com/rubensworks/jsonld-context-parser.js/blob/master/lib/IDocumentLoader.ts) _(Default: [`FetchDocumentLoader`](https://github.com/rubensworks/jsonld-context-parser.js/blob/master/lib/FetchDocumentLoader.ts))_
* `produceGeneralizedRdf`: If blank node predicates should be allowed, they will be ignored otherwise. _(Default: `false`)_
* `processingMode`: The maximum JSON-LD version that should be processable by this parser. _(Default: `1.0`)_
Expand All @@ -134,7 +137,7 @@ new JsonLdParser({
dataFactory: require('@rdfjs/data-model'),
context: 'https://schema.org/',
baseIRI: 'http://example.org/',
allowOutOfOrderContext: false,
streamingProfile: true,
documentLoader: new FetchDocumentLoader(),
produceGeneralizedRdf: false,
processingMode: '1.0',
Expand Down Expand Up @@ -202,27 +205,30 @@ For example:
As such, JSON-LD documents that meet these requirements will be parsed very efficiently.
Other documents will still be parsed correctly as well, with a slightly lower efficiency.

## Specification Compliance
## Streaming Profile

By default, this parser is not 100% spec-compliant.
The main reason for this being the fact that this is a _streaming_ parser,
and some edge-cases are really inefficient with the streaming-nature of this parser.
This parser adheres to both the [JSON-LD 1.1](https://www.w3.org/TR/json-ld/) specification
and the [JSON-LD 1.1 Streaming specification](https://w3c.github.io/json-ld-streaming/).

However, by changing a couple of settings, it can easily be made **fully spec-compliant**.
The downside of this is that the whole document will essentially be loaded in memory before results are emitted,
which will void the main benefit of this parser.
By default, this parser assumes that JSON-LD document
are *not* in the [streaming document form](https://w3c.github.io/json-ld-streaming/#streaming-document-form).
This means that the parser may buffer large parts of the document before quads are produced,
to make sure that the document is interpreted correctly.

```javascript
const mySpecCompliantParser = new JsonLdParser({
allowOutOfOrderContext: true,
validateValueIndexes: true,
});
```
Since this buffering neglects the streaming benefits of this parser,
the `streamingProfile` flag *should* be enabled when a [streaming JSON-LD document](https://w3c.github.io/json-ld-streaming/#streaming-document-form)
is being parsed.

If non-streaming JSON-LD documents are encountered when the `streamingProfile` flag is enabled,
an error may be thrown.

## Specification compliance

Concretely, this parser implements the following [JSON-LD specifications](https://json-ld.org/test-suite/):
This parser implements the following [JSON-LD specifications](https://json-ld.org/test-suite/):

* JSON-LD 1.0 - Transform JSON-LD to RDF
* JSON-LD 1.0 - Error handling
* JSON-LD 1.1 - Transform JSON-LD to RDF
* JSON-LD 1.1 - Error handling
* JSON-LD 1.1 - Streaming Transform JSON-LD to RDF

## Performance

Expand Down
27 changes: 14 additions & 13 deletions lib/JsonLdParser.ts
Original file line number Diff line number Diff line change
Expand Up @@ -46,11 +46,11 @@ export class JsonLdParser extends Transform {
private readonly util: Util;

private readonly jsonParser: any;
// Jobs that are not started yet that process a @context (only used if allowOutOfOrderContext is true)
// Jobs that are not started yet that process a @context (only used if streamingProfile is false)
private readonly contextJobs: (() => Promise<void>)[][];
// Jobs that are not started yet that process a @type (only used if allowOutOfOrderContext is true)
// Jobs that are not started yet that process a @type (only used if streamingProfile is false)
private readonly typeJobs: { job: () => Promise<void>, keys: string[] }[];
// Jobs that are not started yet because of a missing @context or @type (only used if allowOutOfOrderContext is true)
// Jobs that are not started yet because of a missing @context or @type (only used if streamingProfile is false)
private readonly contextAwaitingJobs: { job: () => Promise<void>, keys: string[] }[];

// The last depth that was processed.
Expand Down Expand Up @@ -344,7 +344,7 @@ export class JsonLdParser extends Transform {

if (!this.isParsingContextInner(depth)) { // Don't parse inner nodes inside @context
const valueJobCb = () => this.newOnValueJob(keys, value, depth, true);
if (this.parsingContext.allowOutOfOrderContext
if (!this.parsingContext.streamingProfile
&& !this.parsingContext.contextTree.getContext(keys.slice(0, -1))) {
// If an out-of-order context is allowed,
// we have to buffer everything.
Expand All @@ -370,7 +370,7 @@ export class JsonLdParser extends Transform {
}

// Execute all buffered jobs on deeper levels
if (this.parsingContext.allowOutOfOrderContext && depth === 0) {
if (!this.parsingContext.streamingProfile && depth === 0) {
this.lastOnValueJob = this.lastOnValueJob
.then(() => this.executeBufferedJobs());
}
Expand Down Expand Up @@ -466,16 +466,17 @@ export interface IJsonLdParserOptions {
*/
baseIRI?: string;
/**
* If @context definitions should be allowed as non-first object entries,
* and @type definitions not as next next entries.
* When enabled, streaming results may not come as soon as possible,
* and will be buffered until the end when no context/type is defined at all.
* Defaults to false.
* If this parser can assume that parsed documents follow the streaming JSON-LD profile.
* If true, and a non-streaming document is detected, an error may be thrown.
* If false, non-streaming documents will be handled by preemptively buffering entries,
* which will lose many of the streaming benefits of this parser.
*
* Spec-compliance: to be fully spec-compliant,
* this must be explicitly set to true.
* Concretely, if true, @context definitions must come as first object entries,
* followed by @type (if they define a type-scoped context).
*
* Defaults to false for spec-compliance.
*/
allowOutOfOrderContext?: boolean;
streamingProfile?: boolean;
/**
* Loader for remote contexts.
*/
Expand Down
4 changes: 2 additions & 2 deletions lib/ParsingContext.ts
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ export class ParsingContext {
};

public readonly contextParser: ContextParser;
public readonly allowOutOfOrderContext: boolean;
public readonly streamingProfile: boolean;
public readonly baseIRI?: string;
public readonly produceGeneralizedRdf: boolean;
public readonly allowSubjectList: boolean;
Expand Down Expand Up @@ -78,7 +78,7 @@ export class ParsingContext {
constructor(options: IParsingContextOptions) {
// Initialize settings
this.contextParser = new ContextParser({ documentLoader: options.documentLoader });
this.allowOutOfOrderContext = !!options.allowOutOfOrderContext;
this.streamingProfile = !!options.streamingProfile;
this.baseIRI = options.baseIRI;
this.produceGeneralizedRdf = !!options.produceGeneralizedRdf;
this.allowSubjectList = !!options.allowSubjectList;
Expand Down
6 changes: 3 additions & 3 deletions lib/entryhandler/keyword/EntryHandlerKeywordContext.ts
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ export class EntryHandlerKeywordContext extends EntryHandlerKeyword {
public async handle(parsingContext: ParsingContext, util: Util, key: any, keys: any[], value: any, depth: number)
: Promise<any> {
// Error if an out-of-order context was found when support is not enabled.
if (!parsingContext.allowOutOfOrderContext && parsingContext.processingStack[depth]) {
parsingContext.emitError(new Error('Found an out-of-order context, while support is not enabled.' +
'(enable with `allowOutOfOrderContext`)'));
if (parsingContext.streamingProfile && parsingContext.processingStack[depth]) {
parsingContext.emitError(new Error('Found an out-of-order context, while streaming is enabled.' +
'(disable `streamingProfile`)'));
}

// Find the parent context to inherit from.
Expand Down
7 changes: 4 additions & 3 deletions lib/entryhandler/keyword/EntryHandlerKeywordType.ts
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,11 @@ export class EntryHandlerKeywordType extends EntryHandlerKeyword {
// If at least least one type-scoped context applies, set them in the tree.
if (hasTypedScopedContext) {
// Error if an out-of-order type-scoped context was found when support is not enabled.
if (!parsingContext.allowOutOfOrderContext
if (parsingContext.streamingProfile
&& (parsingContext.processingStack[depth] || parsingContext.idStack[depth])) {
parsingContext.emitError(new Error('Found an out-of-order type-scoped context, while support is not enabled.' +
'(enable with `allowOutOfOrderContext`)'));
parsingContext.emitError(
new Error('Found an out-of-order type-scoped context, while streaming is enabled.' +
'(disable `streamingProfile`)'));
}

// Do not propagate by default
Expand Down
1 change: 0 additions & 1 deletion spec/parser.js
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ module.exports = {
return require('arrayify-stream')(require('streamify-string')(data)
.pipe(new JsonLdParser(Object.assign({
baseIRI,
allowOutOfOrderContext: true,
validateValueIndexes: true,
normalizeLanguageTags: true, // To simplify testing
}, options))));
Expand Down
Loading

0 comments on commit 410a8b6

Please sign in to comment.