Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bail out when chain detection is expensive + edge case fix #1173

Merged
merged 4 commits into from
Aug 2, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/rdf.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -587,7 +587,7 @@ describe("fromRdfJsDataset", () => {
);
});

it("can parse chained Blank Nodes with a single lnk that end in a dangling Blank Node", () => {
it("can parse chained Blank Nodes with a single link that end in a dangling Blank Node", () => {
const mockDataset: ImmutableDataset = {
type: "Dataset",
graphs: { default: {} },
Expand Down
6 changes: 4 additions & 2 deletions src/rdfjs.internal.ts
Original file line number Diff line number Diff line change
Expand Up @@ -502,9 +502,11 @@ function getCycleBlankNodes(
)
.map((quad) => quad.object as RdfJs.BlankNode);

// If no Blank Nodes are connected to `currentNode`, we're done:
// If no Blank Nodes are connected to `currentNode`, and `currentNode` is not
// part of a cycle, we're done; the currently traversed Nodes do not form a
// cycle:
if (blankNodeObjects.length === 0) {
return traversedBlankNodes;
return [];
}

// Store that we've traversed `currentNode`, then move on to all the Blank
Expand Down
149 changes: 149 additions & 0 deletions src/resource/solidDataset.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,155 @@ describe("responseToSolidDataset", () => {
});
});

it("does not include non-deterministic identifiers when it detects non-cyclic chains of Blank Nodes", async () => {
const turtle = `
@prefix : <#>.
@prefix foaf: <http://xmlns.com/foaf/0.1/>.
@prefix vcard: <http://www.w3.org/2006/vcard/ns#>.
@prefix acl: <http://www.w3.org/ns/auth/acl#>.

<> a foaf:PersonalProfileDocument; foaf:maker :me; foaf:primaryTopic :me.

:me
a foaf:Person;
vcard:fn "Vincent";
acl:trustedApp
[
acl:mode acl:Append, acl:Control, acl:Read, acl:Write;
acl:origin <http://localhost:3000>
],
[
acl:mode acl:Append, acl:Control, acl:Read, acl:Write;
acl:origin <https://penny.vincenttunru.com>
].
`;

const response = new Response(turtle, {
headers: {
"Content-Type": "text/turtle",
},
});
jest
.spyOn(response, "url", "get")
.mockReturnValue("https://some.pod/resource");
const solidDataset = await responseToSolidDataset(response);

expect(solidDataset).toStrictEqual({
graphs: {
default: {
"https://some.pod/resource": {
predicates: {
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type": {
namedNodes: [
"http://xmlns.com/foaf/0.1/PersonalProfileDocument",
],
},
"http://xmlns.com/foaf/0.1/maker": {
namedNodes: ["https://some.pod/resource#me"],
},
"http://xmlns.com/foaf/0.1/primaryTopic": {
namedNodes: ["https://some.pod/resource#me"],
},
},
type: "Subject",
url: "https://some.pod/resource",
},
"https://some.pod/resource#me": {
predicates: {
"http://www.w3.org/1999/02/22-rdf-syntax-ns#type": {
namedNodes: ["http://xmlns.com/foaf/0.1/Person"],
},
"http://www.w3.org/2006/vcard/ns#fn": {
literals: {
"http://www.w3.org/2001/XMLSchema#string": ["Vincent"],
},
},
"http://www.w3.org/ns/auth/acl#trustedApp": {
blankNodes: [
{
"http://www.w3.org/ns/auth/acl#mode": {
namedNodes: [
"http://www.w3.org/ns/auth/acl#Append",
"http://www.w3.org/ns/auth/acl#Control",
"http://www.w3.org/ns/auth/acl#Read",
"http://www.w3.org/ns/auth/acl#Write",
],
},
"http://www.w3.org/ns/auth/acl#origin": {
namedNodes: ["http://localhost:3000"],
},
},
{
"http://www.w3.org/ns/auth/acl#mode": {
namedNodes: [
"http://www.w3.org/ns/auth/acl#Append",
"http://www.w3.org/ns/auth/acl#Control",
"http://www.w3.org/ns/auth/acl#Read",
"http://www.w3.org/ns/auth/acl#Write",
],
},
"http://www.w3.org/ns/auth/acl#origin": {
namedNodes: ["https://penny.vincenttunru.com"],
},
},
],
},
},
type: "Subject",
url: "https://some.pod/resource#me",
},
},
},
internal_resourceInfo: {
contentType: "text/turtle",
isRawData: false,
linkedResources: {},
sourceIri: "https://some.pod/resource",
},
type: "Dataset",
});
});

it("does not attempt to detect chains when there are many Blank Nodes, to avoid performance bottlenecks", async () => {
function getChainedBlankNode(iteration: number): string {
if (iteration === 1000) {
return `<https://some.predicate/${iteration}> "Base case"`;
}
return `<https://some.predicate/${iteration}> [${getChainedBlankNode(
iteration + 1
)}]`;
}
const turtle = `
@prefix : <#>.
@prefix vcard: <http://www.w3.org/2006/vcard/ns#>.

:me vcard:fn [${getChainedBlankNode(0)}].
`;

const response = new Response(turtle, {
headers: {
"Content-Type": "text/turtle",
},
});
jest
.spyOn(response, "url", "get")
.mockReturnValue("https://some.pod/resource");

const t0 = performance.now();
const solidDataset = await responseToSolidDataset(response);
const t1 = performance.now();

// Parsing a document with over 1000 statements will always be somewhat slow
// (hence allowing it to take a second), but if it attempts to detect
// chains, it will take on the order of >20 seconds.
expect(t1 - t0).toBeLessThan(1000);
// Blank Nodes should be listed explicitly, rather than as properties on
// https://some.pod/resource#me:
expect(Object.keys(solidDataset.graphs.default)).not.toStrictEqual([
"https://some.pod/resource#me",
]);
});

it("throws a meaningful error when the server returned a 403", async () => {
const response = new Response("Not allowed", { status: 403 });
jest
Expand Down
12 changes: 10 additions & 2 deletions src/resource/solidDataset.ts
Original file line number Diff line number Diff line change
Expand Up @@ -233,11 +233,19 @@ export async function responseToSolidDataset(
solidDataset = addRdfJsQuadToDataset(solidDataset, quad);
}
});
parser.onComplete(() => {
parser.onComplete(async () => {
// If a Resource contains more than this number of Blank Nodes,
// we consider the detection of chains (O(n^2), I think) to be too
// expensive, and just incorporate them as regular Blank Nodes with
// non-deterministic, ad-hoc identifiers into the SolidDataset:
const maxBlankNodesToDetectChainsFor = 20;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a constant in the constants.ts file ? So far it's only RDF constants, but it feels weird to have a hard-coded limit defined so locally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, since it's really only used locally, i.e. there's just a single mention of this variable - I mostly extracted it to be able to give it a name that explained its purpose, rather than for it to be a magic number. I'm afraid that pulling it out of its context will make it harder to understand what it does, and it's not meant to be re-used elsewhere in the codebase either.

// Some Blank Nodes only serve to use a set of Quads as the Object for a
// single Subject. Those Quads will be added to the SolidDataset when
// their Subject's Blank Node is encountered in the Object position.
const chainBlankNodes = getChainBlankNodes(quadsWithBlankNodes);
const chainBlankNodes =
quadsWithBlankNodes.length <= maxBlankNodesToDetectChainsFor
? getChainBlankNodes(quadsWithBlankNodes)
: [];
const quadsWithoutChainBlankNodeSubjects = quadsWithBlankNodes.filter(
(quad) =>
chainBlankNodes.every(
Expand Down