Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(gatsby): use lmdb.removeSync so getNode can't return deleted nodes #33554

Merged
merged 6 commits into from
Oct 20, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 38 additions & 13 deletions packages/gatsby/src/datastore/lmdb/lmdb-datastore.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import { RootDatabase, open } from "lmdb-store"
import { RootDatabase, open, ArrayLikeIterable } from "lmdb-store"
// import { performance } from "perf_hooks"
import { ActionsUnion, IGatsbyNode } from "../../redux/types"
import { ActionsUnion, IDeleteNodeAction, IGatsbyNode } from "../../redux/types"
import { updateNodes } from "./updates/nodes"
import { updateNodesByType } from "./updates/nodes-by-type"
import { IDataStore, ILmdbDatabases, IQueryResult } from "../types"
Expand All @@ -27,6 +27,8 @@ const lmdbDatastore = {
getNodesByType,
}

const preSyncDeletedNodeIdsCache = new Set()

function getDefaultDbPath(): string {
const dbFileName =
process.env.NODE_ENV === `test`
Expand Down Expand Up @@ -122,10 +124,8 @@ function iterateNodes(): GatsbyIterable<IGatsbyNode> {
return new GatsbyIterable(
nodesDb
.getKeys({ snapshot: false })
.map(
nodeId => (typeof nodeId === `string` ? getNode(nodeId) : undefined)!
)
.filter(Boolean)
.map(nodeId => (typeof nodeId === `string` ? getNode(nodeId) : undefined))
.filter(Boolean) as ArrayLikeIterable<IGatsbyNode>
)
}

Expand All @@ -134,13 +134,22 @@ function iterateNodesByType(type: string): GatsbyIterable<IGatsbyNode> {
return new GatsbyIterable(
nodesByType
.getValues(type)
.map(nodeId => getNode(nodeId)!)
.filter(Boolean)
.map(nodeId => {
if (preSyncDeletedNodeIdsCache.has(nodeId)) {
return undefined
}

return getNode(nodeId)
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems redundant as getNode has the same check now.

Suggested change
.map(nodeId => {
if (preSyncDeletedNodeIdsCache.has(nodeId)) {
return undefined
}
return getNode(nodeId)
})
.map(nodeId =>getNode(nodeId)!)

.filter(Boolean) as ArrayLikeIterable<IGatsbyNode>
)
}

function getNode(id: string): IGatsbyNode | undefined {
if (!id) return undefined
if (!id || preSyncDeletedNodeIdsCache.has(id)) {
return undefined
}

const { nodes } = getDatabases()
return nodes.get(id)
}
Expand All @@ -151,9 +160,11 @@ function getTypes(): Array<string> {

function countNodes(typeName?: string): number {
if (!typeName) {
const stats = getDatabases().nodes.getStats()
// @ts-ignore
return Number(stats.entryCount || 0) // FIXME: add -1 when restoring shared structures key
const stats = getDatabases().nodes.getStats() as { entryCount: number }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

return Math.max(
Number(stats.entryCount) - preSyncDeletedNodeIdsCache.size,
0
) // FIXME: add -1 when restoring shared structures key
}

const { nodesByType } = getDatabases()
Expand Down Expand Up @@ -192,15 +203,29 @@ function updateDataStore(action: ActionsUnion): void {
break
}
case `CREATE_NODE`:
case `DELETE_NODE`:
case `ADD_FIELD_TO_NODE`:
case `ADD_CHILD_NODE_TO_PARENT_NODE`:
case `DELETE_NODE`:
case `MATERIALIZE_PAGE_MODE`: {
const dbs = getDatabases()
lastOperationPromise = Promise.all([
updateNodes(dbs.nodes, action),
updateNodesByType(dbs.nodesByType, action),
])

// if create is used in the same transaction as delete we should remove it from cache
if (action.type === `CREATE_NODE`) {
preSyncDeletedNodeIdsCache.delete(action.payload.id)
}

if (action.type === `DELETE_NODE`) {
preSyncDeletedNodeIdsCache.add(
((action as IDeleteNodeAction).payload as IGatsbyNode).id
Copy link
Contributor

@pieh pieh Oct 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://app.circleci.com/pipelines/github/gatsbyjs/gatsby/76288/workflows/e843318b-e1c5-4cdf-b155-b82bf562e44e/jobs/898140 shows cases where DELETE_NODE payload might be falsy:

error Cannot read property 'id' of undefined


  TypeError: Cannot read property 'id' of undefined
  
  - lmdb-datastore.ts:218 updateDataStore
    [artifacts]/[gatsby]/src/datastore/lmdb/lmdb-datastore.ts:218:66
  
  - lmdb-datastore.ts:259 callback
    [artifacts]/[gatsby]/src/datastore/lmdb/lmdb-datastore.ts:259:7
  
  - mett.ts:50 forEach
    [artifacts]/[gatsby]/src/utils/mett.ts:50:11
  
  - Set.forEach
  
  - mett.ts:49 Object.emit
    [artifacts]/[gatsby]/src/utils/mett.ts:49:17
  
  - index.ts:153 
    [artifacts]/[gatsby]/src/redux/index.ts:153:11

so we should check if payload is truthy before reaching to id. We do checks like that in other places handling DELETE_NODE action - for example in

case `DELETE_NODE`: {
if (action.payload) {
return nodesDb.remove(action.payload.id)
}
return false
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also casting payload as IGatsbyNode pretty much removes information that this might be falsy as our action type does mention it:

export interface IDeleteNodeAction {
type: `DELETE_NODE`
// FIXME: figure out why payload can be undefined here
payload: IGatsbyNode | void
}

I really think we should avoid casts like this. In fact none of casts in this line are needed because after checking for type action already become that specific type in closure below

)
lastOperationPromise.then(() => {
preSyncDeletedNodeIdsCache.clear()
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can lead to a race condition when some DELETE_NODE is called while the transaction is being submitted by lmdb in another thread. Ideally, if clear is pending and DELETE_NODE happens we should cancel the previous clear attempt.

Pretty edge case, but it will be very hard to debug if we actually hit it.

}
}
}
}
Expand Down
1 change: 1 addition & 0 deletions packages/gatsby/src/datastore/lmdb/updates/nodes.ts
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ export function updateNodes(
if (action.payload) {
return nodesDb.remove(action.payload.id)
}

return false
}
case `MATERIALIZE_PAGE_MODE`: {
Expand Down