Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues with named fragment handling code #2619

Merged

Conversation

pcmanus
Copy link
Contributor

@pcmanus pcmanus commented Jun 7, 2023

This PR attempts to fix a number of remaining issues around the code to handle named fragments, namely at least:

  1. the assertion failure reported in v1.20.0: Unexpected query planner behavior with fragments within an interface (Cannot add fragment of condition "..." to parent type "...") router#3227.
  2. the assertion failure reported in v1.20.0: query planning of fragments yields Cannot add fragment of condition "..." to parent type "..."  router#3217 (which is the same assertion failing than in the previous point, but on a slightly different code path and due to a different issue).
  3. an issue whereby named fragments are reused in a rare situation where this is invalid, resulting in an invalid (in the sense of graphQL validations) subgraph fetch.

The 2 first point have repro on the linked tickets, but for the last point, the problem is that the graphQL validation for conflicting fields specified by FieldsInSetCanMerge traverses all fragments to look for conflicts, even "branches" that can be statically proven to be dead branches, but because the named fragment reuse code was taking those dead branch into account, it was sometimes using a named fragment that created a validation issue (in such a dead branch). See e1d1044 for an example of the problem with some explanations.

@pcmanus pcmanus requested a review from a team as a code owner June 7, 2023 14:59
@netlify
Copy link

netlify bot commented Jun 7, 2023

👷 Deploy request for apollo-federation-docs pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit 28d9b92

@changeset-bot
Copy link

changeset-bot bot commented Jun 7, 2023

🦋 Changeset detected

Latest commit: 28d9b92

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 7 packages
Name Type
@apollo/query-planner Patch
@apollo/federation-internals Patch
@apollo/gateway Patch
@apollo/composition Patch
@apollo/query-graphs Patch
@apollo/subgraph Patch
apollo-federation-integration-testsuite Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@codesandbox-ci
Copy link

codesandbox-ci bot commented Jun 7, 2023

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

@pcmanus pcmanus force-pushed the no-fragment-reuse-if-create-invalid-op branch from f0d937e to a497b20 Compare June 7, 2023 15:05
// Returns a copy of this operation with the provided updated selection set. Optionally, a new set of fragments
// can be also provided. If `newFragments` is undefined, then the existing fragments will be reused, but if it
// `null`, then the new operation will have no fragments.
private withUpdatedSelectionSet(newSelectionSet: SelectionSet, newFragments?: NamedFragments | null): Operation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we've done it a couple of times before, but I'm not totally sure how I feel about controlling the behavior of a function by passing null rather than undefined. It would be nice to make things more explicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I've split it into 2 separately named methods.

* any unecessary top-level inline fragments, possibly multiple layers of them, but we never recurse
* inside the sub-selection of an selection that is not removed by the normalization.
*/
normalize(parentType: CompositeType, options?: { recursive? : boolean }): SelectionSet {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having named parameters for options, I would just make the whole function use named parameters and get rid of options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it for the sake of not being too contrarian, but I'll admit not entirely seeing how this isn't almost entirely a stylistic preference.

@@ -2806,7 +2788,6 @@ export class QueryPlanner {
// going to expand everything during the algorithm anyway. We'll re-optimize subgraph fetches with fragments
// later if possible (which is why we saved them above before expansion).
operation = operation.expandAllFragments();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reason to normalize at this point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not now that expandAllFragments always ensure it's output is normalized.

pcmanus added 7 commits June 9, 2023 15:53
The naming of `trimUnsatisfiableBranches` was misleading because it
does not just "remove unsatisfiable branches", it more generally remove
things that are not-very-optimal (even if they are not full branches).
So this rename this method to `normalize`, which more clearly explains
then intent (on top of being shorter).

Additionally, the method was not actually removing unsatisfiable
branches in all cases, which made it a bit error prone to use in
practice and led to some small inefficiencies in the
`NamedFragmentDefinition.selectionAtType` method. So fixed that,
making the method more regular and fixing the aforementioned
inefficiency.

This also force a normalisation after `expandAllFragments` because
we were relying on it in practice so this is less fragile this way.
The QP code never truly rely on operation validation as it is assumed
the operation passed to it have already been validated prior to being
passed to the planner, but having the added validation helps a bit in
other testing context when that pre-validation hasn't happened, and
since this is fairly cheap...
The code for normalizing fields was not taking into account that if we
normalize on a more precise type, this may make a field definition "more
precise", which can impacts the type we should use to normalize the
sub-selection of said field.
@pcmanus pcmanus force-pushed the no-fragment-reuse-if-create-invalid-op branch 2 times, most recently from 9074cf6 to e8ede8a Compare June 9, 2023 14:47
@pcmanus pcmanus force-pushed the no-fragment-reuse-if-create-invalid-op branch from e8ede8a to b54033d Compare June 9, 2023 14:51
@pcmanus pcmanus requested a review from clenfest June 9, 2023 15:01
internals-js/src/__tests__/operations.test.ts Outdated Show resolved Hide resolved
internals-js/src/__tests__/operations.test.ts Outdated Show resolved Hide resolved
notCoveredByFragments = notCoveredByFragments.minus(fragment.expandedSelectionSetAtType(parentType));
for (const { fragment, atType} of filteredApplyingFragments) {
const notCovered = subSelection.minus(atType.selectionSet);
if (atType.trimmed && !selectionSetsDoMerge(notCovered, atType.trimmed)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this checks trimmed against the parts of the original sub-selection it hasn't already been checked against, presumably you also need to check the trimmed of each fragment against the trimmeds of other fragments you add.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good point. Added some testing for this case and fixed.

internals-js/src/operations.ts Outdated Show resolved Hide resolved
selectionSet = typeRuntimes.length === conditionRuntimes.length
? expandedSelectionSet
: expandedSelectionSet.filter((s) =>
s.kind !== 'FragmentSelection' || !s.element.typeCondition || typeRuntimes.some((t) => t.name == s.element.typeCondition?.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To check for intersection, shouldn't

typeRuntimes.some((t) => t.name == s.element.typeCondition?.name)

be

runtimeTypesIntersects(type, s.element.typeCondition)

instead? (Or at least, I don't think s.element.typeCondition is necessarily guaranteed to be an object type here.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right; I brain-farted. I added test for this, and in the process realised the that I was using the filter method wrong as the one implemented was recursing inside sub-selection, which we don't want. Anyway, this was confusing and I actually renamed the existing filter into filterRecursiveDepthFirst to be extra clear and kept a simpler filter method which only apply to top-level (so this part of the code hasn't changed, but just explaining why there is some filter-related changes in the last commit).

));
return newSet.toSelectionSet(this.parentType);
}
}

return this.selectionSet === trimmedSelectionSet ? this : this.withUpdatedSelectionSet(trimmedSelectionSet);
return this.selectionSet === normalizedSelectionSet ? this : this.withUpdatedSelectionSet(normalizedSelectionSet);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm noticing that unlike FieldSelection.normalize() and InlineFragmentSelection.normalize() which update this.parentType to parentType, InlineFragmentSelection.normalize() doesn't seem to, e.g.:

  • The return this.selectionSet === normalizedSelectionSet ? this : this.withUpdatedSelectionSet(normalizedSelectionSet); doesn't update the element's parentType.
  • The return this.withUpdatedSelectionSet(selectionSetOfElement( ... ) above doesn't update the element's parentType.
  • The return newSet.toSelectionSet(this.parentType) above uses this.parentType instead of parentType.
  • There's a (this.element.typeCondition ?? this.parentType).typenameField()! call above instead of (thisCondition ?? parentType).typenameField()!,

Also I noticed that when you lift selections out of the fragment in this function, the parentType isn't accordingly updated for those selections.

Is this fine/expected? (It might be that the field is only used in certain cases, or maybe some other code requires it to not be changed, but I'm not familiar enough with the code here.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No,I was being sloppy.

I mean, it is probably mostly fine in the grand scheme of stuff because when we build a SelectionSet with SelectionSetUpdates (which SelectionSet.normalize does through lazyMap), then it ensures the elements have the proper parent type by calling rebasOn, so I believe the selections still ended up with the proper parent type ultimately.

Still, that was sloppy and would be unexpected at least out of the Selection.normlize call and could cause issue someday, so fixed it.

you lift selections out of the fragment in this function, the parentType isn't accordingly updated for those selections

As mentioned above, the fact that we passed the lifted selections to a SelectionSetUpdates took care of updating the parent type of the elements (of course, as you pointed out, the toSelectionSet call for that SelectionSetUpdates was incorrect, so did fixed that, but just calling out that there is nothing additional to do).

@pcmanus
Copy link
Contributor Author

pcmanus commented Jun 12, 2023

Note: the node 20 circleCI check on this PR seems to have failed a bunch of times in a row, yet the error seems CI related. Check the link for detail, but the error is:

npm ERR! code 2
npm ERR! path /home/circleci/project/node_modules/@parcel/watcher
npm ERR! command failed
npm ERR! command sh -c node-gyp-build
npm ERR! sh: 1: node-gyp-build: Text file busy

during npm package installing, and for a PR that don't even change any dependency, so it screams "unrelated to the PR". Yet, I haven't (yet) seen this on other PRs and as said, I got this here a few time in a row now, so wondering if anyone has a clue.

@pcmanus pcmanus requested a review from sachindshinde June 12, 2023 14:03
Copy link
Contributor

@sachindshinde sachindshinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the fixes! There's a few small things below, but other than that LGTM

}
}

return this.selectionSet === normalizedSelectionSet ? this : this.withUpdatedSelectionSet(normalizedSelectionSet);
return this.selectionSet === normalizedSelectionSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might need to be

this.parentType === parentType && this.selectionSet === normalizedSelectionSet

instead of

this.selectionSet === normalizedSelectionSet

if the first branch of the ternary conditional is going to return this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch, thanks.

// have the same number of runtimes, then they have the same set of runtimes and no point doing any filtering.
const typeRuntimes = possibleRuntimeTypes(type);
const conditionRuntimes = possibleRuntimeTypes(this.typeCondition);
selectionSet = typeRuntimes.length === conditionRuntimes.length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

selectionSet in the if block above will be rebased on type due to the SelectionSet.normalize() call, but it looks like we'd have to manually do that rebasing here. (Not sure if it really matters, since it looks like this function is only used by tryOptimizeSubselectionWithFragments().)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, and this actually made me realise that I was making this more complicated than it needed to be.

Because calling rebaseOn(type) on the if block above is actually usually incorrect. Typically, that if block was taken when, say, type was some interface I1 while this.typeCondition was another interface I2 (assuming the runtimes match what expect). But in general, you can't just rebase fields of I2 into I1 directly, even it happens that it "would" be alright from the POV of the underlying possible runtimes.

So really, the issue was that the canApplyAtType method, which was guarding calls to this expandedSelectionSetAtType should just not allow those cases where calling normalize(type) does not work. So fixed that and simplified accordingly, and I've validated that fragments are still reused as well as before.

@sachindshinde since the change is probably a bit bigger than you intended, I'll give you the chance to have a look at the last commit/check my logic before merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give you the chance to have a look at the last commit/check my logic before merging.

Slightly changed my mind given circumstance: I optimistically merged, but do raise any concern if you have and we'll address.

@sachindshinde
Copy link
Contributor

Regarding the Node 20 issues, I couldn't immediately pinpoint a cause.

I tried to SSH into the box and look at the debug logs. Toward the end I saw:

4635 timing build:link Completed in 11ms
4636 info run @parcel/watcher@2.1.0 install node_modules/@parcel/watcher node-gyp-build
4637 info run dtrace-provider@0.8.8 install node_modules/dtrace-provider node-gyp rebuild || node suppress-error.js
4638 info run @parcel/watcher@2.1.0 install { code: 2, signal: null }
4639 timing reify:rollback:createSparse Completed in 1708ms
4640 timing reify:rollback:retireShallow Completed in 0ms
4641 timing command:ci Completed in 19138ms
4642 verbose stack Error: command failed
4642 verbose stack     at ChildProcess.<anonymous> (/home/circleci/.nvm/versions/node/v20.3.0/lib/node_modules/npm/node_modules/@npmcli/promise-spawn/lib/index.js:53:27)
4642 verbose stack     at ChildProcess.emit (node:events:511:28)
4642 verbose stack     at maybeClose (node:internal/child_process:1098:16)
4642 verbose stack     at Socket.<anonymous> (node:internal/child_process:456:11)
4642 verbose stack     at Socket.emit (node:events:511:28)
4642 verbose stack     at Pipe.<anonymous> (node:net:334:12)
4643 verbose pkgid @parcel/watcher@2.1.0
4644 verbose cwd /home/circleci/project
4645 verbose Linux 5.15.0-1030-aws
4646 verbose node v20.3.0
4647 verbose npm  v9.7.1
4648 error code 2
4649 error path /home/circleci/project/node_modules/@parcel/watcher
4650 error command failed
4651 error command sh -c node-gyp-build
4652 error sh: 1: node-gyp-build: Text file busy
4653 verbose exit 2
4654 timing npm Completed in 19317ms
4655 verbose unfinished npm timer reify 1686630640693
4656 verbose unfinished npm timer reify:build 1686630657224
4657 verbose unfinished npm timer build 1686630657226
4658 verbose unfinished npm timer build:deps 1686630657227
4659 verbose unfinished npm timer build:run:install 1686630657258
4660 verbose unfinished npm timer build:run:install:node_modules/@parcel/watcher 1686630657259
4661 verbose unfinished npm timer build:run:install:node_modules/dtrace-provider 1686630657309
4662 verbose code 2

From some googling, sh: 1: node-gyp-build: Text file busy seems to mean that something was trying to overwrite node-gyp-build, but it was in the middle of being executed.

In any case, I changed my volta config in my package.json to the above (node 20.3.0 and npm 9.7.1) and ran tests, and they passed. So it's probably fine to ignore the Node 20 CI error (hopefully it won't re-appear on main).

@abernix
Copy link
Member

abernix commented Jun 13, 2023

@pcmanus:

Note: the node 20 circleCI check on this PR seems to have failed a bunch of times in a row, yet the error seems CI related. Check the link for detail, but the error is:

Just triangulating a theme I'm seeing around Node.js v20.3 with the Text file busy error, a very similar occurrence of that error happened with something totally different in this PR from @jerelmiller on another repository: apollographql/apollo-utils#302.

I think that either Node.js v20.3 OR CircleCI's Node.js v20.3 image are potentially the culprit. I don't yet know if that's foreshadowing something bigger that will break in that version or just a bug. Hoping for the latter.

@pcmanus pcmanus force-pushed the no-fragment-reuse-if-create-invalid-op branch from 6740dba to 28d9b92 Compare June 13, 2023 14:20
@jerelmiller
Copy link
Member

@abernix interestingly I was able to get it working again using the Node 20.2 image: apollographql/apollo-utils#306 so seems its specific to the 20.3 image. I hope this is just a temp fix and as you said, hope its not a foreshadowing of something worse.

@pcmanus pcmanus merged commit 7f1ef73 into apollographql:main Jun 13, 2023
pcmanus added a commit to pcmanus/federation that referenced this pull request Jun 14, 2023
…ches

In apollographql#2619, code was added to avoid using a named fragment if some
"trimmed" parts of that fragment may have field conflicts with the rest
of the selection (or other trimmed part of other fragment at the same
level). But the logic added is actually not generic enough: because
field conflict validation traverses all fragments, it is not enough
to check a fragment against the selection at the "current" fragment,
we need to check againt every sub-selection that "merges" at the same
level, which may avoid some "sibling" branch of the current one.

To make this easier, the code adds some initial phase to the
`Selection.optimize` method (that try reusing named fragment) which
basically pre-compute the job of collecting the result of the 1st
step of [`FieldsInSetCanMerge`](https://spec.graphql.org/draft/#FieldsInSetCanMerge()).
The code from apollographql#2619 is then adapted to use that, thus taking "all
branches" into account regardless of where the recursion of `optimize`
is at.
pcmanus added a commit to pcmanus/federation that referenced this pull request Jun 15, 2023
…ches

In apollographql#2619, code was added to avoid using a named fragment if some
"trimmed" parts of that fragment may have field conflicts with the rest
of the selection (or other trimmed part of other fragment at the same
level). But the logic added is actually not generic enough: because
field conflict validation traverses all fragments, it is not enough
to check a fragment against the selection at the "current" fragment,
we need to check againt every sub-selection that "merges" at the same
level, which may avoid some "sibling" branch of the current one.

To make this easier, the code adds some initial phase to the
`Selection.optimize` method (that try reusing named fragment) which
basically pre-compute the job of collecting the result of the 1st
step of [`FieldsInSetCanMerge`](https://spec.graphql.org/draft/#FieldsInSetCanMerge()).
The code from apollographql#2619 is then adapted to use that, thus taking "all
branches" into account regardless of where the recursion of `optimize`
is at.
pcmanus pushed a commit that referenced this pull request Jun 15, 2023
#2627)

In #2619, code was added to avoid using a named fragment if some
"trimmed" parts of that fragment may have field conflicts with the rest
of the selection (or other trimmed part of other fragment at the same
level). But the logic added is actually not generic enough: because
field conflict validation traverses all fragments, it is not enough
to check a fragment against the selection at the "current" fragment,
we need to check againt every sub-selection that "merges" at the same
level, which may avoid some "sibling" branch of the current one.

To make this easier, the code adds some initial phase to the
`Selection.optimize` method (that try reusing named fragment) which
basically pre-compute the job of collecting the result of the 1st
step of [`FieldsInSetCanMerge`](https://spec.graphql.org/draft/#FieldsInSetCanMerge()).
The code from #2619 is then adapted to use that, thus taking "all
branches" into account regardless of where the recursion of `optimize`
is at.
sachindshinde pushed a commit to sachindshinde/federation that referenced this pull request Jun 15, 2023
…ches

In apollographql#2619, code was added to avoid using a named fragment if some
"trimmed" parts of that fragment may have field conflicts with the rest
of the selection (or other trimmed part of other fragment at the same
level). But the logic added is actually not generic enough: because
field conflict validation traverses all fragments, it is not enough
to check a fragment against the selection at the "current" fragment,
we need to check againt every sub-selection that "merges" at the same
level, which may avoid some "sibling" branch of the current one.

To make this easier, the code adds some initial phase to the
`Selection.optimize` method (that try reusing named fragment) which
basically pre-compute the job of collecting the result of the 1st
step of [`FieldsInSetCanMerge`](https://spec.graphql.org/draft/#FieldsInSetCanMerge()).
The code from apollographql#2619 is then adapted to use that, thus taking "all
branches" into account regardless of where the recursion of `optimize`
is at.
pcmanus added a commit to pcmanus/federation that referenced this pull request Aug 18, 2023
…with conflicting fields

Trying to reuse fragments, if done without specific case, can in some context result in invalid
selections due to fields conflicting. It is to avoid that problem that apollographql#2619 introduced the
`FieldsConflictValidator` mechanism (later improved by apollographql#2635).

Unfortunately, in some fairly specific setups with nested fragments, the
code was missing some data in the validation mentioned above, which led
to still having case where a subgraph fetch may be invalid due to some
fields (within reusing fragments) conflicting at some point of the
query.

This commit fix that issue by ensuring we take everything we should into
account when doing the aforementioned validation.
pcmanus pushed a commit that referenced this pull request Aug 21, 2023
…with conflicting fields (#2740)

Trying to reuse fragments, if done without specific case, can in some context result in invalid
selections due to fields conflicting. It is to avoid that problem that #2619 introduced the
`FieldsConflictValidator` mechanism (later improved by #2635).

Unfortunately, in some fairly specific setups with nested fragments, the
code was missing some data in the validation mentioned above, which led
to still having case where a subgraph fetch may be invalid due to some
fields (within reusing fragments) conflicting at some point of the
query.

This commit fix that issue by ensuring we take everything we should into
account when doing the aforementioned validation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants