Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(aws-appsync): resolver replacement issues #13269

Closed
MrArnoldPalmer opened this issue Feb 24, 2021 · 29 comments · Fixed by #23322
Closed

(aws-appsync): resolver replacement issues #13269

MrArnoldPalmer opened this issue Feb 24, 2021 · 29 comments · Fixed by #23322
Labels
@aws-cdk/aws-appsync Related to AWS AppSync bug This issue is a bug. effort/medium Medium work item – several days of effort needs-cfn This issue is waiting on changes to CloudFormation before it can be addressed. p2

Comments

@MrArnoldPalmer
Copy link
Contributor

MrArnoldPalmer commented Feb 24, 2021

There seem to be issues when that come up during appsync deployment when resolvers for specific fields are replaced. If you change the uniqueID of a resolver, this will cause a CFN deployment where the old resolver is removed, and the new one is created. Since only one resolver can exist on a field, this will cause the deployment to fail with Only one resolver is allowed per field.

Alternatively, this can also result in 'detached' resolvers, which are not actually triggered on their corresponding queries.

This may require resolution on the cloudformation or appsync side since the changes being submitted by the CDK are valid and replacing existing resolvers is allowed, though obviously could incur some downtime.

See: #13238 and #12635 for relevant details.

@bigkraig
Copy link

This one is pretty serious in the fact that once you place your CF stack into a bad state, you can't roll back CDK to a working deployment without manual intervention and an outage. I'm going to escalate it with my TAM but is there a potential workaround to ensure that the unique ids do not change?

@bigkraig
Copy link

It looks like its been a known issue for a few years https://forums.aws.amazon.com/thread.jspa?messageID=884492

@bigkraig
Copy link

This can be worked around as described in aws-amplify/amplify-cli#682. What I did was version my resolver typeName and now I can increment this if CDK needs to change the CF template structure.

@MrArnoldPalmer
Copy link
Contributor Author

@bigkraig thats a great find! could you post an example of the versioning, perhaps this is something we could handle in the construct opaquely?

@bigkraig
Copy link

Just to show my work, here is a selection of the diff. How we might work this into the construct is something else, as we need to modify the schema blob.

My original types were named Query & Mutation, and I appended a V2.

Schema change:

diff --git a/graphql/schema.graphql b/graphql/schema.graphql
index 3ecb9a2..6f79805 100644
--- a/graphql/schema.graphql
+++ b/graphql/schema.graphql
@@ -2966,13 +2966,13 @@ type XXXXX @aws_api_key {
 }
 
 schema {
-  query: Query
-  mutation: Mutation
+  query: QueryV2
+  mutation: MutationV2
 }
 
 scalar AWSJSON
 
-type Query {
+type QueryV2 {
@@ -3013,7 +3013,7 @@ type Query {
 }
 
-type Mutation {
+type MutationV2 {

I then also created some exports in the same path that contains the schema with:

export const QUERY_TYPE = 'QueryV2';
export const MUTATION_TYPE = 'MutationV2';

And the modification to my resolvers look like:

      .createResolver({ fieldName: 'YYYYYY', typeName: MUTATION_TYPE });

@MrArnoldPalmer
Copy link
Contributor Author

ahh I see. Yeah this exact strategy may not be possible to handle within the construct as changing the schema won't be easy unless the user is generating the schema via CDK and not bringing their own. Versioning in general seems like something we should play with though.

@alextriaca
Copy link

I've raised a similar issue on the AppSync community (aws/aws-appsync-community#146). There seems to be some disagreement about what causes this. The only way I could reliably reproduce this was to deploy a broken schema followed by the correct schema. There is also an issue where you rename fields/resolvers and they become detached but I couldn't reliably reproduce this with any consistency. This seems to affect larger stacks where a change on one resolver will cause others to become detached with larger stacks just having a higher chance of this happening.
Currently we are renaming our resolvers each time we deploy which has worked up to now. We've updated to the latest version of CDK and this workaround no longer seems to be working either.

@MrArnoldPalmer
Copy link
Contributor Author

@alextriaca I'm not sure what would have caused your workaround to break in recent releases. I've had a hard time reliably reproducing the detached resolvers as well.

@mmccall10
Copy link

Spent many hours debugging and working around the only one resolver is allowed per field CFN error and the detached resolver issue. Have not encountered these problems since we started creating the resolver from the api construct and not the datasource.

@Dachmian
Copy link

Dachmian commented Jun 11, 2021

Is this being worked on actively by the Appsync/Cloudformation team?

The CFN error only one resolver is allowed per field has been reported on various forums, posts etc all the way back to 2018 and its seems like there is complete silence from the Appsync/Cloudformation team on why this is happening and the only things you can find are various people suggesting different workarounds with renaming resources etc.

Can we have an official statement of what is causing the issue and an ETA on a fix? OR if this is user error please explain the correct way to update/replace resolvers and datasources.

@MrArnoldPalmer MrArnoldPalmer removed their assignment Jun 21, 2021
@pszabop
Copy link

pszabop commented Jun 28, 2021

I can reliably get the only one resolver is allowed per field error if I'm promoting a resolver from an existing built in (VTL) resolver to a Lambda resolver.

There doesn't seem to be a way of changing to a Lambda resolver short of changing the API. Changing the API because of a CFT/appsync bug seems like a terrible idea, my UI folks will be unhappy.

@alextriaca
Copy link

Wicked find @pszabop this is awesome news. Reliably reproducing this issue has been seriously difficult so far. I really hope this helps AWS get to the bottom of this issue!

@mmccall10
Copy link

@pszabop this is my experience too. I can reliably reproduce the error by first creating the resolver FROM the data source then attempting to change the data source type. Creating the resolver against the data source results in the inability to update it. You'll need to delete the resolver then re-create it. Does not go over well in production. In my experience to avoid this undesired behavior DO NOT create the resolver from the data source, create it from the appsync api.

const graphqlApi = new GraphqlApi(...);
const demoDS = api.addDynamoDbDataSource(...);

// Do this
graphqlApi.createResolver(...)

//Not this
demoDS.createResolver(...)

https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-appsync.GraphqlApi.html#createwbrresolverpropsspan-classapi-icon-api-icon-experimental-titlethis-api-element-is-experimental-it-may-change-without-noticespan

https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_aws-appsync.CfnResolver.html

@kylevillegas93
Copy link
Contributor

Hi!

The above solutions are not working for me - I need to add a caching configuration to the resolver and the only way to do that seems to be using CfnResolver. Changing the schema type names would be somewhat of a heavy lift. Do we have a path forward for this?

Thanks,
Kyle

@peterwoodworth peterwoodworth added the needs-cfn This issue is waiting on changes to CloudFormation before it can be addressed. label Oct 7, 2021
@revmischa
Copy link
Contributor

This is the workaround I'm using for now:

  // creates a resolver for a datasource but with a static name so it's safe to move around
  // CDK gets mad ("Only one resolver is allowed per field") if you give a new resolver a different name for the same field
  // so this helps with that
  // https://github.com/aws/aws-cdk/issues/13269
  createResolver<TA, TR>({ fieldName, typeName, dataSource, ...rest }: CreateResolver<TA, TR>) {
    const nameBase = `${upperFirst(typeName)}${upperFirst(fieldName)}`

    // create resolver from datasource
    const resolver = dataSource.createResolver({
      typeName,
      fieldName,
      ...rest,
    })

    // graphQL resolver CFN - make name static
    const resolverCfn = resolver.node.defaultChild as CfnResolver
    resolverCfn.overrideLogicalId(`${nameBase}Resolver`) // don't rename if moved around

    return resolver
  }
}

This is a real problem though that makes AppSync and CDK extremely perilous to use.

@mmccall10
Copy link

The CDK path changes as you move it around therefore resulting in a new logical id. "Updates in place" work if the logical id is the same. Overriding the logical id is the only way to really get around this. My previous suggestion to attach it to the api and not the datasource "solves" the issue because the path doesn't change. The appsync graphq api construct names resolvers with {typeName}${fieldName}Resolver.

Another option is to do this dynamically. You can build an aspect and attach it to the api for example:
https://gist.github.com/mmccall10/c4070e5df1befd95b69e78d98005f099

import { CfnResolver } from '@aws-cdk/aws-appsync'
import { IConstruct } from '@aws-cdk/core'

interface IAspect {
  visit: (node: IConstruct) => void
}

class EnforceAppSyncResolverNaming implements IAspect {
  visit (node: IConstruct): void {
    if (node instanceof CfnResolver) node.overrideLogicalId(`${node.typeName}${node.fieldName}Resolver`)
  }
}

export { EnforceAppSyncResolverNaming }
export default class GraphqlApiStack extends Stack {
  api: GraphqlApi

  constructor (scope: App, id: string, props: GraphqlStackProps) {
    super(scope, id, props)

    this.api = new GraphqlApi(this, 'GraphqlApi', {
      name: 'GraphqlApi',
      schema: Schema.fromAsset('graphql/schema.graphql'),
      authorizationConfig: {
        defaultAuthorization: {
          authorizationType: AuthorizationType.API_KEY
        }
      }
    })

    Aspects.of(this.api).add(new EnforceAppSyncResolverNaming())
  }
}

@p0wl
Copy link

p0wl commented Jan 26, 2022

Hey, the suggested solution from @mmccall10 and @revmischa are only a solution to keep the the logicalIds stable, but not not helpful if you encounter the issue in a production setup, since adding the Aspect will change the logicalId, which will result in the Only one resolver is allowed per field issue on deployment.

Did anyone find solution that does not require manually detaching the resolvers before running CloudFormation with the new resolver names?

@endymion
Copy link

endymion commented Apr 21, 2022

This can be a confusing land mine for newbies to step on. Here's an example from when this issue cost me many frustrated hours: https://repost.aws/questions/QUmiAqh543TR2MnU40ucFx1g/possible-to-override-default-graph-ql-model-resolvers-with-lambda-function-resolvers

Could I please suggest a more informative error message when this happens? That message is pithy and direct and meaningful and ... useless in this scenario. Especially for people who are not (yet?) AppSync experts.

It's not ideal that people need to understand this much mental model for how the nuts and bolts in the cloud affect their workflows. But it's even worse if the error is baffling. I have a high pain tolerance for this kind of thing but most people would have given up on this and used a less-ideal GraphQL schema as a workaround.

Just a usability suggestion from a customer. Thank you for a fantastic, game-changing product.

@jzybert
Copy link

jzybert commented Jun 9, 2022

Has anyone come up with a better solution for this? I can't change to using this.api.addQuery(...) because I'm using a schema-first API which pre-defines the return types and inputs and things like that. And I can't use EnforceAppSyncResolverNaming because I'm already deployed to production, and adding this would break my CloudFormation deployment initially.

@MrArnoldPalmer
Copy link
Contributor Author

For now, I think the best solution here on the CDK side is to remove the dataSouce.createResolver/createFunction methods to prevent unexpected logicalID changes to resources that users aren't expecting. Going forward resolver and function scopes will be the GraphQlApi construct and never the DataSource unless users are explicitly setting it themselves. For those wishing to migrate existing services without downtime, my suggestion would be to create a new GraphQlApi construct with the same schema and backed by the same data sources, then deploy that alongside the existing one and migrate DNS to point to the new service if possible (or clients), then remove the old API.

MrArnoldPalmer added a commit that referenced this issue Dec 13, 2022
Fixes an issue where users couldn't control the ID's of appsync
resolvers and functions which would cause them to be replaced whenever
the data source of the resolver or function was replaced. For resolvers,
this could cause a deployment error as cloudformation would attempt to
create the new instance of the resolver before deleting the old one
throwing 'only one resolver per field' error.

Removing `dataSource.createFunction` and `dataSource.createResolver` in
favor of the same methods on the `GraphQlApi` construct makes it clear
that the parent scope of resolvers and functions is an API and not a
data source and therefore wont be replaced when changing data sources.

BREAKING CHANGE: removes `createFunction` and `createResolver` from
`IDataSource`.

Fixes: #13269
MrArnoldPalmer added a commit that referenced this issue Dec 13, 2022
Fixes an issue that would cause unexpected resource replacement for
appsync resolvers and functions because of construct nesting and ID
generation.

Changes `createResolver` and `createFunction` methods on `GraphQlApi`
and `DataSource` constructs to require explicitly passing an ID.
Additionally changes the scope of the constructs created in
`createResolver` and `createFunction` on the `DataSource` construct to
be `this.api` instead of `this`. This allows users to change the data
sources of resolvers and functions while keeping the IDs stable and
avoiding resource replacement.

This helps to avoid the `only one resolver per field` error that occurs
when deleting a resolver on a field, and adding a new one within the
same deployment.

BREAKING CHANGE: `DataSource.createResolver`,
`DataSource.createFunction`, and `GraphQlApi.createResolver` now require
2 arguments instead of 1.

Fixes: #13269
@mergify mergify bot closed this as completed in #23322 Dec 14, 2022
mergify bot pushed a commit that referenced this issue Dec 14, 2022
fix(appsync): unstable IDs on resolvers and functions

Fixes an issue that would cause unexpected resource replacement for
appsync resolvers and functions because of construct nesting and ID
generation.

Changes `createResolver` and `createFunction` methods on `GraphQlApi`
and `DataSource` constructs to require explicitly passing an ID.
Additionally changes the scope of the constructs created in
`createResolver` and `createFunction` on the `DataSource` construct to
be `this.api` instead of `this`. This allows users to change the data
sources of resolvers and functions while keeping the IDs stable and
avoiding resource replacement.

This helps to avoid the `only one resolver per field` error that occurs
when deleting a resolver on a field, and adding a new one within the
same deployment.

BREAKING CHANGE: `DataSource.createResolver`,
`DataSource.createFunction`, and `GraphQlApi.createResolver` now require
2 arguments instead of 1.

Fixes: #13269

----

### All Submissions:

* [x] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md)

### Adding new Construct Runtime Dependencies:

* [ ] This PR adds new construct runtime dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-construct-runtime-dependencies)

### New Features

* [ ] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)?
	* [ ] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)?

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@MrArnoldPalmer
Copy link
Contributor Author

Instead of removing these methods, we changed the parenting of the generated constructs to use the api instead of the data source and also require explicitly passing the ID so it cannot change unexpectedly from underneath you. If you're still running into this issue please let us know as I'd like to discuss additional solutions if needed.

@neilferreira
Copy link
Contributor

@corymhall @MrArnoldPalmer For what its worth, the fix in #23322 makes the upgrade path of this change is seemingly quite difficult and might need some TLC for users with existing stacks.

Say if you previously had this resolver:

lambdaDataSource.createResolver({
      typeName: 'Query',
      fieldName: 'listTeams',
    });

This now has to become:

lambdaDataSource.createResolver('listTeamsResolver', {
      typeName: 'Query',
      fieldName: 'listTeams',
    });

The diff for this now looks like:

Stack APIStack
Resources
[-] AWS::AppSync::Resolver ApiAppSyncORMLambdaDataSourceQuerylistTeamsResolver6F34919A destroy
[+] AWS::AppSync::Resolver Api/listTeamsResolver ApilistTeamsResolverD763779D

CloudFormation then processes the create statements first, which results in this error, as it is now trying to create listTeams twice.

Only one resolver is allowed per field. (Service: AWSAppSync; Status Code: 400; Error Code: BadRequestException

There is also no seemingly no way to re-use the previously used ID.

@charlyB7
Copy link

charlyB7 commented Dec 21, 2022

I encountered this exact behavior when trying to upgrade to v2.55.0 : I added IDs to all our resolvers and CloudFormation deployment failed with errors Only one resolver is allowed per field

I had to revert the CDK update as we have a stack in Production using the previous resolvers (which were created without an ID)

@MrArnoldPalmer
Copy link
Contributor Author

You can work around this by specifying the same IDs that were being computed before the change (as you are doing @neilferreira) but instead of using DataSource.createResolver use the new Resolver(dataSource, 'listTeamsQuery', {}) specifying the same data source as the parent of the construct that was being used before.

You can also use node.overrideLogicalId similar to the aspects workaround above to override the logicalID to what was previously being generated to avoid replacement.

@charlyB7
Copy link

charlyB7 commented Dec 22, 2022

It now works using the workaround :

const resolverId = `${type)${field}Resolver`;
new Resolver(dataSource, resolverId, {
  api: api
  dataSource: dataSource
  typeName: type
  fieldName: field
})

Thank you @MrArnoldPalmer :)

brennanho pushed a commit to brennanho/aws-cdk that referenced this issue Jan 20, 2023
fix(appsync): unstable IDs on resolvers and functions

Fixes an issue that would cause unexpected resource replacement for
appsync resolvers and functions because of construct nesting and ID
generation.

Changes `createResolver` and `createFunction` methods on `GraphQlApi`
and `DataSource` constructs to require explicitly passing an ID.
Additionally changes the scope of the constructs created in
`createResolver` and `createFunction` on the `DataSource` construct to
be `this.api` instead of `this`. This allows users to change the data
sources of resolvers and functions while keeping the IDs stable and
avoiding resource replacement.

This helps to avoid the `only one resolver per field` error that occurs
when deleting a resolver on a field, and adding a new one within the
same deployment.

BREAKING CHANGE: `DataSource.createResolver`,
`DataSource.createFunction`, and `GraphQlApi.createResolver` now require
2 arguments instead of 1.

Fixes: aws#13269

----

### All Submissions:

* [x] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md)

### Adding new Construct Runtime Dependencies:

* [ ] This PR adds new construct runtime dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-construct-runtime-dependencies)

### New Features

* [ ] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)?
	* [ ] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)?

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
brennanho pushed a commit to brennanho/aws-cdk that referenced this issue Feb 22, 2023
fix(appsync): unstable IDs on resolvers and functions

Fixes an issue that would cause unexpected resource replacement for
appsync resolvers and functions because of construct nesting and ID
generation.

Changes `createResolver` and `createFunction` methods on `GraphQlApi`
and `DataSource` constructs to require explicitly passing an ID.
Additionally changes the scope of the constructs created in
`createResolver` and `createFunction` on the `DataSource` construct to
be `this.api` instead of `this`. This allows users to change the data
sources of resolvers and functions while keeping the IDs stable and
avoiding resource replacement.

This helps to avoid the `only one resolver per field` error that occurs
when deleting a resolver on a field, and adding a new one within the
same deployment.

BREAKING CHANGE: `DataSource.createResolver`,
`DataSource.createFunction`, and `GraphQlApi.createResolver` now require
2 arguments instead of 1.

Fixes: aws#13269

----

### All Submissions:

* [x] Have you followed the guidelines in our [Contributing guide?](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md)

### Adding new Construct Runtime Dependencies:

* [ ] This PR adds new construct runtime dependencies following the process described [here](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md/#adding-construct-runtime-dependencies)

### New Features

* [ ] Have you added the new feature to an [integration test](https://github.com/aws/aws-cdk/blob/main/INTEGRATION_TESTS.md)?
	* [ ] Did you use `yarn integ` to deploy the infrastructure and generate the snapshot (i.e. `yarn integ` without `--dry-run`)?

*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
@phani-srikar
Copy link

phani-srikar commented Feb 23, 2023

👋 we are experiencing this issue for our customer use-case which can be described as below:
We have several AppSync resolvers that are part of say "TestStack" that is already deployed successfully. Now our use-case is to move some resolvers out to a new stack say "CustomTestStack".
We currently do this like suggested in this thread by keeping the logical ID of the resolvers and attached pipeline functions same in CustomTestStack (I have verified this from the CFN generated).

However, during the deployment it fails with only one resolver is allowed per field error.
The new stack also has a DependsOn relation to the original stack, so we would expect the resolvers to be deleted first before re-attaching them from the new stack.

I have attached the CFN logs below:
image

pic2
image

From the logs for original "Test" stack (pic2), we see that it's waiting in phase "update_complete_cleanup_in_progress" where I would assume the resolvers to be soft deleted and detached. Then it wouldn't cause the subsequent new stack to fail with the error mentioned above.

Please let me know if you need any other information.

@MrArnoldPalmer
Copy link
Contributor Author

@phani-srikar workarounds posted here don't work when moving resolvers to a new stack. A depends on relationship between stacks doesn't do anything to tell the service that some resources from one need to be deleted before resources from the other can be deployed, it just means that for one (CustomTestStack) to be deployed, the other (TestStack) must also be, which it already is. So there is no way to declare a dependency on a resource from a stack being deleted before another is created.

For your case I would suggest deploying the changes to the original stack first, deleting the old resolvers, then deploying the new stack creating the new ones. I'm investigating alternatives with the service team in the meantime, but right now this isn't something that cloudformation supports.

@phani-srikar
Copy link

Hi @MrArnoldPalmer. Thanks for your quick response and confirming the behavior.
To give you a little more information about how we do our deployments, both the TestStack and the new CustomTestStack are nested stacks with a common root stack which is updated first followed by the TestStack and CustomTestStack in that order but all this happens in a single deploy operation.
We have considered your suggestion to have separate iterative deployments, but that would introduce additional complexities like:

  • Iterative rollback in case of a failure needs to be also managed by us which essentially should be out-of-the-box managed by CFN instead.
  • Generating the iterative deployment steps is non-trivial as we need to account for any circular dependencies etc especially since we have multiple resolvers which re-use and share the pipeline functions.
  • Our customers run into issues like timeouts etc when using the iterative deployments in a CI environment since they take longer than normal deployments.

Given these pros and cons, would be really useful if the service can handle detaching the resolvers once they are soft deleted and waiting for cleanup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-appsync Related to AWS AppSync bug This issue is a bug. effort/medium Medium work item – several days of effort needs-cfn This issue is waiting on changes to CloudFormation before it can be addressed. p2
Projects
None yet
Development

Successfully merging a pull request may close this issue.