AppSync detached resolvers #146

alextriaca · 2021-01-22T15:32:13Z

It is possible for resolvers to become detached when an incorrect schema is deployed and correcting the schema does not reattach the resolvers. In addition this issue can ripple to affect other resolvers on other fields.

Reproduction Steps

The most localised reproduction of this can be done with a single field schema and resolver. Given the following schema and API:
Schema:

type Query {
    ping: String
}

API (In CDK):

from aws_cdk import core, aws_appsync


class GraphqlTestStack(core.Stack):
    def __init__(self, scope: core.Construct, **kwargs) -> None:
        super().__init__(scope, "GraphqlTestStack", **kwargs)

        api = aws_appsync.GraphqlApi(
            self,
            "test_api-",
            name="test_api",
            schema=aws_appsync.Schema.from_asset("resources/schema.graphql"),
        )

        api.add_none_data_source("ping").create_resolver(
            type_name="Query",
            field_name="ping",
            request_mapping_template=aws_appsync.MappingTemplate.from_string(
                '{"version": "2018-05-29"}'
            ),
            response_mapping_template=aws_appsync.MappingTemplate.from_string(
                '$util.toJson("pong")'
            ),
        )

If you then deploy this stack it will deploy successfully with a query of ping returning pong. If you then deploy the following schema you will end up with a resolver that points to nothing:

type Query {
}

If you then realise your mistake and correct the schema and add the ping field back and redeploy, the schema will correct itself and the resolver will be present but the resolver will not be attached and a query to ping will return null.

Reproduction Steps (a more sinister presentation)

We can extend the previous example to add three fields to our Query like so:

Schema:

type Query {
    ping1: String
    ping2: String
    ping3: String
}

API (In CDK):

from aws_cdk import core, aws_appsync


class GraphqlTestStack(core.Stack):
    def __init__(self, scope: core.Construct, **kwargs) -> None:
        super().__init__(scope, "GraphqlTestStack", **kwargs)

        api = aws_appsync.GraphqlApi(
            self,
            "test_api-",
            name="test_api",
            schema=aws_appsync.Schema.from_asset("resources/schema.graphql"),
        )

        api.add_none_data_source("ping1").create_resolver(
            type_name="Query",
            field_name="ping1",
            request_mapping_template=aws_appsync.MappingTemplate.from_string('{"version": "2018-05-29"}'),
            response_mapping_template=aws_appsync.MappingTemplate.from_string('$util.toJson("pong1")'),
        )
        api.add_none_data_source("ping2").create_resolver(
            type_name="Query",
            field_name="ping2",
            request_mapping_template=aws_appsync.MappingTemplate.from_string('{"version": "2018-05-29"}'),
            response_mapping_template=aws_appsync.MappingTemplate.from_string('$util.toJson("pong2")'),
        )
        api.add_none_data_source("ping3").create_resolver(
            type_name="Query",
            field_name="ping3",
            request_mapping_template=aws_appsync.MappingTemplate.from_string('{"version": "2018-05-29"}'),
            response_mapping_template=aws_appsync.MappingTemplate.from_string('$util.toJson("pong3")'),
        )

If you then deploy a broken version of the schema:

type Query {
    ping1: String
    ping2: String this_is_a_typo
    ping3: String
}

This will successfully deploy however the resulting schema will be empty:

type Query {

}

If you then correct this mistake and deploy the original schema, the schema will be corrected and all three resolvers will be deployed, however all three will be detached.

This is 🐛 Bug Report

The text was updated successfully, but these errors were encountered:

alextriaca · 2021-01-25T07:55:47Z

I'm not sure I follow how this can be expected behaviour. There are two issues I see here:

You cannot deploy a resolver for a non existent field, you will simply get a Cloudformation error. However you can delete the schema from underneath the resolver without the same error being raised (example one above). A resolver requires you specify the specific Type and Field it connects to so mapping between the two is always possible.
It is possible to deploy an "incorrect" schema which passes the deployment validation but does not deploy correctly (example two above). This can then result in multiple resolvers detaching for no reason (and as you stated, will not reattach once they are corrected).

ansman · 2021-02-01T17:25:26Z

I agree with @alextriaca. This cannot be expected behavior as we have never manually attached the resolvers in the first place.

The flow is something like this:

An initial schema is deployed
CDK creates the resolvers AND attached them for us
A new, invalid, schema is deployed
CDK detaches the resolvers
A new, valid, schema is deployed
CDK does NOT attach the resolvers (which I would argue is expected)

alextriaca · 2021-02-11T10:53:14Z

I've found a better example of how there is some subtle bug here rather than this being intended functionality. The resolvers below each have the timestamp embedded in their ID. Each time this stack is deployed it will deploy new resolvers, attach them to the API and then delete the old ones.

Schema:

type Query {
    ping1: String
    ping2: String
    ping3: String
}

API (In CDK):

from aws_cdk import core, aws_appsync
from datetime import datetime

class GraphqlTestStack(core.Stack):
    def __init__(self, scope: core.Construct, **kwargs) -> None:
        super().__init__(scope, "GraphqlTestStack", **kwargs)

        api = aws_appsync.GraphqlApi(
            self,
            "test_api-",
            name="test_api",
            schema=aws_appsync.Schema.from_asset("resources/schema.graphql"),
        )

        api.add_none_data_source(f"ping1_{int(datetime.now().timestamp())}").create_resolver(
            type_name="Query",
            field_name="ping1",
            request_mapping_template=aws_appsync.MappingTemplate.from_string('{"version": "2018-05-29"}'),
            response_mapping_template=aws_appsync.MappingTemplate.from_string('$util.toJson("pong1")'),
        )
        api.add_none_data_source("ping2_{int(datetime.now().timestamp())}").create_resolver(
            type_name="Query",
            field_name="ping2",
            request_mapping_template=aws_appsync.MappingTemplate.from_string('{"version": "2018-05-29"}'),
            response_mapping_template=aws_appsync.MappingTemplate.from_string('$util.toJson("pong2")'),
        )
        api.add_none_data_source("ping1_{int(datetime.now().timestamp())}").create_resolver(
            type_name="Query",
            field_name="ping3",
            request_mapping_template=aws_appsync.MappingTemplate.from_string('{"version": "2018-05-29"}'),
            response_mapping_template=aws_appsync.MappingTemplate.from_string('$util.toJson("pong3")'),
        )

Where this breaks is if you then deploy a broken schema (as below) you will now get a deploy error (resolver 2 will fail to attach) and this will trigger a rollback. After rollback, none of the resolvers will be attached. This is definitely unintended.

type Query {
    ping1: String
    ping2: String this_is_a_typo
    ping3: String
}

ansman · 2021-03-03T14:34:27Z

I would love to get some attention on this. All of our resolvers were just detached during a deployment and deploying again did not reattach them. I had to comment out the resolver part, deploy, uncomment out the resolvers and deploy again to make them be reattached again.

shaneyuan18 · 2021-03-03T19:56:22Z

We identify a bug on schema validation process, and are working on a fix so that it is not possible to deploy an incorrect schema.

ansman · 2021-03-03T20:57:24Z

I don’t think that will fix the root issue. I got this without deploying an invalid schema. I changed the name of some lambdas and data sources which resulted in all affected resolvers being detached

alextriaca · 2021-03-03T21:14:20Z

I can also confirm I've seen the same behaviour as @ansman. The issue I had with reporting it was that I couldn't reliably reproduce it. Its exactly as @ansman says, when renaming a resolver/function/schema/field/something you can end up with a detatched resolver. I've really struggled to produce this reliably though. I do however think it's more closely related to how the resolvers fail to reattach rather than an invalid schema can cause this to occur.
@shaneyuan18, super excited to hear about that schema validation change though, will definitely increase the speed of failures being detected.

shaneyuan18 · 2021-03-04T19:57:19Z

If I understand correctly, I think there are 2 different root causes to resolvers being detached.

Issue 1: as @alextriaca mentioned earlier in the thread with the invalid schema update. This should be fixed once we have the schema validation work ready.

Issue 2: @ansman mentioned that resolvers detached due to data source update? I think this is a different issue and root cause is different from the one above.

If the above statement is correct, we can open a new Github issue on Issue 2, and discuss error reproducing steps and related items there. Thanks.

alextriaca · 2021-03-05T10:06:35Z

@shaneyuan18 sounds like a plan. There is definitely something more subtle here on issue 2 and a separate ticket to discuss it makes sense 👍

akuma12 · 2021-03-23T15:30:51Z

I think we've experienced a similar issue to Issue 2 in @shaneyuan18 's comment above. Due to an update to CDK's AppSync libraries, our resolver logical resource IDs were changed. On deployment, this caused the creation of new resolvers, and the cleanup deletion of the old ones. I believe the new resolvers were created fine and attached to the fields, but when CloudFormation deleted the old resolvers, it also deleted the field->resolver mapping, leaving the field without an attached resolver. We haven't reproduced it yet, but if we do I'll post the results here, or in a new ticket if it's warranted.

…for aws/aws-appsync-community#146

twrichards · 2021-04-15T09:49:18Z

Somewhat inspired by @Dave-McCraw 's comment on a related issue I have also added a workaround in our CDK code which computes an md5 checksum of the Schema and includes this as a comment in the resolver templates, as our CD will run cloudformation step if the clouformation has changed (as such a timestamp would run this every time and slow our deploys when in reality we don't change the schema that often).

I can't share the PR at the moment but here's the implementation if its useful...

Read the schema from file into an appsync.Schema (which we were doing anyway to use in the new appsync.GraphqlApi() call)

    const gqlSchema = appsync.Schema.fromAsset(
      join(__dirname, "../shared/graphql/schema.graphql")
    );

Produce an md5 checksum of the Schema content (given import crypto from "crypto";)...

    const gqlSchemaChecksum = crypto
      .createHash("md5")
      .update(gqlSchema.definition, "utf8")
      .digest("hex");

Re-usable function to wrap mapping templates to add the comment containing the checksum...

    // workaround for resolvers sometimes getting disconnected
    // see https://github.com/aws/aws-appsync-community/issues/146
    const resolverBugWorkaround = (mappingTemplate: appsync.MappingTemplate) =>
      appsync.MappingTemplate.fromString(
        `## schema checksum : ${gqlSchemaChecksum}\n${mappingTemplate.renderTemplate()}`
      );

Use the re-usable function on either the request or response mapping for each of your resolvers, for example...

    blahDataSource.createResolver({
      typeName: "Query",
      fieldName: "listBlah",
      responseMappingTemplate: resolverBugWorkaround(
        appsync.MappingTemplate.lambdaResult()
      ),
    });

... I only implemented last night, but seems to be working so far (it certainly reconnected all the resolvers when I deployed it to test environment).

danielblignaut · 2021-05-25T09:44:50Z

I know this is not a productive comment but the broader issue of resolvers becoming dettached (irrespective of the cause) really makes Appsync unbearable on large schema deployments. Please can some time be invested at a cloudformation or appsync team level to resolve this. One would expect that on a valid cloudformation template deployment, where you have a list of resolvers matching graphql type and fields, that those resolvers are attached and on aninvalid / failed deployment, the state of the appsync app should return to where it was before.

I know there is complexity to solving this but anything short of that is an unreliable product for a part of a tech-stack that needs to be the most robust and failure-proof - your API. The fact that one can have successful api layer deployments, with resolvers specified in the cloudformation template only to perform an API query and find out that it's not attached to the appsync API is a big problem.

This is a topic that's been brought up on:

AWS forums
AWS CDK github issues
serverless-appsync framework github issues
AWS amplify github issues
and here

@ahmadizm or any other team members, can we bring more attention to this? I'm happy to invest my time working to help resolve this (whether it's our own use case examples or suggestions on how to resolve it)

bigkraig · 2021-05-26T16:50:50Z

Unfortunately there are a lot of sharp edges with Cloudformation & Appsync and they don't appear to be likely resolved anytime soon.

We have renamed our Query/Mutation types to Query#/Mutation# and will increment # whenever we are likely to run into one of these problems. It causes a minor outage on the fields but its a better workaround than having an unknown amount of detached resolvers.

Some other related issues:

aws-amplify/amplify-cli#682

aws/aws-cdk#13269

jpignata · 2021-11-18T15:41:12Z

Heya. AppSync team member here. I just wanted to check in on this issue. While the subject of this issue is still very much a sharp edge (i.e., the resolver just goes away, and your stack has no idea), we have worked to reduce some of the friction around managing resolvers. I wanted to check in with folks to understand what pain you're experiencing around this topic to focus our efforts. I'd appreciate if you could post and re-articulate the friction or wishlist items here or shoot me an email at pignataj at amazon.com. Thank you!

elishevarom · 2021-11-23T19:43:48Z

I just had the same issue. I accidentally deployed an invalid schema and my resolver got detached. Redeploying with a valid schema did not reattach the resolver. It happened on the test environment so it doesn't really matter right now, but would have been a problem in the prod environment. (It also took me a while to pinpoint the issue because there was no error, just a null response.) I commented out my resolver, deployed, and commented it back. I agree that we shouldn't have to do above steps and the resolver should be attached automatically.

bigkraig · 2021-11-23T21:37:17Z

@jpignata We have moved away from AppSync due to this and other performance related issues.

jpignata · 2021-11-23T23:13:02Z

Gotcha, @bigkraig. Thanks for letting me know. If you have any interest in chatting further, I'd love to hear more.

@elishevarom Makes sense, and I agree. Looks like some folks have workarounds where they version resolvers and the schema, and increment each time. We can do better than this.

mille-printemps · 2022-02-21T07:00:29Z

I came across this issue. Is there any update?

@twrichards , can I ask you if your workaround, i.e. inserting a hash value into response mapping templates as a comment, is still working? I thought that this would be the most feasible and simplest way to work around this issue at this point if it was still working.

twrichards · 2022-02-21T07:47:19Z

I came across this issue. Is there any update?

@twrichards , can I ask you if your workaround, i.e. inserting a hash value into response mapping templates as a comment, is still working? I thought that this would be the most feasible and simplest way to work around this issue at this point if it was still working.

@mille-printemps Yes definitely still working. I'm not aware of whether the bug is fixed though.

mille-printemps · 2022-02-21T07:52:39Z

@twrichards , Thank you for the prompt reply. I will use this way to work around this issue.

I hope that this issue is fixed soon...

jeremycare · 2023-03-28T20:24:42Z

@jpignata Any update on this? We are facing the issue right now..

jpignata · 2023-03-28T21:20:49Z

@jeremycare can you possibly shoot me an email? I’d like to look at your issue specifically with our team to see what’s up. pignataj at amazon dot com.

trevorbennett · 2023-07-27T13:51:19Z

My team just encountered this issue, or some variant of it. Had three people pulling hair out to solve it, ultimate solution was to nuke everything underneath the graphql API itself (resolvers, data sources, and function configurations).

We commented out all those resources, deployed via sam, uncommented them, redeployed, and it worked like a charm.

jeremycare · 2023-07-27T16:30:38Z

We also ran into that yesterday and had to do the exact same thing. @jpignata it might be a regression on your side

alharris-at · 2023-07-27T16:47:26Z

One workaround I've seen work here is to explicitly model dependencies between all resolvers in CDK/CFN and the Appsync Schema object (not API), as well as all datasources and the Schema object.

At least that mitigation worked when I was encountering issues related to updating my schema and some resolvers leading to the untouched resolvers detached, but based on the comments here about needing to explicitly rename all resolvers, I suspect something similar may be happening in your use-cases.

alextriaca mentioned this issue Jan 25, 2021

Resolvers not getting attached sid88in/serverless-appsync-plugin#225

Open

This was referenced Mar 2, 2021

Modified datasource types don't get updated correctly sid88in/serverless-appsync-plugin#393

Closed

Converting to a NONE dataSource from a LAMBDA data source results in an error during deployment sid88in/serverless-appsync-plugin#394

Open

twrichards added a commit to guardian/pinboard that referenced this issue Apr 15, 2021

add checksum of schema as a comment on all resolvers as a workaround …

bac6a82

…for aws/aws-appsync-community#146

twrichards mentioned this issue Apr 15, 2021

workaround for AppSync 'design flaw' which loses resolvers guardian/pinboard#51

Merged

twrichards added a commit to guardian/pinboard that referenced this issue Apr 15, 2021

add checksum of schema as a comment on all resolvers as a workaround …

306f878

…for aws/aws-appsync-community#146

twrichards added a commit to guardian/pinboard that referenced this issue Apr 15, 2021

add checksum of schema as a comment on all resolvers as a workaround …

9c233e6

…for aws/aws-appsync-community#146

alextriaca mentioned this issue Apr 20, 2021

(aws-appsync): resolver replacement issues aws/aws-cdk#13269

Closed

jpignata added the bug Something isn't working label Jun 6, 2021

ewoelfel mentioned this issue Jun 27, 2022

Cannot remove function + attached resolver hashicorp/terraform-provider-aws#15480

Open

twrichards mentioned this issue Aug 12, 2022

replace DynamoDB Streams for the notifications-lambda with SQS and a AppSync pipeline resolver guardian/pinboard#150

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AppSync detached resolvers #146

AppSync detached resolvers #146

alextriaca commented Jan 22, 2021

alextriaca commented Jan 25, 2021

ansman commented Feb 1, 2021

alextriaca commented Feb 11, 2021 •

edited

Loading

ansman commented Mar 3, 2021

shaneyuan18 commented Mar 3, 2021

ansman commented Mar 3, 2021

alextriaca commented Mar 3, 2021

shaneyuan18 commented Mar 4, 2021

alextriaca commented Mar 5, 2021

akuma12 commented Mar 23, 2021

twrichards commented Apr 15, 2021 •

edited

Loading

danielblignaut commented May 25, 2021

bigkraig commented May 26, 2021

jpignata commented Nov 18, 2021

elishevarom commented Nov 23, 2021

bigkraig commented Nov 23, 2021

jpignata commented Nov 23, 2021

mille-printemps commented Feb 21, 2022

twrichards commented Feb 21, 2022 •

edited

Loading

mille-printemps commented Feb 21, 2022

jeremycare commented Mar 28, 2023

jpignata commented Mar 28, 2023

trevorbennett commented Jul 27, 2023

jeremycare commented Jul 27, 2023

alharris-at commented Jul 27, 2023 •

edited

Loading

AppSync detached resolvers #146

AppSync detached resolvers #146

Comments

alextriaca commented Jan 22, 2021

Reproduction Steps

Reproduction Steps (a more sinister presentation)

alextriaca commented Jan 25, 2021

ansman commented Feb 1, 2021

alextriaca commented Feb 11, 2021 • edited Loading

ansman commented Mar 3, 2021

shaneyuan18 commented Mar 3, 2021

ansman commented Mar 3, 2021

alextriaca commented Mar 3, 2021

shaneyuan18 commented Mar 4, 2021

alextriaca commented Mar 5, 2021

akuma12 commented Mar 23, 2021

twrichards commented Apr 15, 2021 • edited Loading

danielblignaut commented May 25, 2021

bigkraig commented May 26, 2021

jpignata commented Nov 18, 2021

elishevarom commented Nov 23, 2021

bigkraig commented Nov 23, 2021

jpignata commented Nov 23, 2021

mille-printemps commented Feb 21, 2022

twrichards commented Feb 21, 2022 • edited Loading

mille-printemps commented Feb 21, 2022

jeremycare commented Mar 28, 2023

jpignata commented Mar 28, 2023

trevorbennett commented Jul 27, 2023

jeremycare commented Jul 27, 2023

alharris-at commented Jul 27, 2023 • edited Loading

alextriaca commented Feb 11, 2021 •

edited

Loading

twrichards commented Apr 15, 2021 •

edited

Loading

twrichards commented Feb 21, 2022 •

edited

Loading

alharris-at commented Jul 27, 2023 •

edited

Loading