-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AppSync detached resolvers #146
Comments
I'm not sure I follow how this can be expected behaviour. There are two issues I see here:
|
I agree with @alextriaca. This cannot be expected behavior as we have never manually attached the resolvers in the first place. The flow is something like this:
|
I've found a better example of how there is some subtle bug here rather than this being intended functionality. The resolvers below each have the timestamp embedded in their ID. Each time this stack is deployed it will deploy new resolvers, attach them to the API and then delete the old ones. Schema: type Query {
ping1: String
ping2: String
ping3: String
} API (In CDK): from aws_cdk import core, aws_appsync
from datetime import datetime
class GraphqlTestStack(core.Stack):
def __init__(self, scope: core.Construct, **kwargs) -> None:
super().__init__(scope, "GraphqlTestStack", **kwargs)
api = aws_appsync.GraphqlApi(
self,
"test_api-",
name="test_api",
schema=aws_appsync.Schema.from_asset("resources/schema.graphql"),
)
api.add_none_data_source(f"ping1_{int(datetime.now().timestamp())}").create_resolver(
type_name="Query",
field_name="ping1",
request_mapping_template=aws_appsync.MappingTemplate.from_string('{"version": "2018-05-29"}'),
response_mapping_template=aws_appsync.MappingTemplate.from_string('$util.toJson("pong1")'),
)
api.add_none_data_source("ping2_{int(datetime.now().timestamp())}").create_resolver(
type_name="Query",
field_name="ping2",
request_mapping_template=aws_appsync.MappingTemplate.from_string('{"version": "2018-05-29"}'),
response_mapping_template=aws_appsync.MappingTemplate.from_string('$util.toJson("pong2")'),
)
api.add_none_data_source("ping1_{int(datetime.now().timestamp())}").create_resolver(
type_name="Query",
field_name="ping3",
request_mapping_template=aws_appsync.MappingTemplate.from_string('{"version": "2018-05-29"}'),
response_mapping_template=aws_appsync.MappingTemplate.from_string('$util.toJson("pong3")'),
) Where this breaks is if you then deploy a broken schema (as below) you will now get a deploy error (resolver 2 will fail to attach) and this will trigger a rollback. After rollback, none of the resolvers will be attached. This is definitely unintended. type Query {
ping1: String
ping2: String this_is_a_typo
ping3: String
} |
I would love to get some attention on this. All of our resolvers were just detached during a deployment and deploying again did not reattach them. I had to comment out the resolver part, deploy, uncomment out the resolvers and deploy again to make them be reattached again. |
We identify a bug on schema validation process, and are working on a fix so that it is not possible to deploy an incorrect schema. |
I don’t think that will fix the root issue. I got this without deploying an invalid schema. I changed the name of some lambdas and data sources which resulted in all affected resolvers being detached |
I can also confirm I've seen the same behaviour as @ansman. The issue I had with reporting it was that I couldn't reliably reproduce it. Its exactly as @ansman says, when renaming a resolver/function/schema/field/something you can end up with a detatched resolver. I've really struggled to produce this reliably though. I do however think it's more closely related to how the resolvers fail to reattach rather than an invalid schema can cause this to occur. |
If I understand correctly, I think there are 2 different root causes to resolvers being detached. Issue 1: as @alextriaca mentioned earlier in the thread with the invalid schema update. This should be fixed once we have the schema validation work ready. Issue 2: @ansman mentioned that resolvers detached due to data source update? I think this is a different issue and root cause is different from the one above. If the above statement is correct, we can open a new Github issue on Issue 2, and discuss error reproducing steps and related items there. Thanks. |
@shaneyuan18 sounds like a plan. There is definitely something more subtle here on issue 2 and a separate ticket to discuss it makes sense 👍 |
I think we've experienced a similar issue to Issue 2 in @shaneyuan18 's comment above. Due to an update to CDK's AppSync libraries, our resolver logical resource IDs were changed. On deployment, this caused the creation of new resolvers, and the cleanup deletion of the old ones. I believe the new resolvers were created fine and attached to the fields, but when CloudFormation deleted the old resolvers, it also deleted the field->resolver mapping, leaving the field without an attached resolver. We haven't reproduced it yet, but if we do I'll post the results here, or in a new ticket if it's warranted. |
Somewhat inspired by @Dave-McCraw 's comment on a related issue I have also added a workaround in our CDK code which computes an md5 checksum of the Schema and includes this as a comment in the resolver templates, as our CD will run cloudformation step if the clouformation has changed (as such a timestamp would run this every time and slow our deploys when in reality we don't change the schema that often). I can't share the PR at the moment but here's the implementation if its useful...
const gqlSchema = appsync.Schema.fromAsset(
join(__dirname, "../shared/graphql/schema.graphql")
);
const gqlSchemaChecksum = crypto
.createHash("md5")
.update(gqlSchema.definition, "utf8")
.digest("hex");
// workaround for resolvers sometimes getting disconnected
// see https://github.com/aws/aws-appsync-community/issues/146
const resolverBugWorkaround = (mappingTemplate: appsync.MappingTemplate) =>
appsync.MappingTemplate.fromString(
`## schema checksum : ${gqlSchemaChecksum}\n${mappingTemplate.renderTemplate()}`
);
blahDataSource.createResolver({
typeName: "Query",
fieldName: "listBlah",
responseMappingTemplate: resolverBugWorkaround(
appsync.MappingTemplate.lambdaResult()
),
}); ... I only implemented last night, but seems to be working so far (it certainly reconnected all the resolvers when I deployed it to test environment). |
I know this is not a productive comment but the broader issue of resolvers becoming dettached (irrespective of the cause) really makes Appsync unbearable on large schema deployments. Please can some time be invested at a cloudformation or appsync team level to resolve this. One would expect that on a valid cloudformation template deployment, where you have a list of resolvers matching graphql type and fields, that those resolvers are attached and on aninvalid / failed deployment, the state of the appsync app should return to where it was before. I know there is complexity to solving this but anything short of that is an unreliable product for a part of a tech-stack that needs to be the most robust and failure-proof - your API. The fact that one can have successful api layer deployments, with resolvers specified in the cloudformation template only to perform an API query and find out that it's not attached to the appsync API is a big problem. This is a topic that's been brought up on:
@ahmadizm or any other team members, can we bring more attention to this? I'm happy to invest my time working to help resolve this (whether it's our own use case examples or suggestions on how to resolve it) |
Unfortunately there are a lot of sharp edges with Cloudformation & Appsync and they don't appear to be likely resolved anytime soon. We have renamed our Query/Mutation types to Query#/Mutation# and will increment # whenever we are likely to run into one of these problems. It causes a minor outage on the fields but its a better workaround than having an unknown amount of detached resolvers. Some other related issues: |
Heya. AppSync team member here. I just wanted to check in on this issue. While the subject of this issue is still very much a sharp edge (i.e., the resolver just goes away, and your stack has no idea), we have worked to reduce some of the friction around managing resolvers. I wanted to check in with folks to understand what pain you're experiencing around this topic to focus our efforts. I'd appreciate if you could post and re-articulate the friction or wishlist items here or shoot me an email at pignataj at amazon.com. Thank you! |
I just had the same issue. I accidentally deployed an invalid schema and my resolver got detached. Redeploying with a valid schema did not reattach the resolver. It happened on the test environment so it doesn't really matter right now, but would have been a problem in the prod environment. (It also took me a while to pinpoint the issue because there was no error, just a null response.) I commented out my resolver, deployed, and commented it back. I agree that we shouldn't have to do above steps and the resolver should be attached automatically. |
@jpignata We have moved away from AppSync due to this and other performance related issues. |
Gotcha, @bigkraig. Thanks for letting me know. If you have any interest in chatting further, I'd love to hear more. @elishevarom Makes sense, and I agree. Looks like some folks have workarounds where they version resolvers and the schema, and increment each time. We can do better than this. |
I came across this issue. Is there any update? @twrichards , can I ask you if your workaround, i.e. inserting a hash value into response mapping templates as a comment, is still working? I thought that this would be the most feasible and simplest way to work around this issue at this point if it was still working. |
@mille-printemps Yes definitely still working. I'm not aware of whether the bug is fixed though. |
@twrichards , Thank you for the prompt reply. I will use this way to work around this issue. I hope that this issue is fixed soon... |
@jpignata Any update on this? We are facing the issue right now.. |
@jeremycare can you possibly shoot me an email? I’d like to look at your issue specifically with our team to see what’s up. pignataj at amazon dot com. |
My team just encountered this issue, or some variant of it. Had three people pulling hair out to solve it, ultimate solution was to nuke everything underneath the graphql API itself (resolvers, data sources, and function configurations). We commented out all those resources, deployed via sam, uncommented them, redeployed, and it worked like a charm. |
We also ran into that yesterday and had to do the exact same thing. @jpignata it might be a regression on your side |
One workaround I've seen work here is to explicitly model dependencies between all resolvers in CDK/CFN and the Appsync Schema object (not API), as well as all datasources and the Schema object. At least that mitigation worked when I was encountering issues related to updating my schema and some resolvers leading to the untouched resolvers detached, but based on the comments here about needing to explicitly rename all resolvers, I suspect something similar may be happening in your use-cases. |
It is possible for resolvers to become detached when an incorrect schema is deployed and correcting the schema does not reattach the resolvers. In addition this issue can ripple to affect other resolvers on other fields.
Reproduction Steps
The most localised reproduction of this can be done with a single field schema and resolver. Given the following schema and API:
Schema:
API (In CDK):
If you then deploy this stack it will deploy successfully with a query of
ping
returningpong
. If you then deploy the following schema you will end up with a resolver that points to nothing:If you then realise your mistake and correct the schema and add the
ping
field back and redeploy, the schema will correct itself and the resolver will be present but the resolver will not be attached and a query toping
will returnnull
.Reproduction Steps (a more sinister presentation)
We can extend the previous example to add three fields to our Query like so:
Schema:
API (In CDK):
If you then deploy a broken version of the schema:
This will successfully deploy however the resulting schema will be empty:
If you then correct this mistake and deploy the original schema, the schema will be corrected and all three resolvers will be deployed, however all three will be detached.
This is 🐛 Bug Report
The text was updated successfully, but these errors were encountered: