Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataStore max subscriptions reached error #7036

Closed
nubpro opened this issue Oct 22, 2020 · 29 comments
Closed

DataStore max subscriptions reached error #7036

nubpro opened this issue Oct 22, 2020 · 29 comments
Assignees
Labels
bug Something isn't working DataStore Related to DataStore category

Comments

@nubpro
Copy link
Contributor

nubpro commented Oct 22, 2020

Describe the bug
I found an easy and reliable method of reproducing "MaxSubscriptionsReachedError" error which is coming from AppSync. This is a React Native project.

To Reproduce
This is the most explicit ways I could figured out to reproduce the bug.

Updated repro methods:
#7036 (comment)

1. Setup Auth and DataStore on your app.
2. Login and wait for syncing process is to finish. (I'd use Hub.listen('datastore') to wait for syncQueriesReady event to trigger)
3. On your physical device, off the WiFi.
4. Logout.
5. Turn WiFi on.
6. Repeat steps 2 to 6. Depends on your schema, you may need to do this up to 10 times until it hits the max subscription error. In my case, I would only need to repeat 5 times as my schema contains 7 models with 21 subscriptions.

Expected behavior
Max subscription reached error shouldn't happen. When this happens, the app is unable to receive any incoming changes from the server anymore.

Code Snippet
I don't have any project samples that I can share at the moment.

Screenshots
Here's a screenshot of what I get.
image

Environment
System:
    OS: macOS 10.15.5
    CPU: (4) x64 Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
    Memory: 53.96 MB / 8.00 GB
    Shell: 5.7.1 - /bin/zsh
  Binaries:
    Node: 14.3.0 - /usr/local/bin/node
    Yarn: 1.22.4 - /usr/local/bin/yarn
    npm: 6.14.8 - /usr/local/bin/npm
    Watchman: 4.9.0 - /usr/local/bin/watchman
  Browsers:
    Chrome: 86.0.4240.80
    Safari: 13.1.1
  npmGlobalPackages:
    @aws-amplify/cli: 4.29.5
    ios-deploy: 1.10.0
    npm: 6.14.8
    react-native-cli: 2.0.1

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6] iPad
  • OS: [e.g. iOS8.1] iOS 13
  • React Native
@nubpro nubpro added the to-be-reproduced Used in order for Amplify to reproduce said issue label Oct 22, 2020
@nubpro
Copy link
Contributor Author

nubpro commented Oct 22, 2020

Tagging @iartemiev and @undefobj to get their attention. Thanks!!

@amhinson amhinson added the DataStore Related to DataStore category label Oct 22, 2020
@iartemiev iartemiev removed the to-be-reproduced Used in order for Amplify to reproduce said issue label Oct 26, 2020
@iartemiev iartemiev self-assigned this Oct 26, 2020
@iartemiev iartemiev added the bug Something isn't working label Oct 26, 2020
@nubpro
Copy link
Contributor Author

nubpro commented Dec 16, 2020

Here's an updated reproduction method and I've included a repo here.

  1. Schema with 33 models.
  2. Open app.
  3. Wait until DataStore is ready.
  4. Disable wifi.
  5. DataStore.stop() or DataStore.clear().
  6. Enable wifi.
  7. DataStore.start().
  8. Max subscriptions reached error pops out.

Here's a video:
test

@nubpro
Copy link
Contributor Author

nubpro commented Feb 15, 2021

@amhinson Were you guys able to consistently repro the issue with the repo I posted above?
If so, I wonder is there a resolution in the near term?

@amhinson
Copy link
Contributor

Yes, thanks for the great reproduction sample! 🙏 We are aware of the issue, particularly as it relates to other similar issues around subscriptions, and it is in our internal queue.

@nubpro
Copy link
Contributor Author

nubpro commented Apr 6, 2021

Soo it's been almost 2 months after the last reply, I've been wondering is there any progress on this ticket?

@undefobj
Copy link
Contributor

undefobj commented Apr 6, 2021

This is still being tracked for later this year.

@nubpro
Copy link
Contributor Author

nubpro commented Apr 8, 2021

This is still being tracked for later this year.

Is this a kind of fix that requires changes on the Appsync service? Or the client library needs some tweaking?

From my observation is that we just need to make sure that the client's subscription is properly closed before initiating a new one.
Whats your thought on this?

@iartemiev
Copy link
Member

On macOS, I can't repro this issue with the app you provided, @nubpro.

I've attempted to repro 2 different ways for step 4:

A. Ethernet unplugged throughout the steps. Disable WiFi via the menu bar in step 4, enable in step 6
As well as
B. WiFi disabled throughout the steps. Pull out ethernet cable in step 4, plug back in during step 6.

After going through the steps over a dozen times, I never experienced the error. DS always starts back up successfully without running into the max subs error.

This is not to say that this error isn't happening, just that I can't reproduce it in my dev environment with your app/given steps.

I have been able to inconsistently repro this error in the past with an RN app running on a physical iPhone specifically, so I'll go down that route again.

From my observation is that we just need to make sure that the client's subscription is properly closed before initiating a new one.

How would you do this from the client if the WebSocket connection is severed before it can be correctly closed?

@nubpro
Copy link
Contributor Author

nubpro commented Apr 16, 2021

How would you do this from the client if the WebSocket connection is severed before it can be correctly closed?

Pardon me for my lack of knowledge in this space, please do correct me if I'm wrong on the sequence and technical part (on step 8 especially).

  1. Client establish a websocket connection. Let's call it websocket A.
  2. Message 1 is established on Subscription A through websocket A.
  3. Client disconnects from the network (OFFLINE).
  4. Subscription A errors out and attempts to be closed.
  5. Client sends GQL_STOP request to websocket A to end the subscription on the server side.
  6. However the stop request failed to be sent over to websocket A as there's no active network.
  7. The clients reconnects to the network (ONLINE).
  8. The same websocket which is websocket A is still used <-- am I right about this?
  9. Client will start Message 1 on a brand new subscription (Subscription B) instead of reusing the old one (Subscription A).
  10. So if you repeat these steps multiple times (step 3 to 9), on the server side you will have multiple subscriptions listening on Message 1 on the same websocket A. None of them is being properly closed and they wont timeout by themselves.

The problem is at step 6, existing subscription A is not properly closed through websocket A.
Looking at the source code, this seems intentional but I'm convinced that is not the right thing to do, that is if I'm right about step 8. This is all just my theory

} catch (err) {
// If GQL_STOP is not sent because of disconnection issue, then there is nothing the client can do
logger.debug({ err });
}
}

If I'm right about step 8, we should either reuse the same existing subscription for message 1 or to ensure it is properly closed by listening to GQL_COMPLETE. If it's not closed, just try to close it again. Repeat X times then back off.

I got most of this information by reading through this doc:
https://docs.aws.amazon.com/appsync/latest/devguide/real-time-websocket-client.html#appsynclong-real-time-websocket-client-implementation-guide-for-graphql-subscriptions

Anyways, thanks for getting back @iartemiev, I'll try to reproduce the repo on my Mac and see it how it goes

@nubpro
Copy link
Contributor Author

nubpro commented Apr 16, 2021

Alright, time for some update.

On MacOS, I've tested my repo with the same reproduction steps on different browser, chrome and safari on the same Mac.

Chrome v89:
I can't seems repro the "Max subscription reached error".

Meanwhile on Safari 14.0.2:
Yep, I hit that error quite consistently.
Here's the video:

Screen.Recording.2021-04-17.at.1.18.32.AM.mp4

This is so odd, idk why they behave differently from one another, you can clearly see that Chrome on MacOS isnt logging any errors for some reason

@iartemiev
Copy link
Member

iartemiev commented Apr 19, 2021

Thank you, @nubpro, I was also able to repro in Safari.

Browsers can use different WebSocket libraries/implementations under the hood, so it's not completely unreasonable that they're behaving differently. Perhaps Chrome's implementation is more resilient around network connectivity loss.

I'm going to dive deeper into this to make sure, but my understanding from reading the WebSocket Spec is that it's not possible to recover an improperly closed WebSocket connection. Attempting to reconnect, i.e., establishing a new subscription (step 9 in your comment) is likely the only way forward from the clientside. If so, the server (AppSync) would need to have a mechanism for closing these orphaned subscriptions sooner (AppSync currently disables subscriptions after 24 hours).

@vladimirzoyan
Copy link

any update on this? It's becoming a really big issue with content admin users.

@iartemiev
Copy link
Member

We're currently working on a solution to this issue with the AppSync team. Should be able to provide an update soon

@iartemiev
Copy link
Member

@nubpro after further testing, it appears that this is only reproducible with the steps you provided above if you call DataStore.stop or DataStore.clear before the library destroys all of the open socket connections, i.e., before these logs are emitted:

[DEBUG] AWSAppSyncRealTimeProvider - closing WebSocket...
[DEBUG] Hub - Dispatching to datastore with - {event: ""networkStatus", data: {active: false}}

Stopping DataStore at that point prevents it from cleaning up the socket connections correctly and when you call DataStore.start after the connection is re-established, it will attempt to open twice the necessary subscription connections (198 instead of 99 with the schema in your sample app), which causes that error.

In other words, if you wait until you see those ^ logs (2-3 seconds) before calling stop/clear in step 5. of your repro instructions, this error will not occur.

I think we can do a better job of making sure the subscriptions are closed correctly even if stop/clear is called. However, unless someone is calling DataStore.stop randomly in their app code, I'm not sure these repro steps are representative of what is occurring during real-world poor/intermittent network conditions. I was not able to repro when toggling Wi-Fi off/on, disconnecting/reconnecting ethernet, or toggling 100% loss in Network Link Conditioner unless I also called DataStore.stop at just the right time... We will continue investigating this further.

@iartemiev
Copy link
Member

@vladimirzoyan could you please share how many models you have in your schema and exactly which steps/user actions lead to the "max subscriptions reached" error?

@nubpro
Copy link
Contributor Author

nubpro commented May 26, 2021

@nubpro after further testing, it appears that this is only reproducible with the steps you provided above if you call DataStore.stop or DataStore.clear before the library destroys all of the open socket connections, i.e., before these logs are emitted:

[DEBUG] AWSAppSyncRealTimeProvider - closing WebSocket...
[DEBUG] Hub - Dispatching to datastore with - {event: ""networkStatus", data: {active: false}}

Stopping DataStore at that point prevents it from cleaning up the socket connections correctly and when you call DataStore.start after the connection is re-established, it will attempt to open twice the necessary subscription connections (198 instead of 99 with the schema in your sample app), which causes that error.

In other words, if you wait until you see those ^ logs (2-3 seconds) before calling stop/clear in step 5. of your repro instructions, this error will not occur.

I think we can do a better job of making sure the subscriptions are closed correctly even if stop/clear is called. However, unless someone is calling DataStore.stop randomly in their app code, I'm not sure these repro steps are representative of what is occurring during real-world poor/intermittent network conditions. I was not able to repro when toggling Wi-Fi off/on, disconnecting/reconnecting ethernet, or toggling 100% loss in Network Link Conditioner unless I also called DataStore.stop at just the right time... We will continue investigating this further.

We have deployed the app to more than 30 stores in production, and roughly 5 of them are suffering the same issue over and over again.
However this was extremely hard reproduced on our end, hence I came up with a consistent reproduction steps to make it easier for you guys to debug further.

No, we aren't calling DataStore.stop() at random. In fact, we don't at all.

I see that you have posted a PR, does it totally address the underlying issue?

@iartemiev
Copy link
Member

iartemiev commented May 26, 2021

I see that you have posted a PR, does it totally address the underlying issue?

It addresses the issue that occurs when following the repro steps. Since we've never been able to reproduce this any other way, I can't say definitively whether it will totally address every occurrence of this error for schemas containing up to 33 models.

@nubpro
Copy link
Contributor Author

nubpro commented May 26, 2021

I see that you have posted a PR, does it totally address the underlying issue?

It addresses the issue that occurs when following the repro steps. Since we've never been able to reproduce this any other way, I can't say definitively whether it will totally address every occurrence of this error for schemas containing up to 33 models.

Welp, I really appreciate the effort for putting time into this. But damn, it took this long to get this issue prioritized. Part of it is because I wasn't able to provide clear steps but damn...

@iartemiev
Copy link
Member

We've released a fix for this in aws-amplify@4.0.3. Closing the issue.

If you continue experiencing this issue after upgrading to the latest version of Amplify, please let us know and we will re-open it.

@cmgver
Copy link

cmgver commented Jul 10, 2021

We are having the same issue with subscriptions to appsync, but we don't use datastore. Do you have a solution for this?

photo_2021-07-09_21-43-41

@rush86999
Copy link

I am experiencing this issue with latest react native and datastore.

here's the error:

'[WARN] 16:49.728 DataStore - subscriptionError', 'Connection failed: {"errors":{"errorType":"MaxSubscriptionsReachedError","message":"Max number of 100 subscriptions reached"}}'

@iartemiev
Copy link
Member

@rush86999 how many models do you have in your schema?

@rush86999
Copy link

69 models

@iartemiev
Copy link
Member

@rush86999 DataStore supports a maximum of 33 models, because it has to create 3 subscriptions per model to keep the local store in sync with AppSync. That error is expected for a schema containing > 33 models.

@rush86999
Copy link

can i turn off the subscriptions? I think there is a cost involved with keeping a connection open. Shouldn't this be optional? Also I realized mutations are not instant so for certain cases I would just use the regular API for direct mutations on dynamodb

@rush86999
Copy link

rush86999 commented Aug 21, 2021

I just realized you mean my app will not work. Is there an alternative approach in this situation? I wish this was in the documentation someonwhere. I wrote my whole app already. I don't know where to go at this point.

@rush86999
Copy link

Can we turn off DataStore and call the API directly without issues?

@TheMoums
Copy link

Hey @rush86999.
Maybe take a look at this issue, it could help #6260

@github-actions
Copy link

This issue has been automatically locked since there hasn't been any recent activity after it was closed. Please open a new issue for related bugs.

Looking for a help forum? We recommend joining the Amplify Community Discord server *-help channels or Discussions for those types of questions.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working DataStore Related to DataStore category
Projects
None yet
Development

No branches or pull requests

8 participants