Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataStore latter updates should take precedent on a slow network #6363

Closed
nubpro opened this issue Jul 18, 2020 · 17 comments
Closed

DataStore latter updates should take precedent on a slow network #6363

nubpro opened this issue Jul 18, 2020 · 17 comments
Assignees
Labels
bug Something isn't working DataStore Related to DataStore category

Comments

@nubpro
Copy link
Contributor

nubpro commented Jul 18, 2020

Describe the bug
Say you have a button to update a Post title.
With that you change the post title to 'Hairy Potter'. This mutation is queued in the outbox.
However before this mutation is processed, you decides to update the post title again to 'Marry in the Soberland'.
As a user, you would expect your post title to be permanently change to the latter title as the local updates reflects that.
Once the both mutations are processed, the post title would return to 'Hairy Potter' which isn't correct.

To Reproduce

  1. On iOS, enable Network Link Conditioner, choose Edge(or even worse) as the profile.
  2. Update a post title to 'Title A'.
  3. Before the mutation is processed, quickly update the post title to 'Title B'.
  4. After both mutations are done, you would notice the post title is back to 'Title A'.

Expected behavior
The expected post title should be 'Title B'.

What is Configured?
We are using 'Auto merge' as the Conflict Handler, however it shouldn't matter for this case. As I would expect DataStore to give precedent to the second mutation because the first mutation is not processed yet. Personally, I think the solution would be to increment the version number by 1 to the second mutation after knowing that the first mutation is processed.

Environment
  System:
    OS: macOS 10.15.5
    CPU: (4) x64 Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
    Memory: 59.48 MB / 8.00 GB
    Shell: 5.7.1 - /bin/zsh
  Binaries:
    Node: 14.3.0 - /usr/local/bin/node
    Yarn: 1.22.4 - /usr/local/bin/yarn
    npm: 6.14.4 - /usr/local/bin/npm
    Watchman: 4.9.0 - /usr/local/bin/watchman
  Browsers:
    Chrome: 83.0.4103.116
    Safari: 13.1.1
  npmPackages:
    @babel/core: ^7.9.0 => 7.10.3 
    @babel/runtime: ^7.9.2 => 7.10.3 
    @react-native-community/async-storage: ^1.11.0 => 1.11.0 
    @react-native-community/eslint-config: ^1.0.0 => 1.1.0 
    @react-native-community/masked-view: ^0.1.9 => 0.1.10 
    @react-native-community/netinfo: ^5.7.1 => 5.9.4 
    @react-navigation/drawer: ^5.4.1 => 5.8.4 
    @react-navigation/native: ^5.1.5 => 5.6.1 
    @react-navigation/stack: ^5.2.10 => 5.6.2 
    aws-amplify: ^3.0.9 => 3.0.20 
    aws-amplify-react-native: ^4.0.4 => 4.2.1 
    babel-jest: ^25.3.0 => 25.5.1 
    babel-plugin-root-import: ^6.5.0 => 6.5.0 
    eslint: ^6.8.0 => 6.8.0 
    eslint-import-resolver-babel-plugin-root-import: ^1.1.1 => 1.1.1 
    immutability-helper: ^3.0.2 => 3.1.1 
    jest: ^25.3.0 => 25.5.4 
    metro-react-native-babel-preset: ^0.59.0 => 0.59.0 
    moment: ^2.24.0 => 2.27.0 
    patch-package: ^6.2.2 => 6.2.2 
    postinstall-postinstall: ^2.1.0 => 2.1.0 
    prettier: ^2.0.5 => 2.0.5 
    react: 16.11.0 => 16.11.0 
    react-devtools: ^4.6.0 => 4.7.0 
    react-native: ^0.62.2 => 0.62.2 
    react-native-barcode-mask: ^1.2.2 => 1.2.4 
    react-native-camera: ^3.23.0 => 3.30.0 
    react-native-device-info: ^5.6.1 => 5.6.1 
    react-native-elements: ^1.2.7 => 1.2.7 
    react-native-fs: ^2.16.6 => 2.16.6 
    react-native-gesture-handler: ^1.6.1 => 1.6.1 
    react-native-get-random-values: ^1.4.0 => 1.4.0 
    react-native-logs: ^2.2.1 => 2.2.1 
    react-native-paper: ^3.8.0 => 3.10.1 
    react-native-permissions: ^2.1.1 => 2.1.5 
    react-native-reanimated: ^1.8.0 => 1.9.0 
    react-native-safe-area-context: ^0.7.3 => 0.7.3 
    react-native-screens: ^2.4.0 => 2.9.0 
    react-native-sound-player: ^0.10.4 => 0.10.4 
    react-native-svg: ^12.1.0 => 12.1.0 
    react-native-svg-transformer: ^0.14.3 => 0.14.3 
    react-native-vector-icons: ^6.6.0 => 6.6.0 
    react-native-webview: =9.2.1 => 9.2.1 
    react-native-zip-archive: ^5.0.4 => 5.0.4 
    react-test-renderer: 16.11.0 => 16.11.0 
    recyclerlistview: ^3.0.0 => 3.0.0 
  npmGlobalPackages:
    @aws-amplify/cli: 4.22.0
    ios-deploy: 1.10.0
    npm: 6.14.4
    react-native-cli: 2.0.1

Smartphone (please complete the following information):

  • Device: iPad
  • OS: iOS 13
  • React Native
@nubpro nubpro added the to-be-reproduced Used in order for Amplify to reproduce said issue label Jul 18, 2020
@mauerbac mauerbac added the feature-request Request a new feature label Jul 23, 2020
@ashika01 ashika01 removed the to-be-reproduced Used in order for Amplify to reproduce said issue label Jul 27, 2020
@nubpro
Copy link
Contributor Author

nubpro commented Aug 10, 2020

@mauerbac Thanks for parking this under feature-request, but isn't this more like an issue with DataStore rather than a feature request?

@jamesonwilliams
Copy link
Contributor

+1 @nubpro @mauerbac - if reproducible this is a bug. Update mutations should always affect the local data, regardless of the network state.

@undefobj undefobj added bug Something isn't working and removed feature-request Request a new feature labels Aug 21, 2020
@sammartinez sammartinez added DataStore Related to DataStore category React Native React Native related issue labels Sep 14, 2020
wei added a commit to wei/aws-amplify-react that referenced this issue Nov 10, 2020
@wei
Copy link
Contributor

wei commented Nov 10, 2020

I can reproduce in React too. Repo link & git checkout 6363.

Local data/mutations seems to be working fine, it's the appsync graphql request that is returning the previous data when _version remains unchanged.

For example I'm sending 5 saves A B C D E: https://github.com/wei/aws-amplify-react/blob/6363/src/App.js#L21-L27

2 Appsync graphql requests can be observed:

# Request body Response body
1 image image
2 image image

I tried to replay the second request as well sending the same _version and I can observe that _version and _lastChangedAt are being updated but name remain as Title A

When setting _version to the exact _version returned in the latest appsync graphql request, the request will work fine and return the new name.

When setting _version to a number higher than _version previously returned by appsync, an error is returned with message Client version is greater than the corresponding server version. Therefore, we cannot just increment _version on the client-side.

The second request in my screenshots is updating _version and _lastChangedAt but not name which looks like it's appsync's AUTOMERGE at play. I'll try with react native next.

Update
I can reproduce it in React Native as well. It is sending the exact same two graphql requests as seen in the screenshots above after network is restored. I'l do some more digging on the DataStore's offline queue / hashmap undefobj mentioned in the comment below.

@undefobj
Copy link
Contributor

Clients never update the version in DataStore. The consistency model is to have a central authority and therefore only AppSync updates the version. This is outlined in the AppSync documentation as well.

Up-leveling to the original question: If I make a local mutation, then change the data again before going online, the final state is what should be represented in the system. If that's not happening then this is a bug and should be looked into. DataStore's offline queue is a hashmap which operates in the following mechanism:

  • If you make more than one change to data without a predicate, then the final result is one mutation over the network representing the final state
  • If you make more than one change to data with a predicate, then the second change will be appended to the end of the hashmap resulting in two mutations over the network. This gives replay of operations and causal updates.

With this information lets go back to the original question:

With that you change the post title to 'Hairy Potter'. This mutation is queued in the outbox.
However before this mutation is processed, you decides to update the post title again to 'Marry in the Soberland'.

Based on this description I suspect a bug exists in the React Native implementation for draining the hashmap after a successful service confirmation of a write. We should review this in bugbash.

@wei
Copy link
Contributor

wei commented Nov 11, 2020

@undefobj Thanks for sharing your insights.

I performed some tests on react-native. Take my example of updating titles to A B C D E in sequence.

There are two different scenarios:

  1. After turning off network, and we wait for Reachability - Notifying reachability change false before sending all 5 saves. Everything is working as expected - Once network is back on and reachability becomes true, only one graphql mutation request is sent to appsync for E.
  2. If we turn network to edge or slower, reachability does NOT change to false as the network is still reachable just slow. The first save will send a request for mutation for A which will take a while. In the meantime B C D E saves are called and the outbox correctly removes B C D and leaving only E. Once graphql mutation request for A is finished, the request for E is sent. Since the two requests have the same _version, AUTOMERGE will ignore the second update (E) to "title" (a scala field). (See screenshots in my first reply)

To summarize: This issue arises when a request is enroute and another mutation on the same model is triggered in the meantime (with the same _version).

@nubpro One solution I can think of is to use a custom LAMBA conflict resolver to basically ignore _version and always update the field, not ideal but doable.

@nubpro
Copy link
Contributor Author

nubpro commented Nov 11, 2020

@undefobj Thanks for sharing your insights.

I performed some tests on react-native. Take my example of updating titles to A B C D E in sequence.

There are two different scenarios:

  1. After turning off network, and we wait for Reachability - Notifying reachability change false before sending all 5 saves. Everything is working as expected - Once network is back on and reachability becomes true, only one graphql mutation request is sent to appsync for E.
  2. If we turn network to edge or slower, reachability does NOT change to false as the network is still reachable just slow. The first save will send a request for mutation for A which will take a while. In the meantime B C D E saves are called and the outbox correctly removes B C D and leaving only E. Once graphql mutation request for A is finished, the request for E is sent. Since the two requests have the same _version, AUTOMERGE will ignore the second update (E) to "title" (a scala field). (See screenshots in my first reply)

To summarize: This issue arises when a request is enroute and another mutation on the same model is triggered in the meantime (with the same _version).

@nubpro One solution I can think of is to use a custom LAMBA conflict resolver to basically ignore _version and always update the field, not ideal but doable.

Thanks for the detailed investigation. I don't think this problem is exclusive to React Native only is it?

We are using a custom conflict resolver setup for the particular mutation to address this pain-point in prod.

@wei
Copy link
Contributor

wei commented Nov 11, 2020

I don't think this problem is exclusive to React Native only is it?

@nubpro yeah I believe it's across the board.

@undefobj
Copy link
Contributor

@nubpro One solution I can think of is to use a custom LAMBA conflict resolver to basically ignore _version and always update the field, not ideal but doable.

Please don't do this, let AppSync handle updates to the version. Modifying it could lead to data loss.

I've spoken with @iartemiev and I think there is an improvement to the hashmap implementation for slow networks that we can make by not allowing the write to the record when the item is in flight. He's tracking the issue internally.

@wei
Copy link
Contributor

wei commented Nov 20, 2020

Would the solution be throwing an error when user is trying to save a model meeting the following condition?

  • ID matches AND _version <= the one in-flight

Users can then catch this error on Datastore.save?

@danrivett
Copy link

danrivett commented Nov 20, 2020

I haven't been following this thread in detail, but @undefobj 's suggestion at first glance concerns me because the whole point of DataStore in my view is that it insulates the developer from having to worry about online-offline connectivity - things are queued on the way out and on the way in. What @undefobj suggests appears to deviate from this.

I don't know the internals here, but it appears to me that every local mutation should either increase the _version field or likely better some other distinct version field for local writes, and some sort of ever-increasing internal version field is used to ensure we always know the sequence of the local writes, so we can decide which one is ultimately needed to be synced externally, or which ones need to be synced and in what order, to ensure consistency.

(This ever-increasing versioning stategy is normally derived using the current time, but I've forgotten the exact name of this versioning strategy - something to do with atomic clocks but my brain and google-fu is failing me).

The summary of my concern is I wouldn't want local writes to be blocked because the network is either unavailable, slow, or intermittently available, as that would be a terrible user experience and also brings up questions like what @wei said in that we'd want to fail fast in those situations, but in a slow network situation, that's kind of a paradox.

/cc @iartemiev

@undefobj
Copy link
Contributor

@danrivett I think you're reading too much into the phrasing of my comment. The library should handle all of these implementations internally, not force the developer to think about them. @wei is one of the Amplify team fellows working with @iartemiev

@laclance
Copy link

same issue with react

@iartemiev
Copy link
Member

@nubpro and @laclance
Update: we've identified the issue and are working on a solution for this specifically when using Automerge as the conflict resolution strategy.

Note that this problem is exclusive to Automerge. Optimistic Concurrency abides by last-write-wins, so it will correctly keep the latter update.

@mauerbac mauerbac added the needs-discussion Used for internal discussions label Dec 16, 2020
@laclance
Copy link

Hi, is there any update please?

@sammartinez sammartinez removed the needs-discussion Used for internal discussions label Feb 3, 2021
@iartemiev
Copy link
Member

We just deployed changes that should fix this.
Please install aws-amplify@3.3.26 or @aws-amplify/datastore@2.9.15

@raybotha
Copy link

raybotha commented Apr 1, 2021

I'm having an issue that seems similar to this on the latest version (#8012), if it's the same bug this issue is not fixed.

@github-actions
Copy link

github-actions bot commented Apr 2, 2022

This issue has been automatically locked since there hasn't been any recent activity after it was closed. Please open a new issue for related bugs.

Looking for a help forum? We recommend joining the Amplify Community Discord server *-help channels or Discussions for those types of questions.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 2, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working DataStore Related to DataStore category
Projects
None yet
Development

No branches or pull requests