Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion|Onyx] Mixing Onyx.set and Onyx.merge usages leads to race condition #5930

Closed
kidroca opened this issue Oct 18, 2021 · 13 comments
Closed
Labels
Engineering Monthly KSv2 Planning Changes still in the thought process

Comments

@kidroca
Copy link
Contributor

kidroca commented Oct 18, 2021

If you haven’t already, check out our contributing guidelines for onboarding and email contributors@expensify.com to request to join our Slack channel!


Discussion

Decide how to deal with a race condition that happens when mixing Onyx.merge and Onyx.set

Problem

It's not obvious that we can introduce changes leading to Onyx.set and Onyx.merge being called close in time with the same key and create a race condition.

Details

During one of our PRs we discovered an issue where storage is getting overwritten with an older value due to using Onyx.set and Onyx.merge to update the same key: #5726 (comment)

Debugging revealed that Onyx.merge and Onyx.set are called very close in time: #5726 (comment)

The problematic code did this

  1. Invite button is pressed
  2. One piece of code calls Onyx.merge to clear policy errors
  3. At the same time another piece of code calls Onyx.set to update the same policy key
  4. The merge from 1) completes last and overwrites local storage with older value

Even though merge and set are called at the same time. merge has to first read the full policy object in order to merge the changes to it. merge starts and gets the value before set has updated it, since it's promise based it would continue on the next tick. In the meantime set saves a new value in storage (step 2). On the next tick (step 3) the merge is applied with the stale data and another call to set overwrites the storage

A debug session that captured the problem
1.New.Expensify.-.Google.Chrome.2021-10-15.17-49-48.mp4
@kidroca kidroca added AutoAssignerTriage Auto assign issues for triage to an available triage team member Daily KSv2 labels Oct 18, 2021
@MelvinBot
Copy link

Triggered auto assignment to @NicMendonca (AutoAssignerTriage), see https://stackoverflow.com/c/expensify/questions/4749 for more details.

@MelvinBot MelvinBot removed the AutoAssignerTriage Auto assign issues for triage to an available triage team member label Oct 18, 2021
@kidroca
Copy link
Contributor Author

kidroca commented Oct 18, 2021

IMO to avoid this problem we should only allow one way to update Onyx. E.g. hide set from the interface and enforce updates to happen only through Onyx.merge.

But at the moment merge cannot cover all the cases (or maybe it's not clear how):

  • When we want to add an item to an array we can use Onyx.merge (e.g. add one more invite to policy)
  • When we wan't to remove an item from an array we can only use Onyx.set

The policy code originally used Onyx.set to be consistent, you can see here that it still uses Onyx.set when it has to remove items from the policy:

// If the operation failed, undo the optimistic addition
const policyDataWithoutLogin = _.clone(allPolicies[key]);
policyDataWithoutLogin.employeeList = _.without(allPolicies[key].employeeList, ...newEmployeeList);

Onyx.set(key, policyDataWithoutLogin);

@NicMendonca
Copy link
Contributor

I am not entirely sure how to triage this 😅 Adding the engineering label for more eyes.

@MelvinBot
Copy link

Triggered auto assignment to @bondydaa (Engineering), see https://stackoverflow.com/c/expensify/questions/4319 for more details.

@NicMendonca NicMendonca removed their assignment Oct 19, 2021
@parasharrajat
Copy link
Member

@NicMendonca This is a discussion issue and does not need to work on or exported.

@bondydaa
Copy link
Contributor

per the philosophy in the readme https://github.com/Expensify/App/#Philosophy

Actions should favor using Onyx.merge() over Onyx.set() so that other values in an object aren't completely overwritten.

So why is code being merged that uses Onyx.set() when we should be using .merge()?

Or is the problem that even multiple Onyx.merge() calls to update the same key can create the same race condition?

@bondydaa bondydaa removed their assignment Oct 19, 2021
@bondydaa bondydaa added Monthly KSv2 Planning Changes still in the thought process and removed Daily KSv2 labels Oct 19, 2021
@marcaaron
Copy link
Contributor

This is a challenging one to address because it feels like a potential problem with Onyx, but also feels like a potential problem with our conventions of syncing data in New Expensify.

Under normal conditions these race conditions should not be happening and the batching of merge() calls protects us from any really bad effects.

But it does seem likely that when we are using a combination of set() and merge() in the same tick then the outcome would be hard to predict. So maybe we can start by looking at where we made those decisions and why?

It should be more clear what sort of action to take if we audit the workarounds that we suspect cause race conditions.

Enforcing that merge() be preferred over set() could be a great solution. But it's hard to say without considering why we sometimes prefer to use set() over merge(). One reason I've observed is that we are trying to remove some values and "reset" the data by using set() instead of merging new data into stale data. "syncing" local data with an API response is another reason you'll see this happen.

@kidroca
Copy link
Contributor Author

kidroca commented Oct 22, 2021

IMO it would be best to expose only one method for updating Onyx that works similarly to merge, e.g.

export interface Onyx {
  update({ key: string, value: any, strategy: UpdateStrategy }): Promise<void>
  // or
  update(key: string, value: any, options: { strategy: UpdateStrategy }): Promise<void>
}

enum UpdateStrategy {
  DeepMerge,
  ShallowMerge,
  Overwrite,
}

strategy would have a default and it can be DeepMerge - this is the way the current merge works
Overwrite would be the successor of set
And ShallowMerge is for cases where we don't want to merge nested objects or arrays but replace them - this allows us to remove/replace inner arrays instead of always merging new items to them
Furthermore to delete something we can have a special value constant exposed from onyx

Onyx.update({ key: ONYXKEYS.TOKEN, value: Onyx.DELETE_VALUE })

This allows us to have a queue of updates similar to how merge works with a merge queue, but having an information for the type of the update e.g. overwrite or deletion would allow us to disregard any previous values in the queue since if a latter update is a complete overwrite - prior queued updates have no effect on the end result

@marcaaron
Copy link
Contributor

marcaaron commented Oct 22, 2021

Nice, I like the idea of batching and having different strategies to apply changes in a synchronous way. I feel the public interface should maybe be simpler only because I'm not sure if anyone would stop to think about whether they need a deep or shallow merge 😄

It feels like Expensify/App has outgrown Onyx a bit and overtime we've run into situations where we need a more complex way to modify data.

Just as a thought experiment, what if we extended the public methods Onyx has to solve some of the problems we might run into when working with data and then add the internal queue to help prevent the race conditions?

Some things that have been floated in the past and in this issue...

  • No easy way to delete a specific object properties (we can set it to null but that's a workaround)

We've discussed adding a callback to Onyx.set() but other priorities pulled me from that investigation. Another idea was to add something like Onyx.delete('key.property.nestedProperty').

  • No clear way to remove a specific item from an array

I wonder if we could add something like Onyx.filter() or other methods for working with arrays. I find it unintuitive that we can push items into an array with Onyx.merge() but there are no ways to remove them.


Anyway, that conversation could go on for a while 😅 to refocus...

  1. Can we reproduce the race condition with a unit test?
  2. Do we agree that we should start by making sure that Onyx.set() and Onyx.merge() calls happen in the order that they are called?

@kidroca
Copy link
Contributor Author

kidroca commented Oct 22, 2021

Just as a thought experiment, what if we extended the public methods Onyx has to solve some of the problems we might run into when working with data and then add the internal queue to help prevent the race conditions?

It's possible to go the other direction and expose a CRUD api
We can keep set but make it clear/update the mergeQueue if a merge queue exists for the key we're setting. The same thing could work for deletions.
The only reason we have a race condition is that one action uses a queue and can be applied with a delay, while the rest - set/remove they don't wait

I wonder if we could add something like Onyx.filter() or other methods for working with arrays. I find it unintuitive that we can push items into an array with Onyx.merge() but there are no ways to remove them.

The most flexible solution would be something as what you've brought up: callback to Onyx.set() but for .merge - merge already reads the stored value, so if

  • merge is called with an object value we can do the traditional merge
  • merge is called with a function value - we call that function with the read value and return the merged result. Then the next function in the merge queue is called with the product of the previous and so on until we're done with the merge queue

And then it's possible to do something like

Onyx.merge(ONYXKEYS.ARRAY_CONTAINING_KEY, (value) => {
  return {
    ...value,
    myNestedArray: value.myNestedArray.filter(entry => shouldKeepEntry(entry))
  }
})

I guess it makes sense to have this interface for both set and merge, but it would make set slowe and delay the write - currently set does not wait to read the value while merge does

No easy way to delete a specific object properties

The Onyx.DELETE_VALUE can help here

Onyx.merge(ONYXKEYS.ARRAY_CONTAINING_KEY, {
  // deletes the value, e.g. the whole underlying object,array,string etc...
  someKey: Onyx.DELETE_VALUE, 
  someNestedObj: {
    // deletes this nested key 
    someKey: Onyx.DELETE_VALUE,
  }
  // Deletes everything past the 2nd index, while prior items are merged
  someList: [{}, {}, Onyx.DELETE_VALUE],
})

Onyx.delete could work as well but I would prefer an interface like Onyx.delete(key, path)
But I don't see how you can use it to delete multiple array entries at once - you can only delete single items like

Onyx.delete(myKey, ['property', 'nested']);
// or
Onyx.delete(myKey, ['property', 'nestedList', 2]) 
// though it's not obvious whether remaining indexes get shifted down

  1. Can we reproduce the race condition with a unit test?

This should be easy, I can post a PR

  1. Do we agree that we should start by making sure that Onyx.set() and Onyx.merge() calls happen in the order that they are called?

In short yes - no matter what we decide (multiple or single ways to update storage) we still need to make changes to the merge queue so that it takes into consideration overwrites that happened from updates like set or delete

We just need to decide what we want to achieve - there are 2 scenarios, and only one causes a race condition

Ok case - set is called first

  1. set is called and immediately stores the new value in cache
  2. merge is called and reads the latest value from cache

Bug case - set is called after merge

  1. merge is called and reads currently stored value
  2. set is called and updates currently stored value before merge does
  3. merge saves the merged result and overwrites whatever happened in step 2)

ATM I can think of 2 ways to handle the bug case

  • when set is called - clear the mergeQueue for that key and put whatever was set at the tip of the mergeQueue. This way the merge is applied on the latest value (but it's questinoable whether we should apply an update to at all)
    • Update: this is tricky as a set could happen after a merge due to an API response where we might still want to apply the change started from the merge on the api response. What do you think?
  • so when set is called - clear the mergeQueue and abort the pending merge. Don't go with the pending merge since a set happens after hence it's more recent and it completely overwrites the key

@kidroca
Copy link
Contributor Author

kidroca commented Oct 22, 2021

Nice, I like the idea of batching and having different strategies to apply changes in a synchronous way. I feel the public interface should maybe be simpler only because I'm not sure if anyone would stop to think about whether they need a deep or shallow merge 😄

The public interface can be similar to current merge - Onyx.update(key, value) where we default to a deep merge, but now I see that it's not clear that we do a merge and the value can be just the delta...
We can mention it in the method documentation

Instead of deep and shallow merge we can stick to just 2 strategies - merge that works like current Onyx.merge, and set that works like current Onyx.set. If not specified we use merge by default - this should be documented on the method. Now people will be inclined to use merge and would only opt in to use set when it's really necessary
This includes everything I've said about Onyx.DELETE_VALUE and how to delete nested values
And can be extended with even more control with a method callback method like:

Onyx.update(ONYXKEYS.ARRAY_CONTAINING_KEY, (value) => {
  return {
    ...value,
    myNestedArray: value.myNestedArray.filter(entry => shouldKeepEntry(entry))
  }
})

Showcase

Typical usage (merging)

Onyx.update(MY_KEY, myValue);

Removing a nested key

Onyx.update(MY_KEY, { keyToDelete: Onyx.DELETE_VALUE });

Removing the whole key

Onyx.update(MY_KEY, Onyx.DELETE_VALUE);

Overwriting with an API response

Onyx.update(MY_KEY, myValue, { strategy: Onyx.UpdateStrategy.SET });

Removing array items (IRL example)

 // If the operation failed, undo the optimistic addition 
 const policyDataWithoutLogin = _.clone(allPolicies[key]); 
// set data back to what it was (policy), tell Onyx to discard any entries remaining past the original list
 policyDataWithoutLogin.employeeList = [...policy.employeeList, Onyx.DELETE_VALUE];

@marcaaron
Copy link
Contributor

so when set is called - clear the mergeQueue and abort the pending merge. Don't go with the pending merge since a set happens after hence it's more recent and it completely overwrites the key

Yeah I think we should keep things simple for now and do:

  • If you call set() after merge() then the set() should always cancel any merges
  • If you call merge() after set() the data should be merged into whatever is being set()
  • If two set() then a merge() happen then the merge should merge into the data of the last set() to be called

Once that is settled we'll have a better foundation to discuss other improvements - maybe by auditing the usages of set() and seeing whether we are using it as a "reset" or workaround where we set modified local data like in the Policy.js case.

@marcaaron
Copy link
Contributor

Quick update here. I don't think there's really much else to do here just yet so I want to wrap up this discussion.

I experimented with a test and proved the race condition exists. However, I'm not sure there are any actual cases where we're even calling a set() after a merge() so I added logging for detecting if someone tries to do it as it will currently work unexpectedly.

I think there's been some good stuff here regarding alternative methods to update Onyx data. But seems tangential to the race condition investigation so I'm gonna close this out.

It would be useful to re-approach this conversation by just looking at the Policy code and determining how we can or should modify Onyx to make it work better for us. Will create a new issue for that exploration and we can reference the convos here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Engineering Monthly KSv2 Planning Changes still in the thought process
Projects
None yet
Development

No branches or pull requests

6 participants