-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RCORE-1900 Make "next launch" metadata actions multiprocess-safe #7576
Conversation
fa29e67
to
350ca2c
Compare
Pull Request Test Coverage Report for Build thomas.goyne_436Details
💛 - Coveralls |
b6589ac
to
c28f550
Compare
c28f550
to
3916ad1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still reviewing, but posting a couple of questions/comments I have so far.
create_metadata_store(config, file_manager); | ||
REQUIRE(File::exists(path)); | ||
|
||
store_1.reset(); | ||
|
||
create_metadata_store(config, file_manager); | ||
REQUIRE(File::exists(path)); | ||
|
||
store_2.reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to verify: in this case, store_1 and/or store_2 are still open, so reopening the metadata store doesn't run the file actions. It's not until both are closed that the file actions are performed the next time the metadata store is opened.
From the PR description, it sounds like this is the current behavior, but may be changing in the future?
data->access_token.token.clear(); | ||
data->refresh_token.token.clear(); | ||
store->update_user("user 3", *data); | ||
store->update_user("user 3", [](auto& data) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you also check that the current user is "user 3"
prior to making these changes?
m_metadata_store->update_user(user->user_id(), [&](auto& data) { | ||
data.identities = std::move(identities); | ||
data.profile = UserProfile(get<BsonDocument>(profile_json, "data")); | ||
user->update_backing_data(data); // FIXME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between updating the user in the metadata store and updating the user's backing data?
Is one a cached copy vs persistent/saved to disk? Or is this the trigger for the client app to update their own user metadata storage? Or is there some other purpose.
I see the FIXME
comment, so perhaps this will be updated in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metadata store is the source of truth for the user data, and User stores an in-memory copy of it. The intention is to move to a one-way data flow where the metadata store pushes updates to the active users, but for now everything that updates the store also manually updates the users too.
m_metadata_store->update_user(user->user_id(), *data); | ||
user->update_backing_data(std::move(data)); | ||
} | ||
m_metadata_store->update_user(user->user_id(), [&](auto& data) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function a no-op if the user isn't found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Another process could delete the user concurrently with us updating the user, so it's not an error for the user to be missing.
I've added another commit which fixes the race condition mentioned in the PR description and added more of a description of what's going on. |
1b39976
to
f696863
Compare
I guess there needs to be a guarantee that the sync agent flag on the realm file is not claimed right? Is it that it's always a single process that claims both? (and so if the metadata one is claimed then it must mean that the realm file is not in use?) |
There's a single metadata Realm per app, and claiming the sync agent on it has no connection to claiming the sync agent on any of the Realms which the user opens. One process claiming the sync agent on the metadata realm and performing launch actions while another process is opening Realms and being the sync agent on those is completely fine. |
I thought that deleting a realm file while another process uses it would be an issue. |
The point of all this is to avoid deleting a Realm file in use. The metadata Realm is always open at any point where a sync Realm is open, and once the sync agent on the metadata Realm is claimed it remains claimed until everyone has closed the metadata Realm. This means that if we are able to claim the sync agent on the metadata Realm, we know that at that precise moment in time there were no open Realms associated with the current app in any process. It is invalid to reopen a Realm after creating a file action for that path, so once we have a point in time where we know that there are no open Realms, the Realms associated with all file actions created before that point cannot be open. While this is happening another process may be opening other Realms associated with the same app, but they cannot be the same ones as we're deleting. |
f696863
to
3855df6
Compare
I updated |
"Invalid" in that it's a misuse of the API which may lead to data loss without warning. The manual client reset API is basically a giant pile of sharp edges that is very difficult to use correctly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good. You should try rebasing as the migration test failure has been fixed a few weeks ago. There are some other failures though (i.e, SharedRealm: async writes
)
8d22399
to
3241380
Compare
The libuv scheduler used in tests on non-Apple platforms does not support this.
…de the write transaction This avoids overwriting data with stale values in multiprocess scenarios.
3241380
to
b2a419b
Compare
The CI failures were a few new calls which called |
File actions are performed on "next launch" since that's a time where we know it's safe to delete or move Realm files which may have been in use. This previously meant when the metadata store was next constructed, which was incorrect when multiple processes are sharing a metadata file. Instead, it tries to claim the sync agent flag on the metadata Realm file, and only performs the launch actions if it's able to.
There is still a race condition here: process 1 could initialize the metadata realm, claim the flag, then get suspended. Process 2 initializes, opens a Realm, and then removes the user. Process 1 then unsuspends and proceeds to run file actions, deleting the Realm process 2 has open. This is probably not actually a problem but I'll continue to try to find a fix.
Keeping the metadata Realm open is required for this to work, for change notification to work (a future change), and greatly improves performance of metadata operations. It slightly increases memory usage but the metadata realm is typically very small.
update_user() could previously overwrite data with stale data previously read. It was very difficult for this to actually happen with where and when we called it, but better to fix the problem.
There's two groups of not directly related updates to the tests. There were some spots where we called
assert_no_open_realms()
where the sync metadata Realm now should be open. These now check for the specific file that we're expecting to be closed instead. To help track down some scheduler-related problems, I added an assertion to verify that the libuv scheduler is only created on the main thread, as it doesn't support background threads. This turned out to reveal a bunch of places where we were creating them on background threads, which I then fixed.