Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relationship edge inconsistency #543

Open
MythicLionMan opened this issue Jan 14, 2021 · 1 comment
Open

Relationship edge inconsistency #543

MythicLionMan opened this issue Jan 14, 2021 · 1 comment

Comments

@MythicLionMan
Copy link

MythicLionMan commented Jan 14, 2021

I'm having an issue where relationships are causing the wrong object to be deleted. Unfortunately this is in a very complex code path, so I can't give any steps to reproduce the behaviour, but I can report my observations. While I can't guarantee that there are no bugs in our code, the behaviour I'm observing from Yap does seem inconsistent.

This is a pretty serious bug because it can cause seemingly random data deletions in a Yap app that uses relationships.

Mechanism

I tracked this down to [YapDatabaseTransaction removeObjectForCollectionKey:withRowId:] being called with the wrong row ID. I added the following code block to the start of the method and put a breakpoint in it.

    YapCollectionKey *storedKey = [self collectionKeyForRowid:rowid];
    if (![storedKey isEqual:cacheKey]) {
        int64_t correctRowID = 0;
        BOOL result = [self getRowid:&correctRowID forCollectionKey:cacheKey];
        
        NSLog(@"====rssr yap is removing rowid %ld. Caller thinks the key is %@ but it's actually %@. The correct rowid to delete is most likely %ld (fetch rowid result was %@)", rowid, cacheKey, storedKey, correctRowID, result ? @"TRUE": @"FALSE");
    }

When the issue happens I get the following log:

2021-01-14 18:14:12.558989-0500 Daylite[2351:1444308] ====rssr yap is removing rowid 9691. Caller thinks the key is <YapCollectionKey collection(RemoteMutationsReferences) key(72D89780-77D6-436F-BEEE-5024AA600D2F_1)> but it's actually <YapCollectionKey collection(accounts) key(6)>. The correct rowid to delete is most likely 0 (fetch rowid result was FALSE)

So it looks like the wrong row id is stored in the edge (it doesn't match the destination key)

The call to remove the object is in YapDatabaseRelationshipTransaction.m, at the end of "Enumerate all edges where source node is the deleted node" around line 3727:

								[databaseRwTransaction removeObjectForKey:edge->destinationKey
								                             inCollection:edge->destinationCollection
								                                withRowid:edge->destinationRowid];

My Steps To Reproduce

The workflow in our app is so complicated that it isn't possible to glean the real requirements for this issue from them, but I think there is some information about the preconditions.

  1. View the messages list (it's backed by a YapDatabaseAutoView).
  2. Start swiping to delete messages from the list.
  3. Each deletion will trigger a server sync that will push the deletion up to the server and then fetch any new data (There does not need to be any new data to trigger the bug however).
  4. After a certain number of messages are deleted one of them will come back. We do have code that will read back a deleted object if we fail to push the deletion to the server, but I haven't verified that this is the cause of the object coming back (due to some of the issues listed below).
  5. Swipe to delete the same message again. This second deletion succeeds locally, but when pushed to the server it causes the entire account object to be deleted while cleaning up some unrelated relationships.

Notes & Observations

  1. In my debug code above I try to fetch the correct rowID from the CollectionKey and it comes back as 0, so I'm guessing that the intended target of the edge has already been deleted.
  2. I don't think that the bug is caused when the edge is firing in step 5, I believe that it's actually in step 4 when the object comes back. If I quit the application between steps 4 and 5 the message that 'comes back' is not there, so it looks like it wasn't fetched back by our application and persisted, it's an in memory issue.
  3. If you look at the output of the debug log I put above it tries to resolve the real rowid of the destination, and it comes back as 0. I believe that the 'real' target of the relationship has already been deleted, but the machinery in edge processing doesn't realize this because of the rowid issue. (That's just a guess at this point). I have stepped through the call to fetch the rowid and the zero doesn't come from the cache and it does do a database query to attempt to retrieve the rowid.
  4. I have code that performs a rollback in some cases, but I did not see the log from this code at all in some cases where this issue has occurred.
  5. The edge that is firing was not created in the same transaction as the one that is deleting everything.
  6. While investigaing this we have noticed a few other inconsistencies where some of our objects were in an unexpected state. Broken references between objects that are updated in a single transaction. We weren't able to fully rule out a bug in our code, so I don't know if that issue is related to this one.
  7. If the message that 'came back' was caused by our sync code, the new message would have been assigned a new id, since we assign ids to objects sequentially and don't re-use them.
  8. We upgraded our version of Yap recently, and commits b3d4599 and dda1ece are both new to our repo. I don't believe that we had this problem before, but it is a bit hard to tell if this issue is happening or not.
  9. This issue is heavily timing dependant. I added a lot of logs to both our code and to Yap and the issue stopped happening. Clearly this is exposing an issue in some special case for concurrent code that doesn't get hit all the time.
  10. I can reproduce this issue with a debug build of Yap.
  11. The object that is incorrectly targeted by the edge is the root of our object hierarchy. Not only does this trigger the deletion of everything, I suspect that it is the first row in our database, which may be a clue to the mechanism of operation.

Workaround

As a workaround I'm going to try removing the edge that is firing and deleting the wrong objects. We don't need this particular relationship anymore, so hopefully it will prevent this issue. But that seems like a band-aid solution, since the underlying problem is likely to remain. I'm a bit worried about trusting the relationship extension.

@MythicLionMan
Copy link
Author

I've done some more work on this issue, and created some unit tests to reproduce it. The tests are here:

https://github.com/Marketcircle/YapDatabase/blob/UnitTestEdge/YapDatabaseTests/YapDatabaseTests.swift

I made the following observations:

  1. I discovered the bug in our code. We were resurrecting an object that we had deleted, which is what was triggering this bug. That wasn't technically incorrect use of Yap, but it was an error in our code. I also removed the relationships that were triggering the problem. But the underlying issue still remains that a Yap relationship can delete arbitrary objects.
  2. In the unit tests I was able to reproduce the issue by deleting an object and then creating an object with a relationship in the same transaction. The new object would re-use the rowid of the deleted object, which would result in the relationship thinking that the new object had been deleted because relationships are evaluated against row ids, not keys, and in this case the row id is changing.
  3. This is likely slightly different than the latest issue I encountered with this. In that issue the relationship was getting the destination row id wrong, whereas in the unit test it gets the source row id wrong.
  4. I found an issue that I filed in 2018, Cascading relationship deletes can select the wrong rule. #453 for the same bug. That issue references an issue from 2016, Relationship regression: Removing and inserting an object breaks the relationship #274 that identified this as an architectural issue in the relationship extension.
  5. I found another workaround for this issue. If the rowid column of the main database table is created with the AUTOINCREMENT option it avoids the problem by ensuring that row ids are never re-used. This is a bit of a hack, and I don't know if this is a fix for all issues caused by this pattern, but it is an easy solution while a redesign of the relationship extension is contemplated. (It also doesn't fix existing databases, but that would be possible).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant