-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-Converging Operations #238
Comments
Thank you for submitting your first issue to this repository! A maintainer will be here shortly to triage and review.
Finally, remember to use https://discuss.ipfs.io if you just need general support. |
@dozyio I see you are writing in JS, which protobuf library are you using? Things you may double-check what happens on removal (seen them being different across libraries):
FYSA Helia uses https://www.npmjs.com/package/protons |
I can reproduce this... but it seems to work when dropProb = 0 at the beginning. I need to look further, but it might be that PrintDAG() is just printing blocks that have not been processed by the replicas (seems tests use a common blockstore?).. So it prints the same DAG because the DAGSyncer does not sync anything really as it is a single underlying blockstore. Instead perhaps some entries in that blockstore were not broadcasted. Why not? I'm not sure but it can give you a lead to look into, otherwise I will look again when I have some time. There is a processedBlocks namespace and perhaps printDAG should check against that list and print info on whether a block is processed or not. |
This is a bug! Good find @dozyio . I'm trying to come up with a fix. |
I've had a few hours to try and debug this but it starts getting quite messy and no luck as yet. From my limited understanding of AWOR sets I think the converged key should be deleted as it has the highest priority - is that correct? |
The problem is that while r0 has tombstoned both writes, when the x3nq update arrives (it is the last), it does not realize that the current-value (r0-2, priority 3) has been tombstoned and therefore (r1-1, priority 1) is actually the value that needs to take effect. replica 1 writes r1-1 first and then it never replaces it, as it processes the other updates later. |
The fix needs to touch the value/priority entries, either when putting Tombstones (to leave the correct value/priority) or when putting new values (to check that the current value/priority has not been tombstoned). I need to sleep on the approach... |
Thanks for looking into it! |
The problem: a replica may tombstone a value and then receive a new write for that value that had happened BEFORE the tombstone from a different replica. The final value/priority pair should be set to the value/priority that was not tombstoned. However we did not do that. We knew the key was not tombstoned, but the value returned corresponded to an update that was tombstoned. The solution: every time a tombstone arrives, we need to look for the "best value/priority", that is, we need to make sure that among all the key values that we have, we set the best one that was not tombstoned according to the CRDT rules (highest priority or lexicographical sorting when equal). The consecuences: this makes tombstoning a more expensive operation but it also allows us to remove value/priority altogether when all the values have been tombstoned. As such, we don't need to check if a value has been tombstoned anymore when doing Gets/List, before returning the element. That saves lookups and that also means we no longer need to bloom filter, which was supposed to speed up this operation. In general, datastore which mostly add data will be better afterwards.
The fix for #238 does not mean that everything is fine. We will have databases which have the wrong value/priority sets written and this would only fix itself on new writes or deletes to the same key. So we are unfortunately forced to manually fix it on start. For this we introduce a data migration. During a fresh start, we will then find all the keys affected by tombstones and loop them, finding the best value for them (the correct one) and fixing them. Once done we record that we are on version=1 and don't run this again. Future fuckups can be fixed with other migrations.
The fix for #238 does not mean that everything is fine. We will have databases which have the wrong value/priority sets written and this would only fix itself on new writes or deletes to the same key. So we are unfortunately forced to manually fix it on start. For this we introduce a data migration. During a fresh start, we will then find all the keys affected by tombstones and loop them, finding the best value for them (the correct one) and fixing them. Once done we record that we are on version=1 and don't run this again. Future fuckups can be fixed with other migrations.
The fix for #238 does not mean that everything is fine. We will have databases which have the wrong value/priority sets written and this would only fix itself on new writes or deletes to the same key. So we are unfortunately forced to manually fix it on start. For this we introduce a data migration. During a fresh start, we will then find all the keys affected by tombstones and loop them, finding the best value for them (the correct one) and fixing them. Once done we record that we are on version=1 and don't run this again. Future fuckups can be fixed with other migrations.
The problem: a replica may tombstone a value and then receive a new write for that value that had happened BEFORE the tombstone from a different replica. The final value/priority pair should be set to the value/priority that was not tombstoned. However we did not do that. We knew the key was not tombstoned, but the value returned corresponded to an update that was tombstoned. The solution: every time a tombstone arrives, we need to look for the "best value/priority", that is, we need to make sure that among all the key values that we have, we set the best one that was not tombstoned according to the CRDT rules (highest priority or lexicographical sorting when equal). The consecuences: this makes tombstoning a more expensive operation but it also allows us to remove value/priority altogether when all the values have been tombstoned. As such, we don't need to check if a value has been tombstoned anymore when doing Gets/List, before returning the element. That saves lookups and that also means we no longer need to bloom filter, which was supposed to speed up this operation. In general, datastore which mostly add data will be better afterwards.
The fix for #238 does not mean that everything is fine. We will have databases which have the wrong value/priority sets written and this would only fix itself on new writes or deletes to the same key. So we are unfortunately forced to manually fix it on start. For this we introduce a data migration. During a fresh start, we will then find all the keys affected by tombstones and loop them, finding the best value for them (the correct one) and fixing them. Once done we record that we are on version=1 and don't run this again. Future fuckups can be fixed with other migrations.
The problem: a replica may tombstone a value and then receive a new write for that value that had happened BEFORE the tombstone from a different replica. The final value/priority pair should be set to the value/priority that was not tombstoned. However we did not do that. We knew the key was not tombstoned, but the value returned corresponded to an update that was tombstoned. The solution: every time a tombstone arrives, we need to look for the "best value/priority", that is, we need to make sure that among all the key values that we have, we set the best one that was not tombstoned according to the CRDT rules (highest priority or lexicographical sorting when equal). The consecuences: this makes tombstoning a more expensive operation but it also allows us to remove value/priority altogether when all the values have been tombstoned. As such, we don't need to check if a value has been tombstoned anymore when doing Gets/List, before returning the element. That saves lookups and that also means we no longer need to bloom filter, which was supposed to speed up this operation. In general, datastore which mostly add data will be better afterwards.
The fix for #238 does not mean that everything is fine. We will have databases which have the wrong value/priority sets written and this would only fix itself on new writes or deletes to the same key. So we are unfortunately forced to manually fix it on start. For this we introduce a data migration. During a fresh start, we will then find all the keys affected by tombstones and loop them, finding the best value for them (the correct one) and fixing them. Once done we record that we are on version=1 and don't run this again. Future fuckups can be fixed with other migrations.
The fix for #238 does not mean that everything is fine. We will have databases which have the wrong value/priority sets written and this would only fix itself on new writes or deletes to the same key. So we are unfortunately forced to manually fix it on start. For this we introduce a data migration. During a fresh start, we will then find all the keys affected by tombstones and loop them, finding the best value for them (the correct one) and fixing them. Once done we record that we are on version=1 and don't run this again. Future fuckups can be fixed with other migrations.
Fix #238: Non converging operations with unordered deletes. Migration to fix existing datastores.
Hi,
I'm currently working on a JS port and encountered an edge case during interop testing where the values do not converge as expected, despite the DAGs being identical.
When performing a sequence of
put
,put
,delete
operations on replica0, and a singleput
on replica1 (all on the same key), replica0 reverts to the secondput
operation, while replica1 retains its initialput
. This causes the replicas to hold different values even though their DAGs are the same.Any ideas?
Test case below
The text was updated successfully, but these errors were encountered: