You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
in #100 I added a "data validation" task, but I'm breaking it out into a bigger issue here (#23 is also related) to capture my current thinking on data consistency and recovery from disk corruption or loss. I think that we should skip data validation for now, but have some specific recommendations at the end.
Some things to think about:
chain tail has corrupted data for an offset, but other chain parts don't, which blocks reads of that offset, but could be corrected.
non-tail node has corrupted data, which doesn't block reads, but silently reduces replication for long term storage.
a node comes up clean and must recover (we need to do this ASAP, but I think that it's not bad to write with existing primitives, as soon as we add a 'verify CRC mode' to the client).
nodes may have different index and log size settings, making bulk comparison difficult or impossible. that is, we can't simply md5 the whole file, which is fast, but must scan and compare per-value (and possibly across the whole chain) which can make things slow.
nodes may be at different stages in garbage collection, making bulk comparison difficult or impossible, same thing as before. Strongly consistent metadata could help here.
it's not clear to me where the right place to check the crc is. checking on the server is expensive and centralized, but checking at the client means that the error is detected far away from the source, which makes fixing it sometimes unclear.
Plan of action?
Things to do:
clean node recovery
single offset repair
AAE
clean node recovery is a little bit of a project, but is fairly pressing if we ever plan to grow chains. it might be possible to sidestep by never growing chains and draining them to bigger chains instead, but for disaster recovery, we should have this. I think that fully specifying the project is outside of the scope of this issue, but the primary issues are: detecting the need, how to report to clients while we're repairing, and how we decide on what the proper value to use for repair is.
simple single offset repair would be nice to have for a "read-repair" type issue, but I think an initial version could be designed to be triggered manually. we also need to decide on how to chose the right value, as above.
AAE is a big feature but important over the long run. standard merkle-tree implementations seems work well enough. keeping them from thrashing the cache or CPU too badly is the primary concern here, I think.
I don't know how to prioritize these. recovery can be done semi-manually by copying files and then letting write repair catch up the chain. single offset repair similarly, and it's not clear whether there is demand for AAE. does the data that we care about live long enough? or are people mostly running out their retention window before data corruption is likely?
The text was updated successfully, but these errors were encountered:
in #100 I added a "data validation" task, but I'm breaking it out into a bigger issue here (#23 is also related) to capture my current thinking on data consistency and recovery from disk corruption or loss. I think that we should skip data validation for now, but have some specific recommendations at the end.
Some things to think about:
Plan of action?
Things to do:
clean node recovery is a little bit of a project, but is fairly pressing if we ever plan to grow chains. it might be possible to sidestep by never growing chains and draining them to bigger chains instead, but for disaster recovery, we should have this. I think that fully specifying the project is outside of the scope of this issue, but the primary issues are: detecting the need, how to report to clients while we're repairing, and how we decide on what the proper value to use for repair is.
simple single offset repair would be nice to have for a "read-repair" type issue, but I think an initial version could be designed to be triggered manually. we also need to decide on how to chose the right value, as above.
AAE is a big feature but important over the long run. standard merkle-tree implementations seems work well enough. keeping them from thrashing the cache or CPU too badly is the primary concern here, I think.
I don't know how to prioritize these. recovery can be done semi-manually by copying files and then letting write repair catch up the chain. single offset repair similarly, and it's not clear whether there is demand for AAE. does the data that we care about live long enough? or are people mostly running out their retention window before data corruption is likely?
The text was updated successfully, but these errors were encountered: