-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
provide for data retention and storage payment #40
Comments
I just mentioned space rental to @leithaus earlier this week; he said there's an architecture / design from 2018 that just hasn't been built yet. Meanwhile...
The last finalized state has exactly the channels that could ever be accessed again. So a straightforward way to garbage collect is to stop a node and resume from LFS. This doesn't address the time-cost of storage. But storage does get cheaper over time... at a conference of librarians I saw a presentation on "endowed publication" where they charged, say, $1000 to store 1 GB "forever" using interest on an endowment. |
garbage collection is not the issue here though it is certainly important. The issue is reachable data that is ancient. URI's and deployerIds may be tips of the iceberg for names in contracts and other data structures that are forever reachable..
I cannot think of a possible solution that does not require a hard fork which is why I think we need to address this before the planned hard forks. |
Before the planned hard forks seems quite ambitious. This seems like a Venus thing. |
We lose all the history in the LFS such that we cannot distinguish the old from the new. Keeping the block number of last access allows mimicking natural memory with old memories, not reinforced, being lost.
Greg has a different emerging time solution in mind. Keeping block number seems a easy fix for now with some other notion of time added later but it seems it will be delayed Greg does not seems to be considering last access. I am not convinced the charging for storage scheme being considered is viable. |
Greg wants a natural notion of relative time ordering to emerge. I agree. I suggest that we use last accessed block number (or block time) to represent that ordering temporarily. I expect it to be a long time before we need to improve it and it can enable the entropy required by natural systems. The common clock referenced is the blockchain itself. Validators need all to agree on which tuples are expired based on the maximum range of block number of last access are valid. Encountering tuples having a earlier last access are removed not executed to be in sync with other validators. A periodic sweep can remove expired tuples, shrinking the size of tuple space and backing them up if desired. There is the possibility of resurrecting expired tuples from backups. A particular block number on a shard is a type of time ordering on blockchains and we can prepare to add other time ordering types later. Behavioural types ultimately can enable greg's dream. On blockchains block height is a natural ordering available immediacy without the overhead of it's emergence. @tgrospic suggested we experiment and see the overhead cost of keeping the last accessed in the tuple space |
In my career I have often documented a problem that was inevitable but often the warning wasn't heeded until the crises occurred. Most often I had already developed a plan for recovery however if the size of tuple space becomes unmanageable and we do not have last access for tuples I see no path of resolution. I think it is worth at least determining the overhead of keeping the last access for tuples in rspace. The argument that we can always add last accessed later, assuming that is true, is okay as long as we anticipate needing it well in advance of a crises. |
Introduction/Motivation/Abstract
It has been stated that rchain will eventually delete data that has not paid for continued storage. We cannot allow tuple space to grow without bound with reads and writes that are never likely to occur. Currently there is no distinction between items in tuple space. No record is kept of when they were created or last accessed.
There is no immediate need to solve all the retention issues at this time but there is a need to insure we are creating a tuple space that will allow for a retention policy in the future.
Design
A minimum implementation is a modification of tuple space to keep a least recently used list of tuples along with the block number.Additional future fields can be provided for.
at some block height least recently used tuples could be dropped . Then after each propose data can be purged for the lower block height plus one. The number of block age always kept may be constant or inflationary TBD.
In the future keeping the deployid for tuples could enable refreshing deploys.
A comprehensive solution might include a rholang extension allowing names to listen for the deletion event
Alternatives
?
The text was updated successfully, but these errors were encountered: