Skip to content
This repository has been archived by the owner on Nov 12, 2019. It is now read-only.

Millions of rows #6

Open
Raynos opened this issue Sep 12, 2012 · 7 comments
Open

Millions of rows #6

Raynos opened this issue Sep 12, 2012 · 7 comments

Comments

@Raynos
Copy link
Contributor

Raynos commented Sep 12, 2012

There is the millions of rows & memory issue.

I'm envisioning being able to have a document with large N rows which are lazy loaded.

Then you can create limited Sets (with smarter queries on what defines a set) which you know to be of a small M size.

Then you just need a way to sychronize changes to Sets without synchronizing the entire document.

@dominictarr
Copy link
Owner

There is already some support for replicating only a single item. hmm... no I think I removed that feature when I switched to using scuttlebutt.

Basically, you need to detect when a object enters a set and when it leaves.
hmm. when it enters you might need to send entire state (the significant events that created the current state)

When it leaves a set, a node can just forget that object, but when it's added you need to update all the events from that object. hmm. You can't resend old events, if it's past the scuttlebutt timestamp for that source... you could send the current snapshot, but that will not be eventually consistent with concurrent updates to that object...

I think you will need a special event that attaches the history for that update into the enter event, and then that is inserted into the model.
hmm, crdt values are always {}, so you could make an array have a special meaning...

[key, [HISTORY], ts, source]

ts would be the latest change in HISTORY, the one which added it to the set.
hmm, yeah I think this would work.

@dominictarr
Copy link
Owner

I agree, this would be a really cool feature.

@Raynos
Copy link
Contributor Author

Raynos commented Sep 13, 2012

@dominictarr not just really cool. But fundamentally important.

Alternatively we just write a distributed DHT.

The dataset for an application will never fit into memory unless a) very well designed b) low number of users.

@dominictarr
Copy link
Owner

Yes, but this was why I it's called a crdt Document, and not a crdt Database.
If you are using a database like couchdb, then the client will only load a few records at a time.

So I figured, you could always just partition the documents onto many processes,
each having a more reasonable number of rows. Is that what you mean by DHT?

it would be possible to make it larger but we'd have to rewrite it in something that's not javascript.

hmm, prehaps if we just made the scuttlebutt functions all async, that could work too.
probably a combination of these approaches is best.

@dominictarr
Copy link
Owner

A DHT should be pretty easy, actually, using crdt to replicate the list of nodes, (which will still be a relatively small document, a few hundred nodes is pretty big) and then keeping a replica of a given set of crdt documents in a couple of nodes. Just like dynamo, but REALTIME.

@Raynos
Copy link
Contributor Author

Raynos commented Sep 13, 2012

@dominictarr a few hundred nodes is a few hundred concurrent customers.

I should go read dynamo and read about DHTs.

@CoderPuppy
Copy link

👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants