Millions of rows #6

Raynos · 2012-09-12T22:26:46Z

There is the millions of rows & memory issue.

I'm envisioning being able to have a document with large N rows which are lazy loaded.

Then you can create limited Sets (with smarter queries on what defines a set) which you know to be of a small M size.

Then you just need a way to sychronize changes to Sets without synchronizing the entire document.

dominictarr · 2012-09-12T23:05:34Z

There is already some support for replicating only a single item. hmm... no I think I removed that feature when I switched to using scuttlebutt.

Basically, you need to detect when a object enters a set and when it leaves.
hmm. when it enters you might need to send entire state (the significant events that created the current state)

When it leaves a set, a node can just forget that object, but when it's added you need to update all the events from that object. hmm. You can't resend old events, if it's past the scuttlebutt timestamp for that source... you could send the current snapshot, but that will not be eventually consistent with concurrent updates to that object...

I think you will need a special event that attaches the history for that update into the enter event, and then that is inserted into the model.
hmm, crdt values are always {}, so you could make an array have a special meaning...

[key, [HISTORY], ts, source]

ts would be the latest change in HISTORY, the one which added it to the set.
hmm, yeah I think this would work.

dominictarr · 2012-09-12T23:06:07Z

I agree, this would be a really cool feature.

Raynos · 2012-09-13T06:00:59Z

@dominictarr not just really cool. But fundamentally important.

Alternatively we just write a distributed DHT.

The dataset for an application will never fit into memory unless a) very well designed b) low number of users.

dominictarr · 2012-09-13T09:42:18Z

Yes, but this was why I it's called a crdt Document, and not a crdt Database.
If you are using a database like couchdb, then the client will only load a few records at a time.

So I figured, you could always just partition the documents onto many processes,
each having a more reasonable number of rows. Is that what you mean by DHT?

it would be possible to make it larger but we'd have to rewrite it in something that's not javascript.

hmm, prehaps if we just made the scuttlebutt functions all async, that could work too.
probably a combination of these approaches is best.

dominictarr · 2012-09-13T10:09:44Z

A DHT should be pretty easy, actually, using crdt to replicate the list of nodes, (which will still be a relatively small document, a few hundred nodes is pretty big) and then keeping a replica of a given set of crdt documents in a couple of nodes. Just like dynamo, but REALTIME.

Raynos · 2012-09-13T11:55:59Z

@dominictarr a few hundred nodes is a few hundred concurrent customers.

I should go read dynamo and read about DHTs.

CoderPuppy · 2013-01-16T14:55:30Z

👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Millions of rows #6

Millions of rows #6

Raynos commented Sep 12, 2012

dominictarr commented Sep 12, 2012

dominictarr commented Sep 12, 2012

Raynos commented Sep 13, 2012

dominictarr commented Sep 13, 2012

dominictarr commented Sep 13, 2012

Raynos commented Sep 13, 2012

CoderPuppy commented Jan 16, 2013

Millions of rows #6

Millions of rows #6

Comments

Raynos commented Sep 12, 2012

dominictarr commented Sep 12, 2012

dominictarr commented Sep 12, 2012

Raynos commented Sep 13, 2012

dominictarr commented Sep 13, 2012

dominictarr commented Sep 13, 2012

Raynos commented Sep 13, 2012

CoderPuppy commented Jan 16, 2013