initial implementation of the dagstore. #20

raulk · 2021-07-01T17:48:19Z

🐉🐉🐉🐉🐉🐉 BEWARE OF DRAGONS!

This PR contributes an initial implementation of the DAG store.

Concurrency model

The DAG store operates on an event loop to perform mutations to the shard state. This is important as it allows shard operations to be non-blocking (although we can provide a blocking stub on top of this async plumbing). This means that the user does not need to worry about for a shard registration to have returned before they can go and ask to acquire that shard. I explored numerous alternatives and discarded them, such as:

using locks all over the place; prone to concurrency bugs and a nightmare to debug races.
using per-shard goroutines; this is wasteful as registering 1MM shards would immediately explode onto 1MM goroutines (which is quite feasible and expected for Filecoin miners, 1 deal = 1 shard). Furthermore, the majority of those goroutines would be idle most of the time.

Finite state machine

Shards to go through a finite state machine. It would be worth to formalise this FSM.

Public API

The public API is quite simple:

RegisterShard(key shard.Key, mnt mount.Mount, out chan ShardResult, opts RegisterOpts) error
AcquireShard(key shard.Key, out chan ShardResult, _ AcquireOpts) error
DestroyShard(key shard.Key, out chan ShardResult, _ DestroyOpts) error

The caller supplies a channel where they want the result returned, when available. This is much better and flexibile than having the DAG store return a channel, since it enables the caller to send all results to a single channel with a single goroutine servicing it.

The caller can also pass a per-call channel if they so desire. Moreover, we can easily create sugar *Sync() method counterparts that do this behind the scenes, giving the appearance of a synchronous interface.

Mount upgrader

This PR contributes a "mount upgrader". This is in turn a Mount that acts as a locally cached copy of a remote mount (e.g. Lotus, HTTP, FTP, NFS, etc.)

Indices

I was able to remove the FullIndex abstraction, since the CAR library already provides an index.Index interface that we can use directly.

TODO

This is an initial, unstable, buggy implementation to get the ball rolling. There are many features missing here, such as:

Transient file tracking and cleanup; refcounting in the mount.Upgrader. Aka "scrap area".
- implement transient file tracking and cleanup #26
mmap transients.
- mmap transients #28
Proper failure management (and recovery).
- implement shard failure management and recovery #33
~~Proper backpressure management (backlogged task channel).~~
- No longer needed, as internal and external channels have been split out in 258d74d.
Correct shard destruction.
- implement shard destruction #31
Shard release (via ShardAccessor#Close).
- implement shard release (ShardAccessor#Close) #27
Shard state persistence and recovery upon restart.
- implement shard state persistence and recovery upon restart #32
Repopulate transients on restart from the scratch space.
- upgrader: expose the path of the transient #24
Transient vs. mount integrity -- how do we recover from a corrupted transient?
- upgrader: record checksum for integrity checks #25
Lots of unit tests.
CI.
Synchronous sugar on top of async API.

dagstore.go

iand · 2021-07-02T12:01:41Z

I suggest that the public API accepts contexts to facilitate request tracing, metrics and logging

dagstore.go

mount/upgrader.go

dirkmc · 2021-07-02T14:54:15Z

mount/upgrader.go

+	if stat, err := u.underlying.Stat(ctx); err != nil {
+		return fmt.Errorf("underlying mount stat returned error: %w", err)
+	} else if !stat.Exists {
+		return fmt.Errorf("underlying mount no longer exists")
+	}


Can we rely on Fetch to return these errors if the file doesn't exist?

What's the rationale?

dirkmc · 2021-07-02T14:58:36Z

mount/upgrader.go

+	if u.transient != "" {
+		if _, err := os.Stat(u.transient); err == nil {
+			return os.Open(u.transient)
+		}
+	}
+
+	// transient appears to be dead, refetch.
+	if err := u.refetch(ctx); err != nil {
+		return nil, err
+	}
+
+	return os.Open(u.transient)


This feels a little racy - can we return a file handle from u.refetch()?

What's racy? The body of this method and refetch are both performed under a lock.

We now have three "queues": 1. internal: tasks emitted by the event loop, buffer=1. 2. completion: tasks emitted by asynchronous jobs. 3. external: intake of external tasks through public API. We now drain internal first; then move on to completion and external.

aarshkshah1992

@raulk First review. Looking great ! I think I'm convinced the caller having to pass in the (context, result channel) is the way to go.

aarshkshah1992 · 2021-07-05T06:29:21Z

dagstore.go

+		case OpShardIndex:
+			s.state = ShardStateIndexing
+			go func(ctx context.Context, mnt mount.Mount) {
+				reader, err := mnt.Fetch(ctx)


Use the WaitGroup to refcount all go-routines spawned by the DAG Store.

I'm not sure we want to do this for async goroutines, as not all operations are cancellable (e.g. CARv2 stuff).

dagstore.go

aarshkshah1992 · 2021-07-05T07:12:32Z

dagstore.go

+		shards:       make(map[shard.Key]*Shard),
+		externalCh:   make(chan *Task, 128), // len=128, concurrent external tasks that can be queued up before putting backpressure.
+		internalCh:   make(chan *Task, 1),   // len=1, because eventloop will only ever stage another internal event.
+		completionCh: make(chan *Task, 64),  // len=64, hitting this limit will just make async tasks wait.


Why do we need a separate completionCh to send requests produced by async computations to the control loop when we can probably use the externalCh for it ? Is it to make the code easy to reason about ? I don't really see a difference between the two in terms of use.

We don't need it right now, but it separates concerns nicely, and makes the code more future proof in case we do need to process tasks differently going forward.

aarshkshah1992 · 2021-07-05T07:23:37Z

dagstore.go

+
+func (d *DAGStore) acquireAsync(acqCh chan ShardResult, s *Shard, mnt mount.Mount) {
+	k := s.key
+	reader, err := mnt.Fetch(d.ctx)


For posterity:

Even if the Shard is remote, this will not always result in actually fetching the Shard data from a remote location as the upgraded mount here serves the retrieval from a local transient copy if it can before resorting to fetching from a remote loc.

dagstore.go

aarshkshah1992 · 2021-07-05T09:39:20Z

dagstore.go

+
+		case OpShardFetchDone:
+			s.state = ShardStateFetched
+			if !s.indexed {


What purpose does this indexed field serve ? Is the idea that Indexes will be kept around even when shards are destroyed and therefore we wouldn't want to reindex on re-registration ? If that is the case, till when will we keep those Indices around ?

I conceived the Fetching state as a general purpose state for when the shard's contents are being fetched from the mount. That means that when the shard has been registered completely, but the local transient goes missing, and a request is made, the shard would move to this state. Not sure if we'll end up using it this way after all. We can revisit later.

dagstore.go

mount/upgrader.go

Closes #37.

raulk · 2021-07-05T18:11:27Z

We have enough feedback here. I'm going to merge this PR and we can iterate on it. There's plenty to do still.

initial implementation of the dagstore.

48bf452

raulk requested review from dirkmc and aarshkshah1992 July 1, 2021 17:48

iand reviewed Jul 2, 2021

View reviewed changes

dagstore.go Outdated Show resolved Hide resolved

dirkmc reviewed Jul 2, 2021

View reviewed changes

This was referenced Jul 3, 2021

integrate DAG store and CARv2 in deal-making filecoin-project/lotus#6671

Merged

[Meta Issue] Replace Badger in Deal Making with CARv2 and the Sharded DAG Store and test with MRA filecoin-project/lotus#6673

Closed

aarshkshah1992 reviewed Jul 5, 2021

View reviewed changes

mount/upgrader.go Show resolved Hide resolved

raulk mentioned this pull request Jul 5, 2021

mount upgrader: option to control under which circumstances we upgrade #30

Open

raulk added 14 commits July 5, 2021 15:33

increment refcount on queued acquires.

a91f307

fix nit.

fb333d8

remove unused method.

7663748

remove TODOs from code, since they're captured in GitHub.

73cd0e1

remove commented code.

6874937

reorganize code.

904ec82

further reorganize code.

86702b1

Merge branch 'master' into raulk/dagstore-initial

e8ae965

reorganize code.

8434292

fix index tests.

df8d127

fix silly test race.

27daa95

simplify shard registration state transitions.

f692a2f

remove scratch space object.

8f55df5

honour user-passed contexts.

3c2f727

raulk force-pushed the raulk/dagstore-initial branch from bb451ec to 3c2f727 Compare July 5, 2021 17:15

deliver responses in dispatcher goroutine.

3cf65b9

Closes #37.

raulk merged commit 707a16b into master Jul 5, 2021

raulk deleted the raulk/dagstore-initial branch July 5, 2021 18:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

initial implementation of the dagstore. #20

initial implementation of the dagstore. #20

raulk commented Jul 1, 2021 •

edited

Loading

iand commented Jul 2, 2021

dirkmc Jul 2, 2021

raulk Jul 5, 2021

dirkmc Jul 2, 2021

raulk Jul 5, 2021

aarshkshah1992 left a comment

aarshkshah1992 Jul 5, 2021

raulk Jul 5, 2021

aarshkshah1992 Jul 5, 2021

raulk Jul 5, 2021

aarshkshah1992 Jul 5, 2021

aarshkshah1992 Jul 5, 2021

raulk Jul 5, 2021

raulk commented Jul 5, 2021

initial implementation of the dagstore. #20

initial implementation of the dagstore. #20

Conversation

raulk commented Jul 1, 2021 • edited Loading

Concurrency model

Finite state machine

Public API

Mount upgrader

Indices

TODO

iand commented Jul 2, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aarshkshah1992 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raulk commented Jul 5, 2021

raulk commented Jul 1, 2021 •

edited

Loading