Auto-throttling plus many improvements & fixes. #694

jsign · 2020-10-28T13:10:34Z

This PR contains multiple improvements and updates.

Lotus load handling:

Allow flag/env cap on the number of maximum parallel deal preparations that can be executed in parallel. Deal preparation (piece size calculation and CommP calculation) are the most resource-intensive tasks for Lotus. This allows to bump the total parallel job constraints in a safer way since the real intensive part of the Job is limited in a surgical way.
Powergate will generate warnings in the log if detects that Lotus is falling behind in syncing. This will help operators to understand that something might be going wrong, or simply to be aware of this fact. e.g log: WARN lotus-client lotus/metrics.go:93 Louts behind in syncing with height diff 39, todo: 6. So, it also mentions how much of the difference is and which is the current progress (similar to lotus sync wait).
Leveraging the above mechanism, Powergate it will block any new deal-making to let the Lotus node recover. This is an important feature since is a dynamic throttling/limiting of the load, and not relying on some hard number specified by flags/envs which is hard to tune correctly. If this situation is happening, the operator can see in the logs messages indication this is the case: e.g log: WARN ffs-filcold filcold/filcold.go:296 backpressure from unsynced Lotus node.

To be clear about the two last points since it might look similar. The former is a general monitoring control that will warn if the Lotus node is falling behind at any moment (maybe you're not even making deals). The latter is indicating that Powergate is trying to make deals, but is throttling since detected the Lotus node is syncing... whenever the Lotus node gets in sync, it will continue with its work.

Enhancements:

Stop using our custom calculatePieceSize logic. Originally, we needed to do this since there wasn't an API to resolve it. Now use the existing Lotus API which might give future-proof size calculation.
Allow tuning how many retries are needed to consider that connecting to Lotus is an error. Use a big default. This default is nice if the underlying Lotus is restarted or has some problem, so we give Lotus time to recover and not break the flow of whatever Powergate was doing, which in many cases could be undesired.
Workaround known problem of ClientGetDealInfo not reporting active deals. (until the real problem is fixed in Lotus).

Indices:

Include in Ask index MaxPieceSize information.

Miner selection:

Consider in reptop and sr2 miner selection, the MinPieceSize, and MaxPieceSize of miners to avoid selecting miners which will reject the deal depending on deals data size.

Chore:

Update the Lotus client to v1.1.2.
Update Lotus docker image to v1.1.2.

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

…aware Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

jsign · 2020-10-28T14:02:32Z

api/client/asks.go

@@ -54,6 +54,7 @@ func askFromPbAsk(a *rpc.StorageAsk) ask.StorageAsk {
 	return ask.StorageAsk{
 		Price:        a.GetPrice(),
 		MinPieceSize: a.GetMinPieceSize(),
+		MaxPieceSize: a.GetMaxPieceSize(),


Include max piece size in the ask index.

jsign · 2020-10-28T14:02:58Z

api/server/server.go

@@ -64,8 +64,7 @@ import (
 )

 const (
-	datastoreFolderName    = "datastore"
-	lotusConnectionRetries = 10


Now a config attribute. Line 115.

jsign · 2020-10-28T14:03:47Z

api/server/server.go

-	lotus.MonitorLotusSync(clientBuilder)
+	lsm, err := lotus.NewSyncMonitor(clientBuilder)


What before was some background running daemon that generated Prometheus metrics for the grafana dashboard regarding Lotus sync height, now is a more formal component to provide Lotus syncing information for decision making.

jsign · 2020-10-28T14:04:32Z

api/server/server.go

@@ -250,7 +254,7 @@ func NewServer(conf Config) (*Server, error) {
 	if conf.Devnet {
 		conf.FFSMinimumPieceSize = 0
 	}
-	cs := filcold.New(ms, dm, ipfs, chain, l, conf.FFSMinimumPieceSize)
+	cs := filcold.New(ms, dm, ipfs, chain, l, lsm, conf.FFSMinimumPieceSize, conf.FFSMaxParallelDealPreparing)


This new component is used by Cold Storage, plus a new flag that indicates how many data-intensive tasks we throw at lotus in parallel. (More on this later).

jsign · 2020-10-28T14:09:43Z

deals/module/deals.go

-	defaultDealStartOffset = 72 * 60 * 60 / util.EpochDurationSeconds // 72hs
+	defaultDealStartOffset = 48 * 60 * 60 / util.EpochDurationSeconds // 48hs


Still, a question mark which is the right value for this.
Eventually, we can move it to the Storage Config and let people have the change to play with it, but most prob they won't know the consequences yet.

jsign · 2020-10-28T14:20:58Z

ffs/minerselector/sr2/sr2.go

+			if f.PieceSize < uint64(sask.MinPieceSize) || f.PieceSize > uint64(sask.MaxPieceSize) {
+				log.Warnf("skipping miner %s since needed piece size %d doesn't fit bounds (%d, %d)", miners[i], f.PieceSize, sask.MinPieceSize, sask.MaxPieceSize)
+			}


Same with SR2 miner selector.

jsign · 2020-10-28T14:21:56Z

index/ask/rpc/rpc.go

@@ -28,6 +28,7 @@ func (s *RPC) Get(ctx context.Context, req *GetRequest) (*GetResponse, error) {
 		storage[key] = &StorageAsk{
 			Price:        ask.Price,


@asutula , I just did the usual RPC/CLI changes to be coherent.. but this will go away anyway in your changes.

jsign · 2020-10-28T14:23:13Z

lotus/metrics.go

 )
+
+// SyncMonitor provides information about the Lotus


TL;DR: SyncMonitor analyzes the sync status every 2 minutes, and provides cached value to whoever wants to know about it.

jsign · 2020-10-28T14:24:09Z

lotus/metrics.go

+func (lsm *SyncMonitor) SyncHeightDiff() int64 {
+	lsm.lock.Lock()
+	defer lsm.lock.Unlock()
+	return lsm.heightDiff
+}


Provide cached value... we don't want people asking lotus about its sync status to add load to it.
Since the syncing process is rather slow, a value 2 minutes old is usually good enough to make decisions.

jsign · 2020-10-28T14:25:28Z

lotus/metrics.go

+	lsm.lock.Lock()
+	lsm.heightDiff = maxHeightDiff
+	lsm.remaining = remaining
+	lsm.lock.Unlock()


The Lotus syncing process has many workers that are trying to sync different possible branches.
I decided to go pessimistic and consider the worst-case scenario. I think this is the best decision.

asutula

This is very exciting, no doubt that things are going to work much much better than before.

I found one actual bug you need to fix, then just left a couple comments/questions for my own understanding of things.

Very nice work.

asutula · 2020-10-28T15:33:15Z

deals/module/deals.go

+		return nil, fmt.Errorf("client get deal info: %s", err)
+	}
+
+	// Workaround ClientGetDealInfo giving unreliable status.


asutula · 2020-10-28T15:44:12Z

ffs/filcold/filcold.go

+	// In the next Lotus version (v1.1.3) a new API will be used which also calculates
+	// CommP, so we can help Lotus avoid recalculating size and CommP when deals are made.
+	select {
+	case fc.semaphDealPrep <- struct{}{}:


I am sure I'm just missing some understanding about how this works, but I would have expected to see some goroutine spawned off after the fc.semaphDealPrep is used here to block execution until a slot opens up. As is, it looks to me like calls to calculatePieceSize are run serially. Like I said, I'm sure I just don't understand, but I'd like to understand this because it seems awesome!

When the scheduler decides to run a Job, it spawns a new goroutine and let that goroutine execute all the logic.

This means that at this point, when executing cold storage logic, it should only wait for the "OK" of this semaphore to keep doing the work it was doing.

If you see what's happening in a bigger scope, you might see multiple goroutines (Jobs) waiting in this semaphore to be allowed to move on executing heavy stuff.

The good part of this design is that you shouldn't think about the job execution logic methods as single goroutine doing things for multiple jobs. You can always assume you're doing things for a Job already in its own goroutine.

That's why you see that calculatePieceSize is serial. Is serial in a single Job execution, but there're multiple goroutines one per executing Job. Whatever buffer this semaphore allows, are parallel executing goroutine Jobs that will be running in parallel calculating their size.

Makes sense?

Yea that totally makes sense. I forgot that gouroutines for jobs are created quite early and high up the stack. Yea this is a really nice design because it makes this code look very simple and easy to understand.

asutula · 2020-10-28T15:46:29Z

ffs/filcold/filcold.go

-// estimatePieceSize estimates the size of the Piece that will be built by the filecoin client
-// when making the deal. This calculation should consider the unique node sizes of the DAG, and
-// padding. It's important to not underestimate the size since that would lead to deal rejection
-// since the miner won't accept the further calculated PricePerEpoch.
-func (fc *FilCold) calculatePieceSize(ctx context.Context, c cid.Cid) (uint64, error) {
-	// Get unique nodes.
-	seen := cid.NewSet()
-	if err := dag.Walk(ctx, fc.getLinks, c, seen.Visit); err != nil {
-		return 0, fmt.Errorf("walking dag for size calculation: %s", err)
-	}
-
-	// Account for CAR header size.
-	carHeader := car.CarHeader{
-		Roots:   []cid.Cid{c},
-		Version: 1,
-	}
-	totalSize, err := car.HeaderSize(&carHeader)
-	if err != nil {
-		return 0, fmt.Errorf("calculating car header size: %s", err)
-	}
-
-	// Calculate total unique node sizes.
-	buf := make([]byte, 8)
-	f := func(c cid.Cid) error {
-		s, err := fc.ipfs.Block().Stat(ctx, path.IpfsPath(c))
-		if err != nil {
-			return fmt.Errorf("getting stats from DAG node: %s", err)
-		}
-		size := uint64(s.Size())
-		carBlockHeaderSize := uint64(binary.PutUvarint(buf, size))
-		totalSize += carBlockHeaderSize + size
-		return nil
-	}
-	if err := seen.ForEach(f); err != nil {
-		return 0, fmt.Errorf("aggregating unique nodes size: %s", err)
-	}
-
-	// Consider padding.
-	paddedSize := padreader.PaddedSize(totalSize).Padded()
-
-	return uint64(paddedSize), nil
-}
-
-func (fc *FilCold) getLinks(ctx context.Context, c cid.Cid) ([]*format.Link, error) {
-	return fc.ipfs.Object().Links(ctx, path.IpfsPath(c))
-}
-


asutula · 2020-10-28T15:48:02Z

ffs/minerselector/reptop/reptop.go

+		if f.PieceSize < sa.MinPieceSize && f.PieceSize > sa.MaxPieceSize {
+			continue
+		}


Think you mean ||

asutula · 2020-10-28T15:50:08Z

index/ask/rpc/rpc.go

@@ -28,6 +28,7 @@ func (s *RPC) Get(ctx context.Context, req *GetRequest) (*GetResponse, error) {
 		storage[key] = &StorageAsk{
 			Price:        ask.Price,


Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

jsign added 24 commits October 28, 2020 09:09

update Lotus client

077a6aa

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

run test

00e5e9f

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

update lotus tag docker image

04999ca

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

use lotus for claculating piece size now that is possible

2c41950

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

index/ask: include max piece size information

f7e6909

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

consizer min and max piece size in miner selector

c754b75

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

fix query

ed2c584

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

fixes

021a6a3

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

simplify slashing logic

080ed3e

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

make Lotus connection retries tunable

34734a3

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

fxi

da6140d

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

lotus/client: refactor

2bc2246

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

max parallel deal preparing

2f5f433

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

limit piece size calc

ebb3d62

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

add automatic warning if lotus node is detected to be falling behind

26e00b3

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

workaround ClientGetDealInfo problem & make clientBuilder be context …

e5fdd0b

…aware Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

fix

19c8a2b

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

bump parallel

29e2758

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

fix format

b29708f

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

improve logging

e47d7ab

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

fix formatting

eda4027

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

add unsynced lotus detector

ad1541c

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

update bootstrapeers

a2c19cb

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

improve error case

4edfa16

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

jsign added enhancement New feature or request patch labels Oct 28, 2020

jsign added this to the Sprint 48 milestone Oct 28, 2020

jsign self-assigned this Oct 28, 2020

lints

531fff2

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

jsign added 2 commits October 28, 2020 10:36

nit

f45a345

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

go mod tidy

2737704

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

jsign commented Oct 28, 2020

View reviewed changes

jsign changed the title ~~Many improvements & fixes.~~ Auto-throttling plus many improvements & fixes. Oct 28, 2020

jsign requested a review from asutula October 28, 2020 15:22

jsign marked this pull request as ready for review October 28, 2020 15:22

asutula approved these changes Oct 28, 2020

View reviewed changes

fix

8918745

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

jsign merged commit 564db8f into master Oct 28, 2020

jsign deleted the jsign/psizapi branch October 28, 2020 16:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-throttling plus many improvements & fixes. #694

Auto-throttling plus many improvements & fixes. #694

jsign commented Oct 28, 2020 •

edited

Loading

jsign Oct 28, 2020

jsign Oct 28, 2020

jsign Oct 28, 2020

jsign Oct 28, 2020

jsign Oct 28, 2020

jsign Oct 28, 2020

jsign Oct 28, 2020

asutula Oct 28, 2020

jsign Oct 28, 2020

jsign Oct 28, 2020

jsign Oct 28, 2020

asutula left a comment

asutula Oct 28, 2020

asutula Oct 28, 2020

jsign Oct 28, 2020

asutula Oct 28, 2020

jsign Oct 28, 2020

asutula Oct 28, 2020

asutula Oct 28, 2020

asutula Oct 28, 2020

		lotus.MonitorLotusSync(clientBuilder)
		lsm, err := lotus.NewSyncMonitor(clientBuilder)

		defaultDealStartOffset = 72 * 60 * 60 / util.EpochDurationSeconds // 72hs
		defaultDealStartOffset = 48 * 60 * 60 / util.EpochDurationSeconds // 48hs

		@@ -28,6 +28,7 @@ func (s RPC) Get(ctx context.Context, req GetRequest) (*GetResponse, error) {
		storage[key] = &StorageAsk{
		Price: ask.Price,

Auto-throttling plus many improvements & fixes. #694

Auto-throttling plus many improvements & fixes. #694

Conversation

jsign commented Oct 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asutula left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsign commented Oct 28, 2020 •

edited

Loading