Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ffs: enable auto-repair & update to new interopnet & ipfs pinset cache #272

Merged
merged 18 commits into from
Apr 24, 2020

Conversation

jsign
Copy link
Contributor

@jsign jsign commented Apr 22, 2020

  • Fixes Enable a Repair config #18
  • Fixes Cache Pinset of IPFS node #228
  • Fixes ffs: update to new interopnet #314 which updates to to new interop, which solved a bug we found which ended up making 3 or 4 tests to be skipped. Updating to the new interop involved some work on Powergate and:
  • Allowed all tests in ffs/integrationtest to run in parallel, and enabled 4 parallel runs in CI. I had to find the right tune between the devnets speeds and the parallelization count. Pushing too much start making tests flaky since CPU are %100. This decreased that package test runs from ~400s to ~100s.
  • I had to disable some index tests since in the new interop there're some problems. Those tests where quite meaningless already anyway (devnet is quite artificial for miners metadata), but I'll re-enable when gets fixed.
  • The good part is that I re-enabled most of the disabled ffs/integrationtest that were disabled because of the double-deal problem that is fixed, that feels good since I consider those more important ones.

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
@jsign jsign added the ffs label Apr 22, 2020
@jsign jsign added this to the Sprint 35 milestone Apr 22, 2020
@jsign jsign self-assigned this Apr 22, 2020
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
jsign added 10 commits April 22, 2020 19:48
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
@jsign jsign changed the title ffs: enable auto-repair ffs: enable auto-repair & update to new interopnet Apr 23, 2020
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
@jsign jsign changed the title ffs: enable auto-repair & update to new interopnet ffs: enable auto-repair & update to new interopnet & ipfs pinset cache Apr 24, 2020
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
@jsign jsign force-pushed the jsign/repair branch 2 times, most recently from 0c50452 to 23879be Compare April 24, 2020 11:01
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
EpochPrice: types.NewInt(c.EpochPrice),
Miner: maddr,
Wallet: addr,
MinBlocksDuration: dur,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New interopnet API change.

Comment on lines +12 to +13
"/dns4/t01000.miner.interopnet.kittyhawk.wtf/tcp/1347/p2p/12D3KooWNjqGTNZ592wG5DmX7Rbmb76V3NHMdZMqjqPJ1xiax7w6",
"/ip4/54.186.82.90/tcp/1347/p2p/12D3KooWGf4ggd3cdZJKZPHkgSe8Lxf4MDsKhVC7npRGXgsPP4fz",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New bootstrap peers of interopnet.

@@ -5,6 +5,7 @@ import (
"context"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make a pinset cache. Basically is loaded on first use instead of in New. From that point forward, since PG is the only one touching the pinset of the IPFS node it's safe to assume that cache will be always valid and the API call to the IPFS node to get pinsets can be avoided.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an in-memory only thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, just a guarded map of cid.Cid.

@@ -209,7 +219,7 @@ func TestInfo(t *testing.T) {
requireCidConfig(t, fapi, cid, nil)
}

t.Run("WithOneAdd", func(t *testing.T) {
t.Run("WithThreeAdd", func(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-enable the test that made 3 deals of different data with a single miner. 💯

@@ -323,10 +336,7 @@ func TestRepFactor(t *testing.T) {
}

func TestRepFactorIncrease(t *testing.T) {
// ToDo: unskip when testnet/3 allows more than one deal
// See https://bit.ly/2JxQSQk
t.SkipNow()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

}

// WithRepairable allows to enable/disable auto-repair.
func (c CidConfig) WithRepairable(enabled bool) CidConfig {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new option for CidConfig.

@@ -201,12 +201,12 @@ func generateIndex(ctx context.Context, api *apistruct.FullNodeStruct) (*IndexSn
defer func() { <-rateLim }()
ictx, cancel := context.WithTimeout(ctx, qaTimeout)
defer cancel()
pid, err := api.StateMinerPeerID(ictx, addr, types.EmptyTSK)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StateMinerPeerID was removed from Lotus API.
Now it seems that you should make a more general API call for miner's info which has the PeerID.
The returned PeerID still seems to have some extra problems when used with QueryAsks (as mentioned in PR description about index-tests).

p := mp.MinerPower.Uint64()
p := mp.MinerPower.RawBytePower.Uint64()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lotus API change.

Comment on lines +20 to +21
// Skipped until #235 lands.
t.SkipNow()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is green locally, but is flaky on CI.
There's some pending work to do about the slashing index anyway, so this test doesn't make much sense. In fact, it never made much sense since there isn't a good way to force slashing.

@@ -21,7 +21,8 @@ func LaunchDevnetDocker(t *testing.T, numMiners int) *dockertest.Resource {
panic(fmt.Sprintf("couldn't create ipfs-pool: %s", err))
}
envNumMiners := fmt.Sprintf("TEXLOTUSDEVNET_NUMMINERS=%d", numMiners)
lotusDevnet, err := pool.RunWithOptions(&dockertest.RunOptions{Repository: "textile/lotus-devnet", Tag: "sha-9aab2c6", Env: []string{envNumMiners}, Mounts: []string{"/tmp/powergate:/tmp/powergate"}})
speed := "TEXLOTUSDEVNET_SPEED=500"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made devnets run a bit slower than default. This allow running more in parallel without hurting the CI CPUs.
The default speed is quite fast anyway for a single test run, so in the bottom line this has better performance.

@jsign jsign requested a review from asutula April 24, 2020 12:13
@jsign jsign marked this pull request as ready for review April 24, 2020 12:13
@jsign jsign requested a review from sanderpick April 24, 2020 12:14
Copy link
Member

@asutula asutula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work. Huge step forward for CI and nice how simple it was to add repair!

@@ -207,7 +210,31 @@ func (s *Scheduler) run() {
}
}

func (s *Scheduler) scanRenewable(ctx context.Context) {
func (s *Scheduler) execRepairCron(ctx context.Context) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@jsign
Copy link
Contributor Author

jsign commented Apr 24, 2020

Great work. Huge step forward for CI and nice how simple it was to add repair!

Thanks for the review!

@jsign jsign merged commit b585b89 into master Apr 24, 2020
@jsign jsign deleted the jsign/repair branch April 24, 2020 15:04
Copy link
Member

@sanderpick sanderpick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -5,6 +5,7 @@ import (
"context"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet.

@@ -5,6 +5,7 @@ import (
"context"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an in-memory only thing?

@@ -106,6 +107,12 @@ func (ci *CoreIpfs) Store(ctx context.Context, c cid.Cid) (int, error) {
if err != nil {
return 0, fmt.Errorf("getting stats of cid %s: %s", c, err)
}
if err := ci.ensurePinsetCache(ctx); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this just be done on startup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that's an option. But I tried to avoid adding load to bootstrapping and just load it on first use. But this can perfectly be moved to New.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, makes sense. I guess the only downside of having it here is that there will one (the first after start) lucky user that gets a slow response time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good point.
The other option is spinning a go routine on New... "nobody" pays a cost. :P

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, sounds good 💯

@@ -989,6 +1007,67 @@ func TestRemove(t *testing.T) {
require.Equal(t, api.ErrNotFound, err)
}

// This isn't very nice way to test for repair. The main problem is that now
// deal start is buffered for future start for 10000 blocks at the Lotus level.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😄

@jsign
Copy link
Contributor Author

jsign commented Apr 24, 2020

LGTM!

Thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ffs: update to new interopnet Cache Pinset of IPFS node Enable a Repair config
3 participants