Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: FIP-0076: implement migration to populate ProviderSectors #246

Merged
merged 4 commits into from
Feb 28, 2024

Conversation

arajasek
Copy link
Contributor

Extracted from #236.

This is a direct copy of the code from there (except for changes made in earlier PRs #245 and #244). I have not yet examined the code for correctness, nor have I addressed some outstanding review on that PR, and so am opening this as draft for now. It effectively needs review including from me.

In particular, reviewers please examine whether the use of the migration cache is incorrect if the "pre-migrated" state is NOT an ancestor of the migration input state. It SHOULD be the case that this is ancestry-agnostic.

} else {
//fmt.Printf("deal %d not found in providerSectors: %v\n", deal, oldState)

newState.SectorNumber = 0 // FIP: if such a sector cannot be found, assert that the deal's end epoch has passed and use sector ID 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant?

}

func (m *marketMigrator) migrateProviderSectorsAndStatesWithDiff(ctx context.Context, store cbor.IpldStore, prevInStatesCid, prevOutStatesCid, prevOutProviderSectorsCid, inStatesCid, prevInProposals, inProposals cid.Cid) (cid.Cid, cid.Cid, error) {
diffs, err := amt.Diff(ctx, store, store, prevInStatesCid, inStatesCid, amt.UseTreeBitWidth(market12.StatesAmtBitwidth))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, unfortunately, we're diffing the wrong states here if we care about re-orgs. I.e., it's possible for a re-org to change the sector number to which a deal is assigned without changing anything about the deal itself, and this diff won't catch that because it's diffing the deal states which don't record sector information.

I discussed this with @arajasek and we think the answer is to just not allow reorgs. I.e., don't use the cached migration if its not an ancestor. That'll ensure that neither sector nor deal IDs are reused.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically:

  1. Wait for some measure of finality (not real finality, but, like, a few epochs) after the pre-migration epoch before running the pre-migration. OR, run the premigration at the specified epoch, but do a lookback (i.e., migrate state from a few epochs ago).
  2. In the "migration" struct itself, record the highest pre-migration tipset.
  3. When running the actual migration (or, really, any pre-migrations), if there is a pre-migration tipset recorded and it's not an ancestor of the tipset being migrated, discard the cache.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh. Ok, maybe that's not the way. We do diff the sectors anyways. So...

  1. When we diff the sectors, we need to record the changes.
  2. When we diff the deal states, we need to apply/remove the sector changes.
  3. Finally, we need to iterate over the rest of the sector changes, and fix their deal states.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@magik6k is the above note something you can look at addressing?

Base automatically changed from asr/fip0076-invariants to master February 14, 2024 21:00
@arajasek arajasek mentioned this pull request Feb 14, 2024
@arajasek arajasek marked this pull request as ready for review February 14, 2024 21:04
return nil, xerrors.Errorf("failed to unmarshal sector %d after: %w", sectorNo, err)
}

if len(sectorBefore.DealIDs) != len(sectorAfter.DealIDs) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we have the same number of deals, but different deals because of a reorg?What if we have the same number of deals, but different deals because of a reorg?

Copy link
Member

@anorth anorth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only reviewed FromScratch, which looks ok to me.

v13: Address migration review

v13: Handle deal proposal in case of reorgs

v13: organize imports

v13: perf nit: avoid repeated map lookups

implement new rules for uncached operations

implement new rules for cached operations
// no cached migration, so we simply iterate all sectors and collect deal IDs

err = inSectors.ForEach(&sector, func(i int64) error {
if len(sector.DealIDs) == 0 || sector.Expiration < UpgradeHeight {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pass this into the migration from somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I have a note for that too.

}
// There can't already be an entry for this sector, so just set (not append)
// TODO: golang: I need to copy this, since sector is a reference that's constantly updated? Right? Steb?
sectorDealIDs := sector.DealIDs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, we'll overwrite sector each time, but create new slices. However, I'd still feel safer copying.

return nil, xerrors.Errorf("didn't find sector %d in inSectors", sectorNo)
}

if len(sector.DealIDs) == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modify needs to be treated as remove + add, so we need to run the remove logic from below even in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modify should strictly be an add, because the protocol doesn't actually let you remove deals ever.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this only matters for reorgs.

sectorDeals = make(map[abi.SectorNumber][]abi.DealID)
m.providerSectors.updatesToMinerToSectorToDeals[abi.ActorID(mid)] = sectorDeals
}
// There can't already be an entry for this sector, so just set (not append)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reorg

addProviderSectorEntry := func(deal abi.DealID) (abi.SectorNumber, error) {
sid, ok := m.providerSectors.dealToSector[deal]
if !ok {
return 0, xerrors.Errorf("deal %d not found in providerSectors", deal) // todo is this normal and possible??
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See note here.

} else {
//fmt.Printf("deal %d not found in providerSectors: %v\n", deal, oldState)

newState.SectorNumber = 0 // FIP: if such a sector cannot be found, assert that the deal's end epoch has passed and use sector ID 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See note here.


if len(sectorBefore.DealIDs) != len(sectorAfter.DealIDs) {
if len(sectorBefore.DealIDs) != 0 {
return nil, xerrors.Errorf("WHAT?! sector %d modified, this not supported and not supposed to happen", i) // todo: is it? Can't happen w/o a deep, deep reorg, and even then we wouldn't use the cache??
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See note here.

@@ -17,6 +17,8 @@ import (
"golang.org/x/xerrors"
)

const UpgradeHeight = abi.ChainEpoch(1000)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably be an input to the top-level migration func / some config thing.

return nil, xerrors.Errorf("failed to load sectors array: %w", err)
}

hasCached, prevSectors, err := in.Cache.Read(migration.MinerPrevSectorsInKey(in.Address))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to think about use of the cache -- providerSectors is populated based on whether or not there's a cache hit, but the market migration doesn't consider the possibility that some miners were cached, but others weren't.


// now, process updatesToMinerToSectorToDeals, making the appropriate updates

for miner, sectorUpdates := range m.providerSectors.updatesToMinerToSectorToDeals {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a "is not nil" check in both this case and the "without diff" case below, and initialize the one we're using. Otherwise, I'm a bit concerned we might have somehow cached the market migration but not the sector migration...

Also, it would be nice to make this independent somehow?

  1. Determine if we're doing a cached market migration.
  2. Determine independently if we're using a cache from a prior sector migration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but I'd like this to be a future item too. We can unblock "having an nv22 network" for now.

SlashEpoch: oldState.LastUpdatedEpoch,
}

// TODO: We should technically only set this if the deal hasn't expired,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That involves looking up the proposal for every single deal in here, which is significant work, I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. We don't include that info in the deal state.

What if we check the current epoch is more than DealMinDuration after SectorStartEpoch. If so, we lookup the deal state.

Honestly, I think looking up the deal state each time won't be a huge issue either.

You're right that this shouldn't matter, it's just a bit annoying that we're relying on the time between the migration and the first pre-migration being short.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna track this as a future item, and see if looking it up each time isn't too much slower.


// if the deal is now slashed, UNSET the sector number
if newState.SlashEpoch != -1 {
newState.SectorNumber = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-org (if we decide to handle those): expiration/sector number can change.

Comment on lines 267 to 277
err = actorSectors.ForEach(nil, func(k string) error {
return errItemFound
})
nonEmpty := false
if err == errItemFound {
nonEmpty = true
err = nil
}
if err != nil {
return cid.Undef, cid.Undef, xerrors.Errorf("failed to iterate actor sectors: %w", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider attaching this to the "map" definition in util/adt/map.go so I never have to look at this code again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yessir.


var sectorDealIDs market13.SectorDealIDs
for sector, deals := range sectors {
sectorDealIDs = deals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't wait for the fixed loop to ship by default. This is one of the dumbest go design decisions.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This PR is correct (in practice) assuming we detect and avoid using the cache for reorgs in lotus. That means:

  1. Performing the premigration at some small lookback from the actual premigration epoch.
  2. Recording the tipset where we did the premigration.
  3. During the real migration (or any subsequent premigrations), check if the current epoch is a child of the premigration. If not, discard the cache.
  4. During the real migration, DO NOT update the cache. If we reorg around the migration epoch, we just need to re-run the migration based on the last premigration.

@arajasek arajasek merged commit 9c5492f into master Feb 28, 2024
5 checks passed
@arajasek arajasek deleted the asr/fip0076-migration branch February 28, 2024 01:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ☑️ Done (Archive)
Development

Successfully merging this pull request may close these issues.

4 participants