Skip to content

Commit

Permalink
feat!: throttle with retries, consumer changes (#1024)
Browse files Browse the repository at this point in the history
* wip

* wip

* bouncing slash constructor and nits

* UT

* define cross chain ack enum

* wip

* tests

* tests

* update genesis tests

* comments

* migration and changelog

* migration test

* lints

* merge fixes

* clean

* Update ccv.pb.go

* add to ADR

* address some PR comments

* rebuild protos

* v3s

* lint

* regen pbs

* refactor for simplicity

* comment

* changes with slash record type

* wip

* Update ccv.pb.go

* Update ccv.pb.go

* update SendPackets and test

* test packet sending permitted

* test for OnAckPacket

* note on FSM design

* CRUD UT

* packet sending permitted UT

* nits

* Update throttle_retry.go

* v1 result and change tests

* Update relay.go

* expectation func

* reg test

* Update CHANGELOG.md

* lints

* doc on upgrade order

* small updates

* vsc matured handled res

* handle vsc matured acks

* adjust TestSendPacketsDeletion

* Update relay_test.go

* fix integration test

* fix send slash packet deletion test

* fix TestConsumerPacketSendExpiredClient

* lint

* fix more tests

* final test fixes

* disable diff tests

* smalls

* Update steps_downtime.go

* Update slashing.go

* Update CHANGELOG.md

* Update x/ccv/consumer/keeper/throttle_retry.go

Co-authored-by: Marius Poke <marius.poke@posteo.de>

* Update throttle_retry.go

* Update x/ccv/consumer/keeper/throttle_retry.go

Co-authored-by: Marius Poke <marius.poke@posteo.de>

* docstrings

* comment

* DeleteHeadOfPendingPackets unit tests

* fix dup deletion

* better comment

* const

* rm todo and unneeded call

* linting is very important

* break instead of return

* return instead of just print

* fix test

* fix slashing test

* comment

* camel case

* FSM event explanation

---------

Co-authored-by: Marius Poke <marius.poke@posteo.de>
  • Loading branch information
shaspitz and mpoke authored Aug 10, 2023
1 parent 5cb2ffa commit fbbac82
Show file tree
Hide file tree
Showing 21 changed files with 1,077 additions and 122 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
Add an entry to the unreleased section whenever merging a PR to main that is not targeted at a specific release. These entries will eventually be included in a release.
* `[x/ccv/provider]` (fix) [#1076](https://github.com/cosmos/interchain-security/pull/1076) Add `InitTimeoutTimestamps` and `ExportedVscSendTimestamps` to exported genesis.

* (feat!) [#1024](https://github.com/cosmos/interchain-security/pull/1024) throttle with retries, consumer changes
* (fix!) revert consumer packet data changes from #1037 [#1150](https://github.com/cosmos/interchain-security/pull/1150)
* (fix!) proper deletion of pending packets [#1146](https://github.com/cosmos/interchain-security/pull/1146)
* (feat!) optimize pending packets storage on consumer, with migration! [#1037](https://github.com/cosmos/interchain-security/pull/1037)
Expand Down
9 changes: 7 additions & 2 deletions docs/docs/adrs/adr-008-throttle-retries.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ title: Throttle with retries

* 6/9/23: Initial draft
* 6/22/23: added note on consumer pending packets storage optimization
* 7/14/23: Added note on upgrade order

## Status

Expand Down Expand Up @@ -47,6 +48,8 @@ With the behavior described, we maintain very similar behavior to the current th

In the normal case, when no or a few slash packets are being sent, the VSCMaturedPackets will not be delayed, and hence unbonding will not be delayed.

For implementation of this design, see [throttle_retry.go](../../../x/ccv/consumer/keeper/throttle_retry.go).

### Consumer pending packets storage optimization

In addition to the mentioned consumer changes above. An optimization will need to be made to the consumer's pending packets storage to properly implement the feature from this ADR.
Expand Down Expand Up @@ -86,9 +89,11 @@ If a consumer sends VSCMatured packets too leniently: The consumer is malicious

If a consumer blocks the sending of VSCMatured packets: The consumer is malicious and blocking vsc matured packets that should have been sent. This will block unbonding only up until the VSC timeout period has elapsed. At that time, the consumer is removed. Again the malicious behavior only creates a negative outcome for the chain that is being malicious.

### Splitting of PRs
### Splitting of PRs and Upgrade Order

This feature will implement consumer changes in [#1024](https://github.com/cosmos/interchain-security/pull/1024). Note these changes should be deployed to prod for all consumers before the provider changes are deployed to prod. That is the consumer changes in #1024 are compatible with the current ("v1") provider implementation of throttling that's running on the Cosmos Hub as of July 2023.

We could split this feature into two PRs, one affecting the consumer and one affecting the provider, along with a third PR which could setup a clever way to upgrade the provider in multiple steps, ensuring that queued slash packets at upgrade time are handled properly.
Once all consumers have deployed the changes in #1024, the provider changes from (TBD) can be deployed to prod, fully enabling v2 throttling.

## Consequences

Expand Down
8 changes: 8 additions & 0 deletions proto/interchain_security/ccv/consumer/v1/consumer.proto
Original file line number Diff line number Diff line change
Expand Up @@ -89,3 +89,11 @@ message MaturingVSCPacket {
google.protobuf.Timestamp maturity_time = 2
[ (gogoproto.stdtime) = true, (gogoproto.nullable) = false ];
}

// A record storing the state of a slash packet sent to the provider chain
// which may bounce back and forth until handled by the provider.
message SlashRecord {
bool waiting_on_reply = 1;
google.protobuf.Timestamp send_time = 2
[ (gogoproto.stdtime) = true, (gogoproto.nullable) = false ];
}
10 changes: 6 additions & 4 deletions tests/difference/core/driver/core_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ package core

import (
"fmt"
"testing"
"time"

channeltypes "github.com/cosmos/ibc-go/v7/modules/core/04-channel/types"
Expand Down Expand Up @@ -331,9 +330,12 @@ func (s *CoreSuite) TestTraces() {
fmt.Println("Shortest [traceIx, actionIx]:", shortest, shortestLen)
}

func TestCoreSuite(t *testing.T) {
suite.Run(t, new(CoreSuite))
}
// TODO: diff tests will eventually be replaced by quint tests, and all this code could then be deleted.
// Until that decision is finalized, we'll just comment out the top-level test.

// func TestCoreSuite(t *testing.T) {
// suite.Run(t, new(CoreSuite))
// }

// SetupTest sets up the test suite in a 'zero' state which matches
// the initial state in the model.
Expand Down
43 changes: 38 additions & 5 deletions tests/e2e/steps_downtime.go
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,7 @@ func stepsThrottledDowntime(consumerName string) []Step {
validator: validatorID("bob"),
},
state: State{
// powers not affected on either chain yet
// slash packet queued on consumer, but powers not affected on either chain yet
chainID("provi"): ChainState{
ValPowers: &map[validatorID]uint{
validatorID("alice"): 511,
Expand All @@ -295,6 +295,39 @@ func stepsThrottledDowntime(consumerName string) []Step {
},
},
},
// Relay packets so bob is jailed on provider,
// and consumer receives ack that provider recv the downtime slash.
// The latter is necessary for the consumer to send the second downtime slash.
{
action: relayPacketsAction{
chainA: chainID("provi"),
chainB: chainID(consumerName),
port: "provider",
channel: 0,
},
state: State{
chainID("provi"): ChainState{
ValPowers: &map[validatorID]uint{
validatorID("alice"): 511,
validatorID("bob"): 0, // bob is jailed
validatorID("carol"): 500,
},
// no provider throttling engaged yet
GlobalSlashQueueSize: uintPointer(0),
ConsumerChainQueueSizes: &map[chainID]uint{
chainID(consumerName): uint(0),
},
},
chainID(consumerName): ChainState{
// VSC packet applying jailing is not yet relayed to consumer
ValPowers: &map[validatorID]uint{
validatorID("alice"): 511,
validatorID("bob"): 500,
validatorID("carol"): 500,
},
},
},
},
{
action: downtimeSlashAction{
chain: chainID(consumerName),
Expand All @@ -305,11 +338,12 @@ func stepsThrottledDowntime(consumerName string) []Step {
chainID("provi"): ChainState{
ValPowers: &map[validatorID]uint{
validatorID("alice"): 511,
validatorID("bob"): 500,
validatorID("bob"): 0,
validatorID("carol"): 500,
},
},
chainID(consumerName): ChainState{
// VSC packet applying jailing is not yet relayed to consumer
ValPowers: &map[validatorID]uint{
validatorID("alice"): 511,
validatorID("bob"): 500,
Expand Down Expand Up @@ -338,10 +372,9 @@ func stepsThrottledDowntime(consumerName string) []Step {
},
},
chainID(consumerName): ChainState{
// no updates received on consumer
ValPowers: &map[validatorID]uint{
validatorID("alice"): 511,
validatorID("bob"): 500,
validatorID("bob"): 0,
validatorID("carol"): 500,
},
},
Expand Down Expand Up @@ -373,7 +406,7 @@ func stepsThrottledDowntime(consumerName string) []Step {
// no updates received on consumer
ValPowers: &map[validatorID]uint{
validatorID("alice"): 511,
validatorID("bob"): 500,
validatorID("bob"): 0,
validatorID("carol"): 500,
},
},
Expand Down
24 changes: 19 additions & 5 deletions tests/integration/expired_client.go
Original file line number Diff line number Diff line change
Expand Up @@ -139,20 +139,34 @@ func (s *CCVTestSuite) TestConsumerPacketSendExpiredClient() {
// check that the packets were added to the list of pending data packets
consumerPackets = consumerKeeper.GetPendingPackets(s.consumerCtx())
s.Require().NotEmpty(consumerPackets)
// At this point we expect 4 packets, two vsc matured packets and two trailing slash packets
s.Require().Len(consumerPackets, 4, "unexpected number of pending data packets")

// upgrade expired client to the consumer
upgradeExpiredClient(s, Provider)

// go to next block to trigger SendPendingDataPackets
// go to next block to trigger SendPendingPackets
s.consumerChain.NextBlock()

// check that the list of pending data packets is emptied
// Check that the leading vsc matured packets were sent and no longer pending
consumerPackets = consumerKeeper.GetPendingPackets(s.consumerCtx())
s.Require().Empty(consumerPackets)
s.Require().Len(consumerPackets, 2, "unexpected number of pending data packets")

// relay committed packets from consumer to provider, first slash packet should be committed
relayAllCommittedPackets(s, s.consumerChain, s.path, ccv.ConsumerPortID, s.path.EndpointA.ChannelID, 3) // two vsc matured + one slash

// First slash has been acked, now only the second slash packet should remain as pending
consumerPackets = consumerKeeper.GetPendingPackets(s.consumerCtx())
s.Require().Len(consumerPackets, 1, "unexpected number of pending data packets")

// relay all packet from consumer to provider
relayAllCommittedPackets(s, s.consumerChain, s.path, ccv.ConsumerPortID, s.path.EndpointA.ChannelID, 4)
// go to next block to trigger SendPendingPackets
s.consumerChain.NextBlock()

// relay committed packets from consumer to provider, only second slash packet should be committed
relayAllCommittedPackets(s, s.consumerChain, s.path, ccv.ConsumerPortID, s.path.EndpointA.ChannelID, 1) // one slash

consumerPackets = consumerKeeper.GetPendingPackets(s.consumerCtx())
s.Require().Empty(consumerPackets, "pending data packets found")

// check that everything works
// - bond more tokens on provider to change validator powers
Expand Down
72 changes: 58 additions & 14 deletions tests/integration/slashing.go
Original file line number Diff line number Diff line change
Expand Up @@ -247,14 +247,34 @@ func (s *CCVTestSuite) TestSlashPacketAcknowledgement() {
s.SetupCCVChannel(s.path)
s.SetupTransferChannel()

packet := channeltypes.NewPacket([]byte{}, 1, ccv.ConsumerPortID, s.path.EndpointA.ChannelID,
// Mock a proper slash packet from consumer
spd := keepertestutil.GetNewSlashPacketData()

// We don't want truly randomized fields, infraction needs to be specified
if spd.Infraction == stakingtypes.Infraction_INFRACTION_UNSPECIFIED {
spd.Infraction = stakingtypes.Infraction_INFRACTION_DOUBLE_SIGN
}
cpd := ccv.NewConsumerPacketData(ccv.SlashPacket,
&ccv.ConsumerPacketData_SlashPacketData{
SlashPacketData: &spd,
},
)
packet := channeltypes.NewPacket(cpd.GetBytes(), // Consumer always sends v1 packet data
1, ccv.ConsumerPortID, s.path.EndpointA.ChannelID,
ccv.ProviderPortID, s.path.EndpointB.ChannelID, clienttypes.Height{}, 0)

ack := providerKeeper.OnRecvSlashPacket(s.providerCtx(), packet,
keepertestutil.GetNewSlashPacketData())
s.Require().NotNil(ack)
// Map infraction height on provider so validation passes and provider returns valid ack result
providerKeeper.SetValsetUpdateBlockHeight(s.providerCtx(), spd.ValsetUpdateId, 47923)

exportedAck := providerKeeper.OnRecvSlashPacket(s.providerCtx(), packet, spd)
s.Require().NotNil(exportedAck)

err := consumerKeeper.OnAcknowledgementPacket(s.consumerCtx(), packet, channeltypes.NewResultAcknowledgement(ack.Acknowledgement()))
// Unmarshal ack to struct that's compatible with consumer. IBC does this automatically
ack := channeltypes.Acknowledgement{}
err := channeltypes.SubModuleCdc.UnmarshalJSON(exportedAck.Acknowledgement(), &ack)
s.Require().NoError(err)

err = consumerKeeper.OnAcknowledgementPacket(s.consumerCtx(), packet, ack)
s.Require().NoError(err)

err = consumerKeeper.OnAcknowledgementPacket(s.consumerCtx(), packet, ccv.NewErrorAcknowledgementWithLog(s.consumerCtx(), fmt.Errorf("another error")))
Expand Down Expand Up @@ -492,9 +512,15 @@ func (suite *CCVTestSuite) TestValidatorDowntime() {
// clear queue, commit packets
suite.consumerApp.GetConsumerKeeper().SendPackets(ctx)

// check queue was cleared
// Check slash record is created
slashRecord, found := suite.consumerApp.GetConsumerKeeper().GetSlashRecord(suite.consumerCtx())
suite.Require().True(found, "slash record not found")
suite.Require().True(slashRecord.WaitingOnReply)
suite.Require().Equal(slashRecord.SendTime, suite.consumerCtx().BlockTime())

// check queue is not cleared, since no ack has been received from provider
pendingPackets = suite.consumerApp.GetConsumerKeeper().GetPendingPackets(ctx)
suite.Require().Empty(pendingPackets, "pending packets NOT empty")
suite.Require().Len(pendingPackets, 1, "pending packets len should be 1 is %d", len(pendingPackets))

// verify that the slash packet was sent
gotCommit := consumerIBCKeeper.ChannelKeeper.GetPacketCommitment(ctx, ccv.ConsumerPortID, channelID, seq)
Expand Down Expand Up @@ -579,9 +605,15 @@ func (suite *CCVTestSuite) TestValidatorDoubleSigning() {
// clear queue, commit packets
suite.consumerApp.GetConsumerKeeper().SendPackets(ctx)

// check queue was cleared
// Check slash record is created
slashRecord, found := suite.consumerApp.GetConsumerKeeper().GetSlashRecord(suite.consumerCtx())
suite.Require().True(found, "slash record not found")
suite.Require().True(slashRecord.WaitingOnReply)
suite.Require().Equal(slashRecord.SendTime, suite.consumerCtx().BlockTime())

// check queue is not cleared, since no ack has been received from provider
pendingPackets = suite.consumerApp.GetConsumerKeeper().GetPendingPackets(ctx)
suite.Require().Empty(pendingPackets, "pending packets NOT empty")
suite.Require().Len(pendingPackets, 1, "pending packets len should be 1 is %d", len(pendingPackets))

// check slash packet is sent
gotCommit := suite.consumerApp.GetIBCKeeper().ChannelKeeper.GetPacketCommitment(ctx, ccv.ConsumerPortID, channelID, seq)
Expand Down Expand Up @@ -644,10 +676,12 @@ func (suite *CCVTestSuite) TestQueueAndSendSlashPacket() {
// establish ccv channel by sending an empty VSC packet to consumer endpoint
suite.SendEmptyVSCPacket()

// check that each pending data packet is sent once
// check that each pending data packet is sent once, as long as the prev slash packet was relayed/acked.
// Note that consumer throttling blocks packet sending until a slash packet is successfully acked by the provider.
for i := 0; i < 12; i++ {
commit := consumerIBCKeeper.ChannelKeeper.GetPacketCommitment(ctx, ccv.ConsumerPortID, channelID, seq+uint64(i))
suite.Require().NotNil(commit)
relayAllCommittedPackets(suite, suite.consumerChain, suite.path, ccv.ConsumerPortID, channelID, 1)
}

// check that outstanding downtime flags
Expand All @@ -657,8 +691,8 @@ func (suite *CCVTestSuite) TestQueueAndSendSlashPacket() {
suite.Require().True(consumerKeeper.OutstandingDowntime(ctx, consAddr))
}

// send all pending packets - only slash packets should be queued in this test
consumerKeeper.SendPackets(ctx)
// SendPackets method should have already been called during
// endblockers in relayAllCommittedPackets above

// check that pending data packets got cleared
dataPackets = consumerKeeper.GetPendingPackets(ctx)
Expand All @@ -676,13 +710,21 @@ func (suite *CCVTestSuite) TestCISBeforeCCVEstablished() {
pendingPackets := consumerKeeper.GetPendingPackets(suite.consumerCtx())
suite.Require().Len(pendingPackets, 0)

// No slash record found (no slash sent)
_, found := consumerKeeper.GetSlashRecord(suite.consumerCtx())
suite.Require().False(found)

consumerKeeper.SlashWithInfractionReason(suite.consumerCtx(), []byte{0x01, 0x02, 0x3},
66, 4324, sdk.MustNewDecFromStr("0.05"), stakingtypes.Infraction_INFRACTION_DOWNTIME)

// Check slash packet was queued
pendingPackets = consumerKeeper.GetPendingPackets(suite.consumerCtx())
suite.Require().Len(pendingPackets, 1)

// but slash packet is not yet sent
_, found = consumerKeeper.GetSlashRecord(suite.consumerCtx())
suite.Require().False(found)

// Pass 5 blocks, confirming the consumer doesn't panic
for i := 0; i < 5; i++ {
suite.consumerChain.NextBlock()
Expand All @@ -698,6 +740,8 @@ func (suite *CCVTestSuite) TestCISBeforeCCVEstablished() {

// Pass one more block, and confirm the packet is sent now that ccv channel is established
suite.consumerChain.NextBlock()
pendingPackets = consumerKeeper.GetPendingPackets(suite.consumerCtx())
suite.Require().Len(pendingPackets, 0)

// Slash record should now be found (slash sent)
_, found = consumerKeeper.GetSlashRecord(suite.consumerCtx())
suite.Require().True(found)
}
Loading

0 comments on commit fbbac82

Please sign in to comment.