Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix double iterator bug #605

Merged
merged 5 commits into from
Dec 15, 2022
Merged

Conversation

shaspitz
Copy link
Contributor

@shaspitz shaspitz commented Dec 15, 2022

Description

This PR fixes an issue where we were opening two iterators at once, causing database crashes in some instances (including the crash during the Game of Chains testnet).

Linked issues

Intentionally not closing any issues with this PR. Applicable issues will be closed in base branch.

  • Fix: Changes and/or adds code behavior, specifically to fix a bug
  • Testing: Adds testing

New behavior tests

New test that validates fix: TestPacketSpam

@shaspitz shaspitz changed the base branch from main to circuit-breaker December 15, 2022 23:27
@shaspitz shaspitz marked this pull request as ready for review December 15, 2022 23:29
@shaspitz shaspitz merged commit bd2e4d0 into circuit-breaker Dec 15, 2022
@shaspitz shaspitz deleted the circuit-breaker-deadlock-fix branch December 15, 2022 23:52
Copy link
Contributor

@mpoke mpoke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work @smarshall-spitzbart. I left some comments that can be dealt with in another PR after we merge #462.

}
// Obtain all global slash entries, where only some of them may be handled in this method,
// depending on the value of the slash meter.
allEntries := k.GetAllGlobalSlashEntries(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern here is that the size of this slice depends on the number of slash packets received from consumer chains. This is related to #594. A solution here would be to have a filter on the getter that only returns the entries the throttling mechanism can handled.

})
// don't handle any more global entries if meter becomes negative in value
if meter.IsNegative() {
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include this break into GetAllGlobalSlashEntries as meter is not modified in HandlePacketDataForChain. This also means that handledEntries is no longer needed.


// Persist current value for slash meter
k.SetSlashMeter(ctx, meter)
}

// Obtains the effective validator power relevant to a validator consensus address.
func (k Keeper) GetEffectiveValPower(ctx sdktypes.Context,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Safe to use in an iterator as it only contains getters.

@shaspitz
Copy link
Contributor Author

@mpoke I think your comments here can be addressed by an explanation.

If we were to add a filter into GetAllGlobalSlashEntries which called GetEffectiveValPower, this would essentially have to be the old design with two iterators open. Reasoning:

The value of GetEffectiveValPower is different before and after a validator is jailed. Think about this example.

Slash meter starts at value 5, 3 slash packets are recv and queued by provider, each requesting to jail the same validator with a voting power of 4.

With your proposed design, we would obtain 2 slash packets from store where the 3rd would be filtered out (5-4=+1, 1-4 = -3). This means we would not handle the 3rd slash packet, when we indeed could have.

With current design (and also the double-iterator design) we would obtain and handle all 3 slash packets, since the first slash packet would be handled, the validator jailed, then the subsequent calls to GetEffectiveValPower would return 0.

Lastly, GetEffectiveValPower contains getters to an external keeper, which do not contain iterators right now, but there's a small chance they could be refactored in the future to contain an iterator. Therefore the function is safe to use in an iterator right now, but could silently change in the underlying code it calls.

This is an example of where more effective iterator usage would be desirable for performance reasons, but would be dangerous and considered bad practice. It seems like iterator filtering functions should pretty much always be stateless, including reading from state

@shaspitz
Copy link
Contributor Author

TLDR: if we don't get all global slash entries, it'll make #594 worse

jtremback added a commit that referenced this pull request Dec 16, 2022
* wip

* Update module.go

* wip

* tests pass

* Update relay.go

* Update relay.go

* Update relay.go

* Update relay.go

* wip

* still wip

* panic with too many packets

* Update relay.go

* wip

* checkpoint

* Update throttle.go

* make relay.go closer to main

* merge fixes

* Update expected_keepers.go

* Update throttle_test.go

* Update throttle.go

* second queue is now implemented

* smalls

* Update throttle.go

* removed pointer silliness

* Update throttle_test.go

* test improvements

* small

* where it's called

* Update relay.go

* comments n stuff

* Update throttle.go

* Update throttle.go

* comments

* wip

* callbacks

* cleans

* mas tests

* Update README.md

* less diff

* mas

* keys

* wip

* changes

* Update params.go

* size constraints

* Update keys_test.go

* Update throttle_test.go

* wip

* on recv new behavior and test

* on recv slash packet

* so close

* clean

* Update slashing.go

* cleans

* wip

* params

* wip

* tests

* changes

* mas

* Update README.md

* Update README.md

* Update README.md

* sorry for the friday night emails

* Update instance_test.go

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* works

* Update generic_setup.go

* Update setup.go

* smol

* small

* path to ccv chan setup

* todos

* Update setup.go

* Create debug_test.go

* democ

* Update debug_test.go

* setup all ccv channels

* bump to main

* another bump, missed one

* fix after merge main

* fixes

* Update slashing.go

* expired client tests

* checks

* fixed the stuff

* smol

* changes

* updates

* cleans

* clean

* todo

* fixes

* cleans

* Update slashing.go

* Update slashing.go

* mod tidy

* fix build errs

* test fixes

* Update slashing.go

* Update throttle.go

* base rounding

* mas unit tests

* updates

* comments

* comments

* cleans

* clean

* small

* sin gas

* more understandable logic

* crypto rand

* utils

* one e2e

* helpers

* Update slashing.go

* helpers

* cleans

* shiz works

* tcs

* smalls

* un mas

* allowance changing test

* smol

* smol

* e2e tests are done

* lets try this

* Key Assignment -> goc-december (#527)

* add MsgAssignConsumerKey

* add MsgAssignConsumerKey

* fix package name

* add keys

* add keeper methods for key assignment

* handle MsgAssignConsumerKey

* map addresses in slash requests

* prune old consumer addresses

* move AssignConsumerKey logic to keeper

* update consumer initial valset

* add ApplyKeyAssignmentToValUpdates

* fix client creation

* do not check init valset on consumer

* clean state on val removal

* fix TestAssignConsensusKeyForConsumerChain

* delete on val removal

* remove reverse mapping on val removal

* remove pending key assignment in EndBlock

* add query endpoints
add summary of indexes
change ConsumerValidatorByVscID to ConsumerAddrsToPrune

* Refactor AssignConsumerKey for clarity (IMO)

* finish key assignment genesis code- untested

* FIxed mocks compile issue - not sure if it works right though.

* add test for init and export genesis

* set after get in AssignConsumerKey

* enable AssignConsumerKey to be called twice

* remove key assignment on chain removal

* apply some review comments

* fix bug: two validator with same consumer key

* rename key: ConsumerValidatorsByVscIDBytePrefix -> ConsumerAddrsToPruneBytePrefix

* PendingKeyAssignment -> KeyAssignmentReplacements

* msg.ProviderAddr is a validator addr

* fix: key assignment genesis tests (#517)

* Fix consumer init genesis test

* fix provider genesis tests

* fix key assignement handler

* fix linter

* fix merge conflict

* fix ProviderValidatorAddress

* remove unused expectation

Co-authored-by: Marius Poke <marius.poke@posteo.de>

* add key assignment CRUD operations unit tests (#516)

* test val consumer key related CRUD

* test val consumer addr related CRUD

* test pending key assignments related CRUD

* refactor after review session

* refactor after review session

* add prune key CRUD tests

* renamings in testfiles

* improve KeyAssignmentReplacement set and get

* remove ApplyKeyAssignmentToInitialValset (redundant)

* add invariant to docstring of AppendConsumerAddrsToPrune

* fix address conversion

* adding e2e tests

* fix linter

* add queries; setup integration tests (#519)

* add queries; setup integration testse

* test key assignment before chain start

* fix state queries; refactor

* rm extra comment

* rm unused action field

* bump voting times in all tests

* add provider address query to tests

* Adds some very basic random testing and unit tests (#522)

* Adds imports

* Does multi iterations: fails!

* Handle errs

* checkpoint debug

* Pre introduce dynamic mock

* Issue seems to be resolved

* Removes prints in key asisgn

* Removes debug, pre reintroduce all test features

* Fix some magic numbers, bring back prune check

* Pre rework initial assignments

* Refactor and tidyup

* Better docs, clarity, org

Co-authored-by: Daniel <danwtisdall@gmail.com>

* Enable key assignment testing for all e2e tests (#524)

* split CCVTestSuite.setupCallback in two

* pre-assign keys for all vals of first consumer

* fix linter

* remove TestConsumerGenesis

Co-authored-by: mpoke <marius.poke@posteo.de>
Co-authored-by: Simon Noetzlin <simon.ntz@gmail.com>
Co-authored-by: MSalopek <35486649+MSalopek@users.noreply.github.com>
Co-authored-by: Daniel T <30197399+danwt@users.noreply.github.com>
Co-authored-by: Daniel <danwtisdall@gmail.com>

* Fix errors in merge commit, comment out failing TestRelayAndApplySlashPacket test

* add simon's test fixes

* Update state.go

* last replenish time -> last full time

* readme

* increment i

* shared method

* iteration change

* Meter allowance lockstep (#553)

changes

* log

* requested key format changes (#560)

changes

* Throttle garbage collection (#557)

* changes

* Update proposal.go

* add log

* Update throttle_test.go

* fixes

* update invariant

* Throttle bug fixes + req refactors (#565)

* fixes

* Update throttle.go

* fix tests

* set replenish frac = 1.0 for all test runs

* rm unmarshal func

* req refactors

* small lint fix

* comment adjustment

* found <-> jailed order swap

* 100% change

* weird

* merge fixes to build

* tests now pass

* name change

* better ordering tests

* remove integration test diff

* avoid double call to address mapping

* Update throttle.go

* 0 included in iteration start

* naming refactors

* Update keys_test.go

* more refactors

* clarify allowance terminology

* update doc with explanation on min value

* md clarification

* Update throttle.md

* swap replenish order

* add max limit note

* #533 Adds normal operation diff testing

* reb

* reb

* Del unused

Co-authored-by: Daniel <danwtisdall@gmail.com>

* progress save

* cleans

* name change

* Bugfix (#605)

* Circuit breaker refactor (#606)

* quick fix

* small key correction

* smol

* don't store time length

* use big endian, shawn you dingus

Co-authored-by: Jehan <jehan.tremback@gmail.com>
Co-authored-by: mpoke <marius.poke@posteo.de>
Co-authored-by: Simon Noetzlin <simon.ntz@gmail.com>
Co-authored-by: MSalopek <35486649+MSalopek@users.noreply.github.com>
Co-authored-by: Daniel T <30197399+danwt@users.noreply.github.com>
Co-authored-by: Daniel <danwtisdall@gmail.com>
Co-authored-by: Jehan Tremback <hi@jehan.email>
@mpoke
Copy link
Contributor

mpoke commented Dec 16, 2022

With your proposed design, we would obtain 2 slash packets from store where the 3rd would be filtered out (5-4=+1, 1-4 = -3). This means we would not handle the 3rd slash packet, when we indeed could have.

@smarshall-spitzbart That's true. Didn't think of that.

Then, we really need to make sure that we don't add invalid packets to that queue, i.e., #594

@jtremback jtremback changed the title Bugfix Fix double iterator bug Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants