-
Notifications
You must be signed in to change notification settings - Fork 267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: Pull changes from upstream master #561
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fix the order of lines in docs/versions so that v0.34 is last (the current release). Related changes: - Update docs/DOCS_README.md to reflect the current state of how we publish the site. - Fix the build-docs target in Makefile to not perturb the package-lock.json during the build. - Fix the Makefile rule to not clobber package-lock.json.
This test reliably gets hung up on network configuration, (which may be a real issue,) but it's network setup is handcranked and we should ensure that the test focuses on it's core assertions and doesn't fail for test architecture reasons.
When shutting down blocksync, it is observed that the process can hang completely. A dump of running goroutines reveals that this is due to goroutines not listening on the correct shutdown signal. Namely, the `poolRoutine` goroutine does not wait on `pool.Quit`. The `poolRoutine` does not receive any other shutdown signal during `OnStop` becuase it must stop before the `r.closeCh` is closed. Currently the `poolRoutine` listens in the `closeCh` which will not close until the `poolRoutine` stops and calls `poolWG.Done()`. This change also puts the `requestRoutine()` in the `OnStart` method to make it more visible since it does not rely on anything that is spawned in the `poolRoutine`. ``` goroutine 183 [semacquire]: sync.runtime_Semacquire(0xc0000d3bd8) runtime/sema.go:56 +0x45 sync.(*WaitGroup).Wait(0xc0000d3bd0) sync/waitgroup.go:130 +0x65 github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).OnStop(0xc0000d3a00) github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:193 +0x47 github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc0000d3a00, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/node.(*nodeImpl).OnStop(0xc00052c000) github.com/tendermint/tendermint/node/node.go:758 +0xc62 github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc00052c000, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/cmd/tendermint/commands.NewRunNodeCmd.func1.1() github.com/tendermint/tendermint/cmd/tendermint/commands/run_node.go:143 +0x62 github.com/tendermint/tendermint/libs/os.TrapSignal.func1(0xc000df6d20, 0x7f04a68da900, 0xc0004a8930, 0xc0005a72d8) github.com/tendermint/tendermint/libs/os/os.go:26 +0x102 created by github.com/tendermint/tendermint/libs/os.TrapSignal github.com/tendermint/tendermint/libs/os/os.go:22 +0xe6 goroutine 161 [select]: github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).poolRoutine(0xc0000d3a00, 0x0) github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:464 +0x2b3 created by github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).OnStart github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:174 +0xf1 goroutine 162 [select]: github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).processBlockSyncCh(0xc0000d3a00) github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:310 +0x151 created by github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).OnStart github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:177 +0x54 goroutine 163 [select]: github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).processPeerUpdates(0xc0000d3a00) github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:363 +0x12b created by github.com/tendermint/tendermint/internal/blocksync/v0.(*Reactor).OnStart github.com/tendermint/tendermint/internal/blocksync/v0/reactor.go:178 +0x76 ```
Fix a couple of cases where we updated the keys in the config reader, but forgot to update some of their uses in the default template. Fixes #7031.
The layout of struct fields means that interior fields may not be properly aligned for 64-bit access. Fixes #7000.
When statesync is stopped during shutdown, it has the possibility of deadlocking. A dump of goroutines reveals that this is related to the peerUpdates channel not returning anything on its `Done()` channel when `OnStop` is called. As this is occuring, `processPeerUpdate` is attempting to acquire the reactor lock. It appears that this lock can never be acquired. I looked for the places where the lock may remain locked accidentally and cleaned them up in hopes to eradicate the issue. Dumps of the relevant goroutines may be found below. Note that the line numbers below are relative to the code in the `v0.35.0-rc1` tag. ``` goroutine 36 [chan receive]: github.com/tendermint/tendermint/internal/statesync.(*Reactor).OnStop(0xc00058f200) github.com/tendermint/tendermint/internal/statesync/reactor.go:243 +0x117 github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc00058f200, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/node.(*nodeImpl).OnStop(0xc0001ea240) github.com/tendermint/tendermint/node/node.go:769 +0x132 github.com/tendermint/tendermint/libs/service.(*BaseService).Stop(0xc0001ea240, 0x0, 0x0) github.com/tendermint/tendermint/libs/service/service.go:171 +0x323 github.com/tendermint/tendermint/cmd/tendermint/commands.NewRunNodeCmd.func1.1() github.com/tendermint/tendermint/cmd/tendermint/commands/run_node.go:143 +0x62 github.com/tendermint/tendermint/libs/os.TrapSignal.func1(0xc000629500, 0x7fdb52f96358, 0xc0002b5030, 0xc00000daa0) github.com/tendermint/tendermint/libs/os/os.go:26 +0x102 created by github.com/tendermint/tendermint/libs/os.TrapSignal github.com/tendermint/tendermint/libs/os/os.go:22 +0xe6 goroutine 188 [semacquire]: sync.runtime_SemacquireMutex(0xc00026b1cc, 0x0, 0x1) runtime/sema.go:71 +0x47 sync.(*Mutex).lockSlow(0xc00026b1c8) sync/mutex.go:138 +0x105 sync.(*Mutex).Lock(...) sync/mutex.go:81 sync.(*RWMutex).Lock(0xc00026b1c8) sync/rwmutex.go:111 +0x90 github.com/tendermint/tendermint/internal/statesync.(*Reactor).processPeerUpdate(0xc00026b080, 0xc000650008, 0x28, 0x124de90, 0x4) github.com/tendermint/tendermint/internal/statesync/reactor.go:849 +0x1a5 github.com/tendermint/tendermint/internal/statesync.(*Reactor).processPeerUpdates(0xc00026b080) github.com/tendermint/tendermint/internal/statesync/reactor.go:883 +0xab created by github.com/tendermint/tendermint/internal/statesync.(*Reactor.OnStart github.com/tendermint/tendermint/internal/statesync/reactor.go:219 +0xcd) ```
This is intended to fix a test failure that occurs in the p2p state provider. The issue presents as the state provider timing out waiting for the consensus params response. The reason that this can occur is because the statesync reactor has the possibility of attempting to respond to the params request before the state provider is ready to read it. This results in the reactor hitting the `default` case seen here and then never sending on the channel. The stateprovider will then block waiting for a response and never receive one because the reactor opted not to send it.
This script is referenced from the release documentation, we should make sure it's functional. This is helpful in generating the "Special Thanks" section of the changelog.
This commit should be one of the first to land as part of the v0.36 cycle *after* cutting the 0.35 branch. The blocksync/v2 reactor was originally implemented as an experiement to produce an implementation of the blockstack protocol that would be easier to test and validate, but it was never appropriately operationalized and this implementation was never fully debugged. When the p2p layer was refactored as part of the 0.35 cycle, the v2 implementation was not refactored and it was left in the codebase but not removed. This commit just removes all references to it.
…g peers (#7058) The race occurred as a result of a goroutine launched by `processPeerUpdate` racing with the `OnStop` method. The `processPeerUpdates` goroutine deletes from the map as `OnStop` is reading from it. This change updates the `OnStop` method to wait for the peer updates channel to be done before closing the peers. It also copies the map contents to a new map so that it will not conflict with the view of the map that the goroutine created in `processPeerUpdate` sees.
A few notes: - this is not all the deletion that we can do, but this is the most "simple" case: it leaves in shims, and there's some trivial additional cleanup to the transport that can happen but that requires writing more code, and I wanted this to be easy to review above all else. - This should land *after* we cut the branch for 0.35, but I'm anticipating that to happen soon, and I wanted to run this through CI.
This PR tackles the case of using the e2e application in a long lived testnet. The application continually saves snapshots (usually every 100 blocks) which after a while bloats the size of the application. This PR prunes older snapshots so that only the most recent 10 snapshots remain.
This code hasn't been battle tested, and seems to have grown increasingly flaky int tests. Given our general direction of reducing queue complexity over the next couple of releases I think it makes sense to remove it.
Addresses one of the concerns with #7041. Provides a mechanism (via the RPC interface) to delete a single transaction, described by its hash, from the mempool. The method returns an error if the transaction cannot be found. Once the transaction is removed it remains in the cache and cannot be resubmitted until the cache is cleared or it expires from the cache.
While discussing a question about the indexing interface (#7044), we found some confusion about the intent of the design decisions in ADR 065. Based on discussion with the original authors of the ADR, this commit adds some language to the Decisions section to spell out the intentions more clearly, and to call out future work that this ADR did not explicitly decide about.
…anch (#7067) Nightly branches run CI from master branch, and the configuration misses checking out the correct ref.
My earlier p2p cleanup code removed support for the p2p tests from the e2e generator and runner, but missed removing the CI configuration. This patch remedies that.
Bumps [github.com/adlio/schema](https://github.com/adlio/schema) from 1.1.13 to 1.1.14. - [Release notes](https://github.com/adlio/schema/releases) - [Commits](adlio/schema@v1.1.13...v1.1.14) --- updated-dependencies: - dependency-name: github.com/adlio/schema dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This is another batch of things to cleanup in the legacy P2P system.
This tweaks the connectivity of test configurations, in hopes that more will be viable. Additionally reduces the prevalence of testing the legacy mempool.
This is mostly just reading through the output of uparam, after noticing that there were a few places where we were ignoring some arguments.
This PR adds the 0.34.14 changes to the changelog in master
This is follow on to the work in #7112.
This PR adds an initial set of metrics for use ABCI. The initial metrics enable the calculation of timing histograms and call counts for each of the ABCI methods. The metrics are also labeled as either 'sync' or 'async' to determine if the method call was performed using ABCI's `*Async` methods. An example of these metrics is included here for reference: ``` tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.0001"} 0 tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.0004"} 5 tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.002"} 12 tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.009"} 13 tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.02"} 13 tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.1"} 13 tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="0.65"} 13 tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="2"} 13 tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="6"} 13 tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="25"} 13 tendermint_abci_connection_method_timing_bucket{chain_id="ci",method="commit",type="sync",le="+Inf"} 13 tendermint_abci_connection_method_timing_sum{chain_id="ci",method="commit",type="sync"} 0.007802058000000001 tendermint_abci_connection_method_timing_count{chain_id="ci",method="commit",type="sync"} 13 ``` These metrics can easily be graphed using prometheus's `histogram_quantile(...)` method to pick out a particular quantile to graph or examine. I chose buckets that were somewhat of an estimate of expected range of times for ABCI operations. They start at .0001 seconds and range to 25 seconds. The hope is that this range captures enough possible times to be useful for us and operators.
This metric describes itself as 'pending' but never actual decrements when the messages are removed from the queue. This change fixes that by decrementing the metric when the data is removed from the queue.
This change removes the partial gRPC interface to the RPC service, which was deprecated in resolution of #6718. Details: - rpc: Remove the client and server interfaces and proto definitions. - Remove the gRPC settings from the config library. - Remove gRPC setup for the RPC service in the node startup. - Fix various test helpers to remove gRPC bits. - Remove the --rpc.grpc-laddr flag from the CLI. Note that to satisfy the protobuf interface check, this change also includes a temporary edit to buf.yaml, that I will revert after this is merged.
This patch was needed to pass the buf breakage check for the proto file removed in #7121, but now that master contains the change we no longer need the patch.
This should have been part of #7121, but I missed it.
Remove v0 blocksync folder structure.
This is another small sliver of #7075, with the intention of removing the legacy shim layer related to channel registration.
A fourth #7075 component patch to simplify the channel creation interface
This is, perhaps, the trival final piece of #7075 that I've been working on. There's more work to be done: - push more of the setup into the pacakges themselves - move channel-based sending/filtering out of the - simplify the buffering throuhgout the p2p stack.
…ull-changes-from-upstream-master
evan-forbes
changed the title
Pull changes from upstream master
chore: Pull changes from upstream master
Oct 15, 2021
adlerjohn
approved these changes
Oct 16, 2021
will try the new approach that will hopefully make it easier to review this friday when we pull the changes again |
evan-forbes
pushed a commit
that referenced
this pull request
Jun 9, 2023
…472) (#561) * Fixes for OpenAPI (RPC) documents and QA docs restructuring (#472) * openapi doc fixes and QA docs fixes * rename title Co-authored-by: Thane Thomson <connect@thanethomson.com> * fix text Co-authored-by: Thane Thomson <connect@thanethomson.com> * add backquotes Co-authored-by: Thane Thomson <connect@thanethomson.com> * update contact --------- Co-authored-by: Thane Thomson <connect@thanethomson.com> (cherry picked from commit 3cd1037) # Conflicts: # docs/qa/CometBFT-QA-37.md # docs/qa/README.md # docs/qa/TMCore-QA-37.md # docs/qa/img37/200nodes_cmt037/all_experiments.png # docs/qa/img37/200nodes_cmt037/avg_mempool_size.png # docs/qa/img37/200nodes_cmt037/block_rate.png # docs/qa/img37/200nodes_cmt037/cpu.png # docs/qa/img37/200nodes_cmt037/e_75cb89a8-f876-4698-82f3-8aaab0b361af.png # docs/qa/img37/200nodes_cmt037/memory.png # docs/qa/img37/200nodes_cmt037/mempool_size.png # docs/qa/img37/200nodes_cmt037/peers.png # docs/qa/img37/200nodes_cmt037/rounds.png # docs/qa/img37/200nodes_cmt037/total_txs_rate.png # docs/qa/img37/200nodes_tm037/avg_mempool_size.png # docs/qa/img37/200nodes_tm037/block_rate_regular.png # docs/qa/img37/200nodes_tm037/cpu.png # docs/qa/img37/200nodes_tm037/memory.png # docs/qa/img37/200nodes_tm037/mempool_size.png # docs/qa/img37/200nodes_tm037/peers.png # docs/qa/img37/200nodes_tm037/rounds.png # docs/qa/img37/200nodes_tm037/total_txs_rate_regular.png # docs/qa/img37/200nodes_tm037/v037_200node_latencies.png # docs/qa/img37/200nodes_tm037/v037_latency_throughput.png # docs/qa/img37/200nodes_tm037/v037_r200c2_heights.png # docs/qa/img37/200nodes_tm037/v037_r200c2_load1.png # docs/qa/img37/200nodes_tm037/v037_r200c2_mempool_size.png # docs/qa/img37/200nodes_tm037/v037_r200c2_mempool_size_avg.png # docs/qa/img37/200nodes_tm037/v037_r200c2_peers.png # docs/qa/img37/200nodes_tm037/v037_r200c2_rounds.png # docs/qa/img37/200nodes_tm037/v037_r200c2_rss.png # docs/qa/img37/200nodes_tm037/v037_r200c2_rss_avg.png # docs/qa/img37/200nodes_tm037/v037_r200c2_total-txs.png # docs/qa/img37/200nodes_tm037/v037_report_tabbed.txt # docs/qa/img37/200nodes_tm037/v037_rotating_heights.png # docs/qa/img37/200nodes_tm037/v037_rotating_heights_ephe.png # docs/qa/img37/200nodes_tm037/v037_rotating_latencies.png # docs/qa/img37/200nodes_tm037/v037_rotating_load1.png # docs/qa/img37/200nodes_tm037/v037_rotating_peers.png # docs/qa/img37/200nodes_tm037/v037_rotating_rss_avg.png # docs/qa/img37/200nodes_tm037/v037_rotating_total-txs.png # rpc/openapi/openapi.yaml * mergify conflic fixes for v0.34 (#561) --------- Co-authored-by: Andy Nogueira <me@andynogueira.dev>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
didn't want to get too far behind upstream, so pulling in changes while we decide on what to do with the bot #558