-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: file watcher for cosmovisor #8590
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8590 +/- ##
==========================================
- Coverage 63.60% 63.60% -0.01%
==========================================
Files 568 568
Lines 53509 53492 -17
==========================================
- Hits 34034 34021 -13
+ Misses 17550 17547 -3
+ Partials 1925 1924 -1
|
1ca11f1
to
091d54a
Compare
What problem does this solve? |
@alessio , This is related to our Slack chat. Cosmovisor doesn't handle correctly the upgrade restarts. The upgrade module generates an upgrade info file - which should be used for upgrade handling instead of logs. |
@robert-zaremba yeah but tread carefully. There is no real OS-agnostic file watching mechanism. Such a thing just does not exist, operating systems provide their own primitives. Behaviour may vary quite much across different OSes. |
Exactly, that's why I'm using the |
And this is exactly what I'm talking about: From https://github.com/fsnotify/fsnotify's README |
Re OSX - this is fine - w need at least one event. For NFS - I don't think anyone will store blockchain using a network filesystem in production. Cosmovisor is for production setups. Do you have a better solution in mind? |
Yes. Don't use libraries that try to be OS-agnostic whilst it's clearly impossible to guarantee behavior consistency across different platforms. Use a timer-based mechanism maybe? Or maybe just develop a socket-based interface between the two processes. |
A time watch, e.g. checking the file for changes every interval. But that is bad too - at least it's consistently bad across all archs. However, Robert mind updating the PR's description with rationale for this changeset, please? I'm thinking of any casual journeyman passing by here, I think they would struggle to understand what is the problem you're trying to solve 👍 |
Updated the description. |
@alessio , @aaronc - can we reach a consensus on the strategy?
|
That sounds like a good recipe for backdoors. It too risks to break
Mmm one second may be a bit too much. Why don't we better integrate external solutions such as |
How about using signals? @robert-zaremba SIGUSR{1,2} would be good for this |
ConceptACK. Using sockets or signals might be a reasonable approach in the future, but |
@aaronc - why supporting log message if we have a file? Log approach seams useless and as we identified - it might be error prone. |
@alessio - so an App would send a signal to cosmovisor (it's parent process)? So we would need to pass a PID as a parameter to the App? |
Yep! That's correct. And conversely, cosmovisor may send a signal back to the app too - if necessary |
Signals are not very well supported in Windows. But that's OK, I don't think there are actual clients running validators on Windows in prod. The good of this is that signals are standard across operating systems (e.g. Linux, Mac OS, others) that abide by the POSIX standards, so their behaviour would be consistent. Not only file watching mechanisms are very much OS-dependent but also their implementations and APIs have changed over time (and risk is that they will continue do to so). |
We should make sure this gets merged before changing how many logs we're putting in INFO: |
<!-- The default pull request template is for types feat, fix, or refactor. For other templates, add one of the following parameters to the url: - template=docs.md - template=other.md --> ## Description Ref: #9616 (comment) depends: #8590 <!-- Add a description of the changes that this PR introduces and the files that are the most critical to review. --> This PR adds a full backup option for cosmovisor. `UNSAFE_SKIP_BACKUP` is an `env` setting introduced newly. - if `false` (default, **recommended**), cosmovisor will try to take backup and then upgrade. In case of failure while taking backup, it will just halt the process there and won't try the upgrade. - If `true`, the cosmovisor will try to upgrade without any backup. This setting makes it hard to recover from a failed upgrade. Node operators either need to sync from a healthy node or use a snapshot from others. --- ### Author Checklist *All items are required. Please add a note to the item if the item is not applicable and please add links to any relevant follow up issues.* I have... - [x] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title - [ ] added `!` to the type prefix if API or client breaking change - [ ] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting)) - [x] provided a link to the relevant issue or specification - [ ] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules) - [ ] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing) - [ ] added a changelog entry to `CHANGELOG.md` - [x] included comments for [documenting Go code](https://blog.golang.org/godoc) - [x] updated the relevant documentation or specification - [x] reviewed "Files changed" and left comments if necessary - [ ] confirmed all CI checks have passed ### Reviewers Checklist *All items are required. Please add a note if the item is not applicable and please add your handle next to the items reviewed if you only reviewed selected items.* I have... - [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title - [ ] confirmed `!` in the type prefix if API or client breaking change - [ ] confirmed all author checklist items have been addressed - [ ] reviewed state machine logic - [ ] reviewed API design and naming - [ ] reviewed documentation is accurate - [ ] reviewed tests and test coverage - [ ] manually tested (if applicable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#9652 has been merged. Pre-approving with a few minor documentation suggestions.
The first line of the documentation needs to be updated to reflect changes (no long monitoring stdout
):
cosmovisor
is a small process manager for Cosmos SDK application binaries that monitors the governance module via stdout for incoming chain upgrade proposals.
We will need to make it clear in the release notes that auto-download with this version of cosmovisor will not work with Stargate chains. It might not hurt to add a warning at the top of the documentation as well that explains this.
I've merged master (and the auto backup feature) |
Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Ryan Christoffersen <12519942+ryanchristo@users.noreply.github.com>
it seems this PR broke CI. Other prs are failing on cosmos visor tests now fail. |
Yes, because some tests were downloading a file which was using a branch path. We fixed it. |
So, this release should works with SDK v0.42.9 or not? |
cosmovisor @ master and upcoming For latest cosmovisor and autodownload you need SDK v0.43.x or later. |
Adding upgrade file watcher for cosmovisor. Currently the comswisor upgrade mechanism relays on parsing log messages. This is not reliable: + depends on the log level output (x/upgrade uses INFO) + can be hacked by accidentally logging user user content + can be broken by using upgrade name which will break the regex pattern. closes: cosmos#7703 closes: cosmos#8523 closes: cosmos#8651 closes: cosmos#8793 closes: cosmos#8964 **Depends on**: - cosmos#9652 --- Before we can merge this PR, please make sure that all the following items have been checked off. If any of the checklist items are not applicable, please leave them but write a little note why. - [ ] Targeted PR against correct branch (see [CONTRIBUTING.md](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting)) - [ ] Linked to Github issue with discussion and accepted design OR link to spec that describes this work. - [ ] Code follows the [module structure standards](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules/structure.md). - [ ] Wrote unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing) - [ ] Updated relevant documentation (`docs/`) or specification (`x/<module>/spec/`) - [ ] Added relevant `godoc` [comments](https://blog.golang.org/godoc-documenting-go-code). - [ ] Added a relevant changelog entry to the `Unreleased` section in `CHANGELOG.md` - [ ] Re-reviewed `Files changed` in the Github PR explorer - [ ] Review `Codecov Report` in the comment section below once CI passes (cherry picked from commit 13559f9)
## Description This PR updates Cosmos to `v0.44.3` and also cherry picks the changes that have been made inside cosmos/cosmos-sdk#8590 to fix the issue described in cosmos/cosmos-sdk#10428 <!-- Add a description of the changes that this PR introduces and the files that are the most critical to review. --> --- ### Author Checklist *All items are required. Please add a note to the item if the item is not applicable and please add links to any relevant follow up issues.* I have... - [x] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title - [ ] added `!` to the type prefix if API or client breaking change - [x] targeted the correct branch (see [PR Targeting](https://github.com/desmos-labs/desmos/blob/master/CONTRIBUTING.md#pr-targeting)) - [ ] provided a link to the relevant issue or specification - [ ] followed the guidelines for [building modules](https://docs.cosmos.network/v0.44/building-modules/intro.html) - [ ] included the necessary unit and integration [tests](https://github.com/desmos-labs/desmos/blob/master/CONTRIBUTING.md#testing) - [x] added a changelog entry to `CHANGELOG.md` - [ ] included comments for [documenting Go code](https://blog.golang.org/godoc) - [ ] updated the relevant documentation or specification - [x] reviewed "Files changed" and left comments if necessary - [x] confirmed all CI checks have passed ### Reviewers Checklist *All items are required. Please add a note if the item is not applicable and please add your handle next to the items reviewed if you only reviewed selected items.* I have... - [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title - [ ] confirmed `!` in the type prefix if API or client breaking change - [ ] confirmed all author checklist items have been addressed - [ ] reviewed state machine logic - [ ] reviewed API design and naming - [ ] reviewed documentation is accurate - [ ] reviewed tests and test coverage - [ ] manually tested (if applicable)
Adding upgrade file watcher for cosmovisor. Currently the comswisor upgrade mechanism relays on parsing log messages. This is not reliable: + depends on the log level output (x/upgrade uses INFO) + can be hacked by accidentally logging user user content + can be broken by using upgrade name which will break the regex pattern. closes: cosmos#7703 closes: cosmos#8523 closes: cosmos#8651 closes: cosmos#8793 closes: cosmos#8964 **Depends on**: - cosmos#9652 --- Before we can merge this PR, please make sure that all the following items have been checked off. If any of the checklist items are not applicable, please leave them but write a little note why. - [ ] Targeted PR against correct branch (see [CONTRIBUTING.md](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting)) - [ ] Linked to Github issue with discussion and accepted design OR link to spec that describes this work. - [ ] Code follows the [module structure standards](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules/structure.md). - [ ] Wrote unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing) - [ ] Updated relevant documentation (`docs/`) or specification (`x/<module>/spec/`) - [ ] Added relevant `godoc` [comments](https://blog.golang.org/godoc-documenting-go-code). - [ ] Added a relevant changelog entry to the `Unreleased` section in `CHANGELOG.md` - [ ] Re-reviewed `Files changed` in the Github PR explorer - [ ] Review `Codecov Report` in the comment section below once CI passes (cherry picked from commit 13559f9)
Adding upgrade file watcher for cosmovisor. Currently the comswisor upgrade mechanism relays on parsing log messages. This is not reliable: + depends on the log level output (x/upgrade uses INFO) + can be hacked by accidentally logging user user content + can be broken by using upgrade name which will break the regex pattern. closes: cosmos#7703 closes: cosmos#8523 closes: cosmos#8651 closes: cosmos#8793 closes: cosmos#8964 **Depends on**: - cosmos#9652 --- Before we can merge this PR, please make sure that all the following items have been checked off. If any of the checklist items are not applicable, please leave them but write a little note why. - [ ] Targeted PR against correct branch (see [CONTRIBUTING.md](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting)) - [ ] Linked to Github issue with discussion and accepted design OR link to spec that describes this work. - [ ] Code follows the [module structure standards](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules/structure.md). - [ ] Wrote unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing) - [ ] Updated relevant documentation (`docs/`) or specification (`x/<module>/spec/`) - [ ] Added relevant `godoc` [comments](https://blog.golang.org/godoc-documenting-go-code). - [ ] Added a relevant changelog entry to the `Unreleased` section in `CHANGELOG.md` - [ ] Re-reviewed `Files changed` in the Github PR explorer - [ ] Review `Codecov Report` in the comment section below once CI passes (cherry picked from commit 13559f9)
Adding upgrade file watcher for cosmovisor. Currently the comswisor upgrade mechanism relays on parsing log messages. This is not reliable: + depends on the log level output (x/upgrade uses INFO) + can be hacked by accidentally logging user user content + can be broken by using upgrade name which will break the regex pattern. closes: cosmos#7703 closes: cosmos#8523 closes: cosmos#8651 closes: cosmos#8793 closes: cosmos#8964 **Depends on**: - cosmos#9652 --- Before we can merge this PR, please make sure that all the following items have been checked off. If any of the checklist items are not applicable, please leave them but write a little note why. - [ ] Targeted PR against correct branch (see [CONTRIBUTING.md](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting)) - [ ] Linked to Github issue with discussion and accepted design OR link to spec that describes this work. - [ ] Code follows the [module structure standards](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules/structure.md). - [ ] Wrote unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing) - [ ] Updated relevant documentation (`docs/`) or specification (`x/<module>/spec/`) - [ ] Added relevant `godoc` [comments](https://blog.golang.org/godoc-documenting-go-code). - [ ] Added a relevant changelog entry to the `Unreleased` section in `CHANGELOG.md` - [ ] Re-reviewed `Files changed` in the Github PR explorer - [ ] Review `Codecov Report` in the comment section below once CI passes (cherry picked from commit 13559f9)
Adding upgrade file watcher for cosmovisor.
Currently the comswisor upgrade mechanism relays on parsing log messages. This is not reliable:
closes: #7703
closes: #8523
closes: #8651
closes: #8793
closes: #8964
Depends on:
Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.
docs/
) or specification (x/<module>/spec/
)godoc
comments.Unreleased
section inCHANGELOG.md
Files changed
in the Github PR explorerCodecov Report
in the comment section below once CI passes