Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: file watcher for cosmovisor #8590

Merged
merged 64 commits into from
Aug 11, 2021
Merged

feat: file watcher for cosmovisor #8590

merged 64 commits into from
Aug 11, 2021

Conversation

robert-zaremba
Copy link
Collaborator

@robert-zaremba robert-zaremba commented Feb 15, 2021

Adding upgrade file watcher for cosmovisor.

Currently the comswisor upgrade mechanism relays on parsing log messages. This is not reliable:

  • depends on the log level output (x/upgrade uses INFO)
  • can be hacked by accidentally logging user user content
  • can be broken by using upgrade name which will break the regex pattern.

closes: #7703
closes: #8523
closes: #8651
closes: #8793
closes: #8964

Depends on:


Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

  • Targeted PR against correct branch (see CONTRIBUTING.md)
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
  • Code follows the module structure standards.
  • Wrote unit and integration tests
  • Updated relevant documentation (docs/) or specification (x/<module>/spec/)
  • Added relevant godoc comments.
  • Added a relevant changelog entry to the Unreleased section in CHANGELOG.md
  • Re-reviewed Files changed in the Github PR explorer
  • Review Codecov Report in the comment section below once CI passes

@codecov
Copy link

codecov bot commented Feb 15, 2021

Codecov Report

Merging #8590 (ed5c0cd) into master (0bcce5d) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head ed5c0cd differs from pull request most recent head 25a2378. Consider uploading reports for the commit 25a2378 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master    #8590      +/-   ##
==========================================
- Coverage   63.60%   63.60%   -0.01%     
==========================================
  Files         568      568              
  Lines       53509    53492      -17     
==========================================
- Hits        34034    34021      -13     
+ Misses      17550    17547       -3     
+ Partials     1925     1924       -1     
Impacted Files Coverage Δ
store/types/gas.go 84.44% <ø> (ø)
store/types/store.go 68.75% <ø> (ø)
x/upgrade/types/plan.go 96.29% <ø> (ø)
x/upgrade/abci.go 94.73% <100.00%> (+0.14%) ⬆️
x/upgrade/keeper/keeper.go 80.18% <100.00%> (+0.08%) ⬆️
x/bank/types/key.go 70.00% <0.00%> (-12.36%) ⬇️
x/bank/migrations/v044/store.go 68.42% <0.00%> (-5.27%) ⬇️
x/upgrade/keeper/grpc_query.go 75.60% <0.00%> (ø)
crypto/keys/internal/ecdsa/privkey.go 84.21% <0.00%> (+1.75%) ⬆️
x/bank/migrations/v043/keys.go 15.38% <0.00%> (+2.88%) ⬆️

@alessio
Copy link
Contributor

alessio commented Feb 16, 2021

What problem does this solve?

@robert-zaremba
Copy link
Collaborator Author

What problem does this solve?

@alessio , This is related to our Slack chat. Cosmovisor doesn't handle correctly the upgrade restarts. The upgrade module generates an upgrade info file - which should be used for upgrade handling instead of logs.

@alessio
Copy link
Contributor

alessio commented Feb 16, 2021

@robert-zaremba yeah but tread carefully. There is no real OS-agnostic file watching mechanism. Such a thing just does not exist, operating systems provide their own primitives. Behaviour may vary quite much across different OSes.

@robert-zaremba
Copy link
Collaborator Author

@robert-zaremba yeah but tread carefully. There is no real OS-agnostic file watching mechanism. Such a thing just does not exist, operating systems provide their own primitives. Behaviour may vary quite much across different OSes.

Exactly, that's why I'm using the fsnotify package.

@alessio
Copy link
Contributor

alessio commented Feb 16, 2021

And this is exactly what I'm talking about:

image

From https://github.com/fsnotify/fsnotify's README

@robert-zaremba
Copy link
Collaborator Author

Re OSX - this is fine - w need at least one event.

For NFS - I don't think anyone will store blockchain using a network filesystem in production. Cosmovisor is for production setups.

Do you have a better solution in mind?

@alessio
Copy link
Contributor

alessio commented Feb 16, 2021

Yes. Don't use libraries that try to be OS-agnostic whilst it's clearly impossible to guarantee behavior consistency across different platforms. Use a timer-based mechanism maybe? Or maybe just develop a socket-based interface between the two processes.

@robert-zaremba
Copy link
Collaborator Author

@alessio - what do you mean with time based mechanism? I was thinking about creating a socket, but that would require updating SDK. @aaronc what do you think?

@alessio
Copy link
Contributor

alessio commented Feb 17, 2021

what do you mean with time based mechanism?

A time watch, e.g. checking the file for changes every interval. But that is bad too - at least it's consistently bad across all archs.

However, Robert mind updating the PR's description with rationale for this changeset, please? I'm thinking of any casual journeyman passing by here, I think they would struggle to understand what is the problem you're trying to solve 👍

@robert-zaremba
Copy link
Collaborator Author

Updated the description.

@robert-zaremba
Copy link
Collaborator Author

robert-zaremba commented Feb 17, 2021

@alessio , @aaronc - can we reach a consensus on the strategy?

  • The solution proposed in this PR (using fsnotify) works but may have problem with network file systems. I tested fsnotify on mac and linux with different partitions and it was working.
  • Alessio proposal of checking frequently a file change will work. We could do it once a second.
  • Previously with Aaron we were discussing socket communication between x/upgrade and cosmovisor, but we didn't want to change the SDK.

@alessio
Copy link
Contributor

alessio commented Feb 17, 2021

Previously with Aaron we were discussing socket communication between x/upgrade and cosmovisor,

That sounds like a good recipe for backdoors. It too risks to break x/upgrade determinism.

Alessio proposal of checking frequently a file change will work. We could do it once a second.

Mmm one second may be a bit too much. Why don't we better integrate external solutions such as systemd on GNU/Linux and launchd on Mac OS? They're designed to take care of things such as process start, stop and restart. Else we risk reinventing the wheel.

@alessio
Copy link
Contributor

alessio commented Feb 17, 2021

How about using signals? @robert-zaremba

SIGUSR{1,2} would be good for this

@aaronc
Copy link
Member

aaronc commented Feb 17, 2021

ConceptACK.

Using sockets or signals might be a reasonable approach in the future, but x/upgrade would need to support it and it doesn't yet. So first we should make sure cosmovisor supports what x/upgrade already supports (which is log messages + the file) so that this can be used to migrate Stargate networks -> 0.42. We can add sockets or signals to x/upgrade if that's desirable but it's a separate issue.

@robert-zaremba
Copy link
Collaborator Author

robert-zaremba commented Feb 17, 2021

@aaronc - why supporting log message if we have a file? Log approach seams useless and as we identified - it might be error prone.

@robert-zaremba
Copy link
Collaborator Author

SIGUSR{1,2} would be good for this

@alessio - so an App would send a signal to cosmovisor (it's parent process)? So we would need to pass a PID as a parameter to the App?

@alessio
Copy link
Contributor

alessio commented Feb 17, 2021

SIGUSR{1,2} would be good for this

@alessio - so an App would send a signal to cosmovisor (it's parent process)? So we would need to pass a PID as a parameter to the App?

Yep! That's correct. And conversely, cosmovisor may send a signal back to the app too - if necessary

@alessio
Copy link
Contributor

alessio commented Feb 17, 2021

Signals are not very well supported in Windows. But that's OK, I don't think there are actual clients running validators on Windows in prod. The good of this is that signals are standard across operating systems (e.g. Linux, Mac OS, others) that abide by the POSIX standards, so their behaviour would be consistent. Not only file watching mechanisms are very much OS-dependent but also their implementations and APIs have changed over time (and risk is that they will continue do to so).

@clevinson
Copy link
Contributor

We should make sure this gets merged before changing how many logs we're putting in INFO:

mergify bot pushed a commit that referenced this pull request Aug 5, 2021
<!--
The default pull request template is for types feat, fix, or refactor.
For other templates, add one of the following parameters to the url:
- template=docs.md
- template=other.md
-->

## Description

Ref: #9616 (comment)

depends: #8590

<!-- Add a description of the changes that this PR introduces and the files that
are the most critical to review. -->

This PR adds a full backup option for cosmovisor. 
`UNSAFE_SKIP_BACKUP` is an `env` setting introduced newly.
- if `false` (default, **recommended**), cosmovisor will try to take backup and then upgrade. In case of failure while taking backup, it will just halt the process there and won't try the upgrade.
- If `true`, the cosmovisor will try to upgrade without any backup. This setting makes it hard to recover from a failed upgrade. Node operators either need to sync from a healthy node or use a snapshot from others.

---

### Author Checklist

*All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.*

I have...

- [x] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] added `!` to the type prefix if API or client breaking change
- [ ] targeted the correct branch (see [PR Targeting](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [x] provided a link to the relevant issue or specification
- [ ] followed the guidelines for [building modules](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules)
- [ ] included the necessary unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [ ] added a changelog entry to `CHANGELOG.md`
- [x] included comments for [documenting Go code](https://blog.golang.org/godoc)
- [x] updated the relevant documentation or specification
- [x] reviewed "Files changed" and left comments if necessary
- [ ] confirmed all CI checks have passed

### Reviewers Checklist

*All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.*

I have...

- [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] confirmed `!` in the type prefix if API or client breaking change
- [ ] confirmed all author checklist items have been addressed 
- [ ] reviewed state machine logic
- [ ] reviewed API design and naming
- [ ] reviewed documentation is accurate
- [ ] reviewed tests and test coverage
- [ ] manually tested (if applicable)
Copy link
Contributor

@ryanchristo ryanchristo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#9652 has been merged. Pre-approving with a few minor documentation suggestions.

The first line of the documentation needs to be updated to reflect changes (no long monitoring stdout):

cosmovisor is a small process manager for Cosmos SDK application binaries that monitors the governance module via stdout for incoming chain upgrade proposals.

We will need to make it clear in the release notes that auto-download with this version of cosmovisor will not work with Stargate chains. It might not hurt to add a warning at the top of the documentation as well that explains this.

cosmovisor/README.md Outdated Show resolved Hide resolved
cosmovisor/README.md Outdated Show resolved Hide resolved
cosmovisor/README.md Outdated Show resolved Hide resolved
cosmovisor/README.md Outdated Show resolved Hide resolved
cosmovisor/README.md Outdated Show resolved Hide resolved
cosmovisor/README.md Outdated Show resolved Hide resolved
@robert-zaremba robert-zaremba changed the title DO NOT MERGE feat: file watcher for cosmovisor feat: file watcher for cosmovisor Aug 6, 2021
@robert-zaremba
Copy link
Collaborator Author

I've merged master (and the auto backup feature)

cosmovisor/process.go Outdated Show resolved Hide resolved
robert-zaremba and others added 5 commits August 11, 2021 01:15
Co-authored-by: Dev Ojha <ValarDragon@users.noreply.github.com>
Co-authored-by: Ryan Christoffersen <12519942+ryanchristo@users.noreply.github.com>
@robert-zaremba robert-zaremba added the A:automerge Automatically merge PR once all prerequisites pass. label Aug 11, 2021
@mergify mergify bot merged commit 13559f9 into master Aug 11, 2021
@mergify mergify bot deleted the robert/cosmvisor-file-watch branch August 11, 2021 15:03
@tac0turtle
Copy link
Member

it seems this PR broke CI. Other prs are failing on cosmos visor tests now fail.

@robert-zaremba
Copy link
Collaborator Author

Yes, because some tests were downloading a file which was using a branch path. We fixed it.

@RaulBernal
Copy link

So, this release should works with SDK v0.42.9 or not?

@robert-zaremba
Copy link
Collaborator Author

cosmovisor @ master and upcoming cosmovisor-v1.0 doesn't support autodownload with chains running on SDK v0.42.x. Otherwise it works like a charm.

For latest cosmovisor and autodownload you need SDK v0.43.x or later.

RiccardoM pushed a commit to desmos-labs/cosmos-sdk that referenced this pull request Nov 2, 2021
Adding upgrade file watcher for cosmovisor.

Currently the comswisor upgrade mechanism relays on parsing log messages. This is not reliable:
+ depends on the log level output (x/upgrade uses INFO)
+ can be hacked by accidentally logging user user content
+ can be broken by using upgrade name which will break the regex pattern.

closes: cosmos#7703
closes: cosmos#8523
closes: cosmos#8651
closes: cosmos#8793
closes: cosmos#8964

**Depends on**:
- cosmos#9652

---

Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

- [ ] Targeted PR against correct branch (see [CONTRIBUTING.md](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [ ] Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
- [ ] Code follows the [module structure standards](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules/structure.md).
- [ ] Wrote unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [ ] Updated relevant documentation (`docs/`) or specification (`x/<module>/spec/`)
- [ ] Added relevant `godoc` [comments](https://blog.golang.org/godoc-documenting-go-code).
- [ ] Added a relevant changelog entry to the `Unreleased` section in `CHANGELOG.md`
- [ ] Re-reviewed `Files changed` in the Github PR explorer
- [ ] Review `Codecov Report` in the comment section below once CI passes

(cherry picked from commit 13559f9)
mergify bot pushed a commit to desmos-labs/desmos that referenced this pull request Nov 2, 2021
## Description
This PR updates Cosmos to `v0.44.3` and also cherry picks the changes that have been made inside cosmos/cosmos-sdk#8590 to fix the issue described in cosmos/cosmos-sdk#10428

<!-- Add a description of the changes that this PR introduces and the files that
are the most critical to review. -->

---

### Author Checklist

*All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.*

I have...

- [x] included the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] added `!` to the type prefix if API or client breaking change
- [x] targeted the correct branch (see [PR Targeting](https://github.com/desmos-labs/desmos/blob/master/CONTRIBUTING.md#pr-targeting))
- [ ] provided a link to the relevant issue or specification
- [ ] followed the guidelines for [building modules](https://docs.cosmos.network/v0.44/building-modules/intro.html)
- [ ] included the necessary unit and integration [tests](https://github.com/desmos-labs/desmos/blob/master/CONTRIBUTING.md#testing)
- [x] added a changelog entry to `CHANGELOG.md`
- [ ] included comments for [documenting Go code](https://blog.golang.org/godoc)
- [ ] updated the relevant documentation or specification
- [x] reviewed "Files changed" and left comments if necessary
- [x] confirmed all CI checks have passed

### Reviewers Checklist

*All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.*

I have...

- [ ] confirmed the correct [type prefix](https://github.com/commitizen/conventional-commit-types/blob/v3.0.0/index.json) in the PR title
- [ ] confirmed `!` in the type prefix if API or client breaking change
- [ ] confirmed all author checklist items have been addressed
- [ ] reviewed state machine logic
- [ ] reviewed API design and naming
- [ ] reviewed documentation is accurate
- [ ] reviewed tests and test coverage
- [ ] manually tested (if applicable)
leobragaz pushed a commit to desmos-labs/cosmos-sdk that referenced this pull request Apr 5, 2022
Adding upgrade file watcher for cosmovisor.

Currently the comswisor upgrade mechanism relays on parsing log messages. This is not reliable:
+ depends on the log level output (x/upgrade uses INFO)
+ can be hacked by accidentally logging user user content
+ can be broken by using upgrade name which will break the regex pattern.

closes: cosmos#7703
closes: cosmos#8523
closes: cosmos#8651
closes: cosmos#8793
closes: cosmos#8964

**Depends on**:
- cosmos#9652

---

Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

- [ ] Targeted PR against correct branch (see [CONTRIBUTING.md](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [ ] Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
- [ ] Code follows the [module structure standards](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules/structure.md).
- [ ] Wrote unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [ ] Updated relevant documentation (`docs/`) or specification (`x/<module>/spec/`)
- [ ] Added relevant `godoc` [comments](https://blog.golang.org/godoc-documenting-go-code).
- [ ] Added a relevant changelog entry to the `Unreleased` section in `CHANGELOG.md`
- [ ] Re-reviewed `Files changed` in the Github PR explorer
- [ ] Review `Codecov Report` in the comment section below once CI passes

(cherry picked from commit 13559f9)
RiccardoM pushed a commit to desmos-labs/cosmos-sdk that referenced this pull request Oct 25, 2022
Adding upgrade file watcher for cosmovisor.

Currently the comswisor upgrade mechanism relays on parsing log messages. This is not reliable:
+ depends on the log level output (x/upgrade uses INFO)
+ can be hacked by accidentally logging user user content
+ can be broken by using upgrade name which will break the regex pattern.

closes: cosmos#7703
closes: cosmos#8523
closes: cosmos#8651
closes: cosmos#8793
closes: cosmos#8964

**Depends on**:
- cosmos#9652

---

Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

- [ ] Targeted PR against correct branch (see [CONTRIBUTING.md](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [ ] Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
- [ ] Code follows the [module structure standards](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules/structure.md).
- [ ] Wrote unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [ ] Updated relevant documentation (`docs/`) or specification (`x/<module>/spec/`)
- [ ] Added relevant `godoc` [comments](https://blog.golang.org/godoc-documenting-go-code).
- [ ] Added a relevant changelog entry to the `Unreleased` section in `CHANGELOG.md`
- [ ] Re-reviewed `Files changed` in the Github PR explorer
- [ ] Review `Codecov Report` in the comment section below once CI passes

(cherry picked from commit 13559f9)
RiccardoM pushed a commit to desmos-labs/cosmos-sdk that referenced this pull request Nov 14, 2022
Adding upgrade file watcher for cosmovisor.

Currently the comswisor upgrade mechanism relays on parsing log messages. This is not reliable:
+ depends on the log level output (x/upgrade uses INFO)
+ can be hacked by accidentally logging user user content
+ can be broken by using upgrade name which will break the regex pattern.

closes: cosmos#7703
closes: cosmos#8523
closes: cosmos#8651
closes: cosmos#8793
closes: cosmos#8964

**Depends on**:
- cosmos#9652

---

Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

- [ ] Targeted PR against correct branch (see [CONTRIBUTING.md](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#pr-targeting))
- [ ] Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
- [ ] Code follows the [module structure standards](https://github.com/cosmos/cosmos-sdk/blob/master/docs/building-modules/structure.md).
- [ ] Wrote unit and integration [tests](https://github.com/cosmos/cosmos-sdk/blob/master/CONTRIBUTING.md#testing)
- [ ] Updated relevant documentation (`docs/`) or specification (`x/<module>/spec/`)
- [ ] Added relevant `godoc` [comments](https://blog.golang.org/godoc-documenting-go-code).
- [ ] Added a relevant changelog entry to the `Unreleased` section in `CHANGELOG.md`
- [ ] Re-reviewed `Files changed` in the Github PR explorer
- [ ] Review `Codecov Report` in the comment section below once CI passes

(cherry picked from commit 13559f9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A:automerge Automatically merge PR once all prerequisites pass. C:Cosmovisor Issues and PR related to Cosmovisor C:x/upgrade
Projects
None yet