Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CLI] upgrade urfave/cli to v2 #6285

Merged
merged 16 commits into from
Sep 27, 2024

Conversation

shijiesheng
Copy link
Contributor

@shijiesheng shijiesheng commented Sep 13, 2024

What changed?

  • changed GlobalFlag to normal flags as it's no longer supported
  • Flag aliases are separated into a separate field
  • populate error for CLI actions

Why?

V2 introduced a background context field in cli.Context which can be used to propagate traces and security tokens in a centralized way.

How did you test it?
Unit test

Potential risks

Release notes

Documentation Changes

Before

shengs@shengs-DQG34F2RHF cli % cadence --env development --do not-exist-domain domain describe
Error: Domain not-exist-domain does not exist.
Error Details: Domain not-exist-domain does not exist.
('export CADENCE_CLI_SHOW_STACKS=1' to see stack traces)

After

shengs@shengs-DQG34F2RHF cli % go run main.go --do not-exist-domain domain describe
Error: Domain not-exist-domain does not exist.
Error Details: Domain not-exist-domain does not exist.
('export CADENCE_CLI_SHOW_STACKS=1' to see stack traces)
Domain not-exist-domain does not exist.
exit status 1

Copy link

codecov bot commented Sep 23, 2024

Codecov Report

Attention: Patch coverage is 67.78098% with 559 lines in your changes missing coverage. Please review.

Project coverage is 73.16%. Comparing base (34daef5) to head (3b9656f).
Report is 15 commits behind head on master.

Files with missing lines Patch % Lines
tools/cli/workflow_commands.go 44.61% 133 Missing and 11 partials ⚠️
tools/cli/admin_commands.go 11.11% 72 Missing ⚠️
tools/cli/domain_commands.go 51.16% 35 Missing and 7 partials ⚠️
tools/cli/admin_failover_commands.go 9.09% 40 Missing ⚠️
tools/cli/admin_elastic_search_commands.go 0.00% 39 Missing ⚠️
tools/cli/admin_dlq_commands.go 0.00% 26 Missing ⚠️
tools/cli/workflow_batch_commands.go 0.00% 22 Missing ⚠️
tools/cli/admin_config_store_commands.go 0.00% 20 Missing ⚠️
tools/cli/isolation-groups.go 0.00% 18 Missing ⚠️
tools/cli/admin_cluster_commands.go 15.00% 17 Missing ⚠️
... and 15 more
Additional details and impacted files
Files with missing lines Coverage Δ
tools/cli/app.go 98.73% <100.00%> (+0.04%) ⬆️
tools/cli/cluster.go 100.00% <100.00%> (ø)
tools/cli/database.go 46.72% <100.00%> (+0.23%) ⬆️
tools/cli/domain_utils.go 1.72% <ø> (ø)
tools/cli/flags.go 99.78% <100.00%> (+0.03%) ⬆️
tools/cli/render.go 85.82% <ø> (ø)
tools/cli/task_list.go 100.00% <100.00%> (ø)
tools/cli/workflow.go 100.00% <100.00%> (ø)
tools/common/schema/handler.go 30.55% <ø> (ø)
tools/cli/cluster_commands.go 55.55% <66.66%> (ø)
... and 24 more

... and 23 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 34daef5...3b9656f. Read the comment docs.

tools/cli/domain_commands.go Outdated Show resolved Hide resolved
tools/cli/domain.go Outdated Show resolved Hide resolved
tools/cli/domain_commands.go Outdated Show resolved Hide resolved
tools/cli/domain_commands.go Outdated Show resolved Hide resolved
Copy link
Member

@Groxx Groxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far:

  • overall very glad for the PR 🙇
    • we've needed this for quite a while, lots of these changes are clearly good
    • I tried something similar a while back, and yea. this all looks reasonable / expected scale. it's a lot of work.
  • a number of missing returns on ErrorAndPrint, which used to exit, so that's a meaningful control-flow change
    • errcheck ./tools/... finds a fair number of these, they seem worth fixing
  • I suspect there are paths that do multiple ErrorAndPrint calls per call stack, where before the os.Exit ensured only one happened.
    • I haven't actually verified this though, and I think I would need to reread to make sure this is correct. so let me know if it's actually fine :)
    • I suspect this all means we would actually be better off making a custom error printer and returning a DisplayError{"Invalid arg x", cause: err} everywhere we have an ErrorAndPrint. and then the error's full chain is the "error details".

Seems very close but I think there are a few minor things to polish, and maybe that error-return change is worth doing instead. It'd be structurally identical to all this though, so probably not too hard to shift if it seems like it'll work.

tools/cli/domain_utils.go Outdated Show resolved Hide resolved
tools/cli/factory.go Outdated Show resolved Hide resolved
@Groxx
Copy link
Member

Groxx commented Sep 25, 2024

I'll say:

  • check new comments
  • see if the "what if ErrorAndPrint was return PrintableErr + an error-printer" works? that feels like it'd be a lot easier to be confident about "this does not print multiple times", and easier to maintain for the future.
    • I believe that's https://pkg.go.dev/github.com/urfave/cli/v2#ExitErrHandlerFunc
    • this is partly "'no duplicates' is hard to verify, which means it'll be hard to maintain" and partly "current output seems kinda worse as it duplicates the error message". seems worth trying to improve.
    • worst case is probably: ErrorAndPrint can return an alreadyPrintedErr{err} and you can errors.As(&err, &printed) to print nothing in the exit-err-handler because you know it has already been handled". that way we have both a fallback for un-special-printed errors, and no duplicates.

Otherwise LGTM and happy to stamp. A ton of this is a big improvement already.

@shijiesheng
Copy link
Contributor Author

I'll say:

  • check new comments

  • see if the "what if ErrorAndPrint was return PrintableErr + an error-printer" works? that feels like it'd be a lot easier to be confident about "this does not print multiple times", and easier to maintain for the future.

    • I believe that's https://pkg.go.dev/github.com/urfave/cli/v2#ExitErrHandlerFunc
    • this is partly "'no duplicates' is hard to verify, which means it'll be hard to maintain" and partly "current output seems kinda worse as it duplicates the error message". seems worth trying to improve.
    • worst case is probably: ErrorAndPrint can return an alreadyPrintedErr{err} and you can errors.As(&err, &printed) to print nothing in the exit-err-handler because you know it has already been handled". that way we have both a fallback for un-special-printed errors, and no duplicates.

Otherwise LGTM and happy to stamp. A ton of this is a big improvement already.

fixed with cli provided errorhandlerfunc. This is much better now. The merge logic is concatenation for duplicate printable errors. Below is just a sanity test to see what it looks like. Same as before.

shengs@shengs-DQG34F2RHF cli % go run main.go --do not-exist-domain domain describe
Error: Domain not-exist-domain does not exist.
Error Details: Domain not-exist-domain does not exist.
('export CADENCE_CLI_SHOW_STACKS=1' to see stack traces)
Domain not-exist-domain does not exist.
exit status 1

Comment on lines 537 to 545
func PrintableError(msg string, err error) error {
var printable *printableError
if errors.As(err, &printable) { // already printable error type
printable.msg = msg + ": " + printable.msg
} else {
printable = &printableError{msg: msg, err: err}
}
return printable
}
Copy link
Member

@Groxx Groxx Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably gonna want to make a "fresh" wrapper every time, as otherwise currently this will take

err{"a", err{"b", printable{"c", err}}}

and turn it into

printable{"d: c", err}

and that's probably not intended.

(also in principle mutating an error is very risky, as that could be happening concurrently. only read, never modify, or use mutexes)

Copy link
Member

@Groxx Groxx Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm. yea actually this is stranger the more I look at it.

for context (also in chat), generally the pattern with errors is:

  • always wrap to add info, build up ~infinite layers as necessary
  • never mutate, because it could be concurrently touched anywhere, because you don't know where it came from.
  • ^ due to these, it's a lot like a backwards Context. wrap to control, always assume concurrency (so read-only or add mutexes to protect it).
  • when "using" the error, either As to get the thing you care about and interrogate it, or iteratively Unwrap() and handle the whole chain somehow
    • generally in-error methods check their wrapped-err-chain for nested info, like if we had a PrintedError{err{PrintedErr{err}}}.Print() -> that Print() should probably include both printed-err messages somehow.
    • not being unwrap-able is reasonable if the outer-most layer supercedes all wrapped layers. but you might still want to pull out wrapped "cause" errors. or save that into each layer when wrapping like fmt.Errorf does, but generally this is hard to reuse.
    • ^ this is also pretty Context-like, if you ever .WithValue(...) things that care if there's a higher level instance above them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense. I'll always wrap

Copy link
Member

@taylanisikdemir taylanisikdemir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this migration. This will unblock us to easily write unit tests for CLI (probably a few more follow up changes to isolate variables to instance level and pass in context)

@shijiesheng
Copy link
Contributor Author

What changed?

changed GlobalFlag to normal flags as it's no longer supported
Flag aliases are separated into a separate field
populate error for CLI actions
Why?

V2 introduced a background context field in cli.Context which can be used to propagate traces and security tokens in a centralized way.

How did you test it?
Unit test

Potential risks

Release notes

Documentation Changes

Before

shengs@shengs-DQG34F2RHF cli % cadence --env development --do not-exist-domain domain describe
Error: Domain not-exist-domain does not exist.
Error Details: Domain not-exist-domain does not exist.
('export CADENCE_CLI_SHOW_STACKS=1' to see stack traces)
After

shengs@shengs-DQG34F2RHF cli % go run main.go --do not-exist-domain domain describe
Error: Domain not-exist-domain does not exist.
Error Details: Domain not-exist-domain does not exist.
('export CADENCE_CLI_SHOW_STACKS=1' to see stack traces)
Domain not-exist-domain does not exist.
exit status 1

@shijiesheng shijiesheng merged commit d19937f into cadence-workflow:master Sep 27, 2024
18 checks passed
Groxx added a commit that referenced this pull request Oct 1, 2024
Unfortunately I missed a "log.Printf -> return fmt.Errorf" mistake in #6285.
I think it just got lost in the surrounding "log.Fatalf -> return fmt.Errorf" flood.

Oh well.  Easy fix.  This appears to be the only one.
Groxx added a commit that referenced this pull request Oct 3, 2024
…where (#6322)

Clean-up and follow-up from #6285 and #6318, but only part 1 of [probably 2].
The next part will change our ErrorAndExit calls into error returns, so we get chains of causes rather than just one message when a fatal error is hit.

If there's a part 3, maybe it'll use https://github.com/bracesdev/errtrace or a https://github.com/pkg/errors-like wrapper to collect useful stack traces along with the error returns, to bring back `CADENCE_CLI_SHOW_STACKS` for all errors.

---

Contains three major changes:
1. creates some standard "literally all CLIs" tooling in common/commoncli (to avoid conflicts with "common" and "cli")
2. moves `cli.NewApp()` and `PrintableError` into commoncli (as `New` and `Problem`)
3. fancy shmancy wrapped error printing with indentation, nested-error cleanup, and colors (we had colors before)

1 and 2 are pretty straightforward, and I've added a simple lint to block using urfave directly to make CLIs (to prevent accidentally using it in the future).

3 is an attempt at "what if our error messages were similar but less bad".
And it also serves as an example of what you can do with enough force with wrapped errors.

---

Ultimately, this gives us (the ability to) have errors print like this:
```
Error: the most-recent cause of the problem
Error details:
  wrapped error
  details if they exist
  Error: another commoncli.Problem from further down the stack
  Error details:
    and other wrapped errors
    that it may contain
```
Rather than like this:
```
Error: the most-recent cause of the problem
Error details: wrapped error: details if they exist: another commoncli.Problem from further down the stack: and other wrapped errors: that it may contain
```
Because it looks "in the same spirit" as the printing before, but is hopefully more readable.
And hopefully it also helps show enough error-spelunking details that someone can figure out much fancier and more structured errors later, if desired.

This also ends up changing the main server binary to use this same fancy error printing, where before it did not:
```
❯ ./cadence-server start
2024/10/02 22:26:06 Loading config; env=development,zone=,configDir=./config
2024/10/02 22:26:06 Loading configFiles=[./config/base.yaml ./config/development.yaml]
2024/10/02 22:26:06 [WARN] dcRedirectionPolicy config is deprecated. Please replace it with clusterRedirectionPolicy.
2024/10/02 22:26:06 gocql: unable to dial control conn 127.0.0.1:9042: dial tcp 127.0.0.1:9042: connect: connection refused
Error: cassandra schema version compatibility check failed
Error details:
  creating CQL client
  gocql: unable to create session: control: unable to connect to initial hosts: dial tcp 127.0.0.1:9042: connect: connection refused
```
vs before:
```
❯ ./cadence-server start
2024/10/02 22:27:58 Loading config; env=development,zone=,configDir=./config
2024/10/02 22:27:58 Loading configFiles=[./config/base.yaml ./config/development.yaml]
2024/10/02 22:27:58 [WARN] dcRedirectionPolicy config is deprecated. Please replace it with clusterRedirectionPolicy.
2024/10/02 22:27:58 gocql: unable to dial control conn 127.0.0.1:9042: dial tcp 127.0.0.1:9042: connect: connection refused
cassandra schema version compatibility check failed: creating CQL client: gocql: unable to create session: control: unable to connect to initial hosts: dial tcp 127.0.0.1:9042: connect: connection refused
```
which seems like probably a positive change.  It is colorized, but fatih is smart enough to remove it when piped, and removing color from this binary all the time is fairly easy if we want.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants