Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for Pebble check events #151

Merged
merged 13 commits into from
Jul 24, 2024

Conversation

tonyandrewmeyer
Copy link
Collaborator

@tonyandrewmeyer tonyandrewmeyer commented Jul 10, 2024

Adds support for the {container-name}-pebble-check-failed and {container-name}-pebble-check-recovered events that will be available in Juju 3.6.

Example usage:

ctx = Context(MyCharm)
check_info = CheckInfo("http-check", failures=7, status=ops.pebble.CheckStatus.DOWN)
container = Container("my-container", check_infos={check_info})
state = State(containers={container})

ctx.run(ctx.on.pebble_check_failed(container=container, info=check_info), state)

Also updates the support for Pebble custom notice events to align with the 7.x approach (event from ctx.on, notices are a frozenset).

Updated usage:

ctx = Context(MyCharm)
notices = [
    Notice(key="example.com/a", occurrences=10),
    Notice(key="example.com/b", last_data={"bar": "baz"}),
    Notice(key="example.com/c"),
]
container = Container("my-container", notices=notices)
state = State(containers={container})

ctx.run(ctx.on.pebble_custom_notice(container=container, notice=notices[-1]), state)

Comment on lines +777 to +785
status: pebble.CheckStatus = pebble.CheckStatus.UP
"""Status of the check.

CheckStatus.UP means the check is healthy (the number of failures is less
than the threshold), CheckStatus.DOWN means the check is unhealthy
(the number of failures has reached the threshold).
"""

failures: int = 0
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the defaults should be reversed here, to CheckStatus.DOWN and e.g. failures=3. The normal situation should be that the check is passing, but if it's passing then I doubt people would be creating the event and putting it into the container (if that was done automatically with a tool, then it would not require the defaults).

I think observing and therefore testing pebble-check-failed is probably more likely than pebble-check-recovered, so that would lean me towards making a failing check the one that requires the least work. Or maybe charms will always observe both, in a kind of do/undo pattern?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you have (UP, 0) is better, as it matches the Pebble "defaults". So it seems more explicit and less surprising to me than having it (DOWN, 3) as a default, even if it's a bit more work for tests that use it.

jujubot added a commit to juju/juju that referenced this pull request Jul 16, 2024
#17655

This creates new `<container-name>-pebble-check-failed` and `<container-name>-pebble-check-failed` hooks, which are triggered by Pebble change-update notices of particular kinds & statuses, with the effect that we'll now respond to Pebble checks reaching the failure threshold, or passing after having reached the threshold, by sending a pebble-check-failed or pebble-check-recovered event to the charm.

## QA steps

### Test the hooks directly:
```shell
# Pack and deploy the test charm
$ juju add-model t
$ cd testcharms/charms/pebble-checks
$ charmcraft pack
$ juju deploy ./juju-qa-pebble-checks_ubuntu-22.04-amd64.charm --resource ubuntu-image=public.ecr.aws/ubuntu/ubuntu:22.04

# The check is set to succeed if `/trigger/` exists, so make it.
$ juju ssh --container ubuntu juju-qa-pebble-checks/0 mkdir /trigger/

$ juju status # unit status should settle to "active" after pebble-ready

# Make the check fail
$ juju ssh --container ubuntu juju-qa-pebble-checks/0 rmdir /trigger/
$ juju status # will now say status "maintenance" with message "check failed: exec-check"

# Make the check pass again
$ juju ssh --container ubuntu juju-qa-pebble-checks/0 mkdir /trigger/
$ juju status # will now say status "active" with message "check recovered: exec-check"
```

### Test the shell test:
`./main.sh -p k8s -c minikube sidecar test_pebble_checks`

## Documentation changes

 prdesc [CHARMTECH-160](https://warthogs.atlassian.net/browse/CHARMTECH-160), [CHARMTECH-161](https://warthogs.atlassian.net/browse/CHARMTECH-161), [CHARMTECH-162](https://warthogs.atlassian.net/browse/CHARMTECH-162) cover updating the juju.is (main and SDK) docs

## Links

<!-- Link to all relevant specification, documentation, bug, issue or JIRA card. -->

* [OP046](https://docs.google.com/document/d/13ItH8l5ZSmmv9WqZXpqjV8EifZU5e3zxINiNLubnC68/edit)
* [juju/charm PR](juju/charm#432)
* [ops PR](canonical/operator#1281)
* [pebble PR](canonical/pebble#444)
* [jhack PR](canonical/jhack#172)
* [Scenario PR](canonical/ops-scenario#151)

**Jira card:** [CHARMTECH-109](https://warthogs.atlassian.net/browse/CHARMTECH-109)

[CHARMTECH-160]: https://warthogs.atlassian.net/browse/CHARMTECH-160?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
@tonyandrewmeyer tonyandrewmeyer marked this pull request as ready for review July 17, 2024 00:14
@tonyandrewmeyer
Copy link
Collaborator Author

The merge broke the linting check because I forgot to reorder them. I'll fix that later. The test pass if you use ops main - I assume we would not merge this until that's become 2.15 anyway, so will trigger the tests before that.

@tonyandrewmeyer tonyandrewmeyer requested a review from benhoyt July 17, 2024 05:33
scenario/state.py Outdated Show resolved Hide resolved
Comment on lines +777 to +785
status: pebble.CheckStatus = pebble.CheckStatus.UP
"""Status of the check.

CheckStatus.UP means the check is healthy (the number of failures is less
than the threshold), CheckStatus.DOWN means the check is unhealthy
(the number of failures has reached the threshold).
"""

failures: int = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you have (UP, 0) is better, as it matches the Pebble "defaults". So it seems more explicit and less surprising to me than having it (DOWN, 3) as a default, even if it's a bit more work for tests that use it.

tests/test_e2e/test_pebble.py Outdated Show resolved Hide resolved
@tonyandrewmeyer tonyandrewmeyer requested a review from benhoyt July 18, 2024 09:57
scenario/context.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@benhoyt benhoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good -- thanks. Just one minor comment about the CheckInfo argument name.

scenario/consistency_checker.py Outdated Show resolved Hide resolved
scenario/consistency_checker.py Outdated Show resolved Hide resolved
@tonyandrewmeyer tonyandrewmeyer merged commit 7f8e6f2 into canonical:7.0 Jul 24, 2024
2 checks passed
@tonyandrewmeyer tonyandrewmeyer deleted the add-pebble-check-events branch July 24, 2024 05:29
tonyandrewmeyer added a commit that referenced this pull request Sep 2, 2024
* Add support for Pebble checks.

Also update the support for Pebble notices to be aligned with the 7.x approach.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants