Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Allow inspecting the raw agent status document to quickly diagnose unhealthy agents #154067

Closed
cmacknz opened this issue Mar 30, 2023 · 5 comments · Fixed by #154826
Closed
Assignees
Labels
QA:Validated Issue has been validated by QA Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@cmacknz
Copy link
Member

cmacknz commented Mar 30, 2023

Today the agent reports the detailed status of each component and input to Fleet. We are working to present this in an easy to understand way, but until that work is completed the only way to know why the agent is unhealthy is to collect and analyze the agent logs or run the elastic-agent status --output=json command on the agent host.

For example when an agent is unhealthy the user is presented with a "1 or more components in a failed state" message with no way to know what is actually wrong from the UI alone.

unhealthy_example

As a stop gap we should add a link to view the raw agent status reported to the Fleet UI, which will include the detailed state of each component and unit. This will allow advanced users to quickly view what is wrong as a stop gap solution. This is roughly equivalent to the output from the elastic-agent status --output=json command.

Specifically we should allow directly and easily viewing the agent status fields from the .fleet-agents index on the Agent Details page. Here is an example for a healthy agent implementing a log and system/metrics input.

          "last_checkin_message": "Running",
          "last_checkin_status": "online",
          "components": [
            {
              "id": "log-default",
              "units": [
                {
                  "id": "log-default-logfile-system-28261bdf-df61-4404-9db9-9f40cdd8f765",
                  "type": "input",
                  "message": "Starting: spawned pid '12814'",
                  "status": "STARTING"
                },
                {
                  "id": "log-default",
                  "type": "output",
                  "message": "Starting: spawned pid '12814'",
                  "status": "STARTING"
                }
              ],
              "type": "log",
              "message": "Healthy: communicating with pid '12814'",
              "status": "HEALTHY"
            },
            {
              "id": "system/metrics-default",
              "units": [
                {
                  "id": "system/metrics-default-system/metrics-system-28261bdf-df61-4404-9db9-9f40cdd8f765",
                  "type": "input",
                  "message": "Starting: spawned pid '12815'",
                  "status": "STARTING"
                },
                {
                  "id": "system/metrics-default",
                  "type": "output",
                  "message": "Starting: spawned pid '12815'",
                  "status": "STARTING"
                }
              ],
              "type": "system/metrics",
              "message": "Healthy: communicating with pid '12815'",
              "status": "HEALTHY"
            },
          ],
@cmacknz cmacknz added the Team:Fleet Team label for Observability Data Collection Fleet team label Mar 30, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@jen-huang
Copy link
Contributor

jen-huang commented Mar 30, 2023

This is already possible through the hidden debug UI at /app/fleet/_debug:
image

Granted, this UI is just a hard pull of the first 10k .fleet-agents and doesn't allow choosing an exact agent, but maybe it is enough for troubleshooting for now?

@cmacknz
Copy link
Member Author

cmacknz commented Mar 31, 2023

My intent is to make this much easier for users to access, so we wouldn't want it to be hidden (I had no idea /app/fleet/agents exited for example) and we would want to be able to easily see it for a specific agent.

The document structure here is almost an exact match for the output of the elastic-agent status --output=json command which users are already forced to rely on, we should just make it easy to see them same thing in the UI.

jillguyonnet added a commit that referenced this issue Apr 21, 2023
## Summary

Make raw agent status discoverable in Fleet UI, under `Agent details`
tab.

Closes #154067

### Screenshots

<img width="1918" alt="Screenshot 2023-04-19 at 12 14 48"
src="https://user-images.githubusercontent.com/23701614/233059955-7f066ad5-39cd-4685-b76b-41bc31ede4e8.png">
<img width="1918" alt="Screenshot 2023-04-19 at 13 04 06"
src="https://user-images.githubusercontent.com/23701614/233059973-3f1b507f-d0bf-48ab-929b-c567f9814377.png">

### UX checklist

- [ ] Action link title (`View agent JSON`)
- [ ] Flyout title (`{agentName} agent details`)
- [ ] Download button
- [ ] Download button label (`Download JSON`)
- [ ] Downloaded file name (`{agentName}-agent-details.json`)

### Testing steps

1. Run Kibana in dev on this branch.
2. In Fleet, click on an agent to get to the agent details page.
3. There should be a new `View agent JSON` item in the `Actions` menu.
Click it.
4. A new flyout should open with the agent details in JSON format.
Clicking outside of the flyout or on the `Close` button should close the
flyout.
5. The `Download JSON` button should download the JSON correctly.

### Checklist

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios
- [ ] Any UI touched in this PR is usable by keyboard only (learn more
about [keyboard accessibility](https://webaim.org/techniques/keyboard/))
- [ ] Any UI touched in this PR does not create any new axe failures
(run axe in browser:
[FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/),
[Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US))
- [ ] This renders correctly on smaller devices using a responsive
layout. (You can test this [in your
browser](https://www.browserstack.com/guide/responsive-testing-on-local-server))
- [ ] This was checked for [cross-browser
compatibility](https://www.elastic.co/support/matrix#matrix_browsers)

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
@jen-huang jen-huang added the QA:Ready for Testing Code is merged and ready for QA to validate label Apr 21, 2023
@amolnater-qasource
Copy link

Hi Team,

We have created 03 testcases for this feature under Fleet test suite at links:

Please let us know if we are missing any scenario to be covered here.

Thanks!

@amolnater-qasource
Copy link

Hi Team,

We have executed test run for this feature on latest 8.8.0 BC3 kibana cloud environment at link:

Status:
PASS: 03

Build details:
VERSION: 8.8.0
BUILD: 62994
COMMIT: 85b22d3

Hence we are marking this feature as QA:Validated.
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
QA:Validated Issue has been validated by QA Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants