[Fleet] Allow inspecting the raw agent status document to quickly diagnose unhealthy agents #154067

cmacknz · 2023-03-30T15:04:07Z

Today the agent reports the detailed status of each component and input to Fleet. We are working to present this in an easy to understand way, but until that work is completed the only way to know why the agent is unhealthy is to collect and analyze the agent logs or run the elastic-agent status --output=json command on the agent host.

For example when an agent is unhealthy the user is presented with a "1 or more components in a failed state" message with no way to know what is actually wrong from the UI alone.

As a stop gap we should add a link to view the raw agent status reported to the Fleet UI, which will include the detailed state of each component and unit. This will allow advanced users to quickly view what is wrong as a stop gap solution. This is roughly equivalent to the output from the elastic-agent status --output=json command.

Specifically we should allow directly and easily viewing the agent status fields from the .fleet-agents index on the Agent Details page. Here is an example for a healthy agent implementing a log and system/metrics input.

          "last_checkin_message": "Running",
          "last_checkin_status": "online",
          "components": [
            {
              "id": "log-default",
              "units": [
                {
                  "id": "log-default-logfile-system-28261bdf-df61-4404-9db9-9f40cdd8f765",
                  "type": "input",
                  "message": "Starting: spawned pid '12814'",
                  "status": "STARTING"
                },
                {
                  "id": "log-default",
                  "type": "output",
                  "message": "Starting: spawned pid '12814'",
                  "status": "STARTING"
                }
              ],
              "type": "log",
              "message": "Healthy: communicating with pid '12814'",
              "status": "HEALTHY"
            },
            {
              "id": "system/metrics-default",
              "units": [
                {
                  "id": "system/metrics-default-system/metrics-system-28261bdf-df61-4404-9db9-9f40cdd8f765",
                  "type": "input",
                  "message": "Starting: spawned pid '12815'",
                  "status": "STARTING"
                },
                {
                  "id": "system/metrics-default",
                  "type": "output",
                  "message": "Starting: spawned pid '12815'",
                  "status": "STARTING"
                }
              ],
              "type": "system/metrics",
              "message": "Healthy: communicating with pid '12815'",
              "status": "HEALTHY"
            },
          ],

The text was updated successfully, but these errors were encountered:

elasticmachine · 2023-03-30T15:04:12Z

Pinging @elastic/fleet (Team:Fleet)

jen-huang · 2023-03-30T22:01:14Z

This is already possible through the hidden debug UI at /app/fleet/_debug:

Granted, this UI is just a hard pull of the first 10k .fleet-agents and doesn't allow choosing an exact agent, but maybe it is enough for troubleshooting for now?

cmacknz · 2023-03-31T12:42:55Z

My intent is to make this much easier for users to access, so we wouldn't want it to be hidden (I had no idea /app/fleet/agents exited for example) and we would want to be able to easily see it for a specific agent.

The document structure here is almost an exact match for the output of the elastic-agent status --output=json command which users are already forced to rely on, we should just make it easy to see them same thing in the UI.

## Summary Make raw agent status discoverable in Fleet UI, under `Agent details` tab. Closes #154067 ### Screenshots <img width="1918" alt="Screenshot 2023-04-19 at 12 14 48" src="https://user-images.githubusercontent.com/23701614/233059955-7f066ad5-39cd-4685-b76b-41bc31ede4e8.png"> <img width="1918" alt="Screenshot 2023-04-19 at 13 04 06" src="https://user-images.githubusercontent.com/23701614/233059973-3f1b507f-d0bf-48ab-929b-c567f9814377.png"> ### UX checklist - [ ] Action link title (`View agent JSON`) - [ ] Flyout title (`{agentName} agent details`) - [ ] Download button - [ ] Download button label (`Download JSON`) - [ ] Downloaded file name (`{agentName}-agent-details.json`) ### Testing steps 1. Run Kibana in dev on this branch. 2. In Fleet, click on an agent to get to the agent details page. 3. There should be a new `View agent JSON` item in the `Actions` menu. Click it. 4. A new flyout should open with the agent details in JSON format. Clicking outside of the flyout or on the `Close` button should close the flyout. 5. The `Download JSON` button should download the JSON correctly. ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [ ] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios - [ ] Any UI touched in this PR is usable by keyboard only (learn more about [keyboard accessibility](https://webaim.org/techniques/keyboard/)) - [ ] Any UI touched in this PR does not create any new axe failures (run axe in browser: [FF](https://addons.mozilla.org/en-US/firefox/addon/axe-devtools/), [Chrome](https://chrome.google.com/webstore/detail/axe-web-accessibility-tes/lhdoppojpmngadmnindnejefpokejbdd?hl=en-US)) - [ ] This renders correctly on smaller devices using a responsive layout. (You can test this [in your browser](https://www.browserstack.com/guide/responsive-testing-on-local-server)) - [ ] This was checked for [cross-browser compatibility](https://www.elastic.co/support/matrix#matrix_browsers) --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>

amolnater-qasource · 2023-05-04T04:55:27Z

Hi Team,

We have created 03 testcases for this feature under Fleet test suite at links:

Validate View agent Json option is available under the Actions button.
Validate that the Json file keeps getting updated as per update in Agent’s status.
Validate that the user is able to download the Json file while the agent is under any status.

Please let us know if we are missing any scenario to be covered here.

Thanks!

amolnater-qasource · 2023-05-11T11:12:35Z

Hi Team,

We have executed test run for this feature on latest 8.8.0 BC3 kibana cloud environment at link:

Inspecting the raw agent status

Status:
PASS: 03

Build details:
VERSION: 8.8.0
BUILD: 62994
COMMIT: 85b22d3

Hence we are marking this feature as QA:Validated.
Thanks

cmacknz added the Team:Fleet Team label for Observability Data Collection Fleet team label Mar 30, 2023

kpollich assigned jillguyonnet Apr 4, 2023

jillguyonnet mentioned this issue Apr 12, 2023

[Fleet] Add raw status to Agent details UI #154826

Merged

12 tasks

jillguyonnet closed this as completed in #154826 Apr 21, 2023

jen-huang added the QA:Ready for Testing Code is merged and ready for QA to validate label Apr 21, 2023

amolnater-qasource added QA:Validated Issue has been validated by QA and removed QA:Ready for Testing Code is merged and ready for QA to validate labels May 11, 2023

amolnater-qasource mentioned this issue May 12, 2023

[Enhancement][Fleet]: Copy to clipboard button can be added to the View agent JSON flyout. #157473

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fleet] Allow inspecting the raw agent status document to quickly diagnose unhealthy agents #154067

[Fleet] Allow inspecting the raw agent status document to quickly diagnose unhealthy agents #154067

cmacknz commented Mar 30, 2023 •

edited

Loading

elasticmachine commented Mar 30, 2023

jen-huang commented Mar 30, 2023 •

edited

Loading

cmacknz commented Mar 31, 2023

amolnater-qasource commented May 4, 2023

amolnater-qasource commented May 11, 2023

[Fleet] Allow inspecting the raw agent status document to quickly diagnose unhealthy agents #154067

[Fleet] Allow inspecting the raw agent status document to quickly diagnose unhealthy agents #154067

Comments

cmacknz commented Mar 30, 2023 • edited Loading

elasticmachine commented Mar 30, 2023

jen-huang commented Mar 30, 2023 • edited Loading

cmacknz commented Mar 31, 2023

amolnater-qasource commented May 4, 2023

amolnater-qasource commented May 11, 2023

cmacknz commented Mar 30, 2023 •

edited

Loading

jen-huang commented Mar 30, 2023 •

edited

Loading