Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ingest-Management]: Agent with Online, Offline status also appears under "Show Inactive" filter results. #73237

Closed
ghost opened this issue Jul 27, 2020 · 20 comments
Labels
bug Fixes for quality problems that affect the customer experience Feature:Fleet Fleet team's agent central management project Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@ghost
Copy link

ghost commented Jul 27, 2020

Kibana version:
Kibana: 7.9 BC4

Elasticsearch version:
Elasticsearch: 7.9 BC4

Agent version:
Agent: 7.9 BC4

Browser version:
Windows 10, Chrome

Original install method (e.g. download page, yum, from source, etc.):
From 7.9 BC4

Description
[Ingest-Management]: Agent with Online, Offline status also appears under "Show Inactive" filter results.

Preconditions

  1. Kibana 7.9 BC4 cloud environment should be available.
  2. Agent should be enrolled on environment.

Steps to Reproduce

  1. Open the Kibana 7.9 BC4 cloud environment in browser, then click Ingest Manager>Fleet tab.
  2. Click "Action>Un-enroll" option next to enrolled agent and then un-enrolled the same.
  3. Notice that agent moves to Inactive state.
  4. Re-enrolled the agent and notice that agent now appears under agent list with Online status.
  5. Click on Show Inactive Filter and observe that agent with Online status also appears under the Inactive filter.

Test data
N/A

Impacted Test case id
N/A

Actual Result
[Ingest-Management]: Agent with Online, Offline status also appears under "Show Inactive" filter results.

Expected Result
Agent with Online, Offline status should not appear under "Show Inactive" filter results.

What's working
N/A

What's not working
Agent with Offline status also appears under "Show Inactive" filter results.

Screenshot

ShowInactive Filter

Logs
N/A

@ghost
Copy link
Author

ghost commented Jul 27, 2020

@rahulgupta-qasource : Please review and assigned the issue.

@ghost ghost self-assigned this Jul 27, 2020
@ghost ghost changed the title [Ingest-Management]: Agent with Online status also appears under "Show Inactive" filter results. [Ingest-Management]: Agent with Online, Offline status also appears under "Show Inactive" filter results. Jul 27, 2020
@ghost ghost added bug Fixes for quality problems that affect the customer experience impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. Team:Fleet Team label for Observability Data Collection Fleet team labels Jul 27, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/ingest-management (Team:Ingest Management)

@ghost ghost added the Feature:Fleet Fleet team's agent central management project label Jul 27, 2020
@ghost
Copy link

ghost commented Jul 27, 2020

Reviewed and assigned to @EricDavisX

@ghost ghost assigned EricDavisX Jul 27, 2020
@EricDavisX
Copy link
Contributor

@nchaulet can you comment here please?

Nicolas, I suppose the inactive Agent should have eventually disappeared unless there was some problem? If the Agent was active and successful prior, and the host reachable, is there any known bug / reason from the BC4 build as to why it wouldn't have worked? And if no known bug, what details do you need from QA to triage it?

The Agent logs would help for a start I suppose... anything else? @kamalpreetpahwa-qasource can you post those to start? And tell us what OS version the host was?

@EricDavisX
Copy link
Contributor

I want to call out that we shall track the basic work-flow here, the specific of this concern needs to be resolved as well, that the "Un-enroll option is still available for an agent which is already un-enrolled and displayed under Show Inactive filter." From #73236

@nchaulet
Copy link
Member

nchaulet commented Jul 27, 2020

@EricDavisX Currently the show inactive flag show inactive agent + other agent that match your search

@ph
Copy link
Contributor

ph commented Jul 27, 2020

@EricDavisX I think the behavior is correct, by default we don't show inactive agent that match your query?

@ph
Copy link
Contributor

ph commented Jul 27, 2020

@nchaulet @EricDavisX The unenroll an already unenroll agent is a bug though, I think we can do it in 7.10.

@nchaulet
Copy link
Member

nchaulet commented Jul 27, 2020

Yes the unenroll an already agent is a bug, I can do a PR to remove it from the UI #73348

@EricDavisX
Copy link
Contributor

I understand more context after reading it a few times. I understand this last part now, and I'm fine with 7.10. My main concern was if there was an Agent that didn't successfully un-enroll... that would be a bug of some sort. @rahulgupta-qasource please do re-test that tonight and let us know in a separate ticket if you have any Agent scenario where a valid, running Agent does not successfully un-enroll and disappear after around 2 minutes.

@ghost
Copy link
Author

ghost commented Jul 28, 2020

Hi @EricDavisX

We have validated the "Un-enroll scenario" on 7.9 BC 4 cloud environment on Host OS Windows 10_x64 and observed the following:

Observation:

  1. We observed that agent takes approx minimum 1 minute to get un-enrolled and then moves to "Inactive" status.

Screenshot:
73237

Logs:
full log file:
logs.txt

possibly pertinent log lines with errors:
{"log.level":"error","@timestamp":"2020-07-28T12:36:25.588+0530","log.origin":{"file.name":"application/fleet_gateway.go","file.line":159},"message":"failed to dispatch actions, error: acknowledge 0 actions '[]' for elastic-agent '7d3361c8-a95a-4f36-ad4c-b75d46c5057a' failed: fail to ack to fleet: Post "https://514e239afb6c49ef950602aa11986b0a.us-central1.gcp.foundit.no:443/api/ingest_manager/fleet/agents/7d3361c8-a95a-4f36-ad4c-b75d46c5057a/acks?\": context canceled","ecs.version":"1.5.0"}

  1. When we re-enrolled the agent then it successfully gets enrolled with Online status, however the online agent also displays under Inactive filter results.

Activity Logs:

Timestamp Type Subtype Message  
Jul 28, 2020, 12:38:46 PM State Running Application: metricbeat--7.9.0--36643631373035623733363936343635[13c62794-3db0-432c-8005-477eba1ee0af]: State changed to RUNNING: Running  
Jul 28, 2020, 12:38:44 PM State Running Application: filebeat--7.9.0--36643631373035623733363936343635[13c62794-3db0-432c-8005-477eba1ee0af]: State changed to RUNNING: Running  
Jul 28, 2020, 12:38:44 PM State Running Application: metricbeat--7.9.0[13c62794-3db0-432c-8005-477eba1ee0af]: State changed to DEGRADED: 1 error: 1 error: Error creating runner from config: 1 error: metricset 'system/load' not found  
MessageApplication: metricbeat--7.9.0[13c62794-3db0-432c-8005-477eba1ee0af]: State changed to DEGRADED: 1 error: 1 error: Error creating runner from config: 1 error: metricset 'system/load' not found
Jul 28, 2020, 12:38:42 PM Action result Acknowledged Action '00768ccd-e238-4d73-96f3-d48bf2193907' of type 'CONFIG_CHANGE' acknowledged.  
Jul 28, 2020, 12:38:40 PM State Starting Application: metricbeat--7.9.0--36643631373035623733363936343635[13c62794-3db0-432c-8005-477eba1ee0af]: State changed to STARTING: Starting  
Jul 28, 2020, 12:38:40 PM State Starting Application: filebeat--7.9.0--36643631373035623733363936343635[13c62794-3db0-432c-8005-477eba1ee0af]: State changed to STARTING: Starting  
Jul 28, 2020, 12:38:39 PM State Starting Application: metricbeat--7.9.0[13c62794-3db0-432c-8005-477eba1ee0af]: State changed to STARTING: Starting  

Please let us know if anything else is required on same.

Thanks

@EricDavisX
Copy link
Contributor

@nchaulet it looks to me like with the BC4 build there is a case where the Agent doesn't fully unenroll (after a minute it sits in 'inactive') - what would we need to investigate that more?

To confirm, The subsequent actions are done after an indeterminate state (we can put an item into the readme for users and testers to manually 'force unenroll' if / when Agent gets to that state before carrying on more tests on the same host. @kamalpreetpahwa-qasource FYI - a best practice for testing ^. Once we get to a weird state, we need to heavily consider if the subsequent actions are valid state of the system.

@EricDavisX EricDavisX assigned nchaulet and unassigned EricDavisX Jul 29, 2020
@nchaulet
Copy link
Member

@EricDavisX

it looks to me like with the BC4 build there is a case where the Agent doesn't fully unenroll (after a minute it sits in 'inactive') - what would we need to investigate that more?

Do you have agent logs?

@EricDavisX
Copy link
Contributor

I do not. @kamalpreetpahwa-qasource if you see this reproduced with BC5 can you pass logs in please?

@EricDavisX EricDavisX assigned ghost Jul 31, 2020
@ghost
Copy link
Author

ghost commented Aug 3, 2020

Hi @EricDavisX ,

We have validated this ticket on 7.9 BC 5 cloud environment with agent 7.9. Please find below our observation and attached logs.

Observation:

We observed that still agent moves to "Inactive" state when we simply Un-enroll the agent.

Screenshot:

73237_3rdAug

Agent Activity Logs

  • Under Activity logs, Un-enrolled action is successfully Acknowledged as shown in below logs.
Timestamp Type Subtype Message  
Aug 3, 2020, 1:29:03 PM Action result Acknowledged Action 'c5032868-e6e6-4bcc-a123-276891e7530d' of type 'UNENROLL' acknowledged.  
MessageAction 'c5032868-e6e6-4bcc-a123-276891e7530d' of type 'UNENROLL' acknowledged.
Aug 3, 2020, 1:25:48 PM State Running Application: endpoint-security--7.9.0[a7ec9b97-d7a0-4f10-b52e-3858fc7b1c5f]: State changed to RUNNING:  
Aug 3, 2020, 1:25:40 PM State Running Application: endpoint-security--7.9.0[a7ec9b97-d7a0-4f10-b52e-3858fc7b1c5f]: State changed to RUNNING: Protecting with policy {7e498150-d55e-11ea-8d7f-a79775c3a7bb}  
 
  • elastic-agent-json Logs File generated Under "C:\Users\zeus\Desktop\elastic-agent\data\logs" is given below.

elastic-agent-json.txt

Please let us know if there is any other way to get the logs for this scenario.

Thanks,

@EricDavisX
Copy link
Contributor

I'm getting reports of this from others in the QA team, too. I think it will highly depend on the state of the host and the steps taken on the host with regards to setup when you run the Agent and run the powershell install service script (if you did). Can you list out (sorry if its repeat) the specifics steps you did in this regard?
for ex:

  • unzipped the BC6 zip to a clean vm to folder 'xyz' (only asking for clean so we know the steps)
  • browsed to xyz in admin powershell
  • ran the enroll comment (specifics please) with a '-- staging' param included
  • ran the powershell install service...
  • then anything in between here and when you first clicked unenroll in the UI.

The logs for the Agent that Nicolas is asking for are the ones in \data\logs
like in this screenshot:
Screen Shot 2020-08-03 at 4 54 02 PM

  • you can grab and zip up all of those logs you see and attach

@ghost
Copy link
Author

ghost commented Aug 4, 2020

Hi @EricDavisX

Thanks for sharing the information on logs location and detailed steps.

We have already attached the same log file elastic-agent-json.txt file that displayed under data/logs folder, after clicking Un-enroll button in my previous comment.
Today, we have re-validated the issue with above mentioned steps and attached the log below:

  1. Unzipped the 7.9 BC6 agent and navigate to location where elastic-agent exe is present.
  2. Run the enroll comment with a '-- staging' and noticed that error appears.
  3. Then we ran the enrollment command without '--staging' tag and it works.
  4. Ran the Install command PS1.
  5. Then un-enroll the agent and found the logs at data/log location.

Error with "--Staging" flag.
73237_steps

Please refer the latest Zip folder below that have the logs, which displayed under data/logs folder, after clicking Un-enroll button next to agent and it moves to inactive state.
Logs:
73237_Elastic-Agent-Logs.zip

Screenshot:
73237_4thAug

Please let me know if any other information is required from my end.

Thanks

@EricDavisX EricDavisX unassigned ghost Aug 5, 2020
@EricDavisX EricDavisX unassigned ghost Aug 5, 2020
@EricDavisX EricDavisX removed the impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. label Aug 5, 2020
@EricDavisX
Copy link
Contributor

Hi @kamalpreetpahwa-qasource the --staging command needs a value passed to it, else it is expected to error. This is a good exploratory test, but not one we need to repeat or document. I do not think that the failed enrollment invalidates the rest of the scenario.

Can you post specifics of what version of Windows 10 this is? Service pack and security patch info, etc? Please also cite its name for me to find if needed in the vSphere cluster.

@nchaulet I'm removing Kamal and Rahul, its assigned just on your side now if you can take a look at the logs and find anything out further.

If we need to borrow the vm or kibana environment I think Kamal could likely post us creds in chat so we could access it, and I can help access the vm specifically if it is on the Endgame vSphere side, as I expect it is.

@EricDavisX EricDavisX assigned ghost Aug 25, 2020
@EricDavisX
Copy link
Contributor

@rahulgupta-qasource can you help follow up on what info we were asking?

@ph ph unassigned nchaulet Oct 19, 2020
@jen-huang
Copy link
Contributor

I'm closing this issue due to lack of activity. From the conversation in this ticket, I don't think there is a UI issue: "Show inactive" filter adds inactive agents to the list of agents shown in the existing table state. Re-enrolling a previous unenrolled agent will count as a "new" agent, so it can show up in inactive + active.

If there are further issues found with unenrollment, a new issue can be opened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Fleet Fleet team's agent central management project Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

No branches or pull requests

5 participants