[Ingest-Management]: "Enable elastic security agent" page instead of host appears under "Administrator>Host" tab, when user first forcefully un-enroll the agent and then re-enrolled the agent from Fleet tab. #73272

ghost · 2020-07-27T13:38:33Z

Kibana version:
Kibana: 7.9 BC4

Elasticsearch version:
Elasticsearch: 7.9 BC4

Agent version:
Agent: 7.9 BC4

Browser version:
Windows 10, Chrome

Original install method (e.g. download page, yum, from source, etc.):
From 7.9 BC4

Description
[Ingest-Management]: "Enable elastic security agent" page instead of host appears under "Administrator>Host" tab, when user first forcefully un-enroll the agent and then re-enrolled the agent from Fleet tab.

Preconditions

Kibana 7.9 BC4 cloud environment should be available.
Agent should be enrolled under Fleet tab from endpoint security app [Note that default config now integrated with endpoint security]

Steps to Reproduce

Open the Kibana 7.9 BC4 cloud environment in browser, then click Ingest Manager>Fleet tab.
Click "Action>Unenroll" option next to enrolled agent and then Force unenrolled the same.
Notice that agent moves to Inactive state.
Now re-enrolled the same agent with default config by running enrollment string having token from Fleet section.
Observe that agent enrolled successfully under Fleet tab.
Navigate to "Endpoint security>Administrator>Host" tab.
Observe that "Enable elastic security agent" page instead of host appears under "Administrator>Host" tab

Test data
N/A

Impacted Test case id
N/A

Actual Result
"Enable elastic security agent" page instead of host appears under "Administrator>Host" tab, when user first forcefully un-enroll the agent and then re-enrolled the agent from Fleet tab.

Expected Result
Host with Online status should appear under "Administrator>Host" tab, when user first forcefully un-enroll the agent and then re-enrolled the agent from Fleet tab.

What's working
N/A

What's not working
N/A

Screenshot

Logs
N/A

ghost · 2020-07-27T13:43:39Z

Please review the defect @rahulgupta-qasource

elasticmachine · 2020-07-27T13:52:47Z

Pinging @elastic/ingest-management (Team:Ingest Management)

ghost · 2020-07-27T13:53:20Z

Reviewed and assigned to @EricDavisX

EricDavisX · 2020-08-05T17:53:23Z

I'm sorry this sat idle for so many days - can you re-test on BC 6 (not BC 7) please? Specific fixes for unenrolling were in BC5 and 6 that I hope help this. If it still is evidenced, please provide the browser dev console output to see what calls are made and if any had errors or strange responses in some form.

EricDavisX · 2020-08-10T21:09:33Z

@rahulgupta-qasource can you take the re-test on this if you have time?

EricDavisX · 2020-08-10T21:13:09Z

please re-assign me back when action is back on my side. I'm also removing the impact:high label, this feels more moderate to me, if it still amounts to Endpoint being installed.

Also reviewing more.... I think maybe @kevinlog should review the screenshots, I think this might be on the Security App work-flow side. Can you poke in please?

kevinlog · 2020-08-10T21:33:54Z

@rahulgupta-qasource this will happen when the Endpoint hasn't sent any documents to ES most likely. Can you verify that the Endpoint is successfully stood up and communicating with ES in this scenario? I'll run through the scenario myself as well to see.

FYI @EricDavisX

kevinlog · 2020-08-10T22:03:08Z

@EricDavisX @rahulgupta-qasource here's my test:

BC8 Stack and Agent/Endpoint

Initial Enroll, Agent and Endpoint stand up as expected

Agent

Agent logs

Endpoint

Unenroll and then forcefully unenroll:

Agent inactive

Endpoint gone (shows onboarding screen - potentially confusing, but expected for now):

When trying to re-enroll, I noticed I was unable to because the config yml is still in use:

I think this is because the Endpoint is still running (I assume because the forced-uenroll didn't fully send all correct messages)

What has your experience been when re-enrolling after a forced un-enroll? Have you seen this case?

EricDavisX · 2020-08-10T22:33:20Z

I think it may genuinely just take a full minute or more for Endpoint to finish un-installing and deleting files - so your results may be expected. If we had any time frame estimates to cite between when you did the unenroll and then the re-enroll attempt it could help? But this is good evidence I think its working.

kevinlog · 2020-08-10T22:36:09Z

@EricDavisX thanks for the insight.

I went ahead and manually uninstalled the Endpoint, re-enrolled the Agent and everything is back up and running again.

kevinlog · 2020-08-10T22:53:33Z

@EricDavisX @rahulgupta-qasource after running through unenroll + force unenroll again, I'm seeing that the Endpoint is not being stopped after 15 or so minutes. I'm not sure what's the expected behavior here.

Note that I just ran the Agent from the cmd, I didn't install the service on Windows. I'm not sure if that makes a difference in force unenroll

FYI @ph @ruflin @blakerouse

ruflin · 2020-08-11T10:27:49Z

@kevinlog Do you have by chance any log files from Agent / Endpoint to see what is happening there?

kevinlog · 2020-08-11T12:12:05Z

@ruflin here are the Endpoint logs.
endpoint-000000.log

Here are the Agent logs (I zipped the entire folder)
logs.zip

I'm not seeing anywhere in the Endpoint logs of receiving a "stop", etc. Although, I'm not quite sure what that would look like. FYI @ferullo

kevinlog · 2020-08-11T12:15:07Z

@EricDavisX @ruflin sorry to spam you - but as I was collecting the logs above, the Endpoint did finally stop running, but it took about 30 min after I force unenrolled the Agent. So it seems like it is working, it just takes significantly longer than when you unenroll normally.

EricDavisX · 2020-08-11T14:07:04Z

Endpoint stopping after 30 mins seems like an Endpoint side feature, that it hadn't heard from Agent in 30 mins so it shut itself down. With the logs, we can hopefully track what Agent did and didn't send prior to that we might know where there may be an Agent/Endpoint integration bug

ferullo · 2020-08-11T16:16:52Z

Endpoint stopping after 30 mins seems like an Endpoint side feature, that it hadn't heard from Agent in 30 mins so it shut itself down

Endpoint does not have this feature.

@gogochan can you help with any Endpoint coordination needed for this.

ruflin · 2020-08-12T07:10:44Z

@michalpristas @blakerouse Would be great to get your eyes on this when you are back (both are out at the moment).
@gogochan Let us know what you find.

gogochan · 2020-08-12T17:12:41Z

Seems like Endpoint is not able to populate document on Elasticsearch as @kevinlog described. I see 401 in the Endpoint log

{"@timestamp":"2020-08-12T16:00:14.578506194Z","agent":{"id":"01375897-7434-42ed-b071-edde0d00199b","type":"endpoint"},"ecs":{"version":"1.5.0"},"log":{"level":"error","origin":{"file":{"line":243,"name":"Client.cpp"}}},"message":"Client.cpp:243 HTTP Status Code (401): {\"error\":{\"header\":{\"WWW-Authenticate\":[\"Bearer realm=\\\"security\\\"\",\"ApiKey\",\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\"]},\"reason\":\"missing authentication credentials for REST request [/_cluster/health]\",\"root_cause\":[{\"header\":{\"WWW-Authenticate\":[\"Bearer realm=\\\"security\\\"\",\"ApiKey\",\"Basic realm=\\\"security\\\" charset=\\\"UTF-8\\\"\"]},\"reason\":\"missing authentication credentials for REST request [/_cluster/health]\",\"type\":\"security_exception\"}],\"type\":\"security_exception\"},\"status\":401}","process":{"pid":95533,"thread":{"id":95538}}}

When a user clicks on unenroll and then does force unenroll. The Agent remains running along with ElasticEndpoint, and Beats.

If a user comes back to the machine and re-enrolls the Agent, I suppose this process terminates the Agent from the previous enrollment, but it leaves Elastic Endpoint untouched.

I think this is where we have a potential problem. The token Endpoint received from the previous Agent is no longer valid, it needs to be reloaded.

gogochan · 2020-08-12T19:58:34Z

Further investigation shows that upon force unenroll, API token becomes invalid and Endpoint cannot create the necessary index on the Elasticsearch.

It was observed that Elastic Agent didn't send the new API token to Elastic Endpoint even after re-enroll, leaving Elastic Endpoint with old invalid API token.

A work around is to trigger rev number change by modifying the configuration from the Fleet.

EricDavisX · 2020-08-14T20:41:38Z

I'm so pleased the team persisted and we found the bug to fix! Excellent work folks.

@kamalpreetpahwa-qasource @rahulgupta-qasource I think we should add some new content to the regression suite, I think there is actually a much larger matrix of state changes to cover than I realized.

I'd like to review with @gogochan @kevinlog and @blakerouse to see what we have automated and what we need to cover better manually until we have more automation around this. The 'timing' of when the user unenrolls and then possibly 'too quickly' clicks the force-unenroll is challenging as we don't have much insight into it. I'd like to get some help drawing out a nicer state diagram to track what test cases there are, to start. something like the below (but better):

test content that covers a few scenarios, all starting from known working happy Endpoint/Agent state, as:

setting up agent/endpoint and re-starting agent (without reenrolling) and validate
setting up agent/endpoint, unenrolling Agent and re-enrolling agent with the same config and starting and validate
setting up agent/endpoint, unenrolling Agent and and re-enrolling agent with a new config in same folder and starting.
setting up agent/endpoint, unenrolling Agent and and re-enrolling agent with a new config in a new folder and starting.
repeating 2-4 with a 'force unenroll' directly after the (standard) unenroll call
repeating 1-5 for Windows, and Linux and macOS

do we think that this is warranted to be tested on the major OS types or is the logic at all abstracted from that and its overkill? If we can prove it with code knowledge we can save lots of time in the future testing and automating around it.

ghost · 2020-08-17T13:48:26Z

Hi @EricDavisX

Thank you for sharing the feedback.

We have validated this ticket and above mentioned scenarios on Windows 10, Linux 'CentOS 7' VM and Mac Mojave 10.14.1 on Kibana BC9 cloud environment and found it fixed.

Executed below steps to validate the ticket:

Navigate to Security app and Enroll Fleet Agent with 'Default config'. Note that 'Elastic Endpoint Security' integration is now added under 'Default config'.
Navigate to Ingest Manager->Fleet tab and Unenroll the agent.
Now, Force-unenroll the agent and observe that agent moves to inactive state.
Now re-enroll the same agent with 'Default config' by running enrollment string from Fleet section.
After agent gets re-enrolled successfully under Fleet tab, wait for some time(say 15-20 minutes) to let Endpoint send documents to ES (as per @kevinlog comment #73272 comment)
Navigate to "Security->Administration" tab.

Observation:
Observed that Host with Online status is displayed on navigating to "Security->Administration" tab after unenroll, force unenroll and re-enrolling the agent with Elastic Endpoint Security integration.

Screenshot:

Moreover, we have created 21 testcases for above mentioned scenarios(07 each for Windows, Linux and Mac) and passed them under Agent status on Unenroll, Force unenroll , Re-enroll and restarting TestRun.

Hence, we are closing this bug

ghost · 2020-08-19T12:36:03Z

Bug Conversion:

21 Testcases(07 each for Windows, Linux and macOS) already exists for this ticket under following sections:
Windows: https://elastic.testrail.io/index.php?/suites/view/27&group_by=cases:section_id&group_id=4700&group_order=asc
Linux: https://elastic.testrail.io/index.php?/suites/view/27&group_by=cases:section_id&group_id=4701&group_order=asc
macOS: https://elastic.testrail.io/index.php?/suites/view/27&group_by=cases:section_id&group_id=4702&group_order=asc

ghost self-assigned this Jul 27, 2020

ghost added bug Fixes for quality problems that affect the customer experience impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. Team:Fleet Team label for Observability Data Collection Fleet team labels Jul 27, 2020

ghost added the Feature:Fleet Fleet team's agent central management project label Jul 27, 2020

ghost added the impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. label Jul 27, 2020

ghost assigned EricDavisX Jul 27, 2020

ghost removed the impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. label Jul 27, 2020

ph assigned ph and unassigned ph Jul 27, 2020

ghost mentioned this issue Jul 28, 2020

[Ingest Manager] Allow to force unenroll from the UI #72386

Merged

EricDavisX removed their assignment Aug 10, 2020

EricDavisX removed the impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. label Aug 10, 2020

ghost removed their assignment Aug 14, 2020

ghost closed this as completed Aug 17, 2020

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Ingest-Management]: "Enable elastic security agent" page instead of host appears under "Administrator>Host" tab, when user first forcefully un-enroll the agent and then re-enrolled the agent from Fleet tab. #73272

[Ingest-Management]: "Enable elastic security agent" page instead of host appears under "Administrator>Host" tab, when user first forcefully un-enroll the agent and then re-enrolled the agent from Fleet tab. #73272

ghost commented Jul 27, 2020 •

edited by ghost

Loading

ghost commented Jul 27, 2020

elasticmachine commented Jul 27, 2020

ghost commented Jul 27, 2020

EricDavisX commented Aug 5, 2020

EricDavisX commented Aug 10, 2020

EricDavisX commented Aug 10, 2020

kevinlog commented Aug 10, 2020

kevinlog commented Aug 10, 2020 •

edited

Loading

EricDavisX commented Aug 10, 2020

kevinlog commented Aug 10, 2020 •

edited

Loading

kevinlog commented Aug 10, 2020 •

edited

Loading

ruflin commented Aug 11, 2020

kevinlog commented Aug 11, 2020

kevinlog commented Aug 11, 2020

EricDavisX commented Aug 11, 2020 •

edited

Loading

ferullo commented Aug 11, 2020

ruflin commented Aug 12, 2020

gogochan commented Aug 12, 2020 •

edited

Loading

gogochan commented Aug 12, 2020 •

edited

Loading

EricDavisX commented Aug 14, 2020

ghost commented Aug 17, 2020 •

edited by ghost

Loading

ghost commented Aug 19, 2020 •

edited by ghost

Loading

[Ingest-Management]: "Enable elastic security agent" page instead of host appears under "Administrator>Host" tab, when user first forcefully un-enroll the agent and then re-enrolled the agent from Fleet tab. #73272

[Ingest-Management]: "Enable elastic security agent" page instead of host appears under "Administrator>Host" tab, when user first forcefully un-enroll the agent and then re-enrolled the agent from Fleet tab. #73272

Comments

ghost commented Jul 27, 2020 • edited by ghost Loading

ghost commented Jul 27, 2020

elasticmachine commented Jul 27, 2020

ghost commented Jul 27, 2020

EricDavisX commented Aug 5, 2020

EricDavisX commented Aug 10, 2020

EricDavisX commented Aug 10, 2020

kevinlog commented Aug 10, 2020

kevinlog commented Aug 10, 2020 • edited Loading

EricDavisX commented Aug 10, 2020

kevinlog commented Aug 10, 2020 • edited Loading

kevinlog commented Aug 10, 2020 • edited Loading

ruflin commented Aug 11, 2020

kevinlog commented Aug 11, 2020

kevinlog commented Aug 11, 2020

EricDavisX commented Aug 11, 2020 • edited Loading

ferullo commented Aug 11, 2020

ruflin commented Aug 12, 2020

gogochan commented Aug 12, 2020 • edited Loading

gogochan commented Aug 12, 2020 • edited Loading

EricDavisX commented Aug 14, 2020

ghost commented Aug 17, 2020 • edited by ghost Loading

ghost commented Aug 19, 2020 • edited by ghost Loading

ghost commented Jul 27, 2020 •

edited by ghost

Loading

kevinlog commented Aug 10, 2020 •

edited

Loading

kevinlog commented Aug 10, 2020 •

edited

Loading

kevinlog commented Aug 10, 2020 •

edited

Loading

EricDavisX commented Aug 11, 2020 •

edited

Loading

gogochan commented Aug 12, 2020 •

edited

Loading

gogochan commented Aug 12, 2020 •

edited

Loading

ghost commented Aug 17, 2020 •

edited by ghost

Loading

ghost commented Aug 19, 2020 •

edited by ghost

Loading