[BUG] Fleet not coming back after reboot #23

peasead · 2023-04-03T16:40:19Z

Describe the bug
When rebooting the VM that the Fleet container is running on, the Fleet server isn't coming back online. It is unclear what the actual problem is right now.

./elastic-container.sh status
NAME                IMAGE                                                 COMMAND                  SERVICE             CREATED             STATUS                    PORTS
ecp-elasticsearch   docker.elastic.co/elasticsearch/elasticsearch:8.7.0   "/bin/tini -- /usr/l…"   elasticsearch       3 days ago          Up 17 minutes (healthy)   0.0.0.0:9200->9200/tcp, 9300/tcp
ecp-fleet-server    docker.elastic.co/beats/elastic-agent:8.7.0           "/usr/bin/tini -- /u…"   fleet-server        3 days ago          Up 17 minutes             0.0.0.0:8220->8220/tcp
ecp-kibana          docker.elastic.co/kibana/kibana:8.7.0                 "/bin/tini -- /usr/l…"   kibana              3 days ago          Up 17 minutes (healthy)   0.0.0.0:5601->5601/tcp

curl -vvv -k https://localhost:8220
*   Trying 127.0.0.1:8220...
* Connected to localhost (127.0.0.1) port 8220 (#0)
* ALPN: offers h2
* ALPN: offers http/1.1
* [CONN-0-0][CF-SSL] (304) (OUT), TLS handshake, Client hello (1):
* LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to localhost:8220
* Closing connection 0
curl: (35) LibreSSL SSL_connect: SSL_ERROR_SYSCALL in connection to localhost:8220

To Reproduce
Steps to reproduce the behavior:

Deploy the ECP with ./elastic-container.sh start
Once completed, reboot the VM that the containers are running on
Restart the ECP containers with ./elastic-container restart
Fleet is unavailable in Kibana and the error above is present

Expected behavior
Fleet should be operational after a reboot.

Screenshots

Desktop (please complete the following information):

OS: [e.g. iOS] macOS Ventura 13.3, also reported on Ubuntu
Browser [e.g. chrome, safari]: Chrome, but error is also evident with cURL, so not a browser issue
Version [e.g. 22]: 8.7.0

Additional context
Going to try an downgrade to 8.6 and then 8.5 to see if this is an 8.7 thing or something else.

The text was updated successfully, but these errors were encountered:

peasead · 2023-04-03T18:26:03Z

I think the issue may be if you reboot the host VM without stopping the containers, something is happening with TLS.

Test:

Installed using version 8.6.2 (likely the same result for 8.7.0 based on reporting)
Ran ./elastic-container.sh start, verified everything worked
Ran ./elastic-container.sh stop, rebooted the host
When the host came back up, ran ./elastic-container.sh restart, verified everything worked
Rebooted the host without running ./elastic-container.sh stop first
Fleet server has the TLS errors

peasead · 2023-04-04T15:19:42Z

As a workaround, you can run ./elastic-container.sh stop before rebooting the host that is running the stack.

We're working on a better solution, but this workaround should help while we work this out.

rasta-mouse · 2023-04-05T08:42:27Z

I rolled back to d85df01 for 8.6.0 and got the same issue there as well.

ubuntu@elk:~$ docker ps
CONTAINER ID   IMAGE                                                 COMMAND                  CREATED        STATUS                   PORTS                                                 NAMES
42827144627a   docker.elastic.co/beats/elastic-agent:8.6.0           "/usr/bin/tini -- /u…"   17 hours ago   Up 7 minutes             0.0.0.0:8220->8220/tcp, :::8220->8220/tcp             ecp-fleet-server
2e6a2519746a   docker.elastic.co/kibana/kibana:8.6.0                 "/bin/tini -- /usr/l…"   17 hours ago   Up 6 minutes (healthy)   0.0.0.0:5601->5601/tcp, :::5601->5601/tcp             ecp-kibana
aadcb438ea15   docker.elastic.co/elasticsearch/elasticsearch:8.6.0   "/bin/tini -- /usr/l…"   17 hours ago   Up 7 minutes (healthy)   0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 9300/tcp   ecp-elasticsearch

ubuntu@elk:~$ curl -vvv -k https://localhost:8220
*   Trying 127.0.0.1:8220...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8220 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to localhost:8220 
* Closing connection 0
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to localhost:8220

rasta-mouse · 2023-04-05T09:37:44Z

Commit 7f61c3a for 8.5.0 seems to work fine.

peasead · 2023-04-05T11:15:54Z

Thanks for testing the other versions and raising this issue, Rasta.

Our suspicion is that there is a state file somewhere that isn't being gracefully managed when you reboot the host without stopping the stack first.

We're unsure if this is an ECP issue or an issue with the Fleet container.

Next steps will be to spin up an Elastic Stack using the default config from Elastic, test the scenarios, and either make an adjustment to how the state file is managed (if that's the case) or file a bug with the Fleet team.

We'll try to do these tests this weekend. Until then, while not ideal, if you run the stop command before rebooting, that seems to prevent this issue.

peasead · 2023-06-12T23:30:58Z

This is possibly an upstream issue. It has been recreated and being tracked.

elastic/fleet-server#2431

peasead · 2023-07-07T12:57:11Z

Looks like this will be fixed in 8.9.

elastic/fleet-server#2431 (comment)

rasta-mouse · 2023-07-10T13:10:18Z

Awesome. I'm happy to close this issue if you are?

peasead · 2023-07-10T14:12:01Z

I'd like to keep it open so I remember to test and bump to 8.9 😅

peasead · 2023-07-26T20:23:01Z

Verified with 8.9.0 in the project that the Fleet server comes back online. Bumping version in main.

peasead added the bug Something isn't working label Apr 3, 2023

peasead self-assigned this Apr 3, 2023

peasead mentioned this issue Jul 26, 2023

Update to 8.9.0 #30

Merged

peasead closed this as completed in #30 Jul 26, 2023

fish-not-phish mentioned this issue May 30, 2024

[BUG] #43

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Fleet not coming back after reboot #23

[BUG] Fleet not coming back after reboot #23

peasead commented Apr 3, 2023 •

edited

Loading

peasead commented Apr 3, 2023 •

edited

Loading

peasead commented Apr 4, 2023

rasta-mouse commented Apr 5, 2023 •

edited

Loading

rasta-mouse commented Apr 5, 2023

peasead commented Apr 5, 2023

peasead commented Jun 12, 2023

peasead commented Jul 7, 2023

rasta-mouse commented Jul 10, 2023

peasead commented Jul 10, 2023

peasead commented Jul 26, 2023

[BUG] Fleet not coming back after reboot #23

[BUG] Fleet not coming back after reboot #23

Comments

peasead commented Apr 3, 2023 • edited Loading

peasead commented Apr 3, 2023 • edited Loading

peasead commented Apr 4, 2023

rasta-mouse commented Apr 5, 2023 • edited Loading

rasta-mouse commented Apr 5, 2023

peasead commented Apr 5, 2023

peasead commented Jun 12, 2023

peasead commented Jul 7, 2023

rasta-mouse commented Jul 10, 2023

peasead commented Jul 10, 2023

peasead commented Jul 26, 2023

peasead commented Apr 3, 2023 •

edited

Loading

peasead commented Apr 3, 2023 •

edited

Loading

rasta-mouse commented Apr 5, 2023 •

edited

Loading