Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2: 8.6.0-SNAPSHOT not starting in ESS #1731

Closed
simitt opened this issue Nov 16, 2022 · 27 comments · Fixed by #1773
Closed

V2: 8.6.0-SNAPSHOT not starting in ESS #1731

simitt opened this issue Nov 16, 2022 · 27 comments · Fixed by #1773
Assignees
Labels
blocker bug Something isn't working Team:Elastic-Agent Label for the Agent team v8.6.0

Comments

@simitt
Copy link
Contributor

simitt commented Nov 16, 2022

Version: 8.6.0-SNAPSHOT (Nov 16th 2022)

Problem: The integrations server does not start and therefore neither Fleet Server nor APM Server are available. The logs indicate that Elastic Agent tries to connect to localhost.

Logs: Failed to connect to backoff(elasticsearch(http://localhost:9200)): Get "http://localhost:9200": dial tcp [::1]:9200: connect: cannot assign requested address

Reproduce: Spin up a 8.6.0-SNAPSHOT deployment in the CFT on ESS

@simitt simitt added bug Something isn't working v8.6.0 labels Nov 16, 2022
@liza-mae
Copy link

Adding slack threads https://elastic.slack.com/archives/C01G6A31JMD/p1668556010267939 and https://elastic.slack.com/archives/C0JFN9HJL/p1668594188294829

This should be considered a blocker for 8.6.0.

@cmacknz cmacknz added Team:Elastic-Agent Label for the Agent team blocker labels Nov 16, 2022
@cmacknz cmacknz changed the title 8.6.0-SNAPSHOT not starting in ESS V2: 8.6.0-SNAPSHOT not starting in ESS Nov 16, 2022
@cmacknz
Copy link
Member

cmacknz commented Nov 16, 2022

The slack thread mentions this is happening in the 8.5 snapshot as well, which does not contain agent V2. Have we confirmed this? If so it will change what we need to investigate.

@cmacknz
Copy link
Member

cmacknz commented Nov 16, 2022

Possibly this was caused by the packaged versions in the ESS release not being in sync, I am still trying to follow all of the context on this to confirm.

@liza-mae
Copy link

@cmacknz I saw the issue as part of 8.6.0 but someone else said they had the issue on 8.5.1 however I did not see that. Another thread follows a release sync fix for 8.6.0, at least from my side, I would have to wait for a new 8.6 snapshot to confirm if the release sync indeed fixes it or if something else is also broken.

@aleksmaus
Copy link
Member

aleksmaus commented Nov 18, 2022

I poked around on that cloud deployment today: https://admin.staging.foundit.no/deployments/54481a66f7994042aca321c07461871f
The logs are not great. Didn't find anything pointing to the fleet server.
There were few messages from metricbeat and filebeat "Failed to connect to backoff(elasticsearch(http://localhost:9200/)", which is not clear if this is the source of the problem.

I pulled down the same docker image that was used by that deployment locally, was able to enroll the agent with the fleet server against 8.6.0-SNAPSHOT cloud deployment.

The agent was healthy:
Screen Shot 2022-11-17 at 6 58 48 PM

All the beats processes seemed to be running ok:
Screen Shot 2022-11-17 at 7 03 28 PM

How is the agent with the fleet-server is started/set up in the cloud?
Is it possible to get access (ssh?) to the instance in the cloud that fails and poke around, maybe grep the actual logs, try to start the agent, see what exactly happens?
Is it possible to spin up the cloud deployment locally to try to reproduce the issue?

@cmacknz
Copy link
Member

cmacknz commented Nov 18, 2022

How is the agent with the fleet-server is started/set up in the cloud?
Is it possible to get access (ssh?) to the instance in the cloud that fails and poke around, maybe grep the actual logs, try to start the agent, see what exactly happens?
Is it possible to spin up the cloud deployment locally to try to reproduce the issue?

@AlexP-Elastic is there someone on the cloud side who can help us debug this? It seems to only be reproducible on cloud but not locally.

@simitt
Copy link
Contributor Author

simitt commented Nov 18, 2022

Already shared internal docs with @aleksmaus about how to ssh into testing environments or spin up an ECE deployment for testing.
I just spinned up a new deployment - Integrations Server is still not starting up, but the above mentioned Elastic Agent error is not logged anymore. There was nothing in the logs that I could make sense of, but leaving that to the Elastic Agent and Fleet Server experts. What was surprising to me was that there were metricbeat and filebeat logs collected and showing up, don't remember seeing this in the past.

@aleksmaus
Copy link
Member

I don't have the ssh access to the cloud env yet, still waiting.

So, I tried to reproduce the issue with ecetest environment (thanks for @simitt for all the help working through caveats of making it running)
https://github.com/elastic/observability-dev/blob/main/environments/ecetest/README.md

Kibana now has all kind of restrictions/validations, can't spin up the fleet server with "http" url, can't modify the exiting fleet policy to add APM configuration for example, etc, making it difficult to adjust the policy settings and check the results.

Few observations so far:

  1. The policy doesn't have APM integration out of the box.

Screen Shot 2022-11-19 at 2 58 22 PM

On previous installs the policy had APM out of the box. Did anything change in that regard?

The fleet server agent is shown as healthy in kibana:
Screen Shot 2022-11-19 at 3 02 12 PM

The components state reporting "Configuring" initially

          "components": [
            {
              "id": "fleet-server-es-containerhost",
              "units": [
                {
                  "id": "fleet-server-es-containerhost-fleet-server-fleet_server-elastic-cloud-fleet-server",
                  "type": "input",
                  "message": "Re-configuring",
                  "status": "configuring"
                },
                {
                  "id": "fleet-server-es-containerhost",
                  "type": "output",
                  "message": "Re-configuring",
                  "status": "configuring"
                }
              ],
              "type": "fleet-server",
              "message": "Healthy: communicating with pid '133'",
              "status": "healthy"
            }
          ],
          "last_checkin_message": "Running",
          "last_checkin_status": "online",
          "last_checkin": "2022-11-19T20:31:57Z"

After few minutes it all is reported as healthy:

          "components": [
            {
              "id": "fleet-server-es-containerhost",
              "units": [
                {
                  "id": "fleet-server-es-containerhost",
                  "type": "output",
                  "message": "Running on policy with Fleet Server integration: policy-elastic-agent-on-cloud",
                  "status": "healthy"
                },
                {
                  "id": "fleet-server-es-containerhost-fleet-server-fleet_server-elastic-cloud-fleet-server",
                  "type": "input",
                  "message": "Running on policy with Fleet Server integration: policy-elastic-agent-on-cloud",
                  "status": "healthy"
                }
              ],
              "type": "fleet-server",
              "message": "Healthy: communicating with pid '133'",
              "status": "healthy"
            }
          ],
          "last_checkin_message": "Running",
          "last_checkin_status": "online",
          "last_checkin": "2022-11-19T20:36:50Z"

But the fleet and APM is shown as unhealthy in the deployment admin web UI and the docker container is getting killed after some period of this.
How the health of fleet and APM is determined in the cloud?

  1. The monitoring is enabled by default, even during the fleet server bootstrap. Is this intentional?
    It looks like this is the initial config the agent uses for the fleet bootstrap
    https://github.com/elastic/elastic-agent/blob/main/internal/pkg/agent/application/fleet_server_bootstrap.go#L34
    With enabled monitoring the default output there is set to "localhost:9200" and this is seems to be the cause of the logs where the filebeat and metricbeat are failing to reach "localhost:9200"

In since all the monitoring settings are missing it uses the default settings that all are set to "true".

  1. The fleet agent is running fleet-server in the docker container after initial startup
UID        PID  PPID  PGID   SID  C STIME TTY          TIME CMD
elastic+     1     0     1     1  0 19:48 ?        00:00:00 /usr/bin/tini -- /app/apm.sh
elastic+     6     1     6     1  0 19:48 ?        00:00:00 /bin/bash /app/apm.sh
elastic+    14     6     6     1  2 19:48 ?        00:00:02 /usr/share/elastic-agent/elastic-agent container
elastic+   132    14     6     1  0 19:48 ?        00:00:00 /usr/share/elastic-agent/data/elastic-agent-28b762/components/fleet-server --agent-mode -E logging.to_stderr=true -E path.data=/app/elastic-agent/data/run/fleet-server-es-containerhost      

Changing logging level is propagated to the agent and causes the agent relaunch with enroll

elastic+     1     0     1     1  0 19:48 ?        00:00:00 /usr/bin/tini -- /app/apm.sh
elastic+     6     1     6     1  0 19:48 ?        00:00:00 /bin/bash /app/apm.sh
elastic+    14     6     6     1  1 19:48 ?        00:00:02 elastic-agent container
elastic+   142     0   142   142  0 19:50 pts/0    00:00:00 /bin/bash
elastic+   162    14     6     1  3 19:51 ?        00:00:00 /usr/share/elastic-agent/data/elastic-agent-28b762/elastic-agent enroll -f -c /app/config/elastic-agent.yml --path.home /app/elastic-agent/data --path.config /app/config --path.logs /app/logs --path.downloads /usr/share/elastic-agent/data/elastic-a
elastic+   173   162     6     1 78 19:51 ?        00:00:01 /usr/share/elastic-agent/data/elastic-agent-28b762/elastic-agent run -e -c /app/config/elastic-agent.yml --path.home /app/elastic-agent/data --path.config /app/config --path.logs /app/logs --path.downloads /usr/share/elastic-agent/data/elastic-agen
elastic+   184   173     6     1  6 19:51 ?        00:00:00 /usr/share/elastic-agent/data/elastic-agent-28b762/components/fleet-server --agent-mode -E logging.to_stderr=true -E path.data=/app/elastic-agent/data/run/fleet-server-default
elastic+   194   173     6     1 47 19:51 ?        00:00:00 /usr/share/elastic-agent/data/elastic-agent-28b762/components/metricbeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E gc_percent=${METRICBEAT_GOGC:100} -E metricbeat.config.modules.
elastic+   205   173     6     1 44 19:51 ?        00:00:00 /usr/share/elastic-agent/data/elastic-agent-28b762/components/metricbeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E gc_percent=${METRICBEAT_GOGC:100} -E metricbeat.config.modules.
elastic+   215   173     6     1  0 19:51 ?        00:00:00 /usr/share/elastic-agent/data/elastic-agent-28b762/components/filebeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E gc_percent=${FILEBEAT_GOGC:100} -E filebeat.config.modules.enable       

Will keep digging, but if someone on the agent team involved in the fleet bootstrap, running with "container" command and monitoring has some ideas, let me know.

@aleksmaus
Copy link
Member

@simitt you mentioned before that the logs location could affect the reported health
They seem to be all over the place, and I'm not familiar what is the expected location of these logs:

elastic-agent@137c44e901a9:~$ ls -la /app/elastic-agent/data/run/fleet-server-es-containerhost/
total 52
drwxrwx--- 2 elastic-agent elastic-agent    88 Nov 20 02:02 .
drwxr-x--- 5 elastic-agent elastic-agent    67 Nov 20 02:02 ..
-rw------- 1 elastic-agent elastic-agent 23102 Nov 20 02:07 fleet-server.log-20221120-1.ndjson
-rw------- 1 elastic-agent elastic-agent 26454 Nov 20 02:02 fleet-server.log-20221120.ndjson
elastic-agent@137c44e901a9:~$ ls -la /app/logs/
total 88
-rw-r--r-- 1 elastic-agent elastic-agent    65 Nov 20 01:39 boot.log
-rw------- 1 elastic-agent elastic-agent 22221 Nov 20 02:02 elastic-agent-20221120-1.ndjson
-rw------- 1 elastic-agent elastic-agent 23573 Nov 20 02:02 elastic-agent-20221120.ndjson
-rw-r--r-- 1 elastic-agent elastic-agent 33714 Nov 20 02:02 elastic-agent-startup.log
elastic-agent@137c44e901a9:~$ ls -la /app/elastic-agent/data/logs/
total 1232
drwxr-xr-x 2 elastic-agent elastic-agent   4096 Nov 20 02:02 .
drwxr-xr-x 5 elastic-agent elastic-agent     75 Nov 20 02:02 ..
-rw------- 1 elastic-agent elastic-agent  18770 Nov 20 01:39 beat-metrics-monitoring-20221120-1.ndjson
-rw------- 1 elastic-agent elastic-agent    482 Nov 20 01:39 beat-metrics-monitoring-20221120-2.ndjson
-rw------- 1 elastic-agent elastic-agent  18772 Nov 20 01:39 beat-metrics-monitoring-20221120-3.ndjson
-rw------- 1 elastic-agent elastic-agent    482 Nov 20 02:02 beat-metrics-monitoring-20221120-4.ndjson
-rw------- 1 elastic-agent elastic-agent  18774 Nov 20 02:02 beat-metrics-monitoring-20221120-5.ndjson
-rw------- 1 elastic-agent elastic-agent    482 Nov 20 02:02 beat-metrics-monitoring-20221120-6.ndjson
-rw------- 1 elastic-agent elastic-agent  18771 Nov 20 02:02 beat-metrics-monitoring-20221120-7.ndjson
-rw------- 1 elastic-agent elastic-agent    482 Nov 20 01:39 beat-metrics-monitoring-20221120.ndjson
-rw------- 1 elastic-agent elastic-agent  23573 Nov 20 02:02 elastic-agent-20221120-1.ndjson
-rw------- 1 elastic-agent elastic-agent  13145 Nov 20 02:02 elastic-agent-20221120-2.ndjson
-rw------- 1 elastic-agent elastic-agent  22221 Nov 20 02:02 elastic-agent-20221120-3.ndjson
-rw------- 1 elastic-agent elastic-agent  13636 Nov 20 01:39 elastic-agent-20221120.ndjson
-rw------- 1 elastic-agent elastic-agent  96798 Nov 20 01:39 filestream-monitoring-20221120-1.ndjson
-rw------- 1 elastic-agent elastic-agent    478 Nov 20 01:39 filestream-monitoring-20221120-2.ndjson
-rw------- 1 elastic-agent elastic-agent 152713 Nov 20 01:39 filestream-monitoring-20221120-3.ndjson
-rw------- 1 elastic-agent elastic-agent    478 Nov 20 02:02 filestream-monitoring-20221120-4.ndjson
-rw------- 1 elastic-agent elastic-agent 319992 Nov 20 02:02 filestream-monitoring-20221120-5.ndjson
-rw------- 1 elastic-agent elastic-agent    478 Nov 20 02:02 filestream-monitoring-20221120-6.ndjson
-rw------- 1 elastic-agent elastic-agent 318148 Nov 20 02:02 filestream-monitoring-20221120-7.ndjson
-rw------- 1 elastic-agent elastic-agent    478 Nov 20 01:39 filestream-monitoring-20221120.ndjson
-rw------- 1 elastic-agent elastic-agent  36428 Nov 20 01:39 http-metrics-monitoring-20221120-1.ndjson
-rw------- 1 elastic-agent elastic-agent    482 Nov 20 01:39 http-metrics-monitoring-20221120-2.ndjson
-rw------- 1 elastic-agent elastic-agent  35451 Nov 20 01:39 http-metrics-monitoring-20221120-3.ndjson
-rw------- 1 elastic-agent elastic-agent    482 Nov 20 02:02 http-metrics-monitoring-20221120-4.ndjson
-rw------- 1 elastic-agent elastic-agent  36073 Nov 20 02:02 http-metrics-monitoring-20221120-5.ndjson
-rw------- 1 elastic-agent elastic-agent    482 Nov 20 02:02 http-metrics-monitoring-20221120-6.ndjson
-rw------- 1 elastic-agent elastic-agent  35471 Nov 20 02:02 http-metrics-monitoring-20221120-7.ndjson
-rw------- 1 elastic-agent elastic-agent    482 Nov 20 01:39 http-metrics-monitoring-20221120.ndjson

@aleksmaus
Copy link
Member

Some more pieces of observation.
Fleet server is healthy in the ecetest envrionment

elastic-agent@bf0fd713226f:~$ curl -v -k https://localhost:8220/api/status
{"name":"fleet-server","status":"healthy"}

The fleet-server has a proper config it seems:

{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","new":{"Fleet":{"Agent":{"ID":"211ab162-8a4c-4906-9b82-5018d14c8f13","Version":"8.6.0","Logging":{"Level":"info"}},"Host":{"ID":"","Name":""}},"Output":{"Elasticsearch":{"Protocol":"http","Hosts":["24eea9edab464538a8271f3dfd2cdf16.containerhost:9244"]

The debug log after restart is healthy:

{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:08.439Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","cfg":{"NumCounters":500000,"MaxCost":52428800,"ActionTTL":300000000000,"EnrollKeyTTL":60000000000,"ArtifactTTL":86400000000000,"APIKeyTTL":900000000000,"APIKeyJitter":300000000000},"@timestamp":"2022-11-20T15:13:08.439Z","message":"Setting cache config options"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","state":"starting","@timestamp":"2022-11-20T15:13:08.442Z","message":"Starting"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:08.442Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","new":{"Fleet":{"Agent":{"ID":"211ab162-8a4c-4906-9b82-5018d14c8f13","Version":"8.6.0","Logging":{"Level":"info"}},"Host":{"ID":"","Name":""}},"Output":{"Elasticsearch":{"Protocol":"http","Hosts":["24eea9edab464538a8271f3dfd2cdf16.containerhost:9244"],"Path":"","Headers":{"X-Elastic-App-Auth":"eyJhbGciOiJSUzI1NiJ9.eyJpc3MiOiIyNGVlYTllZGFiNDY0NTM4YTgyNzFmM2RmZDJjZGYxNiIsInN1YiI6IjhmMjg3ZWRmNmNkNTQxZjZiODU3ZDYwY2FjZjk5Y2E1IiwiYXVkIjoiMjRlZWE5ZWRhYjQ2NDUzOGE4MjcxZjNkZmQyY2RmMTYiLCJpYXQiOjE2Njg3OTg2ODcsImtpbmQiOiJhcG0ifQ.rlcxYIs7vNrF6ZrrYvLsF0PtY7IVBYkqxF4C52ToxX09fHLWE3DlZ3qNilG9Vf40kmF9J9nDMQk5YOoGjzTddPBMRn8paXdZBpI7qubDzLnTa6eVLJZnHe4SkhDkDDhgdFQaN8GdWDiyxv3gzYtlk0EhQY2_gMj4wo5iLZ4ZP7v9MhKRLk7H3Un5QkuanCy1XaUXlYTiNomLWivdQoGx4TFV4KjCFhO9VtvJ1olOqCVNawGCNJA-bDuXBA7DfkNqueiPBhxIOrI0xIVJCIKDAtpuOevbFS5FlpMsbCGV3d3DJ7JafwCIEn1PBXTySxJgIPk7xkAvA2LAGsJKlW2PeQ"},"APIKey":"","ServiceToken":"[redacted]","ProxyURL":"","ProxyDisable":false,"ProxyHeaders":{},"TLS":{"Enabled":null,"VerificationMode":"full","Versions":null,"CipherSuites":null,"CAs":["/app/config/certs/internal_tls_ca.crt"],"Certificate":{"Certificate":"","Key":"","Passphrase":""},"CurveTypes":null,"Renegotiation":"never","CASha256":null,"CATrustedFingerprint":""},"MaxRetries":3,"MaxConnPerHost":128,"Timeout":90000000000,"MaxContentLength":104857600},"Extra":null},"Inputs":[{"Type":"","Policy":{"ID":""},"Server":{"Host":"0.0.0.0","Port":8220,"InternalPort":8221,"TLS":{"Enabled":null,"VerificationMode":"full","Versions":null,"CipherSuites":null,"CAs":null,"Certificate":{"Certificate":"/app/config/certs/node.crt","Key":"[redacted]","Passphrase":""},"CurveTypes":null,"ClientAuth":0,"CASha256":null},"Timeouts":{"Read":60000000000,"Write":600000000000,"Idle":30000000000,"ReadHeader":5000000000,"CheckinTimestamp":30000000000,"CheckinLongPoll":300000000000,"CheckinJitter":30000000000},"Profiler":{"Enabled":false,"Bind":"localhost:6060"},"CompressionLevel":1,"CompressionThresh":1024,"Limits":{"MaxAgents":0,"PolicyThrottle":500000,"MaxHeaderByteSize":8192,"MaxConnections":52428800,"CheckinLimit":{"Interval":1000000,"Burst":1000,"Max":0,"MaxBody":1048576},"ArtifactLimit":{"Interval":5000000,"Burst":25,"Max":50,"MaxBody":0},"EnrollLimit":{"Interval":10000000,"Burst":100,"Max":50,"MaxBody":524288},"AckLimit":{"Interval":10000000,"Burst":100,"Max":50,"MaxBody":2097152},"StatusLimit":{"Interval":5000000,"Burst":25,"Max":50,"MaxBody":0}},"Runtime":{"GCPercent":20},"Bulk":{"FlushInterval":250000000,"FlushThresholdCount":2048,"FlushThresholdSize":1048576,"FlushMaxPending":8},"GC":{"ScheduleInterval":3600000000000,"CleanupAfterExpiredInterval":"30d"},"Instrumentation":{"TLS":{"SkipVerify":false,"ServerCertificate":"","ServerCA":""},"Environment":"","APIKey":"","SecretToken":"","Hosts":null,"Enabled":false,"GlobalLabels":"","TransactionSampleRate":""}},"Cache":{"NumCounters":0,"MaxCost":0,"ActionTTL":0,"EnrollKeyTTL":0,"ArtifactTTL":0,"APIKeyTTL":0,"APIKeyJitter":0},"Monitor":{"FetchSize":0,"PollTimeout":0}}],"Logging":{"Level":"info","ToStderr":false,"ToFiles":true,"Pretty":false,"Files":null},"HTTP":{"Enabled":false,"Host":"localhost","Port":5066,"User":"","SecurityDescriptor":""}},"@timestamp":"2022-11-20T15:13:08.443Z","message":"initial server configuration"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:08.443Z","message":"starting server on configuration change"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","old":20,"new":20,"@timestamp":"2022-11-20T15:13:08.443Z","message":"SetGCPercent"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","cluster.addr":["24eea9edab464538a8271f3dfd2cdf16.containerhost:9244"],"cluster.maxConnsPersHost":128,"@timestamp":"2022-11-20T15:13:08.445Z","message":"init es"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","cluster.addr":["24eea9edab464538a8271f3dfd2cdf16.containerhost:9244"],"cluster.maxConnsPersHost":128,"cluster.name":"24eea9edab464538a8271f3dfd2cdf16","cluster.uuid":"IMVjbLwgTuia7gG7n2nVcQ","cluster.version":"8.6.0-SNAPSHOT","@timestamp":"2022-11-20T15:13:08.45Z","message":"elasticsearch cluster info"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","fleet_version":"8.6.0-SNAPSHOT","@timestamp":"2022-11-20T15:13:08.45Z","message":"check version compatibility with elasticsearch"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:08.451Z","message":"Bulker started"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","opts":{"flushInterval":250,"flushThresholdCnt":2048,"flushThresholdSz":1048576,"maxPending":8,"blockQueueSz":32,"apikeyMaxParallel":120,"apikeyMaxReqSize":104857600},"@timestamp":"2022-11-20T15:13:08.451Z","message":"Run bulker with options"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","elasticsearch_version":"8.6.0","@timestamp":"2022-11-20T15:13:08.452Z","message":"fetched elasticsearch version"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","fleet_version":"8.6.0-SNAPSHOT","elasticsearch_version":"8.6.0","@timestamp":"2022-11-20T15:13:08.452Z","message":"Elasticsearch compatibility check successful"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:08.452Z","message":"Migrations started"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:08.452Z","message":"applying migration to v7.15"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","fleet.migration.name":"AgentMetadata","fleet.migration.es.took":0,"fleet.migration.es.timed_out":false,"fleet.migration.updated":0,"fleet.migration.deleted":0,"fleet.migration.batches":0,"fleet.migration.version_conflicts":0,"fleet.migration.noops":0,"fleet.migration.retries.bulk":0,"fleet.migration.retries.search":0,"fleet.migration.total.duration":2.050908,"fleet.migration.total.count":0,"@timestamp":"2022-11-20T15:13:08.454Z","message":"migration AgentMetadata done"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:08.454Z","message":"applying migration to v8.5.0"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","fleet.migration.name":"AgentOutputs","fleet.migration.es.took":0,"fleet.migration.es.timed_out":false,"fleet.migration.updated":0,"fleet.migration.deleted":0,"fleet.migration.batches":0,"fleet.migration.version_conflicts":0,"fleet.migration.noops":0,"fleet.migration.retries.bulk":0,"fleet.migration.retries.search":0,"fleet.migration.total.duration":1.995305,"fleet.migration.total.count":0,"@timestamp":"2022-11-20T15:13:08.456Z","message":"migration AgentOutputs done"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:08.456Z","message":"Migrations exited"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","cluster.addr":["24eea9edab464538a8271f3dfd2cdf16.containerhost:9244"],"cluster.maxConnsPersHost":128,"@timestamp":"2022-11-20T15:13:08.457Z","message":"init es"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:08.459Z","message":"Elasticsearch GC started"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","cluster.addr":["24eea9edab464538a8271f3dfd2cdf16.containerhost:9244"],"cluster.maxConnsPersHost":128,"cluster.name":"24eea9edab464538a8271f3dfd2cdf16","cluster.uuid":"IMVjbLwgTuia7gG7n2nVcQ","cluster.version":"8.6.0-SNAPSHOT","@timestamp":"2022-11-20T15:13:08.462Z","message":"elasticsearch cluster info"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:08.462Z","message":"Coordinator policy monitor started"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy leader manager","@timestamp":"2022-11-20T15:13:08.463Z","message":"ensuring leadership of policies"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:08.463Z","message":"Http server started"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:08.463Z","message":"Policy index monitor started"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","index":".fleet-policies","ctx":"index monitor","@timestamp":"2022-11-20T15:13:08.463Z","message":"Starting index monitor"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:08.463Z","message":"Policy monitor started"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy agent monitor","throttle":0.5,"@timestamp":"2022-11-20T15:13:08.463Z","message":"run policy monitor"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:08.463Z","message":"Policy self monitor started"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:08.463Z","message":"Revision monitor started"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","index":".fleet-actions","ctx":"index monitor","@timestamp":"2022-11-20T15:13:08.463Z","message":"Starting index monitor"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","bind":"0.0.0.0:8220","rdTimeout":60000,"wrTimeout":600000,"@timestamp":"2022-11-20T15:13:08.463Z","message":"server listening"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:08.463Z","message":"Revision dispatcher started"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","addr":"0.0.0.0:8220","limits":{"MaxAgents":0,"PolicyThrottle":500000,"MaxHeaderByteSize":8192,"MaxConnections":52428800,"CheckinLimit":{"Interval":1000000,"Burst":1000,"Max":0,"MaxBody":1048576},"ArtifactLimit":{"Interval":5000000,"Burst":25,"Max":50,"MaxBody":0},"EnrollLimit":{"Interval":10000000,"Burst":100,"Max":50,"MaxBody":524288},"AckLimit":{"Interval":10000000,"Burst":100,"Max":50,"MaxBody":2097152},"StatusLimit":{"Interval":5000000,"Burst":25,"Max":50,"MaxBody":0}},"@timestamp":"2022-11-20T15:13:08.463Z","message":"fleet-server creating new limiter"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:08.463Z","message":"Bulk checkin started"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","addr":"0.0.0.0:8220","method":"GET","path":"/api/status","@timestamp":"2022-11-20T15:13:08.463Z","message":"fleet-server route added"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","addr":"0.0.0.0:8220","method":"POST","path":"/api/fleet/agents/:id","@timestamp":"2022-11-20T15:13:08.463Z","message":"fleet-server route added"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","addr":"0.0.0.0:8220","method":"POST","path":"/api/fleet/agents/:id/checkin","@timestamp":"2022-11-20T15:13:08.463Z","message":"fleet-server route added"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","addr":"0.0.0.0:8220","method":"POST","path":"/api/fleet/agents/:id/acks","@timestamp":"2022-11-20T15:13:08.463Z","message":"fleet-server route added"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","addr":"0.0.0.0:8220","method":"GET","path":"/api/fleet/artifacts/:id/:sha2","@timestamp":"2022-11-20T15:13:08.463Z","message":"fleet-server route added"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","addr":"0.0.0.0:8220","@timestamp":"2022-11-20T15:13:08.463Z","message":"fleet-server routes set up"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","hardConnLimit":52428800,"@timestamp":"2022-11-20T15:13:08.463Z","message":"server hard connection limiter installed"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:08.464Z","message":"Listening on 0.0.0.0:8220"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","bind":"localhost:8221","rdTimeout":60000,"wrTimeout":600000,"@timestamp":"2022-11-20T15:13:08.464Z","message":"server listening"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","addr":"localhost:8221","limits":{"MaxAgents":0,"PolicyThrottle":500000,"MaxHeaderByteSize":8192,"MaxConnections":52428800,"CheckinLimit":{"Interval":1000000,"Burst":1000,"Max":0,"MaxBody":1048576},"ArtifactLimit":{"Interval":5000000,"Burst":25,"Max":50,"MaxBody":0},"EnrollLimit":{"Interval":10000000,"Burst":100,"Max":50,"MaxBody":524288},"AckLimit":{"Interval":10000000,"Burst":100,"Max":50,"MaxBody":2097152},"StatusLimit":{"Interval":5000000,"Burst":25,"Max":50,"MaxBody":0}},"@timestamp":"2022-11-20T15:13:08.464Z","message":"fleet-server creating new limiter"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","addr":"localhost:8221","method":"GET","path":"/api/status","@timestamp":"2022-11-20T15:13:08.464Z","message":"fleet-server route added"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","addr":"localhost:8221","method":"POST","path":"/api/fleet/agents/:id","@timestamp":"2022-11-20T15:13:08.464Z","message":"fleet-server route added"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","addr":"localhost:8221","method":"POST","path":"/api/fleet/agents/:id/checkin","@timestamp":"2022-11-20T15:13:08.464Z","message":"fleet-server route added"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","addr":"localhost:8221","method":"POST","path":"/api/fleet/agents/:id/acks","@timestamp":"2022-11-20T15:13:08.464Z","message":"fleet-server route added"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","addr":"localhost:8221","method":"GET","path":"/api/fleet/artifacts/:id/:sha2","@timestamp":"2022-11-20T15:13:08.464Z","message":"fleet-server route added"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","addr":"localhost:8221","@timestamp":"2022-11-20T15:13:08.464Z","message":"fleet-server routes set up"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","hardConnLimit":52428800,"@timestamp":"2022-11-20T15:13:08.464Z","message":"server hard connection limiter installed"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:08.465Z","message":"Listening on localhost:8221"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","index":".fleet-actions","ctx":"index monitor","checkpoints":[10],"@timestamp":"2022-11-20T15:13:08.468Z","message":"updated checkpoint"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","index":".fleet-policies","ctx":"index monitor","checkpoints":[11],"@timestamp":"2022-11-20T15:13:08.468Z","message":"updated checkpoint"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","state":"healthy","@timestamp":"2022-11-20T15:13:08.72Z","message":"Running on policy with Fleet Server integration: policy-elastic-agent-on-cloud"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","http.request.id":"01GJASKD833DVNKHK3FYETRF18","url.full":"/api/fleet/agents/211ab162-8a4c-4906-9b82-5018d14c8f13/checkin","http.version":"1.1","http.request.method":"POST","fleet.apikey.id":"wiyZlYQBrKGcNQmgo1so","client.address":"127.0.0.1:49296","tls.established":true,"client.ip":"127.0.0.1","client.port":49296,"tls.version":"1.3","tls.cipher":"TLS_AES_128_GCM_SHA256","tls.resumed":false,"tls.client.server_name":"instance-0000000012","@timestamp":"2022-11-20T15:13:09.383Z","message":"HTTP start"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","id":"wiyZlYQBrKGcNQmgo1so","http.request.id":"01GJASKD833DVNKHK3FYETRF18","event.duration":1401946,"userName":"elastic/fleet-server","roles":[],"enabled":true,"meta":{"_elastic_service_account":true},"fleet.apikey.cache_hit":false,"@timestamp":"2022-11-20T15:13:09.384Z","message":"ApiKey authenticated"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:09.424Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:09.424Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:09.424Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:09.424Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:09.424Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:09.424Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:09.424Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:09.425Z","message":"Server configuration update"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","state":"configuring","@timestamp":"2022-11-20T15:13:09.425Z","message":"Re-configuring"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:09.425Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","fleet.agent.id":"211ab162-8a4c-4906-9b82-5018d14c8f13","http.request.id":"01GJASKD833DVNKHK3FYETRF18","fleet.access.apikey.id":"wiyZlYQBrKGcNQmgo1so","req.Components":[{"id":"fleet-server-es-containerhost","type":"fleet-server","status":"healthy","message":"Healthy: communicating with pid '317'","units":[{"id":"fleet-server-es-containerhost-fleet-server-fleet_server-elastic-cloud-fleet-server","type":"input","status":"healthy","message":"Running on policy with Fleet Server integration: policy-elastic-agent-on-cloud"},{"id":"fleet-server-es-containerhost","type":"output","status":"healthy","message":"Running on policy with Fleet Server integration: policy-elastic-agent-on-cloud"}]}],"@timestamp":"2022-11-20T15:13:09.481Z","message":"applying new components data"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy agent monitor","fleet.agent.id":"211ab162-8a4c-4906-9b82-5018d14c8f13","fleet.policy.id":"policy-elastic-agent-on-cloud","rev":0,"coord":0,"@timestamp":"2022-11-20T15:13:09.481Z","message":"subscribed to policy monitor"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy agent monitor","fleet.policy.id":"policy-elastic-agent-on-cloud","@timestamp":"2022-11-20T15:13:09.481Z","message":"force load on unknown policyId"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","fleet.agent.id":"211ab162-8a4c-4906-9b82-5018d14c8f13","http.request.id":"01GJASKD833DVNKHK3FYETRF18","fleet.access.apikey.id":"wiyZlYQBrKGcNQmgo1so","status":"online","seqNo":"-1","setupDuration":97.926416,"jitter":24235.999806,"pollDuration":275666.073778,"bodyCount":1180,"@timestamp":"2022-11-20T15:13:09.481Z","message":"checkin start long poll"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy agent monitor","fleet.policy.id":"policy-elastic-agent-on-cloud","rev":3,"coord":1,"fleet.agent.id":"211ab162-8a4c-4906-9b82-5018d14c8f13","@timestamp":"2022-11-20T15:13:09.735Z","message":"scheduled pendingQ on policy revision"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy agent monitor","fleet.policy.id":"policy-elastic-agent-on-cloud","rev":3,"coord":1,"oldRev":0,"oldCoord":0,"nQueued":1,"fleet.policy.id":"policy-elastic-agent-on-cloud","@timestamp":"2022-11-20T15:13:09.735Z","message":"New revision of policy received and added to the queue"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy agent monitor","fleet.policy.id":"fleet-server-policy","rev":3,"coord":1,"fleet.policy.id":"fleet-server-policy","@timestamp":"2022-11-20T15:13:09.735Z","message":"New policy found on update and added"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy agent monitor","fleet.agent.id":"211ab162-8a4c-4906-9b82-5018d14c8f13","fleet.policy.id":"policy-elastic-agent-on-cloud","rev":0,"coord":0,"@timestamp":"2022-11-20T15:13:09.736Z","message":"dispatch"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","fleet.agent.id":"211ab162-8a4c-4906-9b82-5018d14c8f13","http.request.id":"01GJASKD833DVNKHK3FYETRF18","fleet.access.apikey.id":"wiyZlYQBrKGcNQmgo1so","fleet.ctx":"processPolicy","fleet.policyRevision":3,"fleet.policyCoordinator":1,"fleet.policy.id":"policy-elastic-agent-on-cloud","fleet.agent.id":"211ab162-8a4c-4906-9b82-5018d14c8f13","fleet.policy.output.name":"es-containerhost","@timestamp":"2022-11-20T15:13:10.007Z","message":"preparing elasticsearch output"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","fleet.agent.id":"211ab162-8a4c-4906-9b82-5018d14c8f13","http.request.id":"01GJASKD833DVNKHK3FYETRF18","fleet.access.apikey.id":"wiyZlYQBrKGcNQmgo1so","fleet.ctx":"processPolicy","fleet.policyRevision":3,"fleet.policyCoordinator":1,"fleet.policy.id":"policy-elastic-agent-on-cloud","fleet.agent.id":"211ab162-8a4c-4906-9b82-5018d14c8f13","fleet.policy.output.name":"es-containerhost","@timestamp":"2022-11-20T15:13:10.007Z","message":"policy output permissions are the same"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","fleet.agent.id":"211ab162-8a4c-4906-9b82-5018d14c8f13","http.request.id":"01GJASKD833DVNKHK3FYETRF18","fleet.access.apikey.id":"wiyZlYQBrKGcNQmgo1so","ackToken":"","createdAt":"2022-11-18T19:12:45.069Z","id":"policy:policy-elastic-agent-on-cloud:3:1","type":"POLICY_CHANGE","inputType":"","timeout":0,"@timestamp":"2022-11-20T15:13:10.007Z","message":"Action delivered to agent on checkin"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy agent monitor","fleet.agent.id":"211ab162-8a4c-4906-9b82-5018d14c8f13","fleet.policy.id":"policy-elastic-agent-on-cloud","rev":0,"coord":0,"@timestamp":"2022-11-20T15:13:10.027Z","message":"unsubscribe"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","http.request.id":"01GJASKD833DVNKHK3FYETRF18","url.full":"/api/fleet/agents/211ab162-8a4c-4906-9b82-5018d14c8f13/checkin","http.version":"1.1","http.request.method":"POST","fleet.apikey.id":"wiyZlYQBrKGcNQmgo1so","client.address":"127.0.0.1:49296","tls.established":true,"http.request.body.bytes":1180,"http.response.body.bytes":691,"http.response.status_code":200,"event.duration":644380987,"@timestamp":"2022-11-20T15:13:10.027Z","message":"200 HTTP Request"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","http.request.id":"01GJASKDWVJXRCRG2P3EXCMR9W","url.full":"/api/fleet/agents/211ab162-8a4c-4906-9b82-5018d14c8f13/acks","http.version":"1.1","http.request.method":"POST","fleet.apikey.id":"wiyZlYQBrKGcNQmgo1so","client.address":"127.0.0.1:49296","tls.established":true,"client.ip":"127.0.0.1","client.port":49296,"tls.version":"1.3","tls.cipher":"TLS_AES_128_GCM_SHA256","tls.resumed":false,"tls.client.server_name":"instance-0000000012","@timestamp":"2022-11-20T15:13:10.043Z","message":"HTTP start"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","id":"wiyZlYQBrKGcNQmgo1so","http.request.id":"01GJASKDWVJXRCRG2P3EXCMR9W","event.duration":11025,"fleet.apikey.cache_hit":true,"@timestamp":"2022-11-20T15:13:10.043Z","message":"ApiKey authenticated"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:10.061Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:10.061Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:10.061Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:10.061Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:10.061Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:10.061Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:10.062Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:10.062Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:10.062Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:10.063Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:10.063Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:10.063Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:10.063Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:10.063Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:10.064Z","message":"Server configuration update"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","state":"configuring","@timestamp":"2022-11-20T15:13:10.064Z","message":"Re-configuring"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:10.064Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:10.064Z","message":"Server configuration update"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","state":"configuring","@timestamp":"2022-11-20T15:13:10.064Z","message":"Re-configuring"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:10.064Z","message":"No applicable limit for 0 agents, using default."}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","fleet.agent.id":"211ab162-8a4c-4906-9b82-5018d14c8f13","http.request.id":"01GJASKDWVJXRCRG2P3EXCMR9W","fleet.access.apikey.id":"wiyZlYQBrKGcNQmgo1so","nEvents":1,"actionType":"ACTION_RESULT","actionSubType":"ACKNOWLEDGED","actionId":"policy:policy-elastic-agent-on-cloud:3:1","agentId":"211ab162-8a4c-4906-9b82-5018d14c8f13","timestamp":"2022-11-20T15:13:10.04296+00:00","n":0,"@timestamp":"2022-11-20T15:13:10.297Z","message":"ack event"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","fleet.agent.id":"211ab162-8a4c-4906-9b82-5018d14c8f13","http.request.id":"01GJASKDWVJXRCRG2P3EXCMR9W","fleet.access.apikey.id":"wiyZlYQBrKGcNQmgo1so","nEvents":1,"agent.policyId":"policy-elastic-agent-on-cloud","agent.revisionIdx":0,"agent.coordinatorIdx":0,"rev.policyId":"policy-elastic-agent-on-cloud","rev.revisionIdx":3,"rev.coordinatorIdx":1,"@timestamp":"2022-11-20T15:13:10.297Z","message":"ack policy revision"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","fleet.agent.id":"211ab162-8a4c-4906-9b82-5018d14c8f13","http.request.id":"01GJASKDWVJXRCRG2P3EXCMR9W","fleet.access.apikey.id":"wiyZlYQBrKGcNQmgo1so","nEvents":1,"fleet.policy.id":"policy-elastic-agent-on-cloud","policyRevision":3,"policyCoordinator":1,"@timestamp":"2022-11-20T15:13:10.56Z","message":"ack policy"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","http.request.id":"01GJASKDWVJXRCRG2P3EXCMR9W","url.full":"/api/fleet/agents/211ab162-8a4c-4906-9b82-5018d14c8f13/acks","http.version":"1.1","http.request.method":"POST","fleet.apikey.id":"wiyZlYQBrKGcNQmgo1so","client.address":"127.0.0.1:49296","tls.established":true,"http.request.body.bytes":313,"http.response.body.bytes":57,"http.response.status_code":200,"event.duration":517406082,"@timestamp":"2022-11-20T15:13:10.56Z","message":"200 HTTP Request"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","http.request.id":"01GJASKFARX0MQ6541RYHCMT66","url.full":"/api/fleet/agents/211ab162-8a4c-4906-9b82-5018d14c8f13/checkin","http.version":"1.1","http.request.method":"POST","fleet.apikey.id":"wiyZlYQBrKGcNQmgo1so","client.address":"127.0.0.1:49296","tls.established":true,"client.ip":"127.0.0.1","client.port":49296,"tls.version":"1.3","tls.cipher":"TLS_AES_128_GCM_SHA256","tls.resumed":false,"tls.client.server_name":"instance-0000000012","@timestamp":"2022-11-20T15:13:11.512Z","message":"HTTP start"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","id":"wiyZlYQBrKGcNQmgo1so","http.request.id":"01GJASKFARX0MQ6541RYHCMT66","event.duration":9989,"fleet.apikey.cache_hit":true,"@timestamp":"2022-11-20T15:13:11.512Z","message":"ApiKey authenticated"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","fleet.agent.id":"211ab162-8a4c-4906-9b82-5018d14c8f13","http.request.id":"01GJASKFARX0MQ6541RYHCMT66","fleet.access.apikey.id":"wiyZlYQBrKGcNQmgo1so","req.Components":[{"id":"fleet-server-es-containerhost","type":"fleet-server","status":"healthy","message":"Healthy: communicating with pid '317'","units":[{"id":"fleet-server-es-containerhost-fleet-server-fleet_server-elastic-cloud-fleet-server","type":"input","status":"configuring","message":"Re-configuring"},{"id":"fleet-server-es-containerhost","type":"output","status":"configuring","message":"Re-configuring"}]}],"@timestamp":"2022-11-20T15:13:11.765Z","message":"applying new components data"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy agent monitor","fleet.agent.id":"211ab162-8a4c-4906-9b82-5018d14c8f13","fleet.policy.id":"policy-elastic-agent-on-cloud","rev":3,"coord":1,"@timestamp":"2022-11-20T15:13:11.765Z","message":"subscribed to policy monitor"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","fleet.agent.id":"211ab162-8a4c-4906-9b82-5018d14c8f13","http.request.id":"01GJASKFARX0MQ6541RYHCMT66","fleet.access.apikey.id":"wiyZlYQBrKGcNQmgo1so","status":"online","seqNo":"-1","setupDuration":253.119333,"jitter":25273.373017,"pollDuration":274473.50765,"bodyCount":1060,"@timestamp":"2022-11-20T15:13:11.765Z","message":"checkin start long poll"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","state":"healthy","@timestamp":"2022-11-20T15:13:13.977Z","message":"Running on policy with Fleet Server integration: policy-elastic-agent-on-cloud"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","@timestamp":"2022-11-20T15:13:13.977Z","message":"Policy self monitor exited"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","schedule":"fleet actions cleanup","interval":3600000,"@timestamp":"2022-11-20T15:13:18.16Z","message":"started"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"fleet actions cleanup","interval":"now-30d","@timestamp":"2022-11-20T15:13:18.161Z","message":"delete expired actions"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"fleet actions cleanup","interval":"now-30d","count":0,"@timestamp":"2022-11-20T15:13:18.163Z","message":"deleted expired actions"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","schedule":"fleet actions cleanup","@timestamp":"2022-11-20T15:13:18.163Z","message":"finished"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy leader manager","@timestamp":"2022-11-20T15:13:29.48Z","message":"ensuring leadership of policies"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy leader manager","policy_id":"policy-elastic-agent-on-cloud","count":0,"coordinator":"v0","@timestamp":"2022-11-20T15:13:31.008Z","message":"Starting policy coordinator"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy leader manager","policy_id":"fleet-server-policy","count":0,"coordinator":"v0","@timestamp":"2022-11-20T15:13:31.008Z","message":"Starting policy coordinator"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy leader manager","policy_id":"policy-elastic-agent-on-cloud","unenroller_uuid":"e9bc5fad-3dbd-46a1-807a-5234cbfa4db9","unenroll_timeout":86400000,"@timestamp":"2022-11-20T15:13:31.008Z","message":"setting unenrollment timeout"}
{"log.level":"info","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy leader manager","policy_id":"policy-elastic-agent-on-cloud","unenroller_uuid":"e9bc5fad-3dbd-46a1-807a-5234cbfa4db9","checkInterval":60000,"unenrollTimeout":86400000,"@timestamp":"2022-11-20T15:13:31.008Z","message":"giving a grace period to Elastic Agent before enforcing unenrollTimeout monitor"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy leader manager","@timestamp":"2022-11-20T15:13:51.008Z","message":"ensuring leadership of policies"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy leader manager","@timestamp":"2022-11-20T15:14:12.549Z","message":"ensuring leadership of policies"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy leader manager","@timestamp":"2022-11-20T15:14:34.079Z","message":"ensuring leadership of policies"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy leader manager","@timestamp":"2022-11-20T15:14:55.61Z","message":"ensuring leadership of policies"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy leader manager","@timestamp":"2022-11-20T15:15:17.143Z","message":"ensuring leadership of policies"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy leader manager","@timestamp":"2022-11-20T15:15:38.692Z","message":"ensuring leadership of policies"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy leader manager","@timestamp":"2022-11-20T15:16:00.23Z","message":"ensuring leadership of policies"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy leader manager","@timestamp":"2022-11-20T15:16:21.761Z","message":"ensuring leadership of policies"}
{"log.level":"debug","ecs.version":"1.6.0","service.name":"fleet-server","ctx":"policy leader manager","@timestamp":"2022-11-20T15:16:43.289Z","message":"ensuring leadership of policies"}

To be continued on Monday.

@axw
Copy link
Member

axw commented Nov 21, 2022

But the fleet and APM is shown as unhealthy in the deployment admin web UI and the docker container is getting killed after some period of this.
How the health of fleet and APM is determined in the cloud?

I noticed while updating APM Server to support V2 that the fleet-server /api/status endpoint went from reporting "HEALTHY" to "healthy" -- this broke our docker-compose healthcheck command. I don't know where to look in cloud for the health checking, but maybe this is related.

@simitt
Copy link
Contributor Author

simitt commented Nov 21, 2022

The policy doesn't have APM integration out of the box.

Should that be fixed with #1745?

But the fleet and APM is shown as unhealthy in the deployment admin web UI and the docker container is getting killed after some period of this.
How the health of fleet and APM is determined in the cloud?

In the past, it was important that the Elastic Agent would not shut down if there was an issue with Fleet Server, while a healthy APM Server was running. This was important to not destabilize GA versions of APM Server while EA and Fleet Server were in the early stage (see related elastic/beats#28209). Afair, the healthcheck used to be based on the response when calling /processes rather than the api/status. @gcpagano or @AlexP-Elastic can you help refresh our memories how the healthcheck is carried out?

Since then, the Elastic Agent team has worked on a new health status design for the EA. The Integrations Server on cloud was introduced, and EA & Fleet Server are mature. I am unfortunately not in the picture of the current state of the cloud healthcheck and what might have been changed.

@simitt you mentioned before that the logs location could affect the reported health

I don't think it will impact the health, but it should be tested that monitoring still works (both - the operational, cloud regional metrics collection and the customer facing monitoring when enabled).
The paths from which the metricbeats are collecting are defined in the cloud code and might need to be updated when the structure changed.

@aleksmaus
Copy link
Member

I noticed while updating APM Server to support V2 that the fleet-server /api/status endpoint went from reporting "HEALTHY" to "healthy" -- this broke our docker-compose healthcheck command. I don't know where to look in cloud for the health checking, but maybe this is related.

This is an interesting clue. Indeed the health status is now all lower case "healthy", when before in 8.5 in was "HEALTHY". @simitt does it make a difference when the health is checked?

@simitt
Copy link
Contributor Author

simitt commented Nov 21, 2022

I honestly don't know, but as mentioned above (our comments just overlapped) I am not certain api/status is really used for the health state; I thought it was but digging out this older issue it looks like /processes was used.

@aleksmaus
Copy link
Member

Looks like the /processes handler was removed in V2
There is only /stats handler. @blakerouse could you shed the light on this change?

@simitt
Copy link
Contributor Author

simitt commented Nov 21, 2022

When /processes was removed, then at least the metrics collection for monitoring won't be working as expected.

@cmacknz
Copy link
Member

cmacknz commented Nov 21, 2022

It looks like we may have to update all of the cloud monitoring configurations for 8.6 to have it work with the /processes path removed. The easiest path will be to restore it, I suspect we removed it not knowing that it was widely used internally.

I don't see the /processes path documented, but removing it can be considered a breaking change we shouldn't have made.

@AlexP-Elastic
Copy link

AlexP-Elastic commented Nov 21, 2022

We definitely use /processes to detect managed mode, so it need to continue to support this logic:

(I think this is the reason for it dying)

  private def checkForManagedApmServerMode(): Future[ApmInstanceMode] = {
    val request = {
      val r = httpBundle.request
      r.withUri(r.uri.withPath(Uri.Path("/processes")))
    }
    val futureResult = httpBundle.client.httpRequest(request) map { result =>
      import result.response
      traceRequestResponse(response, request, response.status.isSuccess)
      if (response.status.isSuccess) {
        val responseBody = result.entity.asString
        val tryParse = Try(mapper.readRequiredObjectNode(responseBody))
        tryParse match {
          case Failure(exception) => {
            logger.warn(s"Invalid JSON returned from fleet processes API. Received [$responseBody]", exception)
            None
          }
          case Success(node) => {
            node
              .path("processes")
              .elements()
              .asScala
              .find(n => n.path("id").asText().startsWith("apm-server"))
              .map(_ => ApmInstanceMode.Managed)
          }
        }
      } else None
    }

    futureResult
      .recover {
        case e =>
          logger.warn(s"Managed APM check request to [${request.uri}] failed", e)
          None
      }
      .map(mode => mode.getOrElse(ApmInstanceMode.Standalone))
  }

We also use it for metrics collection

    val fleetConfigNode = yamlMapper.readValue(
      s"""
         |hosts:
         |  - "${getAgentMonitoringConnectionString}"
         |metricsets:
         |  - json
         |module: http
         |namespace: $processName
         |path: /processes/$processName$pathSuffix

@cmacknz
Copy link
Member

cmacknz commented Nov 23, 2022

Re-opening this until we can confirm we have actually solved the problem.

One thing I am noticing is the ID keys are different between 8.5 and 8.6 and cloud depends on them being stable:

This is 8.5 with IDs "filebeat-default-monitoring" and "metricbeat-default-monitoring" and "metricbeat-default".

{"processes":[{"id":"filebeat-default-monitoring","pid":"72533","binary":"filebeat","source":{"kind":"internal","outputs":["default"]}},{"id":"metricbeat-default-monitoring","pid":"72534","binary":"metricbeat","source":{"kind":"internal","outputs":["default"]}},{"id":"metricbeat-default","pid":"72474","binary":"metricbeat","source":{"kind":"configured","outputs":["default"]}}]}

This is 8.6 right now with IDs "beat/metrics-monitoring", "system/metrics-default", "http/metrics-monitoring" and "filestream-monitoring".

{"processes":[{"id":"beat/metrics-monitoring","pid":"metricbeat","binary":"86000","source":{"kind":"internal","outputs":["monitoring"]}},{"id":"system/metrics-default","pid":"metricbeat","binary":"85997","source":{"kind":"configured","outputs":["default"]}},{"id":"http/metrics-monitoring","pid":"metricbeat","binary":"85998","source":{"kind":"internal","outputs":["monitoring"]}},{"id":"filestream-monitoring","pid":"filebeat","binary":"85999","source":{"kind":"internal","outputs":["monitoring"]}}]}

@cmacknz
Copy link
Member

cmacknz commented Nov 23, 2022

Including fleet server gives me {"id":"fleet-server-default","pid":"fleet-server","binary":"15271","source":{"kind":"configured","outputs":["default"]}} on 8.6 and the same {"processes":[{"id":"fleet-server-default","pid":"16589","binary":"fleet-server","source":{"kind":"configured","outputs":["default"]}}]} on 8.5 so we might be good here.

@cmacknz
Copy link
Member

cmacknz commented Nov 23, 2022

Kicked off a new 8.6 snapshot build to confirm we have fixed this. I will request a new BC once I confirm this fixes the problem, ideally I want to avoid requesting multiple new BCs if we need to iterate on the fix.

@cmacknz
Copy link
Member

cmacknz commented Nov 23, 2022

Snapshot complete, I can see fleet server started and integrations server is healthy.

@cmacknz cmacknz closed this as completed Nov 23, 2022
@cmacknz
Copy link
Member

cmacknz commented Nov 23, 2022

Screen Shot 2022-11-23 at 6 08 13 PM

@axw
Copy link
Member

axw commented Nov 24, 2022

@cmacknz did you observe the APM Server being able to receive requests?

I've just created a deployment with 8.6.0-45476311-SNAPSHOT, and although Integrations Server claims to be healthy, the APM endpoint is greyed out in the console and the proxy is rejecting requests.

image

The logs indicate that APM Server is running. Our (local, Docker-based) system tests are all passing, so it seems likely that this is related to orchestration - perhaps some residual monitoring/healthcheck issues?

@cmacknz
Copy link
Member

cmacknz commented Nov 24, 2022

No, I selfishly only checked fleet-server. If this is related to the same problem, likely it is the key we are using for the processes endpoint differing from what the cloud monitoring expects.

My day is over, but if you can get the output of the agent's /processes endpoint with apm-server running it will help confirm if this is the case. It should be the same locally as the it is in cloud.

Per #1773 (comment) it needs to have the ID apm-server and if it doesn't this might be the root cause.

The fix would be to adjust the name to be what cloud expects in the processes handler, which should be trivial. That code is here:

func processesHandler(coord *coordinator.Coordinator) func(http.ResponseWriter, *http.Request) error {

If you can confirm the key being different is the problem just create a PR, that will be the fastest way to fix this. If the problem is something else we have more investigating to do. It is possible the APM health is tied into the metrics reporting we haven't fixed yet.

@axw
Copy link
Member

axw commented Nov 24, 2022

Running locally, I had to mount an elastic-agent.yml into the container with

agent.monitoring:
  http:
    enabled: true

to expose the /processes endpoint. Is it expected/intentional that the metrics server is always running, but you have to opt into the /processes endpoint?

Anyway, the results are:

{
  "processes": [
    {
      "id": "http/metrics-monitoring",
      "pid": "metricbeat",
      "binary": "63",
      "source": {
        "kind": "internal",
        "outputs": [
          "monitoring"
        ]
      }
    },
    {
      "id": "filestream-monitoring",
      "pid": "filebeat",
      "binary": "79",
      "source": {
        "kind": "internal",
        "outputs": [
          "monitoring"
        ]
      }
    },
    {
      "id": "beat/metrics-monitoring",
      "pid": "metricbeat",
      "binary": "104",
      "source": {
        "kind": "internal",
        "outputs": [
          "monitoring"
        ]
      }
    },
    {
      "id": "apm-default",
      "pid": "apm-server",
      "binary": "47",
      "source": {
        "kind": "configured",
        "outputs": [
          "default"
        ]
      }
    }
  ]
}

@axw
Copy link
Member

axw commented Nov 24, 2022

If you can confirm the key being different is the problem just create a PR, that will be the fastest way to fix this. If the problem is something else we have more investigating to do. It is possible the APM health is tied into the metrics reporting we haven't fixed yet.

I don't have enough context to know what the right fix is here in order to open a PR. I've opened https://github.com/elastic/ingest-dev/issues/1418

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker bug Something isn't working Team:Elastic-Agent Label for the Agent team v8.6.0
Projects
None yet
6 participants