Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2: mage package will use old, incompatible Beat versions without warning #1709

Closed
cmacknz opened this issue Nov 10, 2022 · 8 comments
Closed
Assignees
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team V2-Architecture v8.6.0

Comments

@cmacknz
Copy link
Member

cmacknz commented Nov 10, 2022

To reproduce, build the elastic-agent docker container from the head of main and run it. You will immediately see error messages about components failing:

DEV=true SNAPSHOT=true PLATFORMS=linux/arm64 PACKAGES=docker mage package
docker run docker.elastic.co/beats/elastic-agent-complete:8.6.0-SNAPSHOT

You will logs like:

Exiting: error loading config file: stat metricbeat.yml: no such file or directory
{"log.level":"info","@timestamp":"2022-11-10T22:10:59.283Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":325},"message":"Existing component state changed","component":{"id":"http/metrics-monitoring","state":"Failed","message":"Failed: pid '192' exited with code '1'","inputs":[{"id":"http/metrics-monitoring-metrics-monitoring-agent","state":"Failed","message":"Failed: pid '192' exited with code '1'"}],"output":{"id":"http/metrics-monitoring","state":"Failed","message":"Failed: pid '192' exited with code '1'"}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-11-10T22:11:08.942Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":325},"message":"Existing component state changed","component":{"id":"filestream-monitoring","state":"Starting","message":"Starting: spawned pid '201'","inputs":[{"id":"filestream-monitoring-logs-monitoring-agent","state":"Starting","message":"Starting: spawned pid '201'"}],"output":{"id":"filestream-monitoring","state":"Starting","message":"Starting: spawned pid '201'"}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-11-10T22:11:08.993Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":325},"message":"Existing component state changed","component":{"id":"system/metrics-default","state":"Starting","message":"Starting: spawned pid '209'","inputs":[{"id":"system/metrics-default-system/metrics","state":"Starting","message":"Starting: spawned pid '209'"}],"output":{"id":"system/metrics-default","state":"Starting","message":"Starting: spawned pid '209'"}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-11-10T22:11:09.044Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":325},"message":"Existing component state changed","component":{"id":"beat/metrics-monitoring","state":"Starting","message":"Starting: spawned pid '219'","inputs":[{"id":"beat/metrics-monitoring-metrics-monitoring-beats","state":"Starting","message":"Starting: spawned pid '219'"}],"output":{"id":"beat/metrics-monitoring","state":"Starting","message":"Starting: spawned pid '219'"}},"ecs.version":"1.6.0"}
Exiting: error loading config file: stat filebeat.yml: no such file or directory
{"log.level":"info","@timestamp":"2022-11-10T22:11:09.067Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":325},"message":"Existing component state changed","component":{"id":"filestream-monitoring","state":"Failed","message":"Failed: pid '201' exited with code '1'","inputs":[{"id":"filestream-monitoring-logs-monitoring-agent","state":"Failed","message":"Failed: pid '201' exited with code '1'"}],"output":{"id":"filestream-monitoring","state":"Failed","message":"Failed: pid '201' exited with code '1'"}},"ecs.version":"1.6.0"}
Exiting: error loading config file: stat metricbeat.yml: no such file or directory
{"log.level":"info","@timestamp":"2022-11-10T22:11:09.135Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":325},"message":"Existing component state changed","component":{"id":"system/metrics-default","state":"Failed","message":"Failed: pid '209' exited with code '1'","inputs":[{"id":"system/metrics-default-system/metrics","state":"Failed","message":"Failed: pid '209' exited with code '1'"}],"output":{"id":"system/metrics-default","state":"Failed","message":"Failed: pid '209' exited with code '1'"}},"ecs.version":"1.6.0"}
Exiting: error loading config file: stat metricbeat.yml: no such file or directory
{"log.level":"info","@timestamp":"2022-11-10T22:11:09.175Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":325},"message":"Existing component state changed","component":{"id":"beat/metrics-monitoring","state":"Failed","message":"Failed: pid '219' exited with code '1'","inputs":[{"id":"beat/metrics-monitoring-metrics-monitoring-beats","state":"Failed","message":"Failed: pid '219' exited with code '1'"}],"output":{"id":"beat/metrics-monitoring","state":"Failed","message":"Failed: pid '219' exited with code '1'"}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-11-10T22:11:09.289Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":325},"message":"Existing component state changed","component":{"id":"http/metrics-monitoring","state":"Starting","message":"Starting: spawned pid '227'","inputs":[{"id":"http/metrics-monitoring-metrics-monitoring-agent","state":"Starting","message":"Starting: spawned pid '227'"}],"output":{"id":"http/metrics-monitoring","state":"Starting","message":"Starting: spawned pid '227'"}},"ecs.version":"1.6.0"}
Exiting: error loading config file: stat metricbeat.yml: no such file or directory
{"log.level":"info","@timestamp":"2022-11-10T22:11:09.410Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":325},"message":"Existing component state changed","component":{"id":"http/metrics-monitoring","state":"Failed","message":"Failed: pid '227' exited with code '1'","inputs":[{"id":"http/metrics-monitoring-metrics-monitoring-agent","state":"Failed","message":"Failed: pid '227' exited with code '1'"}],"output":{"id":"http/metrics-monitoring","state":"Failed","message":"Failed: pid '227' exited with code '1'"}},"ecs.version":"1.6.0"}

Exec-ing into the container to check status reports every component as failed:

elastic-agent@f38bb03309f1:~$ elastic-agent status
State: DEGRADED
Message: 1 or more components/units in a failed state
Components:
  * system/metrics  (FAILED)
                    Failed: pid '568' exited with code '1'
  * filestream      (FAILED)
                    Failed: pid '561' exited with code '1'
  * beat/metrics    (FAILED)
                    Failed: pid '579' exited with code '1'
  * http/metrics    (FAILED)
                    Failed: pid '580' exited with code '1'

I can see that the metricbeat.yml file reported as missing is located in the components directory:

elastic-agent@f38bb03309f1:~$ pwd
/usr/share/elastic-agent
elastic-agent@f38bb03309f1:~$ ls -la data/elastic-agent-e7419d/components/
total 930180
drwxrwx---  1 root root      4096 Nov 10 20:29 .
drwxrwx---  1 root root      4096 Nov 10 20:30 ..
-rw-rw----  1 root root        41 Nov 10 20:28 .build_hash.txt
-rw-rw----  1 root root     13675 Nov 10 20:28 LICENSE.txt
-rw-rw----  1 root root   2566303 Nov 10 20:28 NOTICE.txt
-rw-rw----  1 root root       840 Nov 10 20:28 README.md
drwxrwx---  2 root root      4096 Nov 10 20:28 certs
-rw-r--r--  1 root root      1303 Nov 10 20:28 checksum.yml
-rw-r--r--  1 root root    389399 Nov 10 20:28 fields.yml
-rwxr-xr-x  1 root root 178477751 Nov 10 20:28 filebeat
-rw-r--r--  1 root root    174363 Nov 10 20:28 filebeat.reference.yml
-rw-r--r--  1 root root      4117 Nov 10 20:28 filebeat.spec.yml
-rw-r--r--  1 root root      8622 Nov 10 20:28 filebeat.yml
-rwxr-xr-x  1 root root 161152969 Nov 10 20:28 heartbeat
-rw-r--r--  1 root root     67937 Nov 10 20:28 heartbeat.reference.yml
-rw-r--r--  1 root root      1057 Nov 10 20:28 heartbeat.spec.yml
-rw-r--r--  1 root root      7276 Nov 10 20:28 heartbeat.yml
drwxrwx---  4 root root      4096 Nov 10 20:28 kibana
-rwxr-xr-x  1 root root 239101161 Nov 10 20:28 metricbeat
-rw-r--r--  1 root root    103498 Nov 10 20:28 metricbeat.reference.yml
-rw-r--r--  1 root root      4403 Nov 10 20:28 metricbeat.spec.yml
-rw-r--r--  1 root root      6899 Nov 10 20:28 metricbeat.yml
drwxrwx--- 84 root root      4096 Nov 10 20:29 module
drwxrwx---  2 root root      4096 Nov 10 20:29 modules.d
drwxrwx---  2 root root      4096 Nov 10 20:29 monitors.d
-rw-rw----  1 root root   5526616 Nov 10 20:28 osquery-extension.ext
-rwxr-xr-x  1 root root 146923101 Nov 10 20:28 osquerybeat
-rw-r--r--  1 root root     43600 Nov 10 20:28 osquerybeat.reference.yml
-rw-r--r--  1 root root       584 Nov 10 20:28 osquerybeat.spec.yml
-rw-r--r--  1 root root      6504 Nov 10 20:28 osquerybeat.yml
-rw-rw----  1 root root 217818752 Nov 10 20:28 osqueryd

Edit: the restart problem is a separate bug #1713

@kevinlog
Copy link

I tried to run some tests with the new Agent and Endpoint and got stuck at trying to stand up Fleet Server. It looks like this ticket is related. I'll open a new ticket if you think otherwise.

Here is the Fleet Server output that I'm getting, let me know if any other info would help.

{"log.level":"info","@timestamp":"2022-11-11T16:14:08.593Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":325},"message":"Existing component state changed","component":{"id":"http/metrics-monitoring","state":"Healthy","message":"Healthy: communicating with pid '49'","inputs":[{"id":"http/metrics-monitoring-metrics-monitoring-agent","state":"Starting","message":"Starting: spawned pid '49'"}],"output":{"id":"http/metrics-monitoring","state":"Starting","message":"Starting: spawned pid '49'"}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-11-11T16:14:08.593Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":325},"message":"Existing component state changed","component":{"id":"beat/metrics-monitoring","state":"Healthy","message":"Healthy: communicating with pid '39'","inputs":[{"id":"beat/metrics-monitoring-metrics-monitoring-beats","state":"Starting","message":"Starting: spawned pid '39'"}],"output":{"id":"beat/metrics-monitoring","state":"Starting","message":"Starting: spawned pid '39'"}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-11-11T16:14:08.595Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":325},"message":"Existing component state changed","component":{"id":"beat/metrics-monitoring","state":"Healthy","message":"Healthy: communicating with pid '39'","inputs":[{"id":"beat/metrics-monitoring-metrics-monitoring-beats","state":"Healthy","message":"beat reloaded"}],"output":{"id":"beat/metrics-monitoring","state":"Configuring","message":"reloading output component"}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-11-11T16:14:08.595Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":325},"message":"Existing component state changed","component":{"id":"beat/metrics-monitoring","state":"Healthy","message":"Healthy: communicating with pid '39'","inputs":[{"id":"beat/metrics-monitoring-metrics-monitoring-beats","state":"Healthy","message":"beat reloaded"}],"output":{"id":"beat/metrics-monitoring","state":"Healthy","message":"reloaded output component"}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-11-11T16:14:08.595Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":325},"message":"Existing component state changed","component":{"id":"http/metrics-monitoring","state":"Healthy","message":"Healthy: communicating with pid '49'","inputs":[{"id":"http/metrics-monitoring-metrics-monitoring-agent","state":"Starting","message":"Starting"}],"output":{"id":"http/metrics-monitoring","state":"Configuring","message":"reloading output component"}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-11-11T16:14:08.595Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":325},"message":"Existing component state changed","component":{"id":"http/metrics-monitoring","state":"Healthy","message":"Healthy: communicating with pid '49'","inputs":[{"id":"http/metrics-monitoring-metrics-monitoring-agent","state":"Starting","message":"Starting"}],"output":{"id":"http/metrics-monitoring","state":"Healthy","message":"reloaded output component"}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-11-11T16:14:08.595Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":325},"message":"Existing component state changed","component":{"id":"http/metrics-monitoring","state":"Healthy","message":"Healthy: communicating with pid '49'","inputs":[{"id":"http/metrics-monitoring-metrics-monitoring-agent","state":"Configuring","message":"found reloader for 'input'"}],"output":{"id":"http/metrics-monitoring","state":"Healthy","message":"reloaded output component"}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-11-11T16:14:08.602Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":325},"message":"Existing component state changed","component":{"id":"http/metrics-monitoring","state":"Healthy","message":"Healthy: communicating with pid '49'","inputs":[{"id":"http/metrics-monitoring-metrics-monitoring-agent","state":"Healthy","message":"beat reloaded"}],"output":{"id":"http/metrics-monitoring","state":"Healthy","message":"reloaded output component"}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-11-11T16:14:08.694Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":325},"message":"Existing component state changed","component":{"id":"filestream-monitoring","state":"Healthy","message":"Healthy: communicating with pid '60'","inputs":[{"id":"filestream-monitoring-logs-monitoring-agent","state":"Starting","message":"Starting: spawned pid '60'"}],"output":{"id":"filestream-monitoring","state":"Starting","message":"Starting: spawned pid '60'"}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-11-11T16:14:08.695Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":325},"message":"Existing component state changed","component":{"id":"filestream-monitoring","state":"Healthy","message":"Healthy: communicating with pid '60'","inputs":[{"id":"filestream-monitoring-logs-monitoring-agent","state":"Configuring","message":"found reloader for 'input'"}],"output":{"id":"filestream-monitoring","state":"Configuring","message":"reloading output component"}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-11-11T16:14:08.696Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":325},"message":"Existing component state changed","component":{"id":"filestream-monitoring","state":"Healthy","message":"Healthy: communicating with pid '60'","inputs":[{"id":"filestream-monitoring-logs-monitoring-agent","state":"Configuring","message":"found reloader for 'input'"}],"output":{"id":"filestream-monitoring","state":"Healthy","message":"reloaded output component"}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-11-11T16:14:08.699Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":325},"message":"Existing component state changed","component":{"id":"filestream-monitoring","state":"Healthy","message":"Healthy: communicating with pid '60'","inputs":[{"id":"filestream-monitoring-logs-monitoring-agent","state":"Healthy","message":"beat reloaded"}],"output":{"id":"filestream-monitoring","state":"Healthy","message":"reloaded output component"}},"ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-11-11T16:14:09.498Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":794},"message":"Fleet Server - Failed: execution of component prevented: cannot be writeable by group or other","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-11-11T16:16:07.488Z","log.origin":{"file.name":"cmd/run.go","file.line":227},"message":"Shutting down Elastic Agent and sending last events...","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-11-11T16:16:07.589Z","log.origin":{"file.name":"cmd/run.go","file.line":233},"message":"Shutting down completed.","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-11-11T16:16:07.589Z","log.logger":"api","log.origin":{"file.name":"api/server.go","file.line":72},"message":"Stats endpoint (127.0.0.1:6791) finished: accept tcp 127.0.0.1:6791: use of closed network connection","ecs.version":"1.6.0"}
Error: fleet-server failed: context canceled
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.6/fleet-troubleshooting.html
Error: enrollment failed: exit status 1
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.6/fleet-troubleshooting.html

@cmacknz
Copy link
Member Author

cmacknz commented Nov 11, 2022

Thanks, looks like the permissions are not set up correctly based on the symptoms and this similar report: https://github.com/elastic/ingest-dev/issues/1325#issuecomment-1311761547

Hopefully quick to fix.

@cmacknz
Copy link
Member Author

cmacknz commented Nov 11, 2022

Splitting the fleet-server issue off into #1712, may be a different root cause.

@cmacknz
Copy link
Member Author

cmacknz commented Nov 11, 2022

The Exiting: error loading config file: stat metricbeat.yml: no such file or directory does not occur is the most recent snapshot build. It only occurs with a local build of the agent.

@cmacknz
Copy link
Member Author

cmacknz commented Nov 11, 2022

My local build is using beats packages from 25 days ago, for me is this commit elastic/beats@e6c2ed9. This explains why it doesn't work.

@cmacknz
Copy link
Member Author

cmacknz commented Nov 11, 2022

I think the local elastic agent build is downloading old binaries, reason TBD.

Edit: it's picking up the contents of of my local beats checkout which was last locally packaged a while ago. We need to have everyone building agent locally know to purge these versions or download the beats.

@cmacknz cmacknz assigned cmacknz and unassigned michalpristas Nov 11, 2022
@cmacknz
Copy link
Member Author

cmacknz commented Nov 11, 2022

Build with EXTERNAL=true or repackage your local checkout of beats (after the V2 merge) to fix this, I am going to add a lot more logging to the packaging step and make it obvious where the agent is pulling beats from.

@cmacknz cmacknz changed the title V2: Elastic agent container starts with all components in a failed state V2: mage package will use old, incompatible Beat versions without warning Nov 12, 2022
@cmacknz
Copy link
Member Author

cmacknz commented Nov 21, 2022

Fixed as part of #1716

@cmacknz cmacknz closed this as completed Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team V2-Architecture v8.6.0
Projects
None yet
Development

No branches or pull requests

3 participants