Enable Fleet Server #279

mtojek · 2021-03-08T15:00:27Z

This PR adjusts the Elastic stack created by the elastic-package to supports Fleet Server.

Issue: #278

mtojek · 2021-03-08T15:01:23Z

Right now I can't enroll the agent to any policy (using 7.13.0):

➜  elastic-package git:(278-support-fleet-server) ✗ docker logs 47b5c520a12f -f
Performing setup of Fleet in Kibana
The Elastic Agent is currently in BETA and should not be used in production

2021-03-08T14:53:43.418Z	INFO	application/enroll_cmd.go:287	Generating self-signed certificate for Fleet Server
2021-03-08T14:53:43.787Z	INFO	application/enroll_cmd.go:435	Spawning Elastic Agent daemon as a subprocess to complete bootstrap process.
2021-03-08T14:53:44.793Z	INFO	application/enroll_cmd.go:513	waiting for Elastic Agent to start Fleet Server
2021-03-08T14:53:45.795Z	INFO	application/enroll_cmd.go:531	Fleet Server - Starting
2021-03-08T14:53:47.428Z	WARN	application/enroll_cmd.go:331	Remote server is not ready to accept connections, will retry in a moment.
2021-03-08T14:54:47.360Z	INFO	application/enroll_cmd.go:338	Retrying to enroll...
2021-03-08T14:54:47.363Z	WARN	application/enroll_cmd.go:331	Remote server is not ready to accept connections, will retry in a moment.

Keep in mind that I'd like to support 7.12.0 if possible (I expect new environment variables to be ignored).

elasticmachine · 2021-03-08T15:14:20Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Build Cause: Pull request #279 updated
Start Time: 2021-04-07T08:24:17.760+0000
Duration: 23 min 45 sec
Commit: 0f53750

Test stats 🧪

Test	Results
Failed	0
Passed	316
Skipped	1
Total	317

Trends 🧪

nchaulet · 2021-03-09T00:12:44Z

@mtojek I was able to have it running on 8.0 by adding - "FLEET_SERVER_INSECURE_HTTP=1"

mtojek · 2021-03-09T18:39:14Z

@nchaulet I added the suggested env and now I'm getting this error from Agent:

2021-03-09T18:37:09.755Z	ERROR	application/fleet_gateway.go:185	failed to dispatch actions, error: fail to generate program configuration: expecting Dict and received *transpiler.Key for '0'
2021-03-09T18:37:09.755Z	WARN	status/reporter.go:233	Elastic Agent status changed to: 'degraded'
2021-03-09T18:37:09.755Z	INFO	status/reporter.go:233	Elastic Agent status changed to: 'online'

Package revision in the agent's structure is "null".

mtojek · 2021-03-23T22:05:23Z

internal/install/static_snapshot_yml.go

    - "FLEET_INSECURE=1"
+    - "FLEET_SERVER_ENABLE=1"


@blakerouse @ruflin @nchaulet I watched the observability demo session and @blakerouse's presentation (good job!) about running the agent with fleet server in a container and I'm confused about available configuration options. Recently I've removed FLEET_URL and KIBANA_HOST from global envs, but I saw @blakerouse used them. Unfortunately with current blockers it's hard to determine the correct set.

Shall I ask you to review both kibana.config.yml and snapshot.yml and recommend the best configuration for 7.13?

@mtojek FLEET_URL should not be required, but at the moment there is an issue that requires it. I am working to solve this issue. As for the KIBANA_HOST that is required if you want the container to perform the setup of Kibana working with Fleet, which when running with FLEET_SERVER_ENABLED is required, unless you are running at some hook on the Kibana container instead.

FLEET_URL should not be required, but at the moment there is an issue that requires it. I am working to solve this issue.

Would you mind linking this issue for tracking purposes?

As for the KIBANA_HOST that is required if you want the container to perform the setup of Kibana working with Fleet, which when running with FLEET_SERVER_ENABLED is required, unless you are running at some hook on the Kibana container instead.

In this setup we've a long running Elastic cluster (Elasticsearch, Kibana, Package Registry) and two agents (so far):

Separate agent's container which is used for system tests (reassigning policies).

Agent's container in the Kubernetes cluster.

which gives us two agent containers, each one with own FleetServer inside. Can Kibana handle it without any problems?

Two fleet-server should just work. If not, please let us know.

ruflin · 2021-03-24T10:25:13Z

internal/install/static_kibana_config_yml.go

@@ -17,7 +17,8 @@ xpack.fleet.enabled: true
 xpack.fleet.registryUrl: "http://package-registry:8080"
 xpack.fleet.agents.enabled: true
 xpack.fleet.agents.elasticsearch.host: "http://elasticsearch:9200"
-xpack.fleet.agents.kibana.host: "http://kibana:5601"
+xpack.fleet.agents.fleetServerEnabled: true
+xpack.fleet.agents.kibana.host: "http://localhost:8220"


This will change as part of elastic/beats#24713 and elastic/kibana#94364

Thank you for reviewing this. What is the recommendation though? Should I wait until these PRs are merged?

I would like to switch over elastic-package to fleet-server as soon as possible to have early testing. At the same time I do not want to impact the integrations team with potential bugs / changes. The above changes can only land if a dependency from endpoint is also merged at the same time. I think an easy trick on our side will be to just ahve both config options in already so things will keep working. @nchaulet is this assumption correct?

Yes thats correct we can have both options as soon my PR for fleet server hosts is merged (hopefully soon :) ) elastic/kibana#94364

internal/install/static_snapshot_yml.go

mtojek · 2021-03-25T14:04:33Z

Status update:

I had a 1-1 with @ruflin regarding the configuration. I learnt much about the Fleet Server package, which seems to be the missing part for the system test runner to handle. I will adjust its implementation.

I will also use environment variables specified in the: https://github.com/elastic/beats/blob/master/x-pack/elastic-agent/pkg/agent/cmd/container.go#L61

internal/install/static_snapshot_yml.go

ruflin · 2021-03-29T08:32:55Z

internal/install/static_snapshot_yml.go

-    - "FLEET_SETUP=1"
-    - "FLEET_URL=http://kibana:5601"
-    - "KIBANA_HOST=http://kibana:5601"
+    - "FLEET_URL=http://fleet-server:8220"


I wonder how in this scenario the elastic-agent will get the right enrollment token. We might still have to read it from Kibana.

Would prefer the usage of https here as well. But you will still need the FLEET_INSECURE=1 so it sets ssl.verification_mode: none.

mtojek · 2021-03-29T11:21:33Z

/test

mtojek · 2021-03-29T12:49:45Z

Status update:

I increased the interval between healthchecks and it skips the problematic gap when the Fleet Server is not available, but the flakiness still persists.

It seems that you can't assign the custom policy to the agent. There is no error in logs, but the policy revision is null. We're using this call to verify if the policy revision has been assigned.

EDIT:

I spotted also that the Fleet Server reports errors (status offline) in kibana:

14:43:36.582
elastic_agent
[elastic_agent][error] Could not communicate with Checking API will retry, error: fail to checkin to fleet: Post "http://fleet-server:8220/api/fleet/agents/3853ba23-2f6d-4a5f-a7e3-cd9acebe7c9a/checkin?": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
14:50:15.741
elastic_agent
[elastic_agent][error] Could not communicate with Checking API will retry, error: fail to checkin to fleet: Post "http://fleet-server:8220/api/fleet/agents/3853ba23-2f6d-4a5f-a7e3-cd9acebe7c9a/checkin?": net/http: request canceled (Client.Timeout exceeded while awaiting headers)

mtojek · 2021-03-29T13:21:47Z

/test

mtojek · 2021-03-29T14:57:37Z

/test

mtojek · 2021-03-31T07:37:04Z

jenkins run the tests please

mtojek · 2021-03-31T07:59:43Z

jenkins run the tests please

mtojek · 2021-03-31T08:23:41Z

Passed: https://beats-ci.elastic.co/blue/organizations/jenkins/Ingest-manager%2Felastic-package/detail/PR-279/32/pipeline
Failed: https://beats-ci.elastic.co/blue/organizations/jenkins/Ingest-manager%2Felastic-package/detail/PR-279/31/pipeline/

I suspect that there is some flakiness around, maybe related to healthchecks.

mtojek · 2021-04-06T14:27:53Z

It looks like the PR is ready for review, no issues spotted and the CI reported the green status. Would you mind looking at it one more time to see if I haven't missed anything?

ruflin

I did a quick test on 7.13 and all works as expected. I must confess the setup took a bit longer then I expected but this is much more to the setup on the fleet-server / Fleet end then this scripting and we need to make improvements there.

blakerouse · 2021-04-06T18:27:43Z

internal/install/static_snapshot_yml.go

+    - "FLEET_SERVER_INSECURE_HTTP=1"
+    - "KIBANA_FLEET_SETUP=1"
+    - "KIBANA_FLEET_HOST=http://kibana:5601"
+    - "FLEET_SERVER_HOST=0.0.0.0"


You should not need this anymore. By default Elastic Agent will start Fleet Server with it bound to 0.0.0.0.

~~Something doesn't work (connectivity issue), I will try to revert this one.~~ ~~Seems to be correct.~~

Unfortunately this one is also required, otherwise the fleet server is not reachable anymore. Maybe something hasn't been backported here?

blakerouse · 2021-04-06T18:28:10Z

internal/install/static_snapshot_yml.go

+    hostname: docker-fleet-server
+    environment:
+    - "FLEET_SERVER_ENABLE=1"
+    - "FLEET_SERVER_INSECURE_HTTP=1"


I would rather see you run it without this flag. Why run it insecurely?

For debugging purposes we can sniff network traffic and see requests/responses. It's not a production setup.

blakerouse · 2021-04-06T18:28:38Z

internal/install/static_snapshot_yml.go

    image: ${ELASTIC_AGENT_IMAGE_REF}
    depends_on:
      elasticsearch:
        condition: service_healthy
      kibana:
        condition: service_healthy
+    healthcheck:
+      test: "curl -f http://127.0.0.1:8220/api/status | grep HEALTHY 2>&1 >/dev/null"


Can add --insecure to the curl command and change to https if you remove the FLEET_SERVER_INSECURE_HTTP below.

blakerouse · 2021-04-06T18:30:00Z

internal/install/static_snapshot_yml.go

-    - "FLEET_SETUP=1"
-    - "FLEET_URL=http://kibana:5601"
-    - "KIBANA_HOST=http://kibana:5601"
+    - "FLEET_URL=http://fleet-server:8220"


Would prefer the usage of https here as well. But you will still need the FLEET_INSECURE=1 so it sets ssl.verification_mode: none.

blakerouse · 2021-04-06T18:31:11Z

internal/install/static_snapshot_yml.go

+    image: ${ELASTIC_AGENT_IMAGE_REF}
+    depends_on:
+      fleet-server:
+        condition: service_healthy
    healthcheck:
      test: "sh -c 'grep \"Agent is starting\" -r . --include=elastic-agent-json.log'"


The status command landed, so it would be better to run that instead of this type of check.

test: "./elastic-agent status"

Should be enough, as it returns exit code 0 when the agent is healthy.

~~Fixed~~

Unfortunately it fails with:

bash-4.2$ ./elastic-agent status Error: failed to communicate with Elastic Agent daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /usr/share/elastic-agent/data/tmp/elastic-agent-control.sock: connect: no such file or directory"

I opened issue for this: elastic/beats#24956

ruflin · 2021-04-07T08:52:59Z

@mtojek I would not block this PR on all the changes. We have a working version which can be used and we can always iterate on top of it. I think it is more important to start using fleet-server for all the testing instead of having all the perfect params.

Enable Fleet Server

ae3f99b

mtojek self-assigned this Mar 8, 2021

mtojek and others added 3 commits March 9, 2021 09:35

Add FLEET_SERVER_INSECURE_HTTP

3054a28

Merge branch 'master' into 278-support-fleet-server

95008aa

Merge branch 'master' into 278-support-fleet-server

abfc3c2

Merge branch 'master' into 278-support-fleet-server

8474486

This was referenced Mar 9, 2021

Elastic Agent: expecting Dict and received *transpiler.Key for '0' elastic/beats#24453

Closed

Elastic-Agent: failed: could not decode the response, raw response: no matching action elastic/beats#24467

Closed

mtojek and others added 4 commits March 10, 2021 18:46

Merge branch 'master' into 278-support-fleet-server

3d38884

Merge branch 'master' into 278-support-fleet-server

1010dba

Fix: connect to the Fleet Server

9e82f14

Merge branch 'master' into 278-support-fleet-server

909673e

mtojek mentioned this pull request Mar 23, 2021

[Fleet] Agents can't pick up the policy? elastic/kibana#95179

Closed

mtojek commented Mar 23, 2021

View reviewed changes

ruflin reviewed Mar 24, 2021

View reviewed changes

Merge branch 'master' into 278-support-fleet-server

d6e4f8e

ruflin reviewed Mar 24, 2021

View reviewed changes

internal/install/static_snapshot_yml.go Show resolved Hide resolved

mtojek and others added 4 commits March 25, 2021 09:15

More vars

b1bd3f7

Merge branch 'master' into 278-support-fleet-server

9cf2ede

Use vars defined in beats/container.go

5b2311e

WIP

3e2b85f

mtojek and others added 4 commits March 26, 2021 12:41

Merge branch 'master' into 278-support-fleet-server

c1fa538

Revert

e069db0

Try with two agent instances

f1d5135

Clean variables

5088df3

ruflin reviewed Mar 29, 2021

View reviewed changes

internal/install/static_snapshot_yml.go Show resolved Hide resolved

ruflin reviewed Mar 29, 2021

View reviewed changes

Try: increase healthcheck interval

69ecf9b

mtojek mentioned this pull request Mar 29, 2021

[Fleet] HandlerUnknown type: UNKNOWN (original type: INTERNAL_POLICY_REASSIGN)' received elastic/beats#24725

Closed

Merge branch 'master' into 278-support-fleet-server

fc1037e

Dump fleet-server logs

58c47e0

mtojek added 2 commits April 6, 2021 15:32

Merge branch 'master' into 278-support-fleet-server

a62148b

Fix: bad merge

c46a537

mtojek marked this pull request as ready for review April 6, 2021 14:26

mtojek requested review from ruflin, blakerouse, nchaulet and ycombinator April 6, 2021 14:26

ruflin approved these changes Apr 6, 2021

View reviewed changes

blakerouse reviewed Apr 6, 2021

View reviewed changes

mtojek and others added 5 commits April 7, 2021 09:38

Merge branch 'master' into 278-support-fleet-server

649e2f1

Latest fixes

50ece51

Revert FLEET_SERVER_HOST

fc412cb

Fix

9ee732c

FLEET_SERVER_HOST is required

0f53750

mtojek merged commit f171846 into elastic:master Apr 7, 2021

Enable Fleet Server #279

Enable Fleet Server #279

Conversation

mtojek commented Mar 8, 2021 • edited Loading

mtojek commented Mar 8, 2021 • edited Loading

elasticmachine commented Mar 8, 2021 • edited Loading

💚 Build Succeeded

Build stats

Test stats 🧪

Trends 🧪

nchaulet commented Mar 9, 2021

mtojek commented Mar 9, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtojek commented Mar 25, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtojek commented Mar 29, 2021

mtojek commented Mar 29, 2021 • edited Loading

mtojek commented Mar 29, 2021

mtojek commented Mar 29, 2021

mtojek commented Mar 31, 2021

mtojek commented Mar 31, 2021

mtojek commented Mar 31, 2021

mtojek commented Apr 6, 2021 • edited Loading

ruflin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtojek Apr 7, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mtojek Apr 7, 2021 • edited Loading

Choose a reason for hiding this comment

ruflin commented Apr 7, 2021

mtojek commented Mar 8, 2021 •

edited

Loading

mtojek commented Mar 8, 2021 •

edited

Loading

elasticmachine commented Mar 8, 2021 •

edited

Loading

mtojek commented Mar 29, 2021 •

edited

Loading

mtojek commented Apr 6, 2021 •

edited

Loading

mtojek Apr 7, 2021 •

edited

Loading

mtojek Apr 7, 2021 •

edited

Loading