Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Fleet Server #279

Merged
merged 30 commits into from
Apr 7, 2021
Merged

Enable Fleet Server #279

merged 30 commits into from
Apr 7, 2021

Conversation

mtojek
Copy link
Contributor

@mtojek mtojek commented Mar 8, 2021

This PR adjusts the Elastic stack created by the elastic-package to supports Fleet Server.

Issue: #278

@mtojek mtojek self-assigned this Mar 8, 2021
@mtojek
Copy link
Contributor Author

mtojek commented Mar 8, 2021

Right now I can't enroll the agent to any policy (using 7.13.0):

➜  elastic-package git:(278-support-fleet-server) ✗ docker logs 47b5c520a12f -f
Performing setup of Fleet in Kibana
The Elastic Agent is currently in BETA and should not be used in production

2021-03-08T14:53:43.418Z	INFO	application/enroll_cmd.go:287	Generating self-signed certificate for Fleet Server
2021-03-08T14:53:43.787Z	INFO	application/enroll_cmd.go:435	Spawning Elastic Agent daemon as a subprocess to complete bootstrap process.
2021-03-08T14:53:44.793Z	INFO	application/enroll_cmd.go:513	waiting for Elastic Agent to start Fleet Server
2021-03-08T14:53:45.795Z	INFO	application/enroll_cmd.go:531	Fleet Server - Starting
2021-03-08T14:53:47.428Z	WARN	application/enroll_cmd.go:331	Remote server is not ready to accept connections, will retry in a moment.
2021-03-08T14:54:47.360Z	INFO	application/enroll_cmd.go:338	Retrying to enroll...
2021-03-08T14:54:47.363Z	WARN	application/enroll_cmd.go:331	Remote server is not ready to accept connections, will retry in a moment.

Keep in mind that I'd like to support 7.12.0 if possible (I expect new environment variables to be ignored).

@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 8, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #279 updated

  • Start Time: 2021-04-07T08:24:17.760+0000

  • Duration: 23 min 45 sec

  • Commit: 0f53750

Test stats 🧪

Test Results
Failed 0
Passed 316
Skipped 1
Total 317

Trends 🧪

Image of Build Times

Image of Tests

@nchaulet
Copy link
Member

nchaulet commented Mar 9, 2021

@mtojek I was able to have it running on 8.0 by adding - "FLEET_SERVER_INSECURE_HTTP=1"

@mtojek
Copy link
Contributor Author

mtojek commented Mar 9, 2021

@nchaulet I added the suggested env and now I'm getting this error from Agent:

2021-03-09T18:37:09.755Z	ERROR	application/fleet_gateway.go:185	failed to dispatch actions, error: fail to generate program configuration: expecting Dict and received *transpiler.Key for '0'
2021-03-09T18:37:09.755Z	WARN	status/reporter.go:233	Elastic Agent status changed to: 'degraded'
2021-03-09T18:37:09.755Z	INFO	status/reporter.go:233	Elastic Agent status changed to: 'online'

Package revision in the agent's structure is "null".

- "FLEET_INSECURE=1"
- "FLEET_SERVER_ENABLE=1"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blakerouse @ruflin @nchaulet I watched the observability demo session and @blakerouse's presentation (good job!) about running the agent with fleet server in a container and I'm confused about available configuration options. Recently I've removed FLEET_URL and KIBANA_HOST from global envs, but I saw @blakerouse used them. Unfortunately with current blockers it's hard to determine the correct set.

Shall I ask you to review both kibana.config.yml and snapshot.yml and recommend the best configuration for 7.13?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mtojek FLEET_URL should not be required, but at the moment there is an issue that requires it. I am working to solve this issue. As for the KIBANA_HOST that is required if you want the container to perform the setup of Kibana working with Fleet, which when running with FLEET_SERVER_ENABLED is required, unless you are running at some hook on the Kibana container instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FLEET_URL should not be required, but at the moment there is an issue that requires it. I am working to solve this issue.

Would you mind linking this issue for tracking purposes?

As for the KIBANA_HOST that is required if you want the container to perform the setup of Kibana working with Fleet, which when running with FLEET_SERVER_ENABLED is required, unless you are running at some hook on the Kibana container instead.

In this setup we've a long running Elastic cluster (Elasticsearch, Kibana, Package Registry) and two agents (so far):

  1. Separate agent's container which is used for system tests (reassigning policies).
  2. Agent's container in the Kubernetes cluster.

which gives us two agent containers, each one with own FleetServer inside. Can Kibana handle it without any problems?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two fleet-server should just work. If not, please let us know.

@@ -17,7 +17,8 @@ xpack.fleet.enabled: true
xpack.fleet.registryUrl: "http://package-registry:8080"
xpack.fleet.agents.enabled: true
xpack.fleet.agents.elasticsearch.host: "http://elasticsearch:9200"
xpack.fleet.agents.kibana.host: "http://kibana:5601"
xpack.fleet.agents.fleetServerEnabled: true
xpack.fleet.agents.kibana.host: "http://localhost:8220"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will change as part of elastic/beats#24713 and elastic/kibana#94364

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for reviewing this. What is the recommendation though? Should I wait until these PRs are merged?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to switch over elastic-package to fleet-server as soon as possible to have early testing. At the same time I do not want to impact the integrations team with potential bugs / changes. The above changes can only land if a dependency from endpoint is also merged at the same time. I think an easy trick on our side will be to just ahve both config options in already so things will keep working. @nchaulet is this assumption correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thats correct we can have both options as soon my PR for fleet server hosts is merged (hopefully soon :) ) elastic/kibana#94364

@mtojek
Copy link
Contributor Author

mtojek commented Mar 25, 2021

Status update:

I had a 1-1 with @ruflin regarding the configuration. I learnt much about the Fleet Server package, which seems to be the missing part for the system test runner to handle. I will adjust its implementation.

I will also use environment variables specified in the: https://github.com/elastic/beats/blob/master/x-pack/elastic-agent/pkg/agent/cmd/container.go#L61

- "FLEET_SETUP=1"
- "FLEET_URL=http://kibana:5601"
- "KIBANA_HOST=http://kibana:5601"
- "FLEET_URL=http://fleet-server:8220"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how in this scenario the elastic-agent will get the right enrollment token. We might still have to read it from Kibana.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer the usage of https here as well. But you will still need the FLEET_INSECURE=1 so it sets ssl.verification_mode: none.

@mtojek
Copy link
Contributor Author

mtojek commented Mar 29, 2021

/test

@mtojek
Copy link
Contributor Author

mtojek commented Mar 29, 2021

Status update:

I increased the interval between healthchecks and it skips the problematic gap when the Fleet Server is not available, but the flakiness still persists.

It seems that you can't assign the custom policy to the agent. There is no error in logs, but the policy revision is null. We're using this call to verify if the policy revision has been assigned.

Zrzut ekranu 2021-03-29 o 14 55 20

EDIT:

I spotted also that the Fleet Server reports errors (status offline) in kibana:

14:43:36.582
elastic_agent
[elastic_agent][error] Could not communicate with Checking API will retry, error: fail to checkin to fleet: Post "http://fleet-server:8220/api/fleet/agents/3853ba23-2f6d-4a5f-a7e3-cd9acebe7c9a/checkin?": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
14:50:15.741
elastic_agent
[elastic_agent][error] Could not communicate with Checking API will retry, error: fail to checkin to fleet: Post "http://fleet-server:8220/api/fleet/agents/3853ba23-2f6d-4a5f-a7e3-cd9acebe7c9a/checkin?": net/http: request canceled (Client.Timeout exceeded while awaiting headers)

@mtojek
Copy link
Contributor Author

mtojek commented Mar 29, 2021

/test

1 similar comment
@mtojek
Copy link
Contributor Author

mtojek commented Mar 29, 2021

/test

@mtojek
Copy link
Contributor Author

mtojek commented Mar 31, 2021

jenkins run the tests please

@mtojek
Copy link
Contributor Author

mtojek commented Mar 31, 2021

jenkins run the tests please

@mtojek
Copy link
Contributor Author

mtojek commented Mar 31, 2021

@mtojek mtojek marked this pull request as ready for review April 6, 2021 14:26
@mtojek
Copy link
Contributor Author

mtojek commented Apr 6, 2021

It looks like the PR is ready for review, no issues spotted and the CI reported the green status. Would you mind looking at it one more time to see if I haven't missed anything?

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick test on 7.13 and all works as expected. I must confess the setup took a bit longer then I expected but this is much more to the setup on the fleet-server / Fleet end then this scripting and we need to make improvements there.

- "FLEET_SERVER_INSECURE_HTTP=1"
- "KIBANA_FLEET_SETUP=1"
- "KIBANA_FLEET_HOST=http://kibana:5601"
- "FLEET_SERVER_HOST=0.0.0.0"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not need this anymore. By default Elastic Agent will start Fleet Server with it bound to 0.0.0.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Contributor Author

@mtojek mtojek Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something doesn't work (connectivity issue), I will try to revert this one. Seems to be correct.

Unfortunately this one is also required, otherwise the fleet server is not reachable anymore. Maybe something hasn't been backported here?

hostname: docker-fleet-server
environment:
- "FLEET_SERVER_ENABLE=1"
- "FLEET_SERVER_INSECURE_HTTP=1"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather see you run it without this flag. Why run it insecurely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For debugging purposes we can sniff network traffic and see requests/responses. It's not a production setup.

image: ${ELASTIC_AGENT_IMAGE_REF}
depends_on:
elasticsearch:
condition: service_healthy
kibana:
condition: service_healthy
healthcheck:
test: "curl -f http://127.0.0.1:8220/api/status | grep HEALTHY 2>&1 >/dev/null"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can add --insecure to the curl command and change to https if you remove the FLEET_SERVER_INSECURE_HTTP below.

- "FLEET_SETUP=1"
- "FLEET_URL=http://kibana:5601"
- "KIBANA_HOST=http://kibana:5601"
- "FLEET_URL=http://fleet-server:8220"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer the usage of https here as well. But you will still need the FLEET_INSECURE=1 so it sets ssl.verification_mode: none.

image: ${ELASTIC_AGENT_IMAGE_REF}
depends_on:
fleet-server:
condition: service_healthy
healthcheck:
test: "sh -c 'grep \"Agent is starting\" -r . --include=elastic-agent-json.log'"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The status command landed, so it would be better to run that instead of this type of check.

test: "./elastic-agent status"

Should be enough, as it returns exit code 0 when the agent is healthy.

Copy link
Contributor Author

@mtojek mtojek Apr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Unfortunately it fails with:

bash-4.2$ ./elastic-agent status
Error: failed to communicate with Elastic Agent daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /usr/share/elastic-agent/data/tmp/elastic-agent-control.sock: connect: no such file or directory"

I opened issue for this: elastic/beats#24956

@ruflin
Copy link
Member

ruflin commented Apr 7, 2021

@mtojek I would not block this PR on all the changes. We have a working version which can be used and we can always iterate on top of it. I think it is more important to start using fleet-server for all the testing instead of having all the perfect params.

@mtojek mtojek merged commit f171846 into elastic:master Apr 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants