-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Fleet Server #279
Enable Fleet Server #279
Conversation
Right now I can't enroll the agent to any policy (using 7.13.0):
Keep in mind that I'd like to support 7.12.0 if possible (I expect new environment variables to be ignored). |
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
Trends 🧪 |
@mtojek I was able to have it running on 8.0 by adding |
@nchaulet I added the suggested env and now I'm getting this error from Agent:
Package revision in the agent's structure is "null". |
- "FLEET_INSECURE=1" | ||
- "FLEET_SERVER_ENABLE=1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@blakerouse @ruflin @nchaulet I watched the observability demo session and @blakerouse's presentation (good job!) about running the agent with fleet server in a container and I'm confused about available configuration options. Recently I've removed FLEET_URL and KIBANA_HOST from global envs, but I saw @blakerouse used them. Unfortunately with current blockers it's hard to determine the correct set.
Shall I ask you to review both kibana.config.yml
and snapshot.yml
and recommend the best configuration for 7.13?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mtojek FLEET_URL
should not be required, but at the moment there is an issue that requires it. I am working to solve this issue. As for the KIBANA_HOST
that is required if you want the container to perform the setup of Kibana working with Fleet, which when running with FLEET_SERVER_ENABLED
is required, unless you are running at some hook on the Kibana container instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FLEET_URL should not be required, but at the moment there is an issue that requires it. I am working to solve this issue.
Would you mind linking this issue for tracking purposes?
As for the KIBANA_HOST that is required if you want the container to perform the setup of Kibana working with Fleet, which when running with FLEET_SERVER_ENABLED is required, unless you are running at some hook on the Kibana container instead.
In this setup we've a long running Elastic cluster (Elasticsearch, Kibana, Package Registry) and two agents (so far):
- Separate agent's container which is used for system tests (reassigning policies).
- Agent's container in the Kubernetes cluster.
which gives us two agent containers, each one with own FleetServer inside. Can Kibana handle it without any problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two fleet-server should just work. If not, please let us know.
@@ -17,7 +17,8 @@ xpack.fleet.enabled: true | |||
xpack.fleet.registryUrl: "http://package-registry:8080" | |||
xpack.fleet.agents.enabled: true | |||
xpack.fleet.agents.elasticsearch.host: "http://elasticsearch:9200" | |||
xpack.fleet.agents.kibana.host: "http://kibana:5601" | |||
xpack.fleet.agents.fleetServerEnabled: true | |||
xpack.fleet.agents.kibana.host: "http://localhost:8220" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will change as part of elastic/beats#24713 and elastic/kibana#94364
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for reviewing this. What is the recommendation though? Should I wait until these PRs are merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to switch over elastic-package to fleet-server as soon as possible to have early testing. At the same time I do not want to impact the integrations team with potential bugs / changes. The above changes can only land if a dependency from endpoint is also merged at the same time. I think an easy trick on our side will be to just ahve both config options in already so things will keep working. @nchaulet is this assumption correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes thats correct we can have both options as soon my PR for fleet server hosts is merged (hopefully soon :) ) elastic/kibana#94364
Status update: I had a 1-1 with @ruflin regarding the configuration. I learnt much about the Fleet Server package, which seems to be the missing part for the system test runner to handle. I will adjust its implementation. I will also use environment variables specified in the: https://github.com/elastic/beats/blob/master/x-pack/elastic-agent/pkg/agent/cmd/container.go#L61 |
- "FLEET_SETUP=1" | ||
- "FLEET_URL=http://kibana:5601" | ||
- "KIBANA_HOST=http://kibana:5601" | ||
- "FLEET_URL=http://fleet-server:8220" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder how in this scenario the elastic-agent will get the right enrollment token. We might still have to read it from Kibana.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer the usage of https
here as well. But you will still need the FLEET_INSECURE=1
so it sets ssl.verification_mode: none
.
/test |
Status update: I increased the interval between healthchecks and it skips the problematic gap when the Fleet Server is not available, but the flakiness still persists. It seems that you can't assign the custom policy to the agent. There is no error in logs, but the policy revision is null. We're using this call to verify if the policy revision has been assigned. EDIT: I spotted also that the Fleet Server reports errors (status offline) in kibana:
|
/test |
1 similar comment
/test |
jenkins run the tests please |
jenkins run the tests please |
Passed: https://beats-ci.elastic.co/blue/organizations/jenkins/Ingest-manager%2Felastic-package/detail/PR-279/32/pipeline I suspect that there is some flakiness around, maybe related to healthchecks. |
It looks like the PR is ready for review, no issues spotted and the CI reported the green status. Would you mind looking at it one more time to see if I haven't missed anything? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a quick test on 7.13 and all works as expected. I must confess the setup took a bit longer then I expected but this is much more to the setup on the fleet-server / Fleet end then this scripting and we need to make improvements there.
- "FLEET_SERVER_INSECURE_HTTP=1" | ||
- "KIBANA_FLEET_SETUP=1" | ||
- "KIBANA_FLEET_HOST=http://kibana:5601" | ||
- "FLEET_SERVER_HOST=0.0.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not need this anymore. By default Elastic Agent will start Fleet Server with it bound to 0.0.0.0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something doesn't work (connectivity issue), I will try to revert this one. Seems to be correct.
Unfortunately this one is also required, otherwise the fleet server is not reachable anymore. Maybe something hasn't been backported here?
hostname: docker-fleet-server | ||
environment: | ||
- "FLEET_SERVER_ENABLE=1" | ||
- "FLEET_SERVER_INSECURE_HTTP=1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather see you run it without this flag. Why run it insecurely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For debugging purposes we can sniff network traffic and see requests/responses. It's not a production setup.
image: ${ELASTIC_AGENT_IMAGE_REF} | ||
depends_on: | ||
elasticsearch: | ||
condition: service_healthy | ||
kibana: | ||
condition: service_healthy | ||
healthcheck: | ||
test: "curl -f http://127.0.0.1:8220/api/status | grep HEALTHY 2>&1 >/dev/null" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can add --insecure
to the curl
command and change to https
if you remove the FLEET_SERVER_INSECURE_HTTP
below.
- "FLEET_SETUP=1" | ||
- "FLEET_URL=http://kibana:5601" | ||
- "KIBANA_HOST=http://kibana:5601" | ||
- "FLEET_URL=http://fleet-server:8220" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer the usage of https
here as well. But you will still need the FLEET_INSECURE=1
so it sets ssl.verification_mode: none
.
image: ${ELASTIC_AGENT_IMAGE_REF} | ||
depends_on: | ||
fleet-server: | ||
condition: service_healthy | ||
healthcheck: | ||
test: "sh -c 'grep \"Agent is starting\" -r . --include=elastic-agent-json.log'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The status
command landed, so it would be better to run that instead of this type of check.
test: "./elastic-agent status"
Should be enough, as it returns exit code 0 when the agent is healthy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Unfortunately it fails with:
bash-4.2$ ./elastic-agent status
Error: failed to communicate with Elastic Agent daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /usr/share/elastic-agent/data/tmp/elastic-agent-control.sock: connect: no such file or directory"
I opened issue for this: elastic/beats#24956
@mtojek I would not block this PR on all the changes. We have a working version which can be used and we can always iterate on top of it. I think it is more important to start using fleet-server for all the testing instead of having all the perfect params. |
This PR adjusts the Elastic stack created by the elastic-package to supports Fleet Server.
Issue: #278