Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent v2 #458

Merged
merged 7 commits into from
Nov 9, 2022
Merged

Agent v2 #458

merged 7 commits into from
Nov 9, 2022

Conversation

olegsu
Copy link
Contributor

@olegsu olegsu commented Oct 20, 2022

What does this PR do?
Based on @eyalkraft work. This will align with the new agent v2 architecture.

State: OK 👍


Changes

Elastic-Agent V2 has a few changes which impact us. I will try to list here all the changes that I found during this work.
The most official document can be found here

Filesystem

The file system structure had changed:

  • logs directory moved to /usr/share/elastic-agent/data/elastic-agent-{AGENT_SHA}/logs/
  • download directory are no longs exists
  • install directory is no longer exists
  • All the binaries are in /usr/share/elastic-agent/data/elastic-agent-{AGENT_SHA}/components/
API

Cloudbeat will no longer get a list of inputs that contain an array of streams but every time a single stream from streams

Elastic-Agent Behavior

When compiling the Elastic-Agent locally cloudbeat will not be there. To copy it we need to build them both (https://github.com/elastic/security-team/blob/main/docs/cloud-security-posture-team/Onboarding/deploy-agent-cloudbeat-on-eks.mdx).
After the image is deployed and the assets are copied, the process will still wont start (even in the integration is connected). This is due to component registration on startup in the Agent (#458 (comment)). A quick agent restart will work around this.
Reference - #458 (comment)

Run locally
  1. Checkout Elastic-Agent and Beats repositories to feature-arch-v2 branch
  2. Build agent image (from agent dir) DEV=true SNAPSHOT=true PLATFORMS=linux/arm64 PACKAGES=docker mage package
  3. Build cloudbeat binary (from cloudbeat dir) DEV=true PLATFORMS=linux/arm64 SNAPSHOT=true mage -v package
  4. Setup environment
    4.1 From Kibana UI - add new agent.
    4.2 Download the agent manifests for Kubernetes
    4.3 Add "Kubernetes Security Posture Management" integration to the policy
  5. Setup local kind cluster and load the agent image to that
    5.1 Update the manifests with the new agent image
  6. Deploy
  7. Run ./scripts/remote_replace_cloudbeat.sh to copy and restart the agent

@mergify mergify bot assigned olegsu Oct 20, 2022
@mergify
Copy link

mergify bot commented Oct 20, 2022

This pull request does not have a backport label. Could you fix it @olegsu? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit
    NOTE: backport-skip has been added to this pull request.

@olegsu olegsu force-pushed the agent-v2 branch 2 times, most recently from 3120de9 to 49e8a62 Compare October 20, 2022 14:24
@github-actions
Copy link

@fearful-symmetry
Copy link

So, it looks like you're missing the config transformation callback, like what we have here: https://github.com/elastic/beats/blob/feature-arch-v2/x-pack/metricbeat/cmd/agent.go

Trying to figure out the cloudbeat setup, will see if I can push to this PR.

@fearful-symmetry
Copy link

So, a few things:

As mentioned earlier, you'll need a custom config transform, like this, which should go in cmd/root.go :

func cloudbeatCfg(rawIn *proto.UnitExpectedConfig, agentInfo *client.AgentInfo) ([]*reload.ConfigWithMeta, error) {
	modules, err := management.CreateInputsFromStreams(rawIn, "logs", agentInfo)
	if err != nil {
		return nil, fmt.Errorf("error creating input list from raw expected config: %w", err)
	}

	// format for the reloadable list needed bythe cm.Reload() method
	configList, err := management.CreateReloadConfigFromInputs(modules)
	if err != nil {
		return nil, fmt.Errorf("error creating reloader config: %w", err)
	}

	return configList, nil
}

func init() {
	management.ConfigTransform.SetTransform(cloudbeatCfg)
}

Theoretically, that should be the only missing piece to get this running.
I would push to this PR, but I don't normally use k8s or anything so I'm struggling to test this.

since it seems that the filesystem structure changed, I am not sure where to kubectl cp... the new binary)

Under the root elastic-agent directory, it's data/elastic-agent-[HASH]/components

I imagine the issue might be fixed by properly installing the binary. If we're still running into issues, can you post the logs from elastic-agent during the run?

@olegsu
Copy link
Contributor Author

olegsu commented Oct 24, 2022

Thank you @fearful-symmetry,

I have tried to add the configTransform part you mentioned with no success.

The Elastic Agent logs say that `input is not supported when I am adding KSPM integration.

{"log.level":"info","@timestamp":"2022-10-24T12:18:35.534Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":312},"message":"New component created","component":{"id":"cloudbeat/cis_k8s-default","state":"Failed","message":"input not supported","inputs":[{"id":"cloudbeat/cis_k8s-default-a9c55992-8d6a-4fba-a8d6-fce9530739e6","state":"Failed","message":"input not supported"}],"output":{"id":"cloudbeat/cis_k8s-default","state":"Failed","message":"input not supported"}},"ecs.version":"1.6.0"}

The Elastic-Agent complied on my machine and the output of elastic-agent version (this is running in local kind cluster):

root@kind-control-plane:/usr/share/elastic-agent# elastic-agent version
Binary: 8.6.0-SNAPSHOT (build: ec83c2c7aa6bad91c850e2436ddffed4b7f21420 at 2022-10-24 11:53:07 +0000 UTC)
Daemon: <failed to communicate>
could not get version. failed to communicate with running daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /usr/share/elastic-agent/data/tmp/elastic-agent-control.sock: connect: no such file or directory"
Use --binary-only flag to skip trying to retrieve version from running daemon

@olegsu olegsu force-pushed the agent-v2 branch 2 times, most recently from 31e07d6 to 1301328 Compare October 24, 2022 12:34
@fearful-symmetry
Copy link

"state":"Failed","message":"input not supported"

That's odd, and I think it's an issue with how the policy is getting handled by elastic-agent.

Looking at the KSPM integration, it returns an input value of cloudbeat/cis_k8s

The specfile for cloudbeat is here: https://github.com/elastic/elastic-agent/blob/feature-arch-v2/specs/cloudbeat.spec.yml

And I don't see any aliases listed for the inputs, which most other beats seem to have. @michalpristas is there a reason why cloudbeat doesn't have any alternate inputs listed in the specfile I linked to?

@fearful-symmetry
Copy link

could not get version. failed to communicate with running daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /usr/share/elastic-agent/data/tmp/elastic-agent-control.sock: connect: no such file or directory"
Use --binary-only flag to skip trying to retrieve version from running daemon

I assume the issue here is that elastic-agent isn't actually running? That or there's some container-specific issue with how we're reaching out to the RPC socket.

@fearful-symmetry
Copy link

Alright, because everyone dealing with this is on opposite sides of the planet, I've gone ahead and put in a PR to fix what I think is the issue: elastic/elastic-agent#1596

Don't have any cloudbeat experience, so input from other folks would probably be appreciated with that PR.

@cmacknz
Copy link
Member

cmacknz commented Oct 24, 2022

Thanks, yes if the agent is complaining it doesn't recognize the input type it's because none of the spec files known to the agent declare support for that input type. In this case the cloudbeat spec file needs to be updated in the agent's v2 branch.

@olegsu
Copy link
Contributor Author

olegsu commented Oct 25, 2022

I assume the issue here is that elastic-agent isn't actually running? That or there's some container-specific issue with how we're reaching out to the RPC socket.

Elastic Agent is running in kind cluster.

root@kind-control-plane:/usr/share/elastic-agent# elastic-agent status
State: DEGRADED
Message: 1 or more components/units in a failed state
Components:
  *   (FAILED)
      input not supported
  *   (FAILED)
      input not supported
  *   (FAILED)
      input not supported
  *   (FAILED)
      input not supported
  *   (FAILED)
      input not supported
  *   (FAILED)
      input not supported
  *   (FAILED)
      input not supported

Alright, because everyone dealing with this is on opposite sides of the planet, I've gone ahead and put in a PR to fix what I think is the issue: elastic/elastic-agent#1596

Don't have any cloudbeat experience, so input from other folks would probably be appreciated with that PR.

Thank you, I will try to recompile and run again, in both Kind and EKS.

@olegsu
Copy link
Contributor Author

olegsu commented Oct 25, 2022

I have pulled the last changes compiled it again and tried in both EKS cluster and Kind cluster.
Both tests are showing the same output for elastic-agent status

root@ip-172-31-8-150:/usr/share/elastic-agent# elastic-agent status
State: DEGRADED
Message: 1 or more components/units in a failed state
Components:
  *   (FAILED)
      input not supported
  *   (FAILED)
      input not supported
  *   (FAILED)
      input not supported
  *   (FAILED)
      input not supported
  *   (FAILED)
      input not supported
  *   (FAILED)
      input not supported
  *   (FAILED)
      input not supported
root@ip-172-31-8-150:/usr/share/elastic-agent# elastic-agent version
Binary: 8.6.0-SNAPSHOT (build: 96e071e16f49194ab1c6a01a7e88707986afbad2 at 2022-10-25 09:39:46 +0000 UTC)
Daemon: <failed to communicate>
could not get version. failed to communicate with running daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /usr/share/elastic-agent/data/tmp/elastic-agent-control.sock: connect: no such file or directory"
Use --binary-only flag to skip trying to retrieve version from running daemon

And the logs:

{"log.level":"info","@timestamp":"2022-10-25T10:46:41.672Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":322},"message":"New component created","component":{"id":"cloudbeat/cis_eks-default │
","state":"Failed","message":"input not supported","inputs":[{"id":"cloudbeat/cis_eks-default-fe3be4a3-32af-40d1-ba65-b1a1e649c7a1","state":"Failed","message":"input not supported"}],"output":{"id":"cloudbeat/c │
│ is_eks-default","state":"Failed","message":"input not supported"}},"ecs.version":"1.6.0"}                                                                                                                          │
│ {"log.level":"info","@timestamp":"2022-10-25T10:46:41.975Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":322},"message":"New component created","component":{"id":"logfile-default","state": │
│ "Failed","message":"input not supported","inputs":[{"id":"logfile-default-logfile-system-449c9421-29fe-4a93-8f29-2497d7e7aef5","state":"Failed","message":"input not supported"}],"output":{"id":"logfile-default" │
│ ,"state":"Failed","message":"input not supported"}},"ecs.version":"1.6.0"}                                                                                                                                         │
│ {"log.level":"info","@timestamp":"2022-10-25T10:46:42.168Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":322},"message":"New component created","component":{"id":"winlog-default","state":"
│ Failed","message":"input not supported","inputs":[{"id":"winlog-default-winlog-system-449c9421-29fe-4a93-8f29-2497d7e7aef5","state":"Failed","message":"input not supported"}],"output":{"id":"winlog-default","st │
│ ate":"Failed","message":"input not supported"}},"ecs.version":"1.6.0"}

Might be that I am compiling it in a wrong way (I follow the README) ?

@oren-zohar
Copy link
Collaborator

it seems like we prefixing default to our data stream here:

return indexPrefix + "-" + namespace

so our output is not exactly cloudbeat/cis_k8s but cloudbeat/cis_k8s-default, though I'm not sure if it's related, since the fix was about the configuration (which is the input). Do we also need to configure the output explicitly? @cmacknz @fearful-symmetry

@olegsu I think in any case, it's worth testing cloudbeat without the default prefix

@fearful-symmetry
Copy link

Yah, this isn't an issue with cloudbeat I don't think, I'm 90% sure this is something happening between the integration and elastic-agent.

This is really interesting:

{"log.level":"info","@timestamp":"2022-10-25T10:46:42.168Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":322},"message":"New component created","component":{"id":"winlog-default","state":" │
│ Failed","message":"input not supported","inputs":[{"id":"winlog-default-winlog-system-449c9421-29fe-4a93-8f29-2497d7e7aef5","state":"Failed","message":"input not supported"}],"output":{"id":"winlog-default","st │
│ ate":"Failed","message":"input not supported"}},"ecs.version":"1.6.0"}

That seems to indicate that the failure is in logfile-default, which has been around for a while and has probably been touched multiple times. Either that error is a misnomer, or there's a deeper error with the config.

@fearful-symmetry
Copy link

@olegsu precisely what platform is this running on? What OS/arch/cloud environment/etc

@fearful-symmetry
Copy link

Also @olegsu can I see the full integration config? In the fleet UI, if you go to the page for the agent config, there should be a button called Actions > View Config.

@olegsu
Copy link
Contributor Author

olegsu commented Oct 26, 2022

@olegsu precisely what platform is this running on? What OS/arch/cloud environment/etc

I tried to run it two ways, in both the elastic stack deployed in the cloud o 8.6 snapshot version

  1. MacOS, M1, local environment
  2. Linux, AMD, EKS cluster(two nodes of m5.large)

Also @olegsu can I see the full integration config? In the fleet UI, if you go to the page for the agent config, there should be a button called Actions > View Config.

The configuration
id: ccd0ab20-5070-11ed-8602-11c37f297830
revision: 16
outputs:
  default:
    type: elasticsearch
    hosts:
      - REDACTED
output_permissions:
  default:
    _elastic_agent_monitoring:
      indices:
        - names:
            - logs-elastic_agent.apm_server-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.apm_server-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.auditbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.auditbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.cloudbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.cloudbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.elastic_agent-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.endpoint_security-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.endpoint_security-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.filebeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.filebeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.fleet_server-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.fleet_server-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.heartbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.heartbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.metricbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.metricbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.osquerybeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.osquerybeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-elastic_agent.packetbeat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.packetbeat-default
          privileges:
            - auto_configure
            - create_doc
    _elastic_agent_checks:
      cluster:
        - monitor
    e2150a54-25fd-49f6-8d95-0079bacd934d:
      indices:
        - names:
            - logs-system.auth-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-system.syslog-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-system.application-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-system.security-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-system.system-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-system.cpu-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-system.diskio-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-system.filesystem-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-system.fsstat-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-system.load-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-system.memory-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-system.network-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-system.process-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-system.process.summary-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-system.socket_summary-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-system.uptime-default
          privileges:
            - auto_configure
            - create_doc
    a72d4c42-1bb0-4b94-9985-6ed6d5b106a4:
      indices:
        - names:
            - logs-cloud_security_posture.findings-default
          privileges:
            - auto_configure
            - create_doc
agent:
  download:
    source_uri: 'https://artifacts.elastic.co/downloads/'
  monitoring:
    enabled: true
    use_output: default
    namespace: default
    logs: true
    metrics: true
inputs:
  - id: logfile-system-e2150a54-25fd-49f6-8d95-0079bacd934d
    name: system-1
    revision: 1
    type: logfile
    use_output: default
    meta:
      package:
        name: system
        version: 1.20.4
    data_stream:
      namespace: default
    package_policy_id: e2150a54-25fd-49f6-8d95-0079bacd934d
    streams:
      - id: logfile-system.auth-e2150a54-25fd-49f6-8d95-0079bacd934d
        data_stream:
          dataset: system.auth
          type: logs
        ignore_older: 72h
        paths:
          - /var/log/auth.log*
          - /var/log/secure*
        exclude_files:
          - .gz$
        multiline:
          pattern: ^\s
          match: after
        tags:
          - system-auth
        processors:
          - add_locale: null
      - id: logfile-system.syslog-e2150a54-25fd-49f6-8d95-0079bacd934d
        data_stream:
          dataset: system.syslog
          type: logs
        paths:
          - /var/log/messages*
          - /var/log/syslog*
        exclude_files:
          - .gz$
        multiline:
          pattern: ^\s
          match: after
        processors:
          - add_locale: null
        ignore_older: 72h
  - id: winlog-system-e2150a54-25fd-49f6-8d95-0079bacd934d
    name: system-1
    revision: 1
    type: winlog
    use_output: default
    meta:
      package:
        name: system
        version: 1.20.4
    data_stream:
      namespace: default
    package_policy_id: e2150a54-25fd-49f6-8d95-0079bacd934d
    streams:
      - id: winlog-system.application-e2150a54-25fd-49f6-8d95-0079bacd934d
        name: Application
        data_stream:
          dataset: system.application
          type: logs
        condition: '${host.platform} == ''windows'''
        ignore_older: 72h
      - id: winlog-system.security-e2150a54-25fd-49f6-8d95-0079bacd934d
        name: Security
        data_stream:
          dataset: system.security
          type: logs
        condition: '${host.platform} == ''windows'''
        ignore_older: 72h
      - id: winlog-system.system-e2150a54-25fd-49f6-8d95-0079bacd934d
        name: System
        data_stream:
          dataset: system.system
          type: logs
        condition: '${host.platform} == ''windows'''
        ignore_older: 72h
  - id: system/metrics-system-e2150a54-25fd-49f6-8d95-0079bacd934d
    name: system-1
    revision: 1
    type: system/metrics
    use_output: default
    meta:
      package:
        name: system
        version: 1.20.4
    data_stream:
      namespace: default
    package_policy_id: e2150a54-25fd-49f6-8d95-0079bacd934d
    streams:
      - id: system/metrics-system.cpu-e2150a54-25fd-49f6-8d95-0079bacd934d
        data_stream:
          dataset: system.cpu
          type: metrics
        metricsets:
          - cpu
        cpu.metrics:
          - percentages
          - normalized_percentages
        period: 10s
      - id: system/metrics-system.diskio-e2150a54-25fd-49f6-8d95-0079bacd934d
        data_stream:
          dataset: system.diskio
          type: metrics
        metricsets:
          - diskio
        diskio.include_devices: null
        period: 10s
      - id: system/metrics-system.filesystem-e2150a54-25fd-49f6-8d95-0079bacd934d
        data_stream:
          dataset: system.filesystem
          type: metrics
        metricsets:
          - filesystem
        period: 1m
        processors:
          - drop_event.when.regexp:
              system.filesystem.mount_point: ^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/)
      - id: system/metrics-system.fsstat-e2150a54-25fd-49f6-8d95-0079bacd934d
        data_stream:
          dataset: system.fsstat
          type: metrics
        metricsets:
          - fsstat
        period: 1m
        processors:
          - drop_event.when.regexp:
              system.fsstat.mount_point: ^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/)
      - id: system/metrics-system.load-e2150a54-25fd-49f6-8d95-0079bacd934d
        data_stream:
          dataset: system.load
          type: metrics
        metricsets:
          - load
        condition: '${host.platform} != ''windows'''
        period: 10s
      - id: system/metrics-system.memory-e2150a54-25fd-49f6-8d95-0079bacd934d
        data_stream:
          dataset: system.memory
          type: metrics
        metricsets:
          - memory
        period: 10s
      - id: system/metrics-system.network-e2150a54-25fd-49f6-8d95-0079bacd934d
        data_stream:
          dataset: system.network
          type: metrics
        metricsets:
          - network
        period: 10s
        network.interfaces: null
      - id: system/metrics-system.process-e2150a54-25fd-49f6-8d95-0079bacd934d
        data_stream:
          dataset: system.process
          type: metrics
        metricsets:
          - process
        period: 10s
        process.include_top_n.by_cpu: 5
        process.include_top_n.by_memory: 5
        process.cmdline.cache.enabled: true
        process.cgroups.enabled: false
        process.include_cpu_ticks: false
        processes:
          - .*
      - id: >-
          system/metrics-system.process.summary-e2150a54-25fd-49f6-8d95-0079bacd934d
        data_stream:
          dataset: system.process.summary
          type: metrics
        metricsets:
          - process_summary
        period: 10s
      - id: >-
          system/metrics-system.socket_summary-e2150a54-25fd-49f6-8d95-0079bacd934d
        data_stream:
          dataset: system.socket_summary
          type: metrics
        metricsets:
          - socket_summary
        period: 10s
      - id: system/metrics-system.uptime-e2150a54-25fd-49f6-8d95-0079bacd934d
        data_stream:
          dataset: system.uptime
          type: metrics
        metricsets:
          - uptime
        period: 10s
  - id: a72d4c42-1bb0-4b94-9985-6ed6d5b106a4
    name: cloud_security_posture-3
    revision: 2
    type: cloudbeat/cis_k8s
    use_output: default
    meta:
      package:
        name: cloud_security_posture
        version: 1.0.3
    data_stream:
      namespace: default
    package_policy_id: a72d4c42-1bb0-4b94-9985-6ed6d5b106a4
    streams:
      - id: >-
          cloudbeat/cis_k8s-cloud_security_posture.findings-a72d4c42-1bb0-4b94-9985-6ed6d5b106a4
        name: Findings
        data_stream:
          dataset: cloud_security_posture.findings
          type: logs
        processors:
          - add_cluster_id: null
        fetchers:
          - name: kube-api
          - name: process
            processes:
              kube-apiserver: null
              kubelet:
                config-file-arguments:
                  - config
              kube-scheduler: null
              etcd: null
              kube-controller: null
            directory: /hostfs
          - name: file-system
            patterns:
              - /hostfs/etc/kubernetes/scheduler.conf
              - /hostfs/etc/kubernetes/controller-manager.conf
              - /hostfs/etc/kubernetes/admin.conf
              - /hostfs/etc/kubernetes/kubelet.conf
              - /hostfs/etc/kubernetes/manifests/etcd.yaml
              - /hostfs/etc/kubernetes/manifests/kube-apiserver.yaml
              - /hostfs/etc/kubernetes/manifests/kube-controller-manager.yaml
              - /hostfs/etc/kubernetes/manifests/kube-scheduler.yaml
              - /hostfs/etc/systemd/system/kubelet.service.d/10-kubeadm.conf
              - /hostfs/etc/kubernetes/pki/*
              - /hostfs/var/lib/kubelet/config.yaml
              - /hostfs/var/lib/etcd
              - /hostfs/etc/kubernetes/pki
        runtime_cfg:
          activated_rules:
            cis_k8s:
              - cis_1_2_18
              - cis_1_2_20
              - cis_1_2_19
              - cis_1_2_16
              - cis_1_2_32
              - cis_1_1_15
              - cis_4_2_6
              - cis_4_1_10
              - cis_1_1_12
              - cis_1_1_3
              - cis_5_2_8
              - cis_5_2_5
              - cis_1_2_15
              - cis_1_2_29
              - cis_1_1_14
              - cis_5_1_5
              - cis_4_2_2
              - cis_1_1_2
              - cis_1_2_24
              - cis_4_1_5
              - cis_1_1_11
              - cis_1_2_14
              - cis_5_2_4
              - cis_1_1_20
              - cis_2_3
              - cis_4_2_12
              - cis_1_2_25
              - cis_1_4_2
              - cis_4_2_4
              - cis_1_2_27
              - cis_1_1_18
              - cis_4_2_8
              - cis_2_6
              - cis_1_2_5
              - cis_1_1_21
              - cis_1_4_1
              - cis_4_1_9
              - cis_1_1_1
              - cis_1_1_5
              - cis_2_1
              - cis_1_2_2
              - cis_1_1_16
              - cis_5_1_6
              - cis_2_4
              - cis_4_1_6
              - cis_5_2_9
              - cis_1_1_17
              - cis_1_1_19
              - cis_1_2_7
              - cis_1_1_6
              - cis_1_3_5
              - cis_4_1_2
              - cis_5_2_3
              - cis_1_2_21
              - cis_1_2_4
              - cis_5_1_3
              - cis_4_2_9
              - cis_4_1_1
              - cis_1_2_12
              - cis_1_2_6
              - cis_1_2_13
              - cis_1_2_26
              - cis_4_2_1
              - cis_4_2_3
              - cis_2_2
              - cis_1_3_6
              - cis_2_5
              - cis_5_2_7
              - cis_5_2_6
              - cis_1_2_23
              - cis_1_1_8
              - cis_5_2_2
              - cis_5_2_10
              - cis_1_2_10
              - cis_1_2_8
              - cis_1_2_11
              - cis_4_2_11
              - cis_4_2_5
              - cis_4_2_10
              - cis_4_2_7
              - cis_1_2_17
              - cis_1_3_4
              - cis_1_2_28
              - cis_1_2_22
              - cis_1_2_9
              - cis_1_3_2
              - cis_1_3_7
              - cis_1_3_3
              - cis_1_1_4
              - cis_1_1_7
              - cis_4_2_13
              - cis_1_1_13
fleet:
  hosts:
    - REDACTED

@fearful-symmetry
Copy link

fearful-symmetry commented Oct 26, 2022

So, I did a little bit of testing, and there's a few ways to reproduce the Input not supported error:

  • remove the specfile completely
  • remove the input name from the specfile
  • remove the specfile and the associated binary
  • remove the entire components/ directory

Looking at the original instructions for reproducing this at the top of the PR, I have a strong suspicion that the specfile isn't actually there.

However, based on the error messages posted above, I'm guessing we're somehow missing even more specfiles:

  • "input not supported","inputs":[{"id":"logfile-default-logfile-system-449c9421-29fe-4a93-8f29-2497d7e7aef5"
  • "input not supported","inputs":[{"id":"winlog-default-winlog-system-449c9421-29fe-4a93-8f29-2497d7e7aef5"

This is...kind of baffling. I'm guessing there's something very wrong with the whole install. @olegsu can I see the contents of the components directory you're trying to run from? It's at data/elastic-agent-[HASH]/components in the working directory of elastic-agent. Also, can I see the full CLI args that elastic-agent is running with?

If you see a components directory with a bunch of BEAT and BEAT.spec.yml files, you might be able to get this running by copying both the cloudbeat binary and the cloudbeat.spec.yml file that lives inside the specs/ directory of the elastic-agent repo into the components/ directory. If you can't find a components directory, then something deeper is wrong.

@fearful-symmetry
Copy link

Alright, the more I look at the build code, I'm not even sure how "non-packaged" components like cloudbeat are supposed to get their specfile, since they're not getting packaged with the binary as far as I can tell. In the real world, would elastic-agent download them along with the binary?

@michalpristas / @blakerouse might see something obvious here, but I assume the issue is that something is wrong with the components directory, or elastic-agent's data paths.

@fearful-symmetry
Copy link

Alright, I can reproduce this with the docker.elastic.co/beats/elastic-agent-complete image that I built. It's not seeing the spec files, even though they're getting built into the container image:

{"log.level":"info","@timestamp":"2022-10-26T23:00:38.755Z","log.origin":{"file.name":"coordinator/coordinator.go","file.line":322},"message":"New component created","component":{"id":"system/metrics-default","state":"Failed","message":"input not supported","inputs":[{"id":"system/metrics-default-system/metrics","state":"Failed","message":"input not supported"}],"output":{"id":"system/metrics-default","state":"Failed","message":"input not supported"}},"ecs.version":"1.6.0"}
elastic-agent@5fb75d1ff1cd:~$ ls data/elastic-agent-fc3eba/components/
LICENSE.txt  README.md  checksum.yml  filebeat                filebeat.spec.yml  heartbeat                heartbeat.spec.yml  kibana      metricbeat.reference.yml  metricbeat.yml  modules.d   osquery-extension.ext  osquerybeat.reference.yml  osquerybeat.yml
NOTICE.txt   certs      fields.yml    filebeat.reference.yml  filebeat.yml       heartbeat.reference.yml  heartbeat.yml       metricbeat  metricbeat.spec.yml       module          monitors.d  osquerybeat            osquerybeat.spec.yml       osqueryd

Perhaps there's some issue with how we're setting the data directory under ./elastic-agent container, or something more subtle.

@olegsu
Copy link
Contributor Author

olegsu commented Oct 27, 2022

Alright, the more I look at the build code, I'm not even sure how "non-packaged" components like cloudbeat are supposed to get their specfile, since they're not getting packaged with the binary as far as I can tell. In the real world, would elastic-agent download them along with the binary?

When running the image that is shipped with the release it seems that the cloudbeat is embedded and not downloaded. @oren-zohar am I, right?

Comparing two filesystems content:
docker.elastic.co/beats/elastic-agent:8.5.0 vs feature-arch-v2

8.5

This is the filesystem content of Elastic-Agent taking from image docker.elastic.co/beats/elastic-agent:8.5.0

elastic-agent@f018d455f7f8:~/data/elastic-agent-0e4f48/install$ ls -la
total 40
drwxr-xr-x 10 elastic-agent elastic-agent 4096 Oct 27 04:51 .
drwxrwx---  1 root          root          4096 Oct 27 04:51 ..
drwxr-xr-x  2 elastic-agent elastic-agent 4096 Oct 26 06:22 apm-server-8.5.0-SNAPSHOT-linux-arm64
drwxr-xr-x  2 elastic-agent elastic-agent 4096 Oct 27 04:51 cloudbeat-8.5.0-SNAPSHOT-linux-arm64
drwxr-xr-x  2 elastic-agent elastic-agent 4096 Oct 27 04:51 endpoint-security-8.5.0-SNAPSHOT-linux-arm64
drwxr-xr-x  5 elastic-agent elastic-agent 4096 Oct 27 04:51 filebeat-8.5.0-SNAPSHOT-linux-arm64
drwxr-xr-x  2 elastic-agent elastic-agent 4096 Oct 26 05:53 fleet-server-8.5.0-SNAPSHOT-linux-arm64
drwxr-xr-x  4 elastic-agent elastic-agent 4096 Oct 27 04:51 heartbeat-8.5.0-SNAPSHOT-linux-arm64
drwxr-xr-x  5 elastic-agent elastic-agent 4096 Oct 27 04:51 metricbeat-8.5.0-SNAPSHOT-linux-arm64
drwxr-xr-x  3 elastic-agent elastic-agent 4096 Oct 27 04:51 osquerybeat-8.5.0-SNAPSHOT-linux-arm64
feature-arch-v2
root@kind-control-plane:/usr/share/elastic-agent/data/elastic-agent-96e071/components# ls -la
total 932856
drwxrwx---  1 root root      4096 Oct 25 06:29 .
drwxrwx---  1 root root      4096 Oct 25 06:31 ..
-rw-rw----  1 root root        41 Oct 25 06:28 .build_hash.txt
-rw-rw----  1 root root     13675 Oct 25 06:28 LICENSE.txt
-rw-rw----  1 root root   2566303 Oct 25 06:28 NOTICE.txt
-rw-rw----  1 root root       840 Oct 25 06:28 README.md
drwxrwx---  2 root root      4096 Oct 25 06:28 certs
-rw-r--r--  1 root root      1303 Oct 25 06:28 checksum.yml
-rw-r--r--  1 root root    389399 Oct 25 06:28 fields.yml
-rwxr-xr-x  1 root root 179136435 Oct 25 06:28 filebeat
-rw-r--r--  1 root root    174363 Oct 25 06:28 filebeat.reference.yml
-rw-r--r--  1 root root      3743 Oct 25 06:28 filebeat.spec.yml
-rw-r--r--  1 root root      8622 Oct 25 06:28 filebeat.yml
-rwxr-xr-x  1 root root 161979497 Oct 25 06:28 heartbeat
-rw-r--r--  1 root root     67937 Oct 25 06:28 heartbeat.reference.yml
-rw-r--r--  1 root root      1057 Oct 25 06:28 heartbeat.spec.yml
-rw-r--r--  1 root root      7276 Oct 25 06:28 heartbeat.yml
drwxrwx---  4 root root      4096 Oct 25 06:29 kibana
-rwxr-xr-x  1 root root 239604625 Oct 25 06:28 metricbeat
-rw-r--r--  1 root root    103498 Oct 25 06:28 metricbeat.reference.yml
-rw-r--r--  1 root root      3998 Oct 25 06:28 metricbeat.spec.yml
-rw-r--r--  1 root root      6899 Oct 25 06:28 metricbeat.yml
drwxrwx--- 84 root root      4096 Oct 25 06:29 module
drwxrwx---  2 root root      4096 Oct 25 06:29 modules.d
drwxrwx---  2 root root      4096 Oct 25 06:29 monitors.d
-rw-rw----  1 root root   5526616 Oct 25 06:28 osquery-extension.ext
-rwxr-xr-x  1 root root 147685279 Oct 25 06:28 osquerybeat
-rw-r--r--  1 root root     43600 Oct 25 06:28 osquerybeat.reference.yml
-rw-r--r--  1 root root       584 Oct 25 06:28 osquerybeat.spec.yml
-rw-r--r--  1 root root      6504 Oct 25 06:28 osquerybeat.yml
-rw-rw----  1 root root 217818752 Oct 25 06:28 osqueryd

Alright, I can reproduce this with the docker.elastic.co/beats/elastic-agent-complete image that I built. It's not seeing the spec files, even though they're getting built into the container image

I have tried also to download the cloudbeat.spec.yml from https://github.com/elastic/elastic-agent/tree/feature-arch-v2/specs to componentes with no success.

@oren-zohar
Copy link
Collaborator

When running the image that is shipped with the release it seems that the cloudbeat is embedded and not downloaded. @oren-zohar am I, right?

To my knowledge, yes that should be the behavior

@cmacknz
Copy link
Member

cmacknz commented Nov 3, 2022

I would like to know if there is a way to have the cloudbeat binary to be part of the final image ( so I don't need also the copy it afterward). Maybe we can trigger a job to build snapshot-like image?

Adding cloudbeat to the AGENT_DROP_PATH before the agent mage package job is run should include it. The default path is here, with the rest of the packaging logic if you are curious to read it.

If this doesn't work we can fix it. The only other alternative is updating the agent packaging step to know how to fetch cloudbeat specifically the way it does for binaries from the main Beats repository. I don't think we want to special case binaries like this in the long term. Really the agent build system is due for a redesign to make this whole process easier.

@cmacknz
Copy link
Member

cmacknz commented Nov 3, 2022

@olegsu the fix for the input not found bug has now been merged to the agent feature-arch-v2 branch: elastic/elastic-agent#1653 if you want to retest it.

We will be planning to merge these changes to main next week so that agent v2 is available in the 8.6 snapshot images. We'll send an email once we decide on a specific date.

@olegsu
Copy link
Contributor Author

olegsu commented Nov 6, 2022

Thank you @cmacknz I will try running it and update

  • Test local kind cluster - OK
  • Test remote EKS cluster
  • Test compilation with AGENT_DROP_PATH - No working as expected

Update
The AGENT_DROP_PATH is not working as I would expect.
I have tried setting to the directory where cloudbeat binaries were built, and I got an error the it cannot find all the other beats

logs
Error: failed building elastic-agent type=docker for platform=linux/arm64: failed to copy from ./build/distributions/elastic-agent-drop/archives/linux-arm64.tar.gz/filebeat-8.6.0-SNAPSHOT-linux-arm64.tar.gz to build/package/elastic-agent-complete/elastic-agent-linux-arm64.docker/docker-build/beat/data/cloud_downloads/filebeat-8.6.0-SNAPSHOT-linux-arm64.tar.gz: copy failed: cannot stat source file ./build/distributions/elastic-agent-drop/archives/linux-arm64.tar.gz/filebeat-8.6.0-SNAPSHOT-linux-arm64.tar.gz: stat ./build/distributions/elastic-agent-drop/archives/linux-arm64.tar.gz/filebeat-8.6.0-SNAPSHOT-linux-arm64.tar.gz: no such file or directory
failed building elastic-agent type=docker for platform=linux/arm64: failed to copy from ./build/distributions/elastic-agent-drop/archives/linux-arm64.tar.gz/filebeat-8.6.0-SNAPSHOT-linux-arm64.tar.gz to build/package/elastic-agent-ubi8/elastic-agent-linux-arm64.docker/docker-build/beat/data/cloud_downloads/filebeat-8.6.0-SNAPSHOT-linux-arm64.tar.gz: copy failed: cannot stat source file ./build/distributions/elastic-agent-drop/archives/linux-arm64.tar.gz/filebeat-8.6.0-SNAPSHOT-linux-arm64.tar.gz: stat ./build/distributions/elastic-agent-drop/archives/linux-arm64.tar.gz/filebeat-8.6.0-SNAPSHOT-linux-arm64.tar.gz: no such file or directory
failed building elastic-agent type=docker for platform=linux/arm64: failed to copy from ./build/distributions/elastic-agent-drop/archives/linux-arm64.tar.gz/filebeat-8.6.0-SNAPSHOT-linux-arm64.tar.gz to build/package/elastic-agent/elastic-agent-linux-arm64.docker/docker-build/beat/data/cloud_downloads/filebeat-8.6.0-SNAPSHOT-linux-arm64.tar.gz: copy failed: cannot stat source file ./build/distributions/elastic-agent-drop/archives/linux-arm64.tar.gz/filebeat-8.6.0-SNAPSHOT-linux-arm64.tar.gz: stat ./build/distributions/elastic-agent-drop/archives/linux-arm64.tar.gz/filebeat-8.6.0-SNAPSHOT-linux-arm64.tar.gz: no such file or directory
failed building elastic-agent type=docker for platform=linux/arm64: failed to copy from ./build/distributions/elastic-agent-drop/archives/linux-arm64.tar.gz/metricbeat-8.6.0-SNAPSHOT-linux-arm64.tar.gz to build/package/elastic-agent-cloud/elastic-agent-linux-arm64.docker/docker-build/beat/data/cloud_downloads/metricbeat-8.6.0-SNAPSHOT-linux-arm64.tar.gz: copy failed: cannot stat source file ./build/distributions/elastic-agent-drop/archives/linux-arm64.tar.gz/metricbeat-8.6.0-SNAPSHOT-linux-arm64.tar.gz: stat ./build/distributions/elastic-agent-drop/archives/linux-arm64.tar.gz/metricbeat-8.6.0-SNAPSHOT-linux-arm64.tar.gz: no such file or directory

So I tried another thing, building the cloudbeat assets and then copying them into the expected directory (build/distributions/elastic-agent-drop/archives/linux-arm64.tar.gz). The problem here is that this directory is ephemeral, and the mage package will delete it at the end. So I need prior the command to mkdir -p .... and then copy all my binaries.

This process worked and all the binaries were copied to the final image and cloudbeat started as expected.

I think that it is possible to overcome it by doing one of:

  1. Stop the directory deletion (adding KEEP_DROP might work as well)
  2. Update the AGENT_DROP_PATH variable to have additional binaries
  3. Add another variable to do that.

@fearful-symmetry
Copy link

That's odd, none of the latest changes include changes to the build system, so I'm not sure what would suddenly cause the failed building elastic-agent errors. Gonna try to reproduce...

@fearful-symmetry
Copy link

So, the behavior of AGENT_DROP_PATH is a little weird, and when it's set, the agent magefile will skip building the rest of the binaries, which is where I assume the metricbeat/filebeat build errors are coming from. I wonder if you can create the elastic-agent-drop path under build/distributions beforehand, place the cloudbeat files there, then run mage package without the environment variable. That....might work?

@olegsu
Copy link
Contributor Author

olegsu commented Nov 8, 2022

So, the behavior of AGENT_DROP_PATH is a little weird, and when it's set, the agent magefile will skip building the rest of the binaries, which is where I assume the metricbeat/filebeat build errors are coming from. I wonder if you can create the elastic-agent-drop path under build/distributions beforehand, place the cloudbeat files there, then run mage package without the environment variable. That....might work?

Thank you for the answer, @fearful-symmetry.
That is what I did to verify.
TBH, I don't think it is a good idea to leave it for the end user (us, in this case) as it would add an additional complexity level to day-to-day development flow.
If we can add something like KEEP_DROP variable so the directory is not deleted at the end of the mage package it would be better. A better solution would be to add ADDITIONAL_DROP or some other way to keep the same drop and also add another directory so the mage package will copy from the ADDITIONAL_DROP to the current.

@mergify
Copy link

mergify bot commented Nov 8, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b agent-v2 upstream/agent-v2
git merge upstream/main
git push upstream agent-v2

@cmacknz
Copy link
Member

cmacknz commented Nov 8, 2022

I agree that it would be nice to separate flag that treats the contents of AGENT_DROP_PATH as additional binaries to be bundled with the standard set, rather than expecting it to be an alternative source for all binaries.

I think there is an existing workaround for this in the current cloudbeat magefile that will pack filebeat, metricbeat, heartbeat, and and osquerybeat in addition to cloudbeat:

cloudbeat/magefile.go

Lines 271 to 276 in b84e3bd

packedBeats := []string{"filebeat", "heartbeat", "metricbeat", "osquerybeat"}
ctx := context.Background()
for _, beat := range packedBeats {
for _, reqPackage := range requiredPackages {
newVersion, packageName := getPackageName(beat, version, reqPackage)
err := fetchBinaryFromArtifactsApi(ctx, packageName, beat, newVersion, dropPath)

The behaviour here didn't change in v2, but I do agree the agent build and packaging system could be improved regardless.

@olegsu olegsu force-pushed the agent-v2 branch 3 times, most recently from 1db972d to 697c6bf Compare November 9, 2022 10:28
Copy link
Collaborator

@oren-zohar oren-zohar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great 🚀

cp_to_pod $POD $LOCAL_DIR/cloudbeat $DEST
cp_to_pod $POD $LOCAL_DIR/cloudbeat.yml $DEST/cloudbeat.yml

# Start with COPY_BUNDLE=true to move also the opa bundle to the agent
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

ROOT=/usr/share/elastic-agent/data/elastic-agent-$SHA
DEST=$ROOT/components
cp_to_pod $POD $LOCAL_DIR/cloudbeat $DEST
cp_to_pod $POD $LOCAL_DIR/cloudbeat.yml $DEST/cloudbeat.yml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure we'll always want to copy cloudbeat.yml, wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this is not the behavior we have today.

@olegsu
Copy link
Contributor Author

olegsu commented Nov 9, 2022

@cmacknz Thank you!
I noticed that the elastic-agent restart command is not working anymore.
I thought using it in https://github.com/elastic/cloudbeat/pull/458/files#diff-7f741b97dde46efe3b4de1e8fdfce0cc21f3ed5f767dd6ae9aab5f1ffb0341da to restart the agent with new components.
This command is exiting with an error ( it worked previously, on @fearful-symmetry pr)

Failed trigger restart of daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /usr/share/elastic-agent/data/tmp/elastic-agent-control.sock: connect: no such file or directory"

So I changed it to 1e3bdad. This is not a blocker, but might be something worth looking on.

@olegsu olegsu force-pushed the agent-v2 branch 2 times, most recently from cd313be to f7de335 Compare November 9, 2022 13:47
@mergify
Copy link

mergify bot commented Nov 9, 2022

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b agent-v2 upstream/agent-v2
git merge upstream/main
git push upstream agent-v2

@cmacknz
Copy link
Member

cmacknz commented Nov 9, 2022

I noticed that the elastic-agent restart command is not working anymore.

Assuming this is reproducible, this is a bug. All of the elastic-agent commands should keep working. I'll try to quickly reproduce this and then open an issue to get it fixed.

@olegsu olegsu merged commit 675c6cf into elastic:main Nov 9, 2022
@cmacknz
Copy link
Member

cmacknz commented Nov 10, 2022

Confirmed restart is broken, along with a few other things: elastic/elastic-agent#1709

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants