Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autodiscover provider for Nomad #14954

Merged
merged 91 commits into from
Jan 7, 2021
Merged

Conversation

jorgelbg
Copy link
Contributor

@jorgelbg jorgelbg commented Dec 5, 2019

At trivago we run an internal cloud using Nomad from Hashicorp. Our logging solution is based on ELK and we use Filebeat to ship the logs from our client nodes into Kafka where it is later on ingested into Elasticsearch using Logstash. Previously we used the and input looking for new jobs in a defined path, but the logs lacked a lot of context/metadata from the Job definition/allocation.

This PR adds a new discover module (architecture based on the Kubernetes module). With this new provider, it is possible to start new harvesters by looking at the jobs allocated on each node. We currently run filebeat as a system job on each node and each filebeat instance is responsible for enriching and shipping the local logs.

Example of the configuration for the new provider:

filebeat.autodiscover:
  providers:
    - type: nomad
      host: {{ env "node.unique.name" }}
      hints.enabled: true
      hints.default_config:
        type: log
        paths:
          - /appdata/nomad/alloc/${data.meta.uuid}/alloc/logs/*stderr.[0-9]*
          - /appdata/nomad/alloc/${data.meta.uuid}/alloc/logs/*stdout.[0-9]*

By using the autodiscover module it is possible to define custom processors using the meta stanza on the Nomad job (similar to how it is defined using labels on Kubernetes). For instance:

task "nginx-web" {
    driver = "docker"

    meta {
    task-key = "custom-meta"
    "co.elastic.logs/processors.dissect.tokenizer" = "%{ip} - %{user} [%{local_time}] \"%{request}\" %{status} %{bytes_sent} \"%{referer}\" \"%{user_agent}\""
    }
}

This example defines a custom dissect tokenizer for the logs of this specific task that adds the dissect field with a content similar to:

"dissect": {
    "bytes_sent": "7231",
    "referer": "http://nginx-web.prod.trivago.com/",
    "user_agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36",
    "ip": "10.2.10.138",
    "user": "-",
    "local_time": "15/Nov/2019:09:04:04 +0000",
    "request": "GET / HTTP/1.1",
    "status": "200"
}

By default the following fields are added from the Nomad job/allocation:

  • job
  • namespace
  • status
  • type (job type: system/service/batch)
  • task.* (information about the task and custom metadata defined in the job/group/task using the meta stanza)
  • datacenters
  • region

The PR also includes an add_nomad_metadata processor that matches events to specific allocations and adds the metadata.

We've been running this in our production clusters for a few weeks now.

TODO:

  • Metricbeat support for extracting stats from the Nomad allocation
  • Documentation (it is a WIP and I will add a new commit with the documentation)
  • CHANGELOG
  • Fields reference

How to test locally

  • Start a local development agent (nomad agent -dev).
  • Start filebeat with a configuration like this one:
    filebeat.autodiscover:
      providers:
      - type: nomad
        templates:
          - config:
              - type: log
                paths:
                  - /tmp/NomadClient*/${data.nomad.allocation.id}/alloc/logs/*stderr.[0-9]*
                  - /tmp/NomadClient*/${data.nomad.allocation.id}/alloc/logs/*stdout.[0-9]*
    
  • Run some service in nomad, for example:
    job "consul" {
      datacenters = ["dc1"]
    
      group "server" {
        task "consul-dev" {
          driver = "raw_exec"
    
          config {
            command = "consul"
            args = [
              "agent", "-dev",
            ]
          }
          artifact {
            source = "https://releases.hashicorp.com/consul/1.9.0/consul_1.9.0_linux_amd64.zip"
          }
        }
      }
    }
    
  • Check that logs are collected for this service and they include the nomad metadata.
  • Repeat with a configuration like this one for hints-based autodiscover:
    filebeat.autodiscover:
      providers:
      - type: nomad
        hints.enabled: true
        hints.default_config:
          type: log
          paths:
            - /tmp/NomadClient*/${data.nomad.allocation.id}/alloc/logs/*stderr.[0-9]*
            - /tmp/NomadClient*/${data.nomad.allocation.id}/alloc/logs/*stdout.[0-9]*
    

jorgelbg and others added 30 commits August 1, 2019 17:09
Add a test for the basic hint features
Add tests for the emition of events
Add tests for the emition of events
…llocations

Add local constants to track the possible status of the allocations
On older nomad versions (0.8.4) the `NodeName` attribute of the allocation is empty. This means that sometimes we cannot assign a proper `host` to the event. As a workaround we use the `NodeID` to get the the name from the actual client node.
This is a workaround for Nomad v0.8 that doesn't provide the NodeName directly in the allocation object. We use the NodeID to fetch it from the API.
- WIP emit only one task metadata event.
- Rename the matchers/indexers of the add_nomad_metadata processor to match the Nomad lingo.
- Rename `meta.meta` to `meta.tasks` and fix the tests.
- Add the main import to the nomad provider in the cmd tool.
WIP patch for the unchanged allocations and avoids triggering new harvesters for those allocations that were previously discovered.
- Rename `uuid` field to `alloc_id`.
- Use WatchOptions.RefreshInterval (SyncInterval on the config) for the sync interval of the watcher
@jsoriano jsoriano added v7.12.0 needs_backport PR is waiting to be backported to other branches. labels Jan 5, 2021
@jsoriano
Copy link
Member

jsoriano commented Jan 5, 2021

jenkins run the tests please

Copy link
Member

@jsoriano jsoriano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jorgelbg!

@jorgelbg
Copy link
Contributor Author

jorgelbg commented Jan 5, 2021

@jsoriano I cherry-picked the changes from your branch, but I'm ok if we merge from any side. Thanks for taking care of fixing those issues (especially the Windows test). Changes look great! I was planning on jumping into this after the holidays but glad that you were faster!

@jsoriano
Copy link
Member

jsoriano commented Jan 6, 2021

@jorgelbg could you please update the branch with master? Failing tests were fixed yesterday.

@jsoriano
Copy link
Member

jsoriano commented Jan 6, 2021

jenkins run the tests please

@jsoriano jsoriano merged commit 24397d8 into elastic:master Jan 7, 2021
@jsoriano
Copy link
Member

jsoriano commented Jan 7, 2021

Merged, thanks a lot @jorgelbg!

jsoriano pushed a commit to jsoriano/beats that referenced this pull request Jan 7, 2021
Initial features to support logs collection from applications deployed in Nomad.

Add a new `nomad` autodiscover provider (based on the Kubernetes provider).
With this new provider, it is possible to start new harvesters by looking
at the jobs allocated on each node. With this, filebeat can be run as a
system job on each node and each filebeat instance is responsible for
enriching and shipping the local logs.
This autodiscover provider supports hints-based autodiscover.

Add a new `add_nomad_metadata` processor that matches events to specific
allocations and adds the metadata.

Co-authored-by: Jaime Soriano Pastor <jaime.soriano@elastic.co>
(cherry picked from commit 24397d8)
@jsoriano jsoriano added test-plan Add this PR to be manual test plan and removed needs_backport PR is waiting to be backported to other branches. labels Jan 7, 2021
@sorantis
Copy link
Contributor

sorantis commented Jan 7, 2021

Thank you all for working on this!

jsoriano added a commit that referenced this pull request Jan 7, 2021
Initial features to support logs collection from applications deployed in Nomad.

Add a new `nomad` autodiscover provider (based on the Kubernetes provider).
With this new provider, it is possible to start new harvesters by looking
at the jobs allocated on each node. With this, filebeat can be run as a
system job on each node and each filebeat instance is responsible for
enriching and shipping the local logs.
This autodiscover provider supports hints-based autodiscover.

Add a new `add_nomad_metadata` processor that matches events to specific
allocations and adds the metadata.

(cherry picked from commit 24397d8)

Co-authored-by: Jaime Soriano Pastor <jaime.soriano@elastic.co>
Co-authored-by: Jorge Luis Betancourt <jorge-luis.betancourt@trivago.com>
@andresrc andresrc added the test-plan-added This PR has been added to the test plan label Feb 15, 2021
@zube zube bot removed the [zube]: Done label Apr 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autodiscovery review Team:Integrations Label for the Integrations team Team:Platforms Label for the Integrations - Platforms team test-plan Add this PR to be manual test plan test-plan-added This PR has been added to the test plan v7.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants