Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libbeat: override /sys/fs/cgroup paths in Docker #22879

Merged
merged 2 commits into from
Dec 10, 2020

Conversation

axw
Copy link
Member

@axw axw commented Dec 3, 2020

What does this PR do?

This PR adds an undocumented environment variable, LIBBEAT_MONITORING_CGROUPS_HIERARCHY_OVERRIDE, which can be used to override where cgroup metrics are found. This follows the approach taken by Elasticsearch (elastic/elasticsearch#22757), Kibana, and Logstash. The environment variable is set (to "/") in the Docker images.

Why is it important?

Under Docker, the paths in /proc/<pid>/cgroup do not have corresponding paths under /sys/fs/cgroup/<subsystem>. Instead, we should assume a root cgroup and read directly from the files in the subsystem directories.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
    - [ ] I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
    - [ ] I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

  • Build the Docker image for x-pack/filebeat: PLATFORMS=linux/amd64 TYPES=docker mage package)
  • Load the Docker image: docker image load < build/distributions/filebeat-8.0.0-linux-amd64.docker.tar.gz
  • Run a filebeat container under Docker with monitoring enabled, and with a CPU quota defined (I set it to 20000, for example).
  • Wait for metrics to be reported, and check that the cgroup metrics are non-zero:
GET /.monitoring-beats-7-2020.12.03/_search
{
  "sort": [
    {
      "timestamp": {
        "order": "desc"
      }
    }
  ],
  "_source": "beats_stats.metrics.beat.cgroup"
}
        "_index" : ".monitoring-beats-7-2020.12.03",
        "_id" : "cz-vJnYBjbWE_R3aj_hz",
        "_score" : null,
        "_source" : {
          "beats_stats" : {
            "metrics" : {
              "beat" : {
                "cgroup" : {
                  "memory" : {
                    "mem" : {
                      "usage" : {
                        "bytes" : 29429760
                      },
                      "limit" : {
                        "bytes" : 9223372036854771712
                      }
                    },
                    "id" : "e62f3fa5f16f05d6edc680e2303ccfb6a323599db6f7dfd2b02640390d1b8319"
                  },
                  "cpu" : {
                    "cfs" : {
                      "period" : {
                        "us" : 100000
                      },
                      "quota" : {
                        "us" : 20000
                      }
                    },
                    "stats" : {
                      "periods" : 254,
                      "throttled" : {
                        "ns" : 9362053609,
                        "periods" : 40
                      }
                    },
                    "id" : "e62f3fa5f16f05d6edc680e2303ccfb6a323599db6f7dfd2b02640390d1b8319"
                  },
                  "cpuacct" : {
                    "total" : {
                      "ns" : 1476308159
                    },
                    "id" : "e62f3fa5f16f05d6edc680e2303ccfb6a323599db6f7dfd2b02640390d1b8319"
                  }
                }
              }
            }
          }
        },

Related issues

Fixes #22844

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Dec 3, 2020
@axw
Copy link
Member Author

axw commented Dec 3, 2020

This is ready for review, but I've marked it Draft as I haven't created a new release for gosigar yet. If the PR is approved in principle, I'll go ahead and tidy it up.

@axw axw added the Team:Services (Deprecated) Label for the former Integrations-Services team label Dec 3, 2020
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Dec 3, 2020
@elasticmachine
Copy link
Collaborator

elasticmachine commented Dec 3, 2020

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #22879 updated

  • Start Time: 2020-12-08T00:55:03.569+0000

  • Duration: 109 min 54 sec

Test stats 🧪

Test Results
Failed 0
Passed 17337
Skipped 1373
Total 18710

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 17337
Skipped 1373
Total 18710

@fearful-symmetry
Copy link
Contributor

So we're basically passing this along to cgroup.NewReaderOptions ? In theory this looks fine--We'll definitely want to document this somewhere.

@axw
Copy link
Member Author

axw commented Dec 4, 2020

Thanks for taking a look @fearful-symmetry! I'll clean it up now and make it ready for a final review.

So we're basically passing this along to cgroup.NewReaderOptions ?

Yes, it's just passed straight along and the interesting bits are in gosigar/cgroup.

In theory this looks fine--We'll definitely want to document this somewhere.

Do you mean internal/dev docs, or end-user docs? Either way, do you have an idea of where in particular? This "feature" is really a Docker-specific hack, and is hard-coded in the Dockerfile. It should just work without ever requiring user intervention. It's intentionally undocumented in Elasticsearch (elastic/elasticsearch#22757), so I hadn't planned to document it either.

When running in Docker, make sure metrics are read
from `/sys/fs/cgroup/<subsystem>`, ignoring any paths
in `/proc/self/cgroup`.

Fixes elastic#22844
@axw
Copy link
Member Author

axw commented Dec 4, 2020

I have added tests that prove my fix is effective or that my feature works

I'll need some guidance here please. I think the only sensible way to test this is with an end-to-end test that runs a Docker image generated from the Dockerfile.tmpl template. Do we have any tests like this?

@axw axw marked this pull request as ready for review December 4, 2020 06:46
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-services (Team:Services)

@fearful-symmetry
Copy link
Contributor

Do you mean internal/dev docs, or end-user docs? Either way, do you have an idea of where in particular?

I'm guessing ./libbeat/docs/shared-env-vars.asciidoc

I'll need some guidance here please. I think the only sensible way to test this is with an end-to-end test that runs a Docker image generated from the Dockerfile.tmpl template. Do we have any tests like this?

This is too tiny for a unit test and I'm not sure if we have any end-to-end testing that would work here. We might have to skip this.

@axw
Copy link
Member Author

axw commented Dec 7, 2020

Do you mean internal/dev docs, or end-user docs? Either way, do you have an idea of where in particular?

I'm guessing ./libbeat/docs/shared-env-vars.asciidoc

What would be the purpose of documenting it there? This environment variable is an internal implementation detail, and not something users should have to know or care about.

I'll need some guidance here please. I think the only sensible way to test this is with an end-to-end test that runs a Docker image generated from the Dockerfile.tmpl template. Do we have any tests like this?

This is too tiny for a unit test and I'm not sure if we have any end-to-end testing that would work here. We might have to skip this.

OK. I've opened #22948 to address this.

@fearful-symmetry
Copy link
Contributor

What would be the purpose of documenting it there? This environment variable is an internal implementation detail, and not something users should have to know or care about.

Yah, that's a good point. I suppose the comments in dev-tools/packaging/templates/docker/Dockerfile.tmpl should be fine.

@fearful-symmetry
Copy link
Contributor

Also, NOTICE.txt is out of date, you'll want to run make notice

@axw
Copy link
Member Author

axw commented Dec 8, 2020

Also, NOTICE.txt is out of date, you'll want to run make notice

Oops! Thanks, updated.

@fearful-symmetry
Copy link
Contributor

Uh, I don't think the remaining CI errors are related?

@axw
Copy link
Member Author

axw commented Dec 8, 2020

Yup, looks like it's a known issue: elastic/e2e-testing#535

@axw
Copy link
Member Author

axw commented Dec 10, 2020

Thanks for the review @fearful-symmetry !

@axw axw merged commit 7e1520f into elastic:master Dec 10, 2020
@axw axw deleted the cgroups-override branch December 10, 2020 01:14
axw added a commit to axw/beats that referenced this pull request Dec 10, 2020
* libbeat: override /sys/fs/cgroup paths in Docker

When running in Docker, make sure metrics are read
from `/sys/fs/cgroup/<subsystem>`, ignoring any paths
in `/proc/self/cgroup`.

Fixes elastic#22844

(cherry picked from commit 7e1520f)
axw added a commit that referenced this pull request Dec 16, 2020
* libbeat: override /sys/fs/cgroup paths in Docker

When running in Docker, make sure metrics are read
from `/sys/fs/cgroup/<subsystem>`, ignoring any paths
in `/proc/self/cgroup`.

Fixes #22844

(cherry picked from commit 7e1520f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Team:Services (Deprecated) Label for the former Integrations-Services team v7.11.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

libbeat monitoring cgroup metrics are all zero under Docker
3 participants