WIP: FR #84 Include containerd-specific labels to data coming from powercaprapl sensor #109

bpetit · 2021-05-09T11:38:56Z

No description provided.

…t are containers

pierreozoux · 2021-05-25T19:59:42Z

Yop!

J'ai du faire une bétise :)

Dans le container:

scaphandre -v
scaphandre 0.3.0

Dans le compose:

  scaphandre:
    image: hubblo/scaphandre:build-PR_84-docker-labels
    volumes:
      - type: bind
        source: /proc
        target: /proc
      - type: bind
        source: /var/run/docker.sock
        target: /var/run/docker.sock
        read_only: true
      - type: bind
        source: /sys/class/powercap
        target: /sys/class/powercap
    command: ["prometheus", "--containers"]

Les logs de quand il démarre:

   logs scaphandre
Attaching to metrics-collection_scaphandre_1
scaphandre_1        | Scaphandre prometheus exporter
scaphandre_1        | Sending ⚡ metrics
scaphandre_1        | Press CTRL-C to stop scaphandre

Mais quand je tente de curl, il est pas content:

curl localhost:8080

Pas de messages d'erreurs, mais il attends..

Je pense que j'ai la bonne image car avec le param dans l'image normale, il me mets une erreur, et là non.
J'ai regardé viteuf le code, pas l'air d'avoir trop de log, du coup, je ne pense pas que je peux increase le verbose :P

Dis moi si je peux faire d'autres choses :)

bpetit · 2021-05-26T06:17:52Z

Hi !

I've tried to reproduce but the container does anwer me :/
you need to query localhost:8080/metrics by the way to get the metrics. The root endpoint will give you a warning message.
Regarding the hanging query, I'm not sure what it could be about. Could you try directly from the container, with docker exec ?

bpetit · 2021-05-27T12:40:14Z

I think I identified the problem @pierreozoux runs into. The rs-docker crate we use for this feature uses tokio as an async runtime. The prometheus exporter itself uses actix. I guess some conflicts happen as I get tokio log messages when I reproduce the issue on Pierre's machine (using prometheus exporter). I imagine I can't reproduce on mine because this is not deterministic and may rely on lower level configurations on the system (not sure but strongly suspected).

I think we should either get rid of tokio in rs-docker (rs-docker kind of needs a refresh anyway) or actix in the prometheus exporter. I've also heard about bollard (https://github.com/fussybeaver/bollard) which seems to have more contributors. But it uses tokio too. So maybe the solution is to get rid of actix (I was thinking about moving prometheus exporter in full sync + thread anyway).

Do you have any thoughts about that @rossf7 @PierreRust @uggla ?

uggla · 2021-05-28T07:54:23Z

Do you have any thoughts about that @rossf7 @PierreRust @uggla ?

@bpetit, my 2 cents,

I think that's not the sens of history. I mean all web frameworks struggled(actix, rocket seems to use async now, warp, etc... --> https://github.com/flosse/rust-web-framework-comparison) to move to async because it gives better performances. Despite we clearly don't need performances for Scaphandre web server, I'm afraid it will be difficult to find a web framework that will not use async and we might end up with something not well supported in the future.
Using our own sync http(s) web server is possible too, but that's probably not a good idea regarding security aspect.
Also we will see more and more tools/libraries that will use async for io. Sometimes it is needed, sometimes it is hype so not a good reason. But that's the underlying trend.

So I will bet more trying to bump up tokio to the latest versions and trying to have all tokio consumers to use that dependence (not mixing tokio versions). Then, try to see if it fixes the bug. But of course it is not so easy if it is flaky and can't be reproduced 100% of the time.

rossf7 · 2021-05-28T08:39:52Z

Hi @bpetit I agree with @uggla on this. I think replacing actix with the same version of tokio is a good approach.

I had the same issue when looking at the kubernetes integration. https://github.com/clux/kube-rs is the most popular library and uses tokio.

There is also https://github.com/ynqa/kubernetes-rust which isn't async but the last commit was 2 years ago. I tried to get that working but I wasn't able to. Although that is probably because I'm new to rust.

bpetit · 2021-05-29T08:21:35Z

Hi ! Thanks for your views on this. I'll give a shot to tokio/hyper for the prometheus exporter. Let's see.

…cs yet

bpetit · 2021-06-01T19:43:24Z

Seems to work, I'll run some more test and then jump back on the docker integration if it's satisfying.

uggla

Here is a review with comments. Hoping it will help a little.

Cargo.toml

uggla · 2021-06-01T22:59:44Z

src/exporters/prometheus.rs

+                error!("server error: {}", e);
+            }
+        } else {
+            panic!("{} is not a valid TCP port number", port);


maybe change error msg to : "is not a valid TCP port number or already bind"

Actually I don't think it's the right message as it is triggered only if we can't parse the port parameter as a u16. If the port number is valid but can't be reserved we will get an error from Server::bind most likely.

uggla · 2021-06-01T23:02:06Z

src/exporters/prometheus.rs

+            {
+                info!(
+                    "{}: Refresh topology",
+                    Utc::now().format("%Y-%m-%dT%H:%M:%S")


FYI, I will make a PR to change the logging stuff. The way it is done today is not really good.

What is the part that worries you ?

uggla · 2021-06-01T23:11:29Z

Cargo.toml

 time = "0.2.25"
 colored = "2.0.0"
 chrono = "0.4.19"
+rs-docker = { version = "0.0.58", optional = true }


You may use cargo tree to find the dependencies.
Maybe it brings older deps ?
From crates.io:

Good one, thanks ! Yes it does. Actually I think I'll use my fork instead: https://github.com/bpetit/rs-docker/
as rs-docker seems to be unmaintained. I'll then update the dependencies so we are even. WDYT ?

Cargo.toml

src/exporters/prometheus.rs

uggla · 2021-06-01T23:29:16Z

src/exporters/prometheus.rs

+            };
+            let context = Arc::new(power_metrics);
+            let make_svc = make_service_fn(move |_| {
+                let ctx = context.clone();


Here I would use shadow to keep the same name let context = context.clone(). I think it is easier to follow.
Idem for sfx below.
And do you really need to clone twice ? here and in the async block below ? I'm not sure but maybe it is only required in the async block.

I had errors I couldn't resolve if I didn't clone twice. Maybe I did it wrong. I'll try to give you some details (next week most probably) so we can look into it and see if it should be done differently.

bpetit · 2021-06-03T06:48:12Z

Thanks a lot for the review ! I'm pretty busy in the next few days, but I should be able to integrate your suggestions and build a working version of prom on tokio with container labels (+docker extra labels) for wednesday 9. 🤞

bpetit · 2021-06-14T10:36:50Z

Prometheus exporter with tokio seems to work fine. However, I'm not confortable having a library with async as a requirement for gathering data from the docker socket locally. It's fine in a pull mode like prometheus, especially if we have a tokio runtime for the server itself, in the same version. But i don't think it's fine to require exporters like JSON, CLI, or any simple exporter to have an async runtime to be able to get extra informations for containers. rs-docker and bollard do require async. I'm forking rs-docker in a minimalistic, read-only and synchronous version. I guess it's enough for what we need here. We could then upgrade to something more fancy if needed afterwards. cc @uggla @rossf7

rossf7

Hi @bpetit
I ran into a couple of problems in my testing.

The first was an error in the logs for listing the pods.

isahc::handler: request completed with error: the server certificate could not be validated

It seems to be because in the k8s client its connecting to http://localhost:6443. If I changed this to the host in my kubeconfig it was fine.

The second problem was with the k8s regex. My cluster was using a different format.

rossf7 · 2021-09-11T17:07:47Z

src/sensors/utils.rs

@@ -22,9 +29,15 @@ impl ProcessTracker {
    /// let tracker = ProcessTracker::new(5);
    /// ```
    pub fn new(max_records_per_process: u16) -> ProcessTracker {
+        let regex_cgroup_docker = Regex::new(r"^/docker/.*$").unwrap();
+        let regex_cgroup_kubernetes = Regex::new(r"^/kubepods.slice/.*$").unwrap();


On the cluster I was testing with the cgroups file has a different format. Could the regex be more generic to support both formats?

#/proc/193876/cgroup 1:name=systemd:/kubepods/burstable/pod7a8cbc91-66e9-4303-88df-513f77240233/acd77757d49868ead1f706f901271e737594d0e11cec86d4bfa4de45a0512938

The cluster was kubernetes v1.20.2 installed using kubeadm on ubuntu 20.10 with docker 20.10.2

bpetit · 2021-09-13T08:32:43Z

I guess we need a flag or an env var to set the kubernetes api uri ?

I'll extend the regexp, thanks for the feedback !

rossf7 · 2021-09-14T07:19:44Z

I guess we need a flag or an env var to set the kubernetes api uri ?

The env vars KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT will be set for the container when it's running as a pod.

What about checking for those and if not have a flag that can be set manually?

bpetit · 2021-09-16T16:25:22Z

Hi @rossf7 !

I've added the gathering of kubernetes env vars. If those vars are present, they are used first to determine the server uri.

I also made the regexp more flexible.

I'd like hear your thoughts and tests results :)

rossf7 · 2021-09-17T11:28:45Z

@bpetit Many thanks for the changes :) I'll retest and report back. Most likely will be tomorrow.

rossf7

@bpetit My testing went well I just needed to make some small changes for the helm chart and to adjust for my cluster.

I think this is really close now. 💚 🚀

docs_src/references/exporter-prometheus.md

src/sensors/utils.rs

rossf7 · 2021-09-18T14:10:25Z

src/sensors/utils.rs

+                            .unwrap()
+                            .strip_prefix("docker-")
+                            .unwrap()
+                            .strip_suffix(".scope")
+                            .unwrap();


Suggested change

.unwrap()

.strip_prefix("docker-")

.unwrap()

.strip_suffix(".scope")

.unwrap();

.unwrap();

I had to remove this. Otherwise there was a crash. Here is an example from my cluster.

/kubepods/burstable/podb55b6901e3073a2abf41783540cb7b36/f60b363dd1d5fa5939a879804b3b96836e130f57ca1ae4442da5c368accf751b

I have a bad feeling this varies by container runtime and we might need to support multiple formats.

Dumb question but could this be a function so we can handle multiple format?

You're right, it seems highly different from one setup to another. I think having a function is a good idea.

I think it may be because I'm using the cgroupfs driver. The recommended driver is systemd but my cluster is a temp one on equinix metal and I didn't configure it 🤦‍♂️

Next time I'll use systemd and see if that changes things.

https://kubernetes.io/docs/setup/production-environment/container-runtimes/#cgroup-drivers

I've tried to make this part not mandatory. Could you try in your environment the latest version of the code ?

thanks 🙏🏽

Thanks for the changes. I've retested with the systemd and cgroupfs drivers and both were fine.

Co-authored-by: Ross Fairbanks <rossf7@users.noreply.github.com>

…tups

Co-authored-by: Ross Fairbanks <rossf7@users.noreply.github.com>

…ub.com:hubblo-org/scaphandre into feature/#84-include-containerd-specific-labels

rossf7

LGTM

feat: fetching data from /proc/PID/cgroups (procfs) for processes tha…

7bbc931

…t are containers

bpetit linked an issue May 9, 2021 that may be closed by this pull request

Include containerd-specific labels to data coming from powercaprapl sensor #84

Closed

bpetit added 2 commits May 9, 2021 13:41

style: cargo fmt

70644e6

style: cargo clippy

ca35afd

bpetit added 2 commits May 11, 2021 09:44

feat(containers): added container names as labels thanks to rs-docker

7f38f59

chore(containers): removing useless comments

60d21bd

bpetit mentioned this pull request May 12, 2021

Include containerd-specific labels to data coming from powercaprapl sensor #84

Closed

bpetit added 2 commits May 13, 2021 14:56

style(containers): cargo fmt

076461a

style(containers): cargo clippy

8e1d24b

bpetit self-assigned this May 27, 2021

bpetit added 4 commits June 1, 2021 10:06

refactor: preparing features in cargo.toml

3de90f9

style: cleaned imports

4813021

r(prometheus)efacto: first working version with tokio/hyper, no metri…

e34780b

…cs yet

refacto(prometheus): added metrics to hyper/tokio skeleton

edd3335

uggla reviewed Jun 1, 2021

View reviewed changes

bpetit added 2 commits June 9, 2021 11:45

fix: normalizing container labels keys

a1b9cf1

style: cargo fmt

84fe934

bpetit added 3 commits June 14, 2021 19:34

refactor: integrated docker-sync to query docker data

9444f67

style: cargo fmt

9177a63

chore: triggering docker build

ab319b0

rossf7 reviewed Sep 11, 2021

View reviewed changes

bpetit added 5 commits September 16, 2021 17:52

fix: kubenetes labels have to work even if docker socket is not used

91ed65d

fix: extended kubernetes cgroup regex to match other setups

dc8cf1c

style cargo fmt

618db49

chore: removed debug code

a1517f7

fix: bumping to latest version of docker and k8s sync libs

2435583

bpetit added 2 commits September 17, 2021 11:14

fix: updated docker sync lib and removed useless warning

78e5cb5

style: cargo fmt

5309225

Merge branch 'main' into feature/#84-include-containerd-specific-labels

53a65e2

rossf7 reviewed Sep 18, 2021

View reviewed changes

bpetit and others added 12 commits September 18, 2021 15:26

Update src/sensors/utils.rs

d6d262a

Co-authored-by: Ross Fairbanks <rossf7@users.noreply.github.com>

Update docs_src/references/exporter-prometheus.md

9324dfe

Co-authored-by: Ross Fairbanks <rossf7@users.noreply.github.com>

style: cargo fmt

71a7f1f

fix: tests prometheus config not ok

be12d85

chore: removing actix from cargo.lock

68c3c74

fix: making container_id extraction a function to manage different se…

59a3643

…tups

Update docs_src/references/exporter-prometheus.md

e531fd7

Co-authored-by: Ross Fairbanks <rossf7@users.noreply.github.com>

Update src/sensors/utils.rs

443b021

Co-authored-by: Ross Fairbanks <rossf7@users.noreply.github.com>

chore: removed old comment

674a0a3

Merge branch 'feature/#84-include-containerd-specific-labels' of gith…

bb4e67f

…ub.com:hubblo-org/scaphandre into feature/#84-include-containerd-specific-labels

style: cargo fmt

440e2e6

fix: wrong config in tests

694a85e

rossf7 approved these changes Sep 25, 2021

View reviewed changes

bpetit merged commit c901389 into main Sep 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: FR #84 Include containerd-specific labels to data coming from powercaprapl sensor #109

WIP: FR #84 Include containerd-specific labels to data coming from powercaprapl sensor #109

bpetit commented May 9, 2021

pierreozoux commented May 25, 2021 •

edited

Loading

bpetit commented May 26, 2021

bpetit commented May 27, 2021 •

edited

Loading

uggla commented May 28, 2021

rossf7 commented May 28, 2021

bpetit commented May 29, 2021

bpetit commented Jun 1, 2021

uggla left a comment

uggla Jun 1, 2021

bpetit Jun 3, 2021

bpetit Jun 3, 2021 •

edited

Loading

uggla Jun 1, 2021 •

edited

Loading

bpetit Jun 3, 2021

uggla Jun 1, 2021 •

edited

Loading

bpetit Jun 3, 2021

uggla Jun 1, 2021

bpetit Jun 3, 2021

bpetit commented Jun 3, 2021 •

edited

Loading

bpetit commented Jun 14, 2021 •

edited

Loading

rossf7 left a comment

rossf7 Sep 11, 2021

bpetit commented Sep 13, 2021

rossf7 commented Sep 14, 2021

bpetit commented Sep 16, 2021

rossf7 commented Sep 17, 2021

rossf7 left a comment

rossf7 Sep 18, 2021

bpetit Sep 18, 2021

rossf7 Sep 18, 2021 •

edited

Loading

bpetit Sep 20, 2021

rossf7 Sep 25, 2021

rossf7 left a comment

WIP: FR #84 Include containerd-specific labels to data coming from powercaprapl sensor #109

WIP: FR #84 Include containerd-specific labels to data coming from powercaprapl sensor #109

Conversation

bpetit commented May 9, 2021

pierreozoux commented May 25, 2021 • edited Loading

bpetit commented May 26, 2021

bpetit commented May 27, 2021 • edited Loading

uggla commented May 28, 2021

rossf7 commented May 28, 2021

bpetit commented May 29, 2021

bpetit commented Jun 1, 2021

uggla left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bpetit Jun 3, 2021 • edited Loading

Choose a reason for hiding this comment

uggla Jun 1, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uggla Jun 1, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bpetit commented Jun 3, 2021 • edited Loading

bpetit commented Jun 14, 2021 • edited Loading

rossf7 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bpetit commented Sep 13, 2021

rossf7 commented Sep 14, 2021

bpetit commented Sep 16, 2021

rossf7 commented Sep 17, 2021

rossf7 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rossf7 Sep 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rossf7 left a comment

Choose a reason for hiding this comment

pierreozoux commented May 25, 2021 •

edited

Loading

bpetit commented May 27, 2021 •

edited

Loading

bpetit Jun 3, 2021 •

edited

Loading

uggla Jun 1, 2021 •

edited

Loading

uggla Jun 1, 2021 •

edited

Loading

bpetit commented Jun 3, 2021 •

edited

Loading

bpetit commented Jun 14, 2021 •

edited

Loading

rossf7 Sep 18, 2021 •

edited

Loading