Skip to content
This repository has been archived by the owner on Jun 23, 2022. It is now read-only.

*: Data collection from FCOS machines #31

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

zonggen
Copy link
Member

@zonggen zonggen commented Sep 18, 2019

Source of truth for each part of data collected:

  • Minimal:
    • platform: read_file /proc/cmdline
    • current_os_version: read_output rpm-ostree status --json
    • original_os_version: read_file /.coreos-aleph-version.json
    • instance_type: read_file /run/metadata/afterburn
  • Full:
    • hardware:
      • disk: read_output lsblk --fs --json
      • cpu: read_output lscpu --json
      • memory: read_output lsmem --json
    • network: read_output nmcli device show
    • container_runtime:
      • podman:
        • check if running by pgrep podman
        • count running container by pgrep conmon | wc -l
      • docker:
        • check if running by pgrep dockerd
        • count running container by pgrep containerd-shim | wc -l
      • cri-o:
        • check if running by pgrep crio
        • count running container by pgrep crictl | wc -l
      • system-nspawn:
        • check if running and counting containers by pgrep systemd-nspawn

Signed-off-by: Allen Bai, abai@redhat.com

@zonggen zonggen changed the title src/identity: Add a module that collects platform id [WIP] src/identity: Add a module that collects platform id Sep 18, 2019
@zonggen
Copy link
Member Author

zonggen commented Sep 18, 2019

Currently in main.rs the collected ignition platform id will be printed to the console, which will be updated later to be bundled with other data and sent to the remote endpoint.

@zonggen
Copy link
Member Author

zonggen commented Sep 18, 2019

Also, this PR will be updated to resolve #30 (comment)

@zonggen zonggen changed the title [WIP] src/identity: Add a module that collects platform id [WIP] *: Metrics collection from FCOS machines Sep 18, 2019
src/main.rs Outdated Show resolved Hide resolved
src/identity/platform.rs Outdated Show resolved Hide resolved
src/identity/mod.rs Outdated Show resolved Hide resolved
src/identity/mod.rs Outdated Show resolved Hide resolved
Allen Bai added 2 commits September 19, 2019 10:14
Collects ignition platform id from /proc/cmdline in the varriable
`ignition.platform.id=` and stores the collected data inside the
`identity::Identity` struct.

Also adds different grains of data collection level: minimum and
full, depending on the passed toml configuration.

Signed-off-by: Allen Bai, abai@redhat.com
Changes the use of term "metrics", deduplicates match pattern and
uses maplit::hashmap! macro.
@zonggen
Copy link
Member Author

zonggen commented Sep 19, 2019

UPDATE:

  • Added the module os-release.rs under identity to extract OS version from /etc/os-release

Note: Most portion of code is the same as platform.rs so in the future, towards the end of this PR should de-duplicate this part of code into a new utils helper module.

Picks up CoreOS alpha version information from `/.coreos-aleph-version.json`.

Signed-off-by: Allen Bai, abai@redhat.com
Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sane from a quick scan!

src/identity/mod.rs Outdated Show resolved Hide resolved
src/identity/os_release.rs Outdated Show resolved Hide resolved
@zonggen zonggen force-pushed the collect-metrics branch 3 times, most recently from 97f3a1d to b3a806b Compare October 9, 2019 20:09
Allen Bai added 3 commits October 15, 2019 11:28
src/full: add hardware data collection

Signed-off-by: Allen Bai <abai@redhat.com>
Signed-off-by: Allen Bai <abai@redhat.com>
Signed-off-by: Allen Bai <abai@redhat.com>
@zonggen
Copy link
Member Author

zonggen commented Oct 15, 2019

STATUS:
Currently Travis CI is failing since it is using Rust docker image which is based on Debian and it does not ship with lsmem, and lscpu does not contain an --json option.
To test with more production-like environment, i.e. FCOS, it might be better to switch to UBI (https://www.redhat.com/en/blog/introducing-red-hat-universal-base-image). Will be updating with changes to see if it works.

EDIT:

  • Updated .travis.tml to use build a container using Dockerfile and test the code inside the UBI container

@zonggen zonggen force-pushed the collect-metrics branch 2 times, most recently from 96eb3d1 to 01b0a44 Compare October 16, 2019 17:21
.travis.yml: test inside ubi container instead of ubuntu vm

Allows "yes"/"no" values for `removable` field of `lsmem --json`, because of
different implementations, the field type might be different. However, in an
FCOS machine this will be true/false. This change is mainly to accommodate testing
environment.

Signed-off-by: Allen Bai <abai@redhat.com>
@zonggen zonggen changed the title [WIP] *: Metrics collection from FCOS machines [WIP] *: Data collection from FCOS machines Oct 16, 2019
Allen Bai added 2 commits October 16, 2019 17:20
Signed-off-by: Allen Bai <abai@redhat.com>

agent/full: only collect hw info on bare metals

Signed-off-by: Allen Bai <abai@redhat.com>

full/network: collect network info with `nmcli`

Collects network data with format `[key[\s]+value\n]+`,
to see example run `nmcli device show`. Parse the data
as key:value pairs stored in a HashMap.

Signed-off-by: Allen Bai <abai@redhat.com>
Signed-off-by: Allen Bai <abai@redhat.com>
@zonggen
Copy link
Member Author

zonggen commented Oct 21, 2019

STATUS:

  • Since collecting service is meant to running in the background and users should not expect visible error information, the program should not crash/return error when one of the commands returns non-zero and instead should suppress the error and collect whatever is available at the moment.
  • Could use a log file to dump the error message

tests: add tests to print minimum and full agent with test config file

Signed-off-by: Allen Bai <abai@redhat.com>
Signed-off-by: Allen Bai <abai@redhat.com>
@zonggen
Copy link
Member Author

zonggen commented Oct 22, 2019

STATUS:

  • Client side data collection is done. though the send_data() function is not doing anything other than printing the collected data in Json format. This needs to be updated after we have an actual endpoint.
  • Persistent state so that pinger could send data periodically with some fixed interval will be added through another PR to reduce the size of this PR

Overall, ready for review!

@zonggen zonggen changed the title [WIP] *: Data collection from FCOS machines *: Data collection from FCOS machines Oct 22, 2019
Dockerfile: install dependency, openssl-devel

Signed-off-by: Allen Bai <abai@redhat.com>
Since reporting will happen daily or monthly, we are keeping
track of time intervals between reports. Two threads are spawned
for tracking daily and monthly reporting. Daily thread will check
timestamp every 12 hours, otherwise put to sleep. Monthly thread
will check every 15 days, otherwise put to sleep.

Signed-off-by: Allen Bai <abai@redhat.com>
@zonggen
Copy link
Member Author

zonggen commented Oct 24, 2019

STATUS:

  • Built on top of systemd: add persistent state and report with different intervals #35, in the main function, to create a Daemon process, one process is forked and parent process is immediately terminated. Two threads are then spawned by child process to track the time for daily and monthly reports.
  • The two threads are put into sleep to avoid busy waiting, and only checks whether it needs to send new report every 12 hours / 15 days for daily and monthly reports correspondingly.

@zonggen
Copy link
Member Author

zonggen commented Oct 28, 2019

cc @lucab @bgilbert

…time

Previously we are using individual container runtimes to extract container
information, e.g. calling `podman container ls` to count running containers.
It would create two problems:
  - podman failed when trying to create `/.config/containers` since root dir
    is read-only and dynamic user does not have its own home dir
  - successful calls do not make sense either since counting running containers
    under the dynamic user is not useful
Hence, switch to count containers by calling `pgrep` and count the running
processes of each container runtime.

Signed-off-by: Allen Bai <abai@redhat.com>
Previously used `StateDirectory=` to persist pinger specific
data across bootups. It turns out that systemd does not allow
sharing state directory across different dynamic users. An example
would be:

```
$ systemd-run --pty --property=DynamicUser=yes --property=StateDirectory=wuff /bin/sh
$ systemd-run --pty --property=DynamicUser=yes --property=StateDirectory=wuff /bin/sh
```

Reference: http://0pointer.net/blog/dynamic-users-with-systemd.html
Signed-off-by: Allen Bai <abai@redhat.com>
@rfairley rfairley self-requested a review June 4, 2020 20:57
@ashcrow
Copy link
Member

ashcrow commented Mar 18, 2021

Is this something we want to continue pursuing?

@bgilbert
Copy link
Contributor

Maybe not; xref coreos/fedora-coreos-tracker#770.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants