Make `flux logs` more lenient #3945

makkes · 2023-06-01T14:00:09Z

UX changes:

Only print an error when a pod doesn't have a matching container
instead of exiting early.
Return a non-zero status code when no pod is found at all.

Details:

In certain situations there might be 3rd-party pods running in the
Flux namespace that cause the command to fail streaming logs, e.g.
when they have multiple containers but none of them is called
manager (which all Flux-maintained pods do). An example of such a
situation is when Flux is installed with the 3rd-party Flux extension
on AKS.

The logs command is now more forgiving and merely logs an error in
these situations instead of completely bailing out. It still returns a
non-zero exit code.

For the parallel log streaming with -f the code is now a little more
complex so that errors are now written to stderr in parallel with all
other logs written to stdout. That's what asyncCopy is for.

Before:

$ flux logs
✗ container manager is not valid for pod nginx-797bf9d444-vrhdr
$ echo $?
1

After:

$ flux logs
[...]
2023-06-01T08:05:07.860Z info Kustomization/flux-system.flux-system - Discarding event, no alerts found for the involved object 
2023-06-01T08:15:08.581Z info Kustomization/flux-system.flux-system - Discarding event, no alerts found for the involved object
container manager is not valid for pod nginx-797bf9d444-tbc8m
container manager is not valid for pod nginx2-7b4b6d8d54-stdrk
2023-06-02T12:11:17.309Z info GitRepository/flux-system.flux-system - Discarding event, no alerts found for the involved object 
2023-06-02T12:11:18.811Z info Kustomization/flux-system.flux-system - Discarding event, no alerts found for the involved object
[...]
✗ failed to collect logs from all Flux pods
$ echo $?
1

Tests

I changed the testing of the logs command and split the tests into unit and e2e tests so that the tests running the actual command are now run as part of the e2e suite so that they run against an actual cluster and we can test more complex scenarios and also because the command now fails if no pod can be found to print logs from.

refs #3944

hiddeco · 2023-06-01T14:08:12Z

@makkes can you provide an example of the output? Mostly trying to see how the \n looks from a UX perspective.

makkes · 2023-06-01T16:04:12Z

@makkes can you provide an example of the output? Mostly trying to see how the \n looks from a UX perspective.

done

stefanprodan · 2023-06-01T16:08:31Z

Can we please accommodate flux-aio. It’s a single pod, with a container per controller, the container name matches the controller name.

makkes · 2023-06-01T17:20:00Z

Can we please accommodate flux-aio. It’s a single pod, with a container per controller, the container name matches the controller name.

One way we could solve this is to print the logs of all containers of a matching pod. In #2844 @hiddeco deliberately chose to show only the logs from the manager container, though. I'm not sure if the option to list all containers' logs was on the table and if it was discarded for a reason I don't know.

stefanprodan · 2023-06-01T17:33:03Z

We can’t print logs of random containers as we’ll end up with Envoy/Linkerd/ngnix sidecars if people run Flux in a service mesh for mTLS. What I suggest is to look beside manager for the controller names which are already declared in flux2 manifestgen

hiddeco · 2023-06-01T19:50:16Z

Shouldn't we first put the project under the Flux umbrella before facilitating changes for an experimental and semi-official project?

stefanprodan · 2023-06-02T05:44:19Z

Shouldn't we first put the project under the Flux umbrella before facilitating changes for an experimental and semi-official project?

Ok fair point. I can document why flux logs is the only command that doesn’t work with aio.

Regarding Azure, I could say the same thing, they shouldn’t label something as part-of: flux that’s not under Flux org. They could use some other value like azure-flux and that would make the current log command work.

hiddeco

If we fail to recognize any Deployment and/or Pod at all, I would expect this to still return an error (and error code). While at present on a freshly bootstrapped cluster (without a flux-system namespace, nor any components installed), it returns:

$ flux logs
$ echo $?
0

makkes · 2023-06-02T08:03:25Z

Ok, let's settle on a way forward. The options we have:

Leave the code as it is and ask 3rd-party components such as the Azure Flux extension to remove the app.kubernetes.io/part-of label or at least change its value. This will leave flux logs unusable with flux-aio (which technically is not (yet) a part of Flux).
Make the command more lenient as proposed in this PR, making it compatible with the Azure Flux extension.

If we opt for 2. then we still need to decide if we want to add extra code to handle flux-aio or not. I'm on the fence about this because I don't know how many users actually use flux-aio at this point. Given it's not part of the Flux project I'd personally prefer to only handle flux-aio when it has been made part of the Flux project (or at least fluxcd-community).

hiddeco · 2023-06-02T08:38:25Z

As I understand it, both Stefan and I agree that it should be option 2. But without taking flux-aio into account at this moment. Because we don't have full control over people and/or projects not doing this, it should still be lenient.

However, I would still in some way like to see #3945 (review) covered.

makkes · 2023-06-02T09:03:22Z

As I understand it, both Stefan and I agree that it should be option 2. But without taking flux-aio into account at this moment. Because we don't have full control over people and/or projects not doing this, it should still be lenient.

@stefanprodan said

Regarding Azure, I could say the same thing, they shouldn’t label something as part-of: flux that’s not under Flux org. They could use some other value like azure-flux and that would make the current log command work.

That's why I brought option 1 up to begin with.

hiddeco · 2023-06-02T09:11:51Z

Think failing as hard as we do for misconfigurations, and thereby breaking the primary feature of what it tries to offer -- is bad from a UX perspective in any case. Which makes this change unarguably better in comparison to what's offered now.

makkes · 2023-06-02T09:21:28Z

Ok good. I'll address the remaining comment and ask for another round of reviews.

hiddeco

Much better overall UX, thanks a lot @makkes 💯

cmd/flux/logs.go

UX changes: - Only print an error when a pod doesn't have a matching container instead of exiting early. - Return a non-zero status code when no pod is found at all. Details: In certain situations there might be 3rd-party pods running in the Flux namespace that cause the command to fail streaming logs, e.g. when they have multiple containers but none of them is called `manager` (which all Flux-maintained pods do). An example of such a situation is when Flux is installed with the 3rd-party Flux extension on AKS. The `logs` command is now more forgiving and merely logs an error in these situations instead of completely bailing out. It still returns a non-zero exit code. For the parallel log streaming with `-f` the code is now a little more complex so that errors are now written to stderr in parallel with all other logs written to stdout. That's what `asyncCopy` is for. refs #3944 Signed-off-by: Max Jonas Werner <mail@makk.es>

makkes requested review from somtochiama and hiddeco June 1, 2023 14:00

makkes self-assigned this Jun 1, 2023

makkes added the area/UX label Jun 1, 2023

makkes force-pushed the lenient-logs-cmd branch from 1d0a089 to ec6ef63 Compare June 1, 2023 14:06

hiddeco reviewed Jun 2, 2023

View reviewed changes

makkes force-pushed the lenient-logs-cmd branch from ec6ef63 to c8ec5a7 Compare June 2, 2023 12:25

makkes requested a review from hiddeco June 2, 2023 12:32

makkes force-pushed the lenient-logs-cmd branch 2 times, most recently from 7735e7a to 926a4ce Compare June 2, 2023 16:25

hiddeco approved these changes Jun 5, 2023

View reviewed changes

cmd/flux/logs.go Outdated Show resolved Hide resolved

makkes force-pushed the lenient-logs-cmd branch 2 times, most recently from 6b8afc9 to cbdd71e Compare June 5, 2023 08:07

makkes merged commit a3f2b1d into main Jun 5, 2023

makkes deleted the lenient-logs-cmd branch June 5, 2023 08:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `flux logs` more lenient #3945

Make `flux logs` more lenient #3945

makkes commented Jun 1, 2023 •

edited

Loading

hiddeco commented Jun 1, 2023

makkes commented Jun 1, 2023

stefanprodan commented Jun 1, 2023

makkes commented Jun 1, 2023

stefanprodan commented Jun 1, 2023 •

edited

Loading

hiddeco commented Jun 1, 2023

stefanprodan commented Jun 2, 2023

hiddeco left a comment

makkes commented Jun 2, 2023

hiddeco commented Jun 2, 2023 •

edited

Loading

makkes commented Jun 2, 2023

hiddeco commented Jun 2, 2023 •

edited

Loading

makkes commented Jun 2, 2023

hiddeco left a comment

Make flux logs more lenient #3945

Make flux logs more lenient #3945

Conversation

makkes commented Jun 1, 2023 • edited Loading

Tests

hiddeco commented Jun 1, 2023

makkes commented Jun 1, 2023

stefanprodan commented Jun 1, 2023

makkes commented Jun 1, 2023

stefanprodan commented Jun 1, 2023 • edited Loading

hiddeco commented Jun 1, 2023

stefanprodan commented Jun 2, 2023

hiddeco left a comment

Choose a reason for hiding this comment

makkes commented Jun 2, 2023

hiddeco commented Jun 2, 2023 • edited Loading

makkes commented Jun 2, 2023

hiddeco commented Jun 2, 2023 • edited Loading

makkes commented Jun 2, 2023

hiddeco left a comment

Choose a reason for hiding this comment

Make `flux logs` more lenient #3945

Make `flux logs` more lenient #3945

makkes commented Jun 1, 2023 •

edited

Loading

stefanprodan commented Jun 1, 2023 •

edited

Loading

hiddeco commented Jun 2, 2023 •

edited

Loading

hiddeco commented Jun 2, 2023 •

edited

Loading