Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFP-25694: Move cilium/cilium-cli code into cilium/cilium repository #9

Merged
merged 5 commits into from
Aug 16, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
169 changes: 169 additions & 0 deletions cilium/CFP-25694-move-cilium-cli-to-cilium-repo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
# CFP-25694: Move cilium/cilium-cli code into cilium/cilium repository

**Code Name: Mega Ketchup**

**SIG: [sig-cilium-cli]**

**Begin Design Discussion:** 2023-05-25

**Cilium Release:** 1.15

**Authors:** Michi Mutsuzaki `<michi@isovalent.com>`
michi-covalent marked this conversation as resolved.
Show resolved Hide resolved

**Status:** Implementable

## Summary

Move [cilium/cilium-cli] code into [cilium/cilium] repository.

## Motivation

The main motivation for moving [cilium/cilium-cli] code into [cilium/cilium]
repository is to minimize the feature development overhead for adding new tests
joestringer marked this conversation as resolved.
Show resolved Hide resolved
to `cilium connectivity test`.

Currently, adding a new test to `cilium connectivity test` requires the following
steps:

- Open a pull request against [cilium/cilium-cli] repo to add a new test, and
wait for the CI build to finish.
- Modify GitHub workflows files to pick up the CI version of `cilium-cli` in
your pull request against [cilium/cilium] repo.
- Once the new tests passes, undo the change in the workflow files.
- Get these pull requests reveiewed and merged.
michi-covalent marked this conversation as resolved.
Show resolved Hide resolved

Even after all these steps, the new test does not become a part of the regular
CI until the [sig-cilium-cli] team releases a new version of `cilium-cli` that
includes the test, and Renovate updates GitHub wokflow files to pick up the new
`cilium-cli` version. This has been a major source of frustration for some Cilium
contributors.
Comment on lines +35 to +39
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cilium-CLI for PRs uses a helm chart and docker images to deploy Cilium CLI into the cluster, then runs it from there right? Looking at this.

Could we model the cilium/cilium CI like this instead, and autopublish CLI images for every PR or every main commit... then provide a way for cilium/cilium PRs to point to those CLI dev images?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a step back, I guess there's two cases, right? There's:

  1. Developing new e2e tests
  2. Updating Cilium to use the latest CLI

For (1), it's helpful to be able to point arbitrary cilium/cilium PRs towards arbitrary cilium/cilium-cli versions, so that you can co-develop the tests and functionality. Some background here, I would hope that 80%+ of the coverage is already provided by unit tests and module tests in cilium/cilium, so the test changes in cilium-CLI are for the remaining e2e & integration testing. Those tests should be able to autodetect whatever functionality they need in the cluster so that they can be skipped when running against older Cilium versions that don't have the requisite support. While we do need to integrate such tests into cilium/cilium CLI, I am not sure I necessarily see the need for this to be tightly coupled with the implementation that goes into cilium/cilium.

For (2), the existing process to release the new version and rely on renovate seems reasonable enough to me. If we think it needs to happen faster, then someone can post the PR directly instead of renovate, or otherwise we'll get to it when we get to it (and take advantage of the automation that removes some of the manual work there).

Copy link
Member

@joestringer joestringer Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My recollection of the origins of Cilium CLI in general was that it's intended to focus on the user experience in order to simplify installing, observing, maintaining live Cilium clusters. Tools like cilium connectivity test were conceived as a sanity check for user environments so we can quickly see whether a prod environment is healthy or not.

Over time, given that this framework provided mechanisms to develop new tests without the baggage of ginkgo and jenkins, I think it became an attractive place for Cilium developers to expand testing. Now we're at several minutes to assess the cluster sanity, and it's doing a lot of advanced feature testing. We're talking about reconfiguring the target environment to perform some of this testing, and starting to hide such "extra test" options behind special flags to avoid impacting prod environments.

I have a bit of concern that over time we're losing sight of the original UX-focused goals of Cilium-CLI and turning it into the Cilium developer CI tool instead. On the way on that path, we'll simplify some Cilium developer workflows (ideal) but also potentially raise the barrier to entry for cilium-cli contribution (not ideal).

Should the tests for advanced Cilium functionality exist in cilium/cilium or cilium/cilium-cli? Well, probably cilium/cilium. Does that mean that all of cilium-cli should then go into cilium/cilium? I'm not so sure. I'd hope that most testing can be done in unit or module / cell level tests directly in the cilium/cilium tree, rather than being e2e. Even if we do need e2e tests for some of the new functionality, is the user-facing tool the right place to implement a lot of that advanced testing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we model the cilium/cilium CI like this instead, and autopublish CLI images for every PR or every main commit... then provide a way for cilium/cilium PRs to point to those CLI dev images?

we already do that, sort of. you can specify a commit SHA of cilium-cli here: https://github.com/cilium/cilium/blob/6d4b2f7092d634f67c6192c85143051c49846245/.github/workflows/conformance-aks.yaml#L74. you just need to edit these workflow files. i guess some people find this step more annoying than others.

I'd hope that most testing can be done in unit or module / cell level tests directly in the cilium/cilium tree, rather than being e2e. Even if we do need e2e tests for some of the new functionality, is the user-facing tool the right place to implement a lot of that advanced testing?

i think this is the key question. cilium connectivity test unintentionally became "the" place to add e2e tests. maybe that's where things started going wrong 🔥 cc @brb.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gandro had another idea: use cilium-cli's Hook interface https://github.com/cilium/cilium-cli/blob/main/internal/cli/cmd/hooks.go. this might make more sense if we want to keep cilium-cli relatively lean, and at the same time let cilium developers use the same cilium connectivity test framework to add advanced e2e tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then provide a way for cilium/cilium PRs to point to those CLI dev images?

I added this - cilium/cilium#25228. However, bunch of workflows # pull_request: {} need to be uncommented though to make changes effective.

Copy link
Member

@brb brb Jun 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cilium connectivity test unintentionally became "the" place to add e2e tests.

We started adding more tests to the CLI connectivity suite, because otherwise we would have reinvented the CLI connectivity suite for e2e tests (as part of the Hyper Jump project). There was an attempt to introduce the K8s E2E testing fw, but we quickly realized that it's an tremendous effort, and the CLI already provides all the required constructs. Also, the CLI connectivity suite was executed by different GH workflows running in different environments. So adding a new test meant that it will be executed e.g., on EKS.

To address the long runtime concern - we will add a flag to prevent from inclusion more advanced tests which take time. However, I find it useful that users have means to do an extensive testing in their clusters.

I'd hope that most testing can be done in unit or module / cell level tests directly in the cilium/cilium tree, rather than being e2e.

Unfortunately, it's not possible for many features which involve interactions between many subsystems. I doubt that having Egress GW tests in a form of unit / control-plane would give us enough confidence.

In the end I am fine with any approach re cilium-cli (i.e., merge or having a Hook), as longs as test files live in cilium/cilium, and thus get executed for each PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at least once a week cilium/cilium would be synced with the most recent CLI revision

Unfortunately, everything which gets tested after a merge usually involves a few folks spending their cycles instead of a single PR owner.


## Goals

* Move [cilium/cilium-cli] code into [cilium/cilium] main branch under `cli/`
directory. This enables Cilium contributors to implement new features with
tests in a single pull request against [cilium/cilium] repository.
* Establish a backporting process for changes inside `cli/` directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of the backporting process?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are more details in "Impact: Backporting to stable branches" section. let me know if it makes sense 👀

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed the since in the middle there. I think the goal in that case is "Continue to test older Cilium versions against new versions of the CLI". The extra backporting overhead is a logical consequence of the proposal.

* Avoid disrupting the cilium-cli development and release processes as much as
possible.

## Non-Goals

* Rename the name of the `cilium-cli` binary from `cilium` to something else.
* Release `cilium-cli` as a part of Cilium releases.

## Proposal

This proposal consists of three steps to move [cilium/cilium-cli] code into
[cilium/cilium]:

### 1. Move [cilium/cilium-cli] code to [cilium/cilium] under `cli/` directory

I'm assuming that you want to preserve the Git history of [cilium/cilium-cli]. See
[michi-covalent/cilium] on how the code migration might look like. Squashing
these commits into a single commit is another option.

### 2. Make [cilium/cilium-cli] code a part of `github.com/cilium/cilium` Go module

Merge [cilium/cilium-cli] code into `github.com/cilium/cilium` Go module by
removing `cli/go.mod`. This allows [cilium/cilium-cli] code to depend on
[cilium/cilium] code in the same tree.

This is **not** to suggest `cilium-cli` to be tied to a specific `cilium`
version. The goal of this proposal is to make it easy for contributors to add
tests against the latest Cilium code. Continue to maintain the backward
compatibility of `cilium-cli` with respect to `cilium` versions.
joestringer marked this conversation as resolved.
Show resolved Hide resolved

### 3. Release `cilium-cli` from [cilium/cilium-cli] repository

We do not need to change the release process / cadence to accomplish the goals
defined in this CFP. Continue releasing `cilium-cli` from [cilium/cilium-cli] at
its own cadence using its own versioning that is independent of [cilium/cilium]
versioning as you have been doing in [cilium/cilium-cli] repository.

See [michi-covalent/cilium-cli] as an example of how [cilium/cilium-cli] repo
might look like after the code migration. Basically it contains:
Comment on lines +84 to +85
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option could just be a git mirror for the cli/ directory, something like how https://github.com/libbpf/libbpf mirrors in from the Linux kernel tree.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah @tklauser also suggested something similar.

it's a bit more complicated for Go modules because we need to modify all the import paths from github.com/cilium/cilium/cli back to github.com/cilium/cilium-cli.


- [`go.mod`](https://github.com/michi-covalent/cilium-cli/blob/865cac4f148ce88cd04d99f8ecfe61a0ae4f645f/go.mod)
- A simple [`main.go`](https://github.com/michi-covalent/cilium-cli/blob/865cac4f148ce88cd04d99f8ecfe61a0ae4f645f/main.go)
that calls [`NewCiliumCommand`](https://github.com/cilium/cilium-cli/blob/44ae1874fae4544c0db34dac89c11e37365b76ef/cli/cmd.go#L27)
- Release [tags](https://github.com/cilium/cilium-cli/tags) and [artifacts](https://github.com/cilium/cilium-cli/releases)

## Impacts / Key Questions

### Impact: Backporting to stable branches

Starting in `v1.15`, you need to backport bug fixes related to
`cilium connectivity test` command from `main` branch to affected stable
branches since these stable branches use their own copy of `cli/` to run
`cilium connectivity test`. Apply the same [backporting process] as the rest of
the Cilium codebase.

### Impact: Module name change

This proposal impacts users who depend on [cilium/cilium-cli] as a library.

- The module path needs to be updated from `github.com/cilium/cilium-cli` to
`github.com/cilium/cilium/cli`.
- Instead of depending on a version of `github.com/cilium/cilium-cli` (`v0.14.5`
for example), you need to inspect `go.mod` of a specific `github.com/cilium/cilium-cli`
version, and transitively find the `github.com/cilium/cilium/cli` version to
depend on.
- Alternatively, depend on a particular `github.com/cilium/cilium` version without
inspecting `go.mod` file in `github.com/cilium/cilium-cli`.

If you must retain the Go module name, you need to copy the `cilium-cli` code
back to [cilium/cilium-cli] repo and modify all the import paths back to
`github.com/cilium/cilium-cli` for each release..
See https://github.com/cilium/design-cfps/pull/9#discussion_r1211996796 for details.

### Key Question: How do you run CI for CLI? More specifically, how do you ensure that changes to Cilium CLI remain compatible with older released versions of Cilium?

Here is the list of GitHub workflows that currently run for each pull requests in [cilium/cilium-cli]
using the latest version of Cilium:

- https://github.com/cilium/cilium-cli/blob/main/.github/workflows/aks-byocni.yaml
- https://github.com/cilium/cilium-cli/blob/main/.github/workflows/eks-tunnel.yaml
- https://github.com/cilium/cilium-cli/blob/main/.github/workflows/eks.yaml
- https://github.com/cilium/cilium-cli/blob/main/.github/workflows/externalworkloads.yaml
- https://github.com/cilium/cilium-cli/blob/main/.github/workflows/gke.yaml
- https://github.com/cilium/cilium-cli/blob/main/.github/workflows/kind.yaml
- https://github.com/cilium/cilium-cli/blob/main/.github/workflows/multicluster.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which version of Cilium CLI will these tests use?

Note there may also be security considerations particularly for cloud credential cases. I believe we have mitigations in place for this threat model already in cilium-cli which we may need to import.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests build cilium-cli from the tree.


[cilium/cilium] has a similar coverage using CI images instead of released images,
so it's probably an overkill to port all of these workflows to [cilium/cilium]. My
proposal is to move https://github.com/cilium/cilium-cli/blob/main/.github/workflows/kind.yaml
to [cilium/cilium], and run this workflow for pull requests that modify files
under `cli/` directory. This ensures that any changes under `cli/` directory get
tested against a released version of Cilium.

## Alternative approach: Extend connectivity test from cilium/cilium repo
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at cilium/cilium-cli#1683 as an example of extending cilium connectivity test, i'm not sure how well this approach works in practice. maybe the hook interface needs to be extended to support stuff like:

  • add new flags to cilium connectivity test command
  • extend the setup logic


Sebastian Wicki suggests another approach to extend `cilium connectivity test`
command from [cilium/cilium] repo using cilium-cli's [Hook interface]. This
approach involves two steps:

### Step 1: Implement the Hook interface in cilium/cilium repo.

See [this commit](https://github.com/cilium/cilium/commit/26526a1890de6e05ca58d37b5a2da5822a3f22f0)
as an example for how to implement the [Hook interface]. This enables you to
extend `cilium connectivity test` without having to migrate the entire [cilium/cilium-cli]
code base into [cilium/cilium] repo. Then, build a local version of cilium-cli
with the additional hook during the CI.

### Step 2: Import the hook back in cilium/cilium-cli repo.

michi-covalent marked this conversation as resolved.
Show resolved Hide resolved
michi-covalent marked this conversation as resolved.
Show resolved Hide resolved
Import the hook from the previous step in [cilium/cilium-cli] repo so that
additional tests in [cilium/cilium] repo get included in upstream cilium-cli
releases. See [this commit](https://github.com/cilium/cilium-cli/commit/7b474e61be358fb0b9f2cd6fd075ba843b5d78f5)
as an example.

[sig-cilium-cli]: https://github.com/orgs/cilium/teams/sig-cilium-cli
[cilium/cilium]: https://github.com/cilium/cilium
[cilium/cilium-cli]: https://github.com/cilium/cilium-cli
[michi-covalent/cilium]: https://github.com/michi-covalent/cilium/pull/169
[michi-covalent/cilium-cli]: https://github.com/michi-covalent/cilium-cli/tree/cli-test
[backporting process]: https://docs.cilium.io/en/stable/contributing/release/backports/
[kubernetes/kubernetes staging directory]: https://github.com/kubernetes/kubernetes/tree/master/staging/
[apimachinery]: https://github.com/kubernetes/apimachinery
[Hook interface]: https://github.com/cilium/cilium-cli/blob/4a6cb76243704f96dfc02ff312b57e4d0ced0d84/internal/cli/cmd/hooks.go#L13-L17