Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script for migrating from other CNI to Antrea #5677

Merged
merged 1 commit into from
Jan 16, 2024

Conversation

hjiajing
Copy link
Contributor

@hjiajing hjiajing commented Nov 8, 2023

A new image "antrea-migrator" is responsible for restarting all Pods on each Nodes. It is a DaemonSet that tries to kill sandboxes to restart the Pods in-place. In this way, the cluster could be migrated from old CNIs to Antrea.

@hjiajing hjiajing force-pushed the antrea-migration branch 4 times, most recently from 294746a to 18d73e3 Compare November 8, 2023 06:29
@luolanzone luolanzone added the action/release-note Indicates a PR that should be included in release notes. label Nov 8, 2023
@luolanzone luolanzone added this to the Antrea v1.15 release milestone Nov 8, 2023
@luolanzone
Copy link
Contributor

Please update the title and comment: s/scrtip/script/
Link this PR to the issue. You should add more descriptions for this PR to clarify what does this tool do.

@hjiajing hjiajing linked an issue Nov 8, 2023 that may be closed by this pull request
@hjiajing
Copy link
Contributor Author

hjiajing commented Nov 8, 2023

Please update the title and comment: s/scrtip/script/ Link this PR to the issue. You should add more descriptions for this PR to clarify what does this tool do.

Sure. I will add more descriptions for the antctl migrate tool.

@hjiajing hjiajing changed the title [WIP] Add scritp for migrating from other CNI to Antrea [WIP] Add script for migrating from other CNI to Antrea Nov 8, 2023
@hjiajing hjiajing force-pushed the antrea-migration branch 9 times, most recently from fbc3a98 to 6356dbe Compare November 9, 2023 08:30
@hjiajing hjiajing changed the title [WIP] Add script for migrating from other CNI to Antrea Add script for migrating from other CNI to Antrea Nov 10, 2023
@hjiajing hjiajing force-pushed the antrea-migration branch 2 times, most recently from ccd7f28 to b288390 Compare November 12, 2023 04:43
build/images/scripts/restart_sandbox Outdated Show resolved Hide resolved
build/yamls/antrea-migrator.yml Outdated Show resolved Hide resolved
docs/migrate-to-antrea.md Outdated Show resolved Hide resolved
docs/migrate-to-antrea.md Outdated Show resolved Hide resolved
docs/migrate-to-antrea.md Outdated Show resolved Hide resolved
pkg/antctl/raw/migrate/convertnetworkpolicy/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/migrate/convertnetworkpolicy/command.go Outdated Show resolved Hide resolved
pkg/antctl/raw/migrate/convertnetworkpolicy/command.go Outdated Show resolved Hide resolved

if len(rule) != 0 {
antreaRules = append(antreaRules,
v1beta1.Rule{Action: migrate.ToPtr(v1beta1.RuleActionReject)})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this, why Reject rule is added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Calico NP, if one NP is applied to some Pods, then these Pods' ingress and egress traffic will be rejected except Allow fields in that NP, so I added a default Reject rule.

pkg/antctl/raw/migrate/convertnetworkpolicy/command.go Outdated Show resolved Hide resolved
@@ -1,4 +1,4 @@
// Copyright 2021 Antrea Authors
// Copyright 2023 Antrea Authors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reformated this file accidentally, resulting in the import files being sorted. Then make code-gen command will edit this file every time even if I undo my format operation. Maybe I could edit the year manually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what's kind of reformat are you referring to, but I feel it's better to avoid this change if there is no code change on the CRD.

pkg/antctl/raw/migrate/convertnetworkpolicy/command.go Outdated Show resolved Hide resolved
@hjiajing hjiajing force-pushed the antrea-migration branch 4 times, most recently from 904508a to 370b89c Compare November 15, 2023 06:27
build/images/scripts/migrate_cni Outdated Show resolved Hide resolved
build/images/scripts/migrate_cni Outdated Show resolved Hide resolved
docs/migrate-to-antrea.md Outdated Show resolved Hide resolved
docs/migrate-to-antrea.md Outdated Show resolved Hide resolved
docs/migrate-to-antrea.md Outdated Show resolved Hide resolved

The migration process is divided into three steps:

1. Clean up old CNI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the old CNI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The" Added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably missed this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, thanks for the reminder. Addressed.

docs/migrate-to-antrea.md Outdated Show resolved Hide resolved
docs/migrate-to-antrea.md Outdated Show resolved Hide resolved
docs/migrate-to-antrea.md Outdated Show resolved Hide resolved
docs/migrate-to-antrea.md Outdated Show resolved Hide resolved
@hjiajing hjiajing force-pushed the antrea-migration branch 2 times, most recently from 7d5bc4e to 266531a Compare January 5, 2024 02:15
Comment on lines 13 to 30
- apiGroups:
- ""
resources:
- pods
verbs:
- create
- apiGroups:
- ""
resources:
- pods/exec
verbs:
- create
- apiGroups:
- "apps/v1"
resources:
- daemonsets
verbs:
- create
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does it require these permissions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed these rules.

build/images/Dockerfile.build.migrator Outdated Show resolved Hide resolved
build/images/Dockerfile.build.migrator Outdated Show resolved Hide resolved
build/images/scripts/migrate_cni Outdated Show resolved Hide resolved
build/images/scripts/migrate_cni Outdated Show resolved Hide resolved
Comment on lines 47 to 49
After Antrea is installed in the cluster, the next step is to restart all
Pods in the cluster in-place by the following command. This step will create
a DaemonSet in the cluster, which will restart all Pods in the cluster in-place.
Copy link
Member

@tnqn tnqn Jan 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After Antrea is up and running, you can now deploy Antrea migrator by the following command. The migrator runs as a DaemonSet, antrea-migrator, in the cluster, which will restart all non hostNetwork Pods in the cluster in-place and perform necessary network resource cleanup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 56 to 62
network management and IPAM from the old CNI. In order to avoid the Pods
being rescheduled, we restart all Pods in-place by deploying a DaemonSet
named `antrea-migrator`, which will run a Pod on each Node. The
`antrea-migrator` Pod will stop all containerd tasks of Pods with CNI network on
each Node, and the containerd tasks will be restarted by the containerd
service. In this way, all Pods in the cluster will not be rescheduled but will
be restarted in-place.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
network management and IPAM from the old CNI. In order to avoid the Pods
being rescheduled, we restart all Pods in-place by deploying a DaemonSet
named `antrea-migrator`, which will run a Pod on each Node. The
`antrea-migrator` Pod will stop all containerd tasks of Pods with CNI network on
each Node, and the containerd tasks will be restarted by the containerd
service. In this way, all Pods in the cluster will not be rescheduled but will
be restarted in-place.
network management and IPAM from the old CNI. In order to avoid the Pods
being rescheduled and minimize service downtime, the migrator restarts
all non-hostNetwork Pods in-place by restarting their sandbox container.
Therefore, it's expected to see these Pods' `RESTARTS` being increased
by 1 like below:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

docs/migrate-to-antrea.md Show resolved Hide resolved
@@ -107,6 +107,8 @@ export IMG_TAG=$VERSION
export IMG_NAME=projects.registry.vmware.com/antrea/antrea-ubuntu
./hack/generate-standard-manifests.sh --mode release --out "$OUTPUT_DIR"

sed "s|antrea\/antrea-migrator:latest|$(echo $IMG_NAME | sed "s/ubuntu/migrator/"):$VERSION|g" ./build/yamls/antrea-migrator.yml > "$OUTPUT_DIR"/antrea-migrator.yml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this line since we do not need to add it to release asserts.

a DaemonSet in the cluster, which will restart all Pods in the cluster in-place.

```bash
$ kubectl apply -f https://github.com/antrea-io/antrea/releases/download/v1.15.0/antrea-migrator.yml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$ kubectl apply -f https://github.com/antrea-io/antrea/releases/download/v1.15.0/antrea-migrator.yml
$ kubectl apply -f https://github.com/antrea-io/antrea/main/build/yamls/antrea-migrator.yml

Like antrea-aks-node-init.yml, the resource is version-generic and case-specific, I think we don't need to add it to release assets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to main/build/yamls

Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits

@@ -0,0 +1,33 @@
# Copyright 2023 Antrea Authors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2024

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to 2024.

build/images/scripts/migrate_cni Outdated Show resolved Hide resolved

After Antrea is installed in the cluster, the next step is to restart all
Pods in the cluster in-place by the following command. This step will create
a DaemonSet in the cluster, which will restart all Pods in the cluster in-place.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe highlight antrea-migrator is supported since v1.15.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. A description of the version is added in front of this doc.

@@ -107,6 +107,8 @@ export IMG_TAG=$VERSION
export IMG_NAME=projects.registry.vmware.com/antrea/antrea-ubuntu
./hack/generate-standard-manifests.sh --mode release --out "$OUTPUT_DIR"

sed "s|antrea\/antrea-migrator:latest|$(echo $IMG_NAME | sed "s/ubuntu/migrator/"):$VERSION|g" ./build/yamls/antrea-migrator.yml > "$OUTPUT_DIR"/antrea-migrator.yml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use "projects.registry.vmware.com/antrea/antrea-migrator:$VERSION" directly to replace antrea-migrator:latest? There is an on-going change #5794 to split agent and controller images. Maybe better to have your own image string in case there is conflict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this line since we do not need to add it to release asserts.

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM

build/yamls/antrea-migrator.yml Outdated Show resolved Hide resolved
docs/migrate-to-antrea.md Outdated Show resolved Hide resolved
docs/migrate-to-antrea.md Outdated Show resolved Hide resolved
migrate-example-6d6b97f96b-jpflg 1/1 Running 1 (23s ago) 2m5s 10.10.1.5 test-worker <none> <none>
```

When we meet the condition that all Pods of antrea-migrator are in `Running`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

antrea-migrator -> antrea-migrator

Please change all occasions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, as well as the the other occasions.

The reason for restarting all Pods is that Antrea needs to take over the
network management and IPAM from the old CNI. In order to avoid the Pods
being rescheduled and minimize service downtime, the migrator restarts
all non-hostNetwork Pods in-place by restarting their sandbox container.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

container -> containers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

migrate-example-6d6b97f96b-jpflg 1/1 Running 1 (23s ago) 2m5s 10.10.1.5 test-worker <none> <none>
```

When we meet the condition that all Pods of antrea-migrator are in `Running`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"When the antrea-migrator Pods on all Nodes are"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.


When we meet the condition that all Pods of antrea-migrator are in `Running`
state, the migration process is completed. You can then remove the antrea-migrator
DaemonSet safely by the following command:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by -> with

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed.

Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two nits.
Btw, your DCO check failed, please check if your sign-off info is missing.

docs/migrate-to-antrea.md Outdated Show resolved Hide resolved
docs/migrate-to-antrea.md Show resolved Hide resolved
@hjiajing hjiajing force-pushed the antrea-migration branch 2 times, most recently from 9c8fd50 to 19c95cd Compare January 10, 2024 03:21
docs/migrate-to-antrea.md Outdated Show resolved Hide resolved
tnqn
tnqn previously approved these changes Jan 11, 2024
Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

luolanzone
luolanzone previously approved these changes Jan 11, 2024
Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one nit.

docs/migrate-to-antrea.md Outdated Show resolved Hide resolved
luolanzone
luolanzone previously approved these changes Jan 12, 2024
@tnqn
Copy link
Member

tnqn commented Jan 12, 2024

@jianjuns @antoninbas let me know if you will take another look, thanks.

@jianjuns
Copy link
Contributor

I have no extra comment.

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple nits, otherwise lgtm

Comment on lines 58 to 60
command:
- "sleep"
- "infinity"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's nicer to default to the pause command in your container image
the pause command handles signals well, unlike sleep. Deleting Pods where a container uses sleep infinity can take 30s (default grace period IIRC).
see https://github.com/antrea-io/image-utils/blob/25f87067c80ab22c0b8574e1bb13cd859f669707/images/toolbox/Dockerfile#L60-L62

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/pause is added to the image and the command of the YAML file is removed.

Comment on lines 6 to 7
NOTE: The following is a reference list of CNIs and versions we have already
verified the migration process. CNIs and versions that are not listed here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following is a reference list of CNIs and versions for which we have verified the migration process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

network management and IPAM from the old CNI. In order to avoid the Pods
being rescheduled and minimize service downtime, the migrator restarts
all non-hostNetwork Pods in-place by restarting their sandbox containers.
Therefore, it's expected to see these Pods' `RESTARTS` being increased
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's expected to see the RESTARTS count for these Pods being increased

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Add a YAML file antrea-migrator.yml to migrate clusters
with other CNIs to Antrea. It will restart all Pods in-place.

A new image "antrea-migrator" is responsible for restarting
all Pods on each Nodes. It is a DaemonSet that tries to kill
sandboxes to restart the Pods in-place.

Signed-off-by: hjiajing <hjiajing@vmware.com>
@antoninbas
Copy link
Contributor

/skip-all

@antoninbas antoninbas merged commit e469bd9 into antrea-io:main Jan 16, 2024
48 of 54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/release-note Indicates a PR that should be included in release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A proposal for CNI migration from Calico to Antrea
6 participants