Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support deploying one FRR container in Kind network #6488

Merged
merged 1 commit into from
Jul 22, 2024

Conversation

hongliangl
Copy link
Contributor

@hongliangl hongliangl commented Jun 26, 2024

This commit introduces a new parameter --deploy-external-frr
in ci/kind/kind-setup.sh, enabling to deploy one FRR container in
the Kind cluster network. This serves as the foundation for running
BGPPolicy e2e tests.

ci/kind/kind-setup.sh Outdated Show resolved Hide resolved
ci/kind/test-e2e-kind.sh Outdated Show resolved Hide resolved
ci/kind/test-e2e-kind.sh Outdated Show resolved Hide resolved
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will we ensure that normal Pods used for testing (toolbox, agnhost, nginx, ...) are not scheduled on this extra Node? I guess it will never be Ready and so Pods won't be scheduled there unless they have the appropriate toleration?

Unless I am mistaken, with this change we effectively go from a 3-Node K8s cluster for testing to a 4-Node K8s cluster? This is not an insignificant change as the Github runners have limited resources and an extra Node may put a strain on these resources. Would like @tnqn to review this PR as well.

ci/kind/test-e2e-kind.sh Outdated Show resolved Hide resolved
@hongliangl
Copy link
Contributor Author

hongliangl commented Jun 28, 2024

How will we ensure that normal Pods used for testing (toolbox, agnhost, nginx, ...) are not scheduled on this extra Node? I guess it will never be Ready and so Pods won't be scheduled there unless they have the appropriate toleration?

Unless I am mistaken, with this change we effectively go from a 3-Node K8s cluster for testing to a 4-Node K8s cluster? This is not an insignificant change as the Github runners have limited resources and an extra Node may put a strain on these resources. Would like @tnqn to review this PR as well.

I think we should only enable this option in the test that enables all features. I am planning to use it to test BGP only (install a host network FRR Pod on the Node without Antrea to mock remote BGP router).

@tnqn
Copy link
Member

tnqn commented Jun 28, 2024

How will we ensure that normal Pods used for testing (toolbox, agnhost, nginx, ...) are not scheduled on this extra Node? I guess it will never be Ready and so Pods won't be scheduled there unless they have the appropriate toleration?
Unless I am mistaken, with this change we effectively go from a 3-Node K8s cluster for testing to a 4-Node K8s cluster? This is not an insignificant change as the Github runners have limited resources and an extra Node may put a strain on these resources. Would like @tnqn to review this PR as well.

I think we should only enable this option in the test that enables all features. I am planning to use it to test BGP only (install a host network FRR Pod on the Node without Antrea to mock remote BGP router).

I have the same concern as @antoninbas. In my understanding the test just needs a BGP router, right? Why it needs a NotReady worker Node to run it, instead of just running the router directly via docker run like the existing "external-server"? we could even add the functionaliy to the existing container. The current approach that sets up another node introduces more code to deal with Node configuration, Pod affinity, overhead to run kubelet, containerd on that node, and the topology may look confusing as NotReady Node normally means something wrong.

@hongliangl
Copy link
Contributor Author

hongliangl commented Jun 28, 2024

How will we ensure that normal Pods used for testing (toolbox, agnhost, nginx, ...) are not scheduled on this extra Node? I guess it will never be Ready and so Pods won't be scheduled there unless they have the appropriate toleration?
Unless I am mistaken, with this change we effectively go from a 3-Node K8s cluster for testing to a 4-Node K8s cluster? This is not an insignificant change as the Github runners have limited resources and an extra Node may put a strain on these resources. Would like @tnqn to review this PR as well.

I think we should only enable this option in the test that enables all features. I am planning to use it to test BGP only (install a host network FRR Pod on the Node without Antrea to mock remote BGP router).

I have the same concern as @antoninbas. In my understanding the test just needs a BGP router, right? Why it needs a NotReady worker Node to run it, instead of just running the router directly via docker run like the existing "external-server"? we could even add the functionaliy to the existing container. The current approach that sets up another node introduces more code to deal with Node configuration, Pod affinity, overhead to run kubelet, containerd on that node, and the topology may look confusing as NotReady Node normally means something wrong.

Got that. If so, I will try to enhance the current external server to mock BGP server. I think we need to add some framework code to operate the external server, for example:

  • Install FRR
  • start FRR BGP
  • Add BGP peer, and the peer is the k8s Nodes selected by the test BGPPolicy.
  • Run wget/curl to verify the advertised routes from k8s cluster.

Maybe more details should be considered. How do you think about this proposal? @tnqn @antoninbas

@tnqn
Copy link
Member

tnqn commented Jun 28, 2024

Got that. If so, I will try to enhance the current external server to mock BGP server. I think we need to add some framework code to operate the external server, for example:

  • Install FRR
  • start FRR BGP
  • Add BGP peer, and the peer is the k8s Nodes selected by the test BGPPolicy.
  • Run wget/curl to verify the advertised routes from k8s cluster.

Maybe more details should be considered. How do you think about this proposal? @tnqn @antoninbas

Not sure what you mean by framework code. In my mind you just need an image for 1, customize docker run command to start BGP for 2 and 3, figure out a way to validate 4 in tests.

@hongliangl
Copy link
Contributor Author

Got that. If so, I will try to enhance the current external server to mock BGP server. I think we need to add some framework code to operate the external server, for example:

  • Install FRR
  • start FRR BGP
  • Add BGP peer, and the peer is the k8s Nodes selected by the test BGPPolicy.
  • Run wget/curl to verify the advertised routes from k8s cluster.

Maybe more details should be considered. How do you think about this proposal? @tnqn @antoninbas

Not sure what you mean by framework code. In my mind you just need an image for 1, customize docker run command to start BGP for 2 and 3, figure out a way to validate 4 in tests.

I didn't make it clear. I just meant that I may need to add some fundamental functions because I'm not sure if we have some existing convenient functions to configure the external server you added. @tnqn

@tnqn
Copy link
Member

tnqn commented Jun 28, 2024

I didn't make it clear. I just meant that I may need to add some fundamental functions because I'm not sure if we have some existing convenient functions to configure the external server you added. @tnqn

I think there is no problem to have another parameter to deploy a new bgp router (not based on the external server), given they don't have many things in common, and there isn't much code to bring up the external server.

@hongliangl hongliangl force-pushed the 20240625-exclude-node-in-ci branch from 3c275ca to 0ad4c1d Compare July 1, 2024 01:03
@hongliangl hongliangl changed the title Support deploying an additional Node in Kind without Antrea Support deploying an FRR container in Kind network Jul 1, 2024
@hongliangl hongliangl force-pushed the 20240625-exclude-node-in-ci branch 2 times, most recently from 0fb4110 to 9557b49 Compare July 1, 2024 06:51
Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall

ci/kind/kind-setup.sh Show resolved Hide resolved
XinShuYang
XinShuYang previously approved these changes Jul 3, 2024
Copy link
Contributor

@XinShuYang XinShuYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ci/kind/kind-setup.sh Outdated Show resolved Hide resolved
if $bgp_policy; then
external_frr_cid=$(docker ps -f name="^antrea-external-frr" --format '{{.ID}}')
external_frr_ips=$(docker inspect $external_frr_cid -f '{{.NetworkSettings.Networks.kind.IPAddress}},{{.NetworkSettings.Networks.kind.GlobalIPv6Address}}')
EXTRA_ARGS="$EXTRA_ARGS --external-frr-cid $external_frr_cid --external-frr-ips $external_frr_ips"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need to pass external_frr_cid to test code? I think this would cause the test can only run in kind cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we could run some commands in the FRR container by passing the cid to test code.

I think this would cause the test can only run in kind cluster.

A question: is the external agnhost container only supported in Kind cluster? I thought that it was the same for the external frr container.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we could run some commands in the FRR container by passing the cid to test code.

Is there a public API to access the FRR application, like "/clientip" of agnhost container?

A question: is the external agnhost container only supported in Kind cluster? I thought that it was the same for the external frr container.

The test code doesn't make any assumption on how the external agnhost server is deployed so technically you can specify a VM IP or a public IP. It only needs the server's IP.
By passing the container ID to test code, I assume you would then use it to run some validation, then the code can be only ran in a kind cluster, and can't work with real BGP router.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a public API to access the FRR application, like "/clientip" of agnhost container?

I'm afraid we cannot configure it through network. I found that FRR application doesn't provide any API based TCP/IP. It can be configured through Unix Socket API locally.

By passing the container ID to test code, I assume you would then use it to run some validation, then the code can be only ran in a kind cluster, and can't work with real BGP router.

I use the container ID to configure the FRR application using the command like docker exec -it <container ID> <command>.

Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, please replace an FRR to 'a FRR' in your PR title and description.

test/e2e/framework.go Outdated Show resolved Hide resolved
This commit introduces a new parameter `--deploy-external-frr`
in `ci/kind/kind-setup.sh`, enabling to deploy one FRR container in
the Kind cluster network. This serves as the foundation for running
BGPPolicy e2e tests.

Signed-off-by: Hongliang Liu <lhongliang@vmware.com>
@hongliangl hongliangl changed the title Support deploying an FRR container in Kind network Support deploying one FRR container in Kind network Jul 16, 2024
@hongliangl
Copy link
Contributor Author

@tnqn @luolanzone Do we have any more comments about this? If not, could we merge this PR? Thanks

Copy link
Contributor

@luolanzone luolanzone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the efforts, no more comments from my side, @tnqn could you take another look?

Copy link
Member

@tnqn tnqn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tnqn
Copy link
Member

tnqn commented Jul 22, 2024

/skip-all

1 similar comment
@tnqn
Copy link
Member

tnqn commented Jul 22, 2024

/skip-all

@tnqn tnqn merged commit 943d9b3 into antrea-io:main Jul 22, 2024
52 of 58 checks passed
@hongliangl hongliangl deleted the 20240625-exclude-node-in-ci branch July 22, 2024 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants