Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NetworkPolicyRecommendation Controller implementations #114

Merged
merged 1 commit into from
Oct 11, 2022

Conversation

yanjunz97
Copy link
Contributor

@yanjunz97 yanjunz97 commented Sep 20, 2022

This PR implements the controller workflow for NetworkPolicyRecommendation
and adds RecommendedNetworkPolicy CRD to store the results.

To run the NetworkPolicyRecommendation, please follow the steps below:

  1. Deploy ClickHouse, Spark Operator, and Theia Manager.
  2. Apply the following yaml.
apiVersion: crd.theia.antrea.io/v1alpha1
kind: NetworkPolicyRecommendation
metadata:
  name: pr-test
spec:
  jobType: "initial"
  policyType: "anp-deny-applied"
  executorInstances: 1
  driverCoreRequest: 200m
  driverMemory: 512M
  executorCoreRequest: 200m
  executorMemory: 512M
  excludeLabels: true
  toServices: true

You can observe the Spark Application starts, runs and stops. You could run the following command to get the details about the NetworkPolicyRecommendation anytime.

kubectl get npr pr-test -n flow-visibility -o json

The output would look like the following. When the job is finished, you will also be able to see the recommendedNetworkPolicy in the NetworkPolicyRecommendation status.

{
    "apiVersion": "crd.theia.antrea.io/v1alpha1",
    "kind": "NetworkPolicyRecommendation",
    ...
    "status": {
        "completedStages": 4,
        "sparkApplication": "97186a4d-5681-470d-82d6-e890ba539c34",
        "state": "RUNNING",
        "totalStages": 5,
    }
}

After deleting the NetworkPolicyRecommendation pr-test, the corresponding RecommendedNetworkPolicy CR and the row in ClickHouse database will be deleted at the same time.

Signed-off-by: Yanjun Zhou zhouya@vmware.com

@codecov-commenter
Copy link

codecov-commenter commented Sep 20, 2022

Codecov Report

Merging #114 (a746f9d) into main (8a3826a) will increase coverage by 17.74%.
The diff coverage is 73.90%.

@@             Coverage Diff             @@
##             main     #114       +/-   ##
===========================================
+ Coverage   30.65%   48.39%   +17.74%     
===========================================
  Files          11       15        +4     
  Lines        1442     2461     +1019     
  Branches        0       41       +41     
===========================================
+ Hits          442     1191      +749     
- Misses        978     1194      +216     
- Partials       22       76       +54     
Flag Coverage Δ
unit-tests 45.65% <73.90%> (+15.00%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/controller/networkpolicyrecommendation/util.go 67.06% <67.06%> (ø)
...ntroller/networkpolicyrecommendation/controller.go 75.78% <77.63%> (ø)
...om-plugins/grafana-chord-plugin/src/ChordPanel.tsx 71.58% <0.00%> (ø)
...-plugins/grafana-sankey-plugin/src/SankeyPanel.tsx 89.13% <0.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@yanjunz97 yanjunz97 marked this pull request as draft September 20, 2022 23:04
return err
}

func (c *NPRecommendationController) startSparkApplication(npReco *crdv1alpha1.NetworkPolicyRecommendation) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we abstract these validation logic to some querrier-like interface so it can be shared with API server REST handler? This way we can move the error reporting at REST CREATE time. Otherwise a success response from API server could end up in invalid state due to user input error, which looks confusing. cc @yuntanghsu

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Yanjun pointed out offline, as the similar checks are also done on CLI, and no other component / user is expected to be directly calling the API server at the moment, validation at REST handler doesn't provide much benefit at the moment. probably can skip this at the moment until when UI is to be added and we can reevaluate.

Copy link
Contributor

@yuntanghsu yuntanghsu Oct 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the user directly create a NPR CR to start the SparkApplication? If no, maybe we can validate the input only on CLI side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not suggest user to do this. But we do not prevent user from doing this. I think the validation here could be a protection in case this happens.

pkg/controller/networkpolicyrecommendation/controller.go Outdated Show resolved Hide resolved
pkg/controller/networkpolicyrecommendation/controller.go Outdated Show resolved Hide resolved
pkg/controller/networkpolicyrecommendation/controller.go Outdated Show resolved Hide resolved
pkg/controller/networkpolicyrecommendation/controller.go Outdated Show resolved Hide resolved
pkg/controller/networkpolicyrecommendation/controller.go Outdated Show resolved Hide resolved
pkg/controller/networkpolicyrecommendation/controller.go Outdated Show resolved Hide resolved
for i := 0; i < defaultWorkers; i++ {
go wait.Until(c.worker, time.Second, stopCh)
}
<-stopCh
}

func (c *NPRecommendationController) deletionworker() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need some GC logic to recycle stale resources (rnp and db entry) in case npr is removed during manager downtime, or manager is restarted without finishing the cleanup job. this may be addressed in a future pr.

Copy link
Contributor Author

@yanjunz97 yanjunz97 Oct 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of GC sounds great to me. As we may remove rnp in next release, maybe it will be a periodic job checking db to remove an entry with an id not in the current npr list?
Besides, I'm also thinking about whether to do the similar GC for the stale SparkApplication. But I'm a little bit worry about whether this will delete some jobs not created by the manager in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one way around is we introduce additional metadata (annotation / labels) for the ones created by Theia. This way we won't bother with other unrelated SparkApplications. Periodical is good but at least this should be triggered once per mgr init.

}

func getPolicyRecommendationResult(client kubernetes.Interface, id string, namespace string) (recommendedNetworkPolicy *crdv1alpha1.RecommendedNetworkPolicy, err error) {
connect, err := setupClickHouseConnection(client, namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we cache / resue connection within the controller instead of reestablishing connection each time db check is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

pkg/controller/networkpolicyrecommendation/controller.go Outdated Show resolved Hide resolved
Copy link
Contributor

@wsquan171 wsquan171 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm. will check again after rebase and approve.

recommendedNetworkPolicy := obj.(*crdv1alpha1.RecommendedNetworkPolicy)
err = c.crdClient.CrdV1alpha1().RecommendedNetworkPolicies(namespace).Delete(context.TODO(), recommendedNetworkPolicy.Name, metav1.DeleteOptions{})
if err != nil {
aggregatedErrorMsg += fmt.Sprintf("failed to delete the RecommendedNetworkPolicy %s, error: %v\n", recommendedNetworkPolicy.Name, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typically we don't want to have multi-line error messages. could collect failed name and err in 2 lists and call fmt when returning error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense to me. Updated, thanks!

@yanjunz97 yanjunz97 force-pushed the manager-controller branch 2 times, most recently from 8ddd85e to c255d55 Compare October 10, 2022 16:21
Signed-off-by: Yanjun Zhou <zhouya@vmware.com>
Copy link
Contributor

@yuntanghsu yuntanghsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@wsquan171 wsquan171 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yanjunz97 yanjunz97 merged commit 995bf04 into antrea-io:main Oct 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants