Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NetworkPolicy Recommendation on Snowflake Backend #137

Merged
merged 5 commits into from
Dec 6, 2022

Conversation

dreamtalen
Copy link
Contributor

In this commit, we add the NetworkPolicy Recommendation Application on the Snowflake backend. NetworkPolicy Recommendation is implemented as Snowflake UDFs and could be running on Snowflake warehouses.

We add two commands in theia-sf CLI tool: create-udfs and policy-recommendation. create-udfs is used to upload and create UDFs on Snowflake database then user can call policy-recommendation command to run NetworkPolicy Recommendation application.

NetworkPolicy Recommendation UDFs are written in Python and stored under udf/ directory.

Signed-off-by: Yongming Ding dyongming@vmware.com

@dreamtalen dreamtalen added this to the Theia v0.4 release milestone Oct 28, 2022
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some initial high-level comments, let me know what you think:

  • kubernetes.zip should not be checked-in. Can we version it (i.e., add the K8s version to the file name) and upload it to antrea.io?
  • I am wondering if we really need the "create-udfs" command here. I think it will be easier if everything can be configured by the "onboard" command at first.
  • I think the user should not have to provide --udf-dir and I think that the CLI binary should also be completely self-contained (i.e., not assume that the Github repository has been cloned). As a result, I believe that we should embed the UDF code into the antrea-sf binary (using go:embed). At the end of "onboarding", the CLI tool will upload the code to the Snowflake stage and create the function object.

@dreamtalen
Copy link
Contributor Author

some initial high-level comments, let me know what you think:

  • kubernetes.zip should not be checked-in. Can we version it (i.e., add the K8s version to the file name) and upload it to antrea.io?
  • I am wondering if we really need the "create-udfs" command here. I think it will be easier if everything can be configured by the "onboard" command at first.
  • I think the user should not have to provide --udf-dir and I think that the CLI binary should also be completely self-contained (i.e., not assume that the Github repository has been cloned). As a result, I believe that we should embed the UDF code into the antrea-sf binary (using go:embed). At the end of "onboarding", the CLI tool will upload the code to the Snowflake stage and create the function object.

Thanks Antonin.
1: sure.
2 & 3: sounds good to me, including creating UDFs in onboard could save a step for users. Just wondering if create-udfs command is still valuable, if users want to create or modify udfs by themselves? Or we probably tell users to run onboard again in that case.

@antoninbas
Copy link
Contributor

antoninbas commented Oct 28, 2022

Just wondering if create-udfs command is still valuable, if users want to create or modify udfs by themselves? Or we probably tell users to run onboard again in that case.

If they modify a UDF (or if we modify a UDF across releases), the idea is to run onboard again (the antrea-sf binary will have changed at that point).
Custom UDFs would be a pretty advanced use case, so I would expect users who want to create their own UDFs to be able to use the Snowflake API and do everything themselves. That being said, we can consider introducing the create-udfs command later on if it is appropriate. Re-building and running onboard again should also work for that case as you pointed out.
I want to keep things as simple as possible for the "standard" use case: for each new Theia release, users can download antrea-sf and run antrea-sf onboard again to upgrade to the latest stack and functionalities, with no breakage or data loss.

snowflake/README.md Outdated Show resolved Hide resolved
snowflake/cmd/policyRecommendation.go Outdated Show resolved Hide resolved
snowflake/cmd/policyRecommendation.go Outdated Show resolved Hide resolved
snowflake/cmd/policyRecommendation.go Outdated Show resolved Hide resolved
snowflake/cmd/policyRecommendation.go Show resolved Hide resolved
Comment on lines 238 to 241
// stackName, stateBackendURL, secretsProviderURL, region, workdir are not provided here
// because we only uses snowflake client in this command.
mgr := infra.NewManager(logger, "", "", "", "", warehouseName, "", verbose)
rows, err := mgr.RunUdf(ctx, query, databaseName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think you should find a way to avoid using the infra manager here. Does RunUdf really need to be a method of the manager? You could use the Snowflake client directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I was thinking reuseing RunUdf function in the future for other udf commands. Moved it under infra/udfs package now.

)

// Download a file from the given url to the current directory
func DownloadFile(url string, filename string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function should use a Context object. Maybe do some unification with

func downloadAndUntar(ctx context.Context, logger logr.Logger, url string, dir string) error {
. You could improve this function and use it as part of the downloadAndUntar implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, merged it into downloadAndUntar function

"time"
)

func ParseTimestamp(t string, now time.Time, defaultT ...time.Time) (string, error) {
Copy link
Contributor

@antoninbas antoninbas Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at usage of this function, I am not seeing defaultT being used anywhere. IMO, using a variadic function for this (passing an optional default value to the function) is also a bit strange. Something like this makes more sense IMO:

func ParseTimestamp(t string, now time.Time) (string, error) {
        return ParseTimestampWithDefault(t, now, now)
}

func ParseTimestampWithDefault(t string, now time.Time, defaultT time.Time) (string, error) {
        // ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out, I was copying this function from our gitlab repo. Remove the defaultT argument for now since we don't have any usecases.

@@ -0,0 +1,79 @@
// Copyright 2022 Antrea Authors.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this needs to be under pkg/infra at all. How about pkg/udfs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, moved it there.


// Embed the udfs directory here because go:embed doesn't support embeding in subpackages

//go:embed udf/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe we could rename the directory to udfs/?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed


func main() {
infra.UdfFs = udfFs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still avoid doing something like this. It can be confusing to change the value of a global variable which is in a different package. Maybe check out what I did for database migrations as a reference: https://github.com/antrea-io/theia/tree/main/snowflake/database

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Antonin, I created udfs.go in a separate package like database migrations and also found some utils function like WriteMigrationsToDisk could be shared here, changed it to WriteEmbedDirToDisk and moved to utils package.

"github.com/go-logr/logr"
)

func DownloadAndUntar(ctx context.Context, logger logr.Logger, url string, dir string, filename string, untar bool) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably better to have 2 different functions: Download and DownloadAndUntar, and have the second one call the first one for its implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make senses, addressed.

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small comments, otherwise LGTM

Comment on lines 133 to 139
if !result {
_, err := c.db.ExecContext(multi_statement_context, query)
return nil, err
} else {
rows, err := c.db.QueryContext(multi_statement_context, query)
return rows, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think we could have 2 different functions for the 2 different cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, changed.


func (c *client) UseSchema(ctx context.Context, name string) error {
query := fmt.Sprintf("USE SCHEMA %s", name)
c.logger.Info("Snowflake query", "query", query)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the verbosity values for the logs don't seem consistent (see V(2) above)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, addressed.

@@ -5,6 +5,7 @@ all: bin

.PHONY: bin
bin:
make -C udfs/udfs/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it doesn't make sense to have this in the bin target
You should define a new target and add it as a requirement for the all target so that it is built by default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, thanks for pointing out.

@@ -37,6 +36,8 @@ import (

"antrea.io/theia/snowflake/database"
sf "antrea.io/theia/snowflake/pkg/snowflake"
utils "antrea.io/theia/snowflake/pkg/utils"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: renaming the package doesn't seem necessary here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, addressed.

// See the License for the specific language governing permissions and
// limitations under the License.

package utils
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Quan pointed out in the past that utils was too generic for a package name. So I wonder if we could move these functions to pkg/utils/file/file.go (package can then be imported as fileutils).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, changed.

return nil
}

func WriteEmbedDirToDisk(ctx context.Context, logger logr.Logger, fsys fs.FS, embedPath string, dest string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given that there is nothing specific to embed in this function implementation, I suggest you rename it to writeFSDirToDisk and rename embedPath to fsysPath.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, addressed.

In this commit, we add the NetworkPolicy Recommendation Application on
the Snowflake backend. NetworkPolicy Recommendation is implemented as
Snowflake UDFs and could be running on Snowflake warehouses. Users could
start it using command policy-recommendation in theia-sf CLI tool.

NetworkPolicy Recommendation UDFs are written in Python and stored under
udf/ directory.

Signed-off-by: Yongming Ding <dyongming@vmware.com>
Signed-off-by: Yongming Ding <dyongming@vmware.com>
Signed-off-by: Yongming Ding <dyongming@vmware.com>
Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you are done with code changes, please make sure you run the whole thing end-to-end before we merge

snowflake/pkg/snowflake/snowflake.go Outdated Show resolved Hide resolved
"time"
)

func ParseTimestamp(t string, now time.Time) (string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could add some table-driven unit tests for this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, added.

Signed-off-by: Yongming Ding <dyongming@vmware.com>
@dreamtalen
Copy link
Contributor Author

when you are done with code changes, please make sure you run the whole thing end-to-end before we merge

Sure, I just tried e2e with Antrea Flow exporter & Aggregator locally.

Copy link
Contributor

@antoninbas antoninbas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 275 to 278
policyRecommendationCmd.Flags().Int("isolationMethod", 1, `Network isolation preference. Currently we have 3 options:
1: Recommending allow ANP/ACNP policies, with default deny rules only on Pods which have an allow rule applied
2: Recommending allow ANP/ACNP policies, with default deny rules for whole cluster
3: Recommending allow K8s NetworkPolicies only`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use String instead of Int for isolationMethod to make it more readable as we do for policy-type in ClickHouse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, addressed.

Comment on lines 113 to 115
fmt.Fprintf(&queryBuilder, `AND
flowEndSeconds >= '%s'
`, endTime)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be wrong, but could you just check if this should be flowEndSeconds < '%s'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! You're right

@@ -488,3 +416,118 @@ func (m *Manager) Offboard(ctx context.Context) error {
_, err := m.run(ctx, true)
return err
}

func createUdfs(ctx context.Context, logger logr.Logger, databaseName string, warehouseName string, workdir string) error {
logger.Info("creating UDFs")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious about whether we have a convention on whether to capital the first letter for logger.Info? I see both in code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer capitaling the first letter, let me fix that.

Signed-off-by: Yongming Ding <dyongming@vmware.com>
Copy link
Contributor

@yanjunz97 yanjunz97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dreamtalen
Copy link
Contributor Author

/theia-test-e2e

@dreamtalen dreamtalen merged commit af9d0fe into antrea-io:main Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants