Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UPSTREAM: <carry>: Added support for TLS to MLMD GRPC Server #683

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

hbelmiro
Copy link
Contributor

@hbelmiro hbelmiro commented Aug 9, 2024

The issue resolved by this Pull Request:

Resolves https://issues.redhat.com/browse/RHOAIENG-4971

This PR depends on:

Description of the changes:

Added support for TLS to MLMD GRPC Server

Testing instructions

There are 2 scenarios for testing:

TLS Enabled
  1. Deploy the following DSPA

    apiVersion: datasciencepipelinesapplications.opendatahub.io/v1alpha1
    kind: DataSciencePipelinesApplication
    metadata:
      name: dspa
    spec:
      dspVersion: v2
      podToPodTLS: true
      apiServer:
        image: "quay.io/opendatahub/ds-pipelines-api-server:pr-72"
        argoDriverImage: "quay.io/opendatahub/ds-pipelines-driver:pr-72"
        argoLauncherImage: "quay.io/opendatahub/ds-pipelines-launcher:pr-72"
        enableSamplePipeline: true
      persistenceAgent:
        image: "quay.io/opendatahub/ds-pipelines-persistenceagent:pr-72"
      scheduledWorkflow:
        image: "quay.io/opendatahub/ds-pipelines-scheduledworkflow:pr-72"
      mlmd:  
        deploy: true  # Optional component
        grpc:
          image: "quay.io/opendatahub/mlmd-grpc-server:latest"
        envoy:
          image: "registry.redhat.io/openshift-service-mesh/proxyv2-rhel8:2.3.9-2"
      mlpipelineUI:
        deploy: true  # Optional component 
        image: "quay.io/opendatahub/ds-pipelines-frontend:pr-72"
      objectStorage:
        minio:
          deploy: true
          image: 'quay.io/opendatahub/minio:RELEASE.2019-08-14T20-37-41Z-license-compliance'
  2. Run the sample pipeline

  3. The run must complete successfully

TLS Disabled
  1. Deploy the following DSPA

    apiVersion: datasciencepipelinesapplications.opendatahub.io/v1alpha1
    kind: DataSciencePipelinesApplication
    metadata:
      name: dspa
    spec:
      dspVersion: v2
      podToPodTLS: false
      apiServer:
        image: "quay.io/opendatahub/ds-pipelines-api-server:pr-72"
        argoDriverImage: "quay.io/opendatahub/ds-pipelines-driver:pr-72"
        argoLauncherImage: "quay.io/opendatahub/ds-pipelines-launcher:pr-72"
        enableSamplePipeline: true
      persistenceAgent:
        image: "quay.io/opendatahub/ds-pipelines-persistenceagent:pr-72"
      scheduledWorkflow:
        image: "quay.io/opendatahub/ds-pipelines-scheduledworkflow:pr-72"
      mlmd:  
        deploy: true  # Optional component
        grpc:
          image: "quay.io/opendatahub/mlmd-grpc-server:latest"
        envoy:
          image: "registry.redhat.io/openshift-service-mesh/proxyv2-rhel8:2.3.9-2"
      mlpipelineUI:
        deploy: true  # Optional component 
        image: "quay.io/opendatahub/ds-pipelines-frontend:pr-72"
      objectStorage:
        minio:
          deploy: true
          image: 'quay.io/opendatahub/minio:RELEASE.2019-08-14T20-37-41Z-license-compliance'
  2. Run the sample pipeline

  3. The run must complete successfully

Checklist

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@dsp-developers
Copy link
Contributor

A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-683
An OCP cluster where you are logged in as cluster admin is required.

To use this image run the following:

cd $(mktemp -d)
git clone git@github.com:opendatahub-io/data-science-pipelines-operator.git
cd data-science-pipelines-operator/
git fetch origin pull/683/head
git checkout -b pullrequest 85a6ca9ebb69f28e05780de54509b0d3855cb533
oc new-project opendatahub
make deploy IMG="quay.io/opendatahub/data-science-pipelines-operator:pr-683"

More instructions here on how to deploy and test a Data Science Pipelines Application.

Signed-off-by: hbelmiro <helber.belmiro@gmail.com>
@opendatahub-io opendatahub-io deleted a comment from dsp-developers Aug 30, 2024
@opendatahub-io opendatahub-io deleted a comment from dsp-developers Aug 30, 2024
@opendatahub-io opendatahub-io deleted a comment from dsp-developers Aug 30, 2024
@opendatahub-io opendatahub-io deleted a comment from dsp-developers Aug 30, 2024
@opendatahub-io opendatahub-io deleted a comment from dsp-developers Aug 30, 2024
@opendatahub-io opendatahub-io deleted a comment from dsp-developers Aug 30, 2024
@opendatahub-io opendatahub-io deleted a comment from dsp-developers Aug 30, 2024
@hbelmiro hbelmiro changed the title RHOAIENG-4971 WIP UPSTREAM: <carry>: Added support for TLS to MLMD GRPC Server Aug 30, 2024
@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-683

@hbelmiro hbelmiro marked this pull request as ready for review August 30, 2024 19:27
@VaniHaripriya
Copy link
Contributor

VaniHaripriya commented Sep 4, 2024

/verified
/lgtm
Deployed DSPO and verified the two scenarios mentioned in the testing instructions. Created pipeline runs and they completed successfully.

Comment on lines +345 to +348
// +kubebuilder:validation:Optional
CertificateContents string `json:"certificateContents,omitempty"`
// +kubebuilder:validation:Optional
PrivateKeyContents string `json:"privateKeyContents,omitempty"`
Copy link
Contributor

@gregsheremeta gregsheremeta Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love that the user needs to paste these in here. For apiserver, we are able to just use the paths to the OpenShift-provded certs like so:

name: ds-pipeline-api-server
command: ['/bin/apiserver']
args:
- --config=/config
- -logtostderr=true
{{ if .APIServer.EnableSamplePipeline }}
- --sampleconfig=/config/sample_config.json
{{ end }}
{{ if .PodToPodTLS }}
- --tlsCertPath=/etc/tls/private/tls.crt
- --tlsCertKeyPath=/etc/tls/private/tls.key
{{ end }}

could we not do something similar for here? Could we not read those file contents instead of having them be manually pasted here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is automatically set by the operator and is not for users to use (technically they can). I had to add these fields to be able to set the contents in the template.
Regarding setting the contents, I also don't like it, but the MLMD config requires that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: having to set the contents -- sure, no way around that.

I had to add these fields to be able to set the contents in the template.

Ok, I understand the reason. I really don't like adding fields to the API for this though. Hmm. How sure are we that there's no other way to feed things to the manifestival templating? That doesn't sound right to me.

Copy link
Contributor

@gregsheremeta gregsheremeta Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think you could just add two strings roughly here

CustomKfpLauncherConfigMapData string

and then populate them

and then use those to feed the templating engine (example:

)

and avoid changing the DSPA API

Comment on lines 918 to 926
func (p *DSPAParams) getMlmdGrpcCertificatesSecret(ctx context.Context, client client.Client) (v1.Secret, error) {
secretName := types.NamespacedName{
Namespace: p.Namespace,
Name: "ds-pipeline-metadata-grpc-tls-certs-" + p.Name,
}
secret := v1.Secret{}
err := client.Get(ctx, secretName, &secret)
return secret, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be a static helper method like we do for configmaps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 906 to 907
if apierrs.IsNotFound(err) && ignoreSecretNotFound {
logger.Info("Ignoring secret not found due to DSPO_MLMD_IGNORE_AUTO_GENERATED_SECRET_NOT_FOUND_ERROR = true")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a project admin, I think I might find this log message confusing.

Maybe something more user friendly like "Did not detect an MLMD GRPC Service-CA provisioned TLS kubernetes secret, continuing (DSPA is configured to ignore)."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a more broader point, I actually think we might want to report an error here instead, because we enter here at: if params.PodToPodTLS == true, in this case we expect the service-ca provided tls secret to exist, if it doesn't, something has gone wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ssl_config {
server_cert: "{{.MLMD.GRPC.CertificateContents}}"
server_key: "{{.MLMD.GRPC.PrivateKeyContents}}"
client_verify: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to confirm, this is for mutual tls right? i.e. server also verifying client tls certs? I'm thinking maybe we should drop a comment here mentioning that? wdyt? in proto files comments can be added via //

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -893,3 +898,29 @@ func (p *DSPAParams) ExtractParams(ctx context.Context, dsp *dspa.DataSciencePip

return nil
}

func (p *DSPAParams) LoadMlmdCertificates(ctx context.Context, client client.Client, logger logr.Logger) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we keep ExtractParams() at the bottom?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

DBConnectionTimeoutConfigName = "DSPO.HealthCheck.Database.ConnectionTimeout"
RequeueTimeConfigName = "DSPO.RequeueTime"
ApiServerIncludeOwnerReferenceConfigName = "DSPO.ApiServer.IncludeOwnerReference"
MlmdIgnoreAutoGeneratedSecretNotFoundError = "DSPO.Mlmd.IgnoreAutoGeneratedSecretNotFoundError"
Copy link
Contributor

@HumairAK HumairAK Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate on why this is needed? when do we want to ignore the secret?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll ignore the secret on tests. Since tests don't run on OpenShift, the secret is never created.
Should I add a comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this might be a hack to get around our existing tests. ATM I'm thinking we should just not be doing PodtoPodTLS testing that is is not an live openshift environment. See: https://issues.redhat.com/browse/RHOAIENG-11646, Can we just set this to false, and not require the creation of such secrets to begin with in this mode?

What I'd rather see is we add openshift ci, and add integration tests for when podtopodtls == true
Since this feature inherently requires a working live openshift ci environment, the tests for this should be integration tests in nature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have tests that have TLS enabled, for example

@openshift-ci openshift-ci bot removed the lgtm label Sep 6, 2024
Comment on lines +62 to +72
err := r.ApplyDir(dsp, params, mlmdTemplatesDir+"/"+mlmdGrpcService)
if err != nil {
return err
}

if params.PodToPodTLS {
err = params.LoadMlmdCertificates(ctx, r.Client, r.Log)
if err != nil {
return err
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, okay so I think you're doing this because we need the tls/key pair generated to submit into the protofile. One issue I see with this is that it could result in potential failures initially, because service-ca is creating these secrets asynchronously, and may not create it in time before we attempt to fetch it.

I'm wondering if it might be better to do this in stages:

  • first reconcile and create the service
  • ensure dspo triggers a reconcile when the service-ca tls secret is created
  • in the new reconcile, detect that we are podtopodtls == true && secretCreated == true, then proceed with r.ApplyDir(mlmddir)

Interested to hear thoughts, @gregsheremeta / @hbelmiro

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HumairAK we would wait for the new reconcile indefinitely and never know if the secret was never created.
What about trying to LoadMlmdCertificates and repeat until a timeout (maybe 30s?)? It would be simpler and we can log an error if the certificates are not found.

Copy link
Contributor

openshift-ci bot commented Sep 6, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from humairak. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: hbelmiro <helber.belmiro@gmail.com>
Signed-off-by: hbelmiro <helber.belmiro@gmail.com>
Signed-off-by: hbelmiro <helber.belmiro@gmail.com>
Signed-off-by: hbelmiro <helber.belmiro@gmail.com>
Signed-off-by: hbelmiro <helber.belmiro@gmail.com>
@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-683

@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-683

Signed-off-by: hbelmiro <helber.belmiro@gmail.com>
Signed-off-by: hbelmiro <helber.belmiro@gmail.com>
@dsp-developers
Copy link
Contributor

Change to PR detected. A new PR build was completed.
A new image has been built to help with testing out this PR: quay.io/opendatahub/data-science-pipelines-operator:pr-683

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants