Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backend] metadata-grpc-deployment cannot connect to mysql #8224

Closed
jielou opened this issue Aug 31, 2022 · 13 comments
Closed

[backend] metadata-grpc-deployment cannot connect to mysql #8224

jielou opened this issue Aug 31, 2022 · 13 comments

Comments

@jielou
Copy link

jielou commented Aug 31, 2022

Environment

  • How did you deploy Kubeflow Pipelines (KFP)?
    follow the standalone deployment doc, but run `kubectl create -k platform-agnostic' instead.
  • KFP version:
    1.5
  • KFP SDK version:
    NA
  • EKS cluster k8s version: 1.19

Steps to reproduce

I follow the guideline the deploy the standalone kubeflow pipelines. However, the metadata-grpc-deployment pod always crush, and logs shows

F ml_metadata/metadata_store/metadata_store_server_main.cc:220] Non-OK-status: status status: Internal: mysql_real_connect failed: errno: 2005, error: Unknown MySQL server host ‘mysql’ (-3)MetadataStore cannot be created with the given connection config.

mysql pod is running fine.

mysql: ready for connections.

also meta-writer, ml-pipeline,ml-pipeline-persistenceagent also has crashloopbackoff or 0/1 ready status.

Expected result

pods are running ok.

Materials and Reference


Impacted by this bug? Give it a 👍.

@gkcalat
Copy link
Member

gkcalat commented Sep 1, 2022

Hi @jielou !
Could you please try a newer version of KFP standalone deployment (e.g. 1.8.5) and let us know if you still encounter the problem.

@jielou
Copy link
Author

jielou commented Oct 3, 2022

@gkcalat Hi. I tried to install 1.8.5, but metadata-grpc-deployment still failed. Logs showed MetadataStore cannot be created with the given connection config. I installed with the platform agnostic in AWS EKS. Can someone help? thanks.

More details:

  • Pipelines UI showed failed to retrieve list of pipelines

@gkcalat
Copy link
Member

gkcalat commented Oct 3, 2022

Hi @jielou

Did you follow these instruction? I was able to deploy KFP 1.8.5 on GCP. There are also AWS instructions.

/CC @surajkota, as this might be AWS-specific issue?

FYI, here are instructions for standalone installation using kustomize.

@jielou
Copy link
Author

jielou commented Oct 3, 2022

@gkcalat thanks for the reply. I followed the first link of instruction in platform-agnostic in an existing EKS cluster. I haven't tried AWS instructions because I don't want to use S3 and RDS for now.

@gkcalat
Copy link
Member

gkcalat commented Oct 3, 2022

As we are waiting for @surajkota or someone else from AWS, can you provide what exactly you run to deploy KFP?

Besides, you can try deploying KFP 1.8.5:

export PIPELINE_VERSION=1.8.5
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=$PIPELINE_VERSION"
kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/platform-agnostic?ref=$PIPELINE_VERSION"
kubectl wait pods -l application-crd-id=kubeflow-pipelines -n kubeflow --for condition=Ready --timeout=1800s
kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80

Then try accessing Kubeflow Pipelines UI in your browser: http://localhost:8080/. If you are using ssh, you have to connect with port forwarding: ssh -R 9902:localhost:8080 <remote hostname> and then access it locally via http://localhost:9902/.

@jielou
Copy link
Author

jielou commented Oct 3, 2022

sure. I cloned the pipelines repo and then checkout 1.8.5 tag. and then followed the instructions in readme:

KFP_ENV=platform-agnostic
kubectl apply -k cluster-scoped-resources/
kubectl wait crd/applications.app.k8s.io --for condition=established --timeout=60s
kubectl apply -k "env/${KFP_ENV}/"
kubectl wait pods -l application-crd-id=kubeflow-pipelines -n kubeflow --for condition=Ready --timeout=1800s
kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80

I will try the instructions you sent me, but I think they did the same job. thanks.

@jielou
Copy link
Author

jielou commented Oct 3, 2022

I got a new error in the deployment (happened in both using the instructions you shared and my old deployment method). The cache server failed to launch now.

Warning  FailedMount  3m41s                kubelet            Unable to attach or mount volumes: unmounted volumes=[webhook-tls-certs], unattached volumes=[kubeflow-pipelines-cache-token-88wc7 webhook-tls-certs]: timed out waiting for the condition
  Warning  FailedMount  83s (x3 over 8m17s)  kubelet            Unable to attach or mount volumes: unmounted volumes=[webhook-tls-certs], unattached volumes=[webhook-tls-certs kubeflow-pipelines-cache-token-88wc7]: timed out waiting for the condition
  Warning  FailedMount  4s (x13 over 10m)    kubelet            MountVolume.SetUp failed for volume “webhook-tls-certs” : secret “webhook-server-tls” not found

it worked before probably because I did not clean up resources after installing 1.5. This time, I cleaned up the cluster before installing 1.8.5 and it failed.

@surajkota
Copy link
Contributor

surajkota commented Oct 4, 2022

Hi @jielou, do you want to install Kubeflow pipelines standalone? or are you interested in trying out the full Kubeflow?

Please clean up your existing installation and follow one of these options according to your choice:

If you are looking to install Kubeflow pipelines 1.8.5 standalone on AWS, you would need to:

  1. Install cert-manager
  2. Use the cert-manager overlays under https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/env/cert-manager, so
export PIPELINE_VERSION=1.8.5
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/cert-manager/cluster-scoped-resources?ref=$PIPELINE_VERSION"
kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/cert-manager/dev?ref=$PIPELINE_VERSION"
kubectl wait pods -l application-crd-id=kubeflow-pipelines -n kubeflow --for condition=Ready --timeout=1800s
kubectl port-forward -n kubeflow svc/ml-pipeline-ui 8080:80

We need update the documentation here: https://www.kubeflow.org/docs/components/pipelines/v1/installation/standalone-deployment/ and remove outdated README content

@jielou
Copy link
Author

jielou commented Oct 5, 2022

@surajkota thanks for the instructions. I want to install Kubeflow pipelines 1.8.5 standalone on AWS. Which version of cert-manager would you recommend? I used the latest one but saw the error when installing kubeflow pipelines:

unable to recognize “env/cert-manager/dev”: no matches for kind “Certificate” in version “http://cert-manager.io/v1”
unable to recognize “env/cert-manager/dev”: no matches for kind “Issuer” in version “http://cert-manager.io/v1”

@surajkota
Copy link
Contributor

surajkota commented Oct 7, 2022

@jielou What EKS version are you on?

Copy link

github-actions bot commented Mar 2, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Mar 2, 2024
@rimolive
Copy link
Member

I'm closing this issue as it's open for more than a year. You can reopen it if the issue persists.

/close

Copy link

@rimolive: Closing this issue.

In response to this:

I'm closing this issue as it's open for more than a year. You can reopen it if the issue persists.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Closed
Development

No branches or pull requests

4 participants