Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubeflow 1.7 Terraform Deployment Not Working on K8s 1.25 #609

Closed
AlexandreBrown opened this issue Mar 7, 2023 · 16 comments · Fixed by #644
Closed

Kubeflow 1.7 Terraform Deployment Not Working on K8s 1.25 #609

AlexandreBrown opened this issue Mar 7, 2023 · 16 comments · Fixed by #644
Labels
bug Something isn't working

Comments

@AlexandreBrown
Copy link
Contributor

Describe the bug
I ran into a timeout issue during the installation of Kubeflow 1.6.1 using AWS RDS S3 Cognito Terraform make deploy command.
I ran the make delete command to delete created resources since the install was only partial.
Running the make delete command did not delete the resources.
Failed install logs tail:

module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [10m30s elapsed]
╷
│ Warning: Resource targeting is in effect
│ 
│ You are creating a plan with the -target option, which means that the
│ result of this plan may not represent all of the changes requested by the
│ current configuration.
│ 
│ The -target option is not for routine use, and is provided only for
│ exceptional situations such as recovering from errors or mistakes, or when
│ Terraform specifically suggests to use it as part of an error message.
╵
╷
│ Warning: Applied changes may be incomplete
│ 
│ The plan was created with the -target option in effect, so some changes
│ requested in the configuration may have been ignored and the output values
│ may not be fully updated. Run the following command to verify that no other
│ changes are pending:
│     terraform plan
│ 	
│ Note that the -target option is not suitable for routine use, and is
│ provided only for exceptional situations such as recovering from errors or
│ mistakes, or when Terraform specifically suggests to use it as part of an
│ error message.
╵
╷
│ Warning: Redundant empty provider block
│ 
│   on cognito-rds-s3-components/main.tf line 1:
│    1: provider "aws" {
│ 
│ Earlier versions of Terraform used empty provider blocks ("proxy provider
│ configurations") for child modules to declare their need to be passed a
│ provider configuration by their callers. That approach was ambiguous and is
│ now deprecated.
│ 
│ If you control this module, you can migrate to the new declaration syntax
│ by removing all of the empty provider "aws" blocks and then adding or
│ updating an entry like the following to the required_providers block of
│ module.kubeflow_components:
│     aws = {
│       source = "hashicorp/aws"
│       configuration_aliases = [
│         aws.aws,
│         aws.virginia,
│       ]
│     }
│ 
│ (and one more similar warning elsewhere)
╵
╷
│ Warning: Experimental feature "module_variable_optional_attrs" is active
│ 
│   on .terraform/modules/eks_blueprints_kubernetes_addons.ondat/locals.tf line 2, in terraform:
│    2:   experiments = [module_variable_optional_attrs]
│ 
│ Experimental features are subject to breaking changes in future minor or
│ patch releases, based on feedback.
│ 
│ If you have feedback on the design of this feature, please open a GitHub
│ issue to discuss it.
│ 
│ (and 7 more similar warnings elsewhere)
╵
╷
│ Warning: "default_secret_name" is no longer applicable for Kubernetes v1.24.0 and above
│ 
│   with module.kubeflow_components.module.kubeflow_secrets_manager_irsa.kubernetes_service_account_v1.irsa[0],
│   on .terraform/modules/kubeflow_components.kubeflow_secrets_manager_irsa/modules/irsa/main.tf line 16, in resource "kubernetes_service_account_v1" "irsa":
│   16: resource "kubernetes_service_account_v1" "irsa" {
│ 
│ Starting from version 1.24.0 Kubernetes does not automatically generate a
│ token for service accounts, in this case, "default_secret_name" will be
│ empty
╵
╷
│ Warning: Helm release "kubeflow-pipelines" was created but has a failed status. Use the `helm` command to investigate the error, correct it, then run Terraform again.
│ 
│   with module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0],
│   on .terraform/modules/kubeflow_components.kubeflow_pipelines.helm_addon/modules/kubernetes-addons/helm-addon/main.tf line 1, in resource "helm_release" "addon":
│    1: resource "helm_release" "addon" {
│ 
╵
╷
│ Error: timed out waiting for the condition
│ 
│   with module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0],
│   on .terraform/modules/kubeflow_components.kubeflow_pipelines.helm_addon/modules/kubernetes-addons/helm-addon/main.tf line 1, in resource "helm_release" "addon":
│    1: resource "helm_release" "addon" {
│ 
╵
Makefile:30: recipe for target 'deploy-kubeflow-components' failed
make: *** [deploy-kubeflow-components] Error 1

Make delete relevant logs:

Destroy complete! Resources: 0 destroyed.

(I can provide the full log on requst but it's pretty long and repeteitive with the same output as the one above.

Steps To Reproduce
deploy.Dockerfile

FROM ubuntu:18.04

ARG KUBEFLOW_RELEASE_VERSION
ARG AWS_RELEASE_VERSION

WORKDIR /tmp/

RUN apt update \
    && apt install --yes \
        git \
        curl \
        unzip \
        tar \
        make \
        sudo \
        vim \
        wget \
    && git clone https://github.com/awslabs/kubeflow-manifests.git \
    && cd kubeflow-manifests \
    && git checkout ${AWS_RELEASE_VERSION} \
    && git clone --branch ${KUBEFLOW_RELEASE_VERSION} https://github.com/kubeflow/manifests.git upstream \
    && make install-tools


WORKDIR /tmp/kubeflow-manifests/deployments/cognito-rds-s3/terraform

# Disable automatic subdomain creation since our root domain is not on AWS Route53
ARG CREATE_SUBDOMAIN="false"

ARG CLUSTER_NAME

ARG CLUSTER_REGION

ARG EKS_VERSION

# Name of an existing Route53 root domain (e.g. example.com)
ARG ROOT_DOMAIN
# Name of the subdomain to create (e.g. platform.example.com)
ARG SUBDOMAIN

ARG USER_POOL_NAME

ARG USE_RDS="true"

ARG USE_S3="true"

ARG USE_COGNITO="true"

ARG LOAD_BALANCER_SCHEME=internet-facing

ARG NOTEBOOK_ENABLE_CULLING=true

ARG NOTEBOOK_CULL_IDLE_TIMEOUT_SECONDS=120

ARG NOTEBOOK_IDLENESS_CHECK_PERIOD=10

ARG SECRET_RECOVERY_WINDOW_IN_DAYS=30

ARG NODE_INSTANCE_TYPE

ARG AWS_ACCESS_KEY_ID
ENV AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID}

ARG AWS_SECRET_ACCESS_KEY
ENV AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY}

ARG AWS_DEFAULT_REGION
ENV AWS_DEFAULT_REGION=${AWS_DEFAULT_REGION}

RUN    echo "minio_aws_access_key_id=\"${MINIO_AWS_ACCESS_KEY_ID}\"" >> sample.auto.tfvars \
    && echo "minio_aws_secret_access_key=\"${MINIO_AWS_SECRET_ACCESS_KEY}\"" >> sample.auto.tfvars \
    && echo "create_subdomain=\"${CREATE_SUBDOMAIN}\"" >> sample.auto.tfvars \
    && echo "cluster_name=\"${CLUSTER_NAME}\"" >> sample.auto.tfvars \
    && echo "cluster_region=\"${CLUSTER_REGION}\"" >> sample.auto.tfvars \
    && echo "eks_version=\"${EKS_VERSION}\"" >> sample.auto.tfvars \
    && echo "generate_db_password=\"true\"" >> sample.auto.tfvars \
    && echo "aws_route53_root_zone_name=\"${ROOT_DOMAIN}\"" >> sample.auto.tfvars \
    && echo "aws_route53_subdomain_zone_name=\"${SUBDOMAIN}\"" >> sample.auto.tfvars \
    && echo "cognito_user_pool_name=\"${USER_POOL_NAME}\"" >> sample.auto.tfvars \
    && echo "use_rds=\"${USE_RDS}\"" >> sample.auto.tfvars \
    && echo "use_s3=\"${USE_S3}\"" >> sample.auto.tfvars \
    && echo "use_cognito=\"${USE_COGNITO}\"" >> sample.auto.tfvars \
    && echo "load_balancer_scheme=\"${LOAD_BALANCER_SCHEME}\"" >> sample.auto.tfvars \
    && echo "notebook_enable_culling=\"${NOTEBOOK_ENABLE_CULLING}\"" >> sample.auto.tfvars \
    && echo "notebook_cull_idle_time=\"${NOTEBOOK_CULL_IDLE_TIMEOUT_SECONDS}\"" >> sample.auto.tfvars \
    && echo "notebook_idleness_check_period=\"${NOTEBOOK_IDLENESS_CHECK_PERIOD}\"" >> sample.auto.tfvars \
    && echo "secret_recovery_window_in_days=\"${SECRET_RECOVERY_WINDOW_IN_DAYS}\"" >> sample.auto.tfvars \
    && echo "node_instance_type=\"${NODE_INSTANCE_TYPE}\"" >> sample.auto.tfvars \
    && terraform init \
    && terraform plan

CMD ["make", "deploy"]
  1. Build the image :
docker build \
    --build-arg AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
    --build-arg AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
    --build-arg AWS_DEFAULT_REGION=$AWS_DEFAULT_REGION \
    --build-arg KUBEFLOW_RELEASE_VERSION=$KUBEFLOW_RELEASE_VERSION \
    --build-arg AWS_RELEASE_VERSION=$AWS_RELEASE_VERSION \
    --build-arg MINIO_AWS_ACCESS_KEY_ID=$MINIO_AWS_ACCESS_KEY_ID \
    --build-arg MINIO_AWS_SECRET_ACCESS_KEY=$MINIO_AWS_SECRET_ACCESS_KEY \
    --build-arg CLUSTER_NAME=$CLUSTER_NAME \
    --build-arg CLUSTER_REGION=$CLUSTER_REGION \
    --build-arg EKS_VERSION=$EKS_VERSION \
    --build-arg ROOT_DOMAIN=$ROOT_DOMAIN \
    --build-arg SUBDOMAIN=$SUBDOMAIN \
    --build-arg USER_POOL_NAME=$USER_POOL_NAME \
    --build-arg NODE_INSTANCE_TYPE=$NODE_INSTANCE_TYPE \
    -t kf-deployment \
    . \
    -f deploy.Dockerfile
  1. Run the deployment docker image :
docker run --rm kf-deployment
  1. Observe timeout error
  2. Try deleting the resources using docker run --rm kf-deployment make delete (or change the CMD part of the docker file and rebuild and re-run if you prefer).
  3. Observe that no error is shown but not resources is actually deleted.
  4. To confirm this, try redeploying kubeflow, you'll get creating IAM Role (kf-test-cluster-role): EntityAlreadyExists: Role with name kf-test-cluster-role already exists. (kf-test is my cluster name).

Expected behavior
I expected no timeout but it can happen, if we can customize the timeout period it would be great, I don't mind waiting 1hour for the install I just want it to get done.
As for the make delete, it should delete all resources created otherwise

Environment

  • Kubernetes version 1.24
  • Using EKS (yes/no), if so version? 1.24
  • Kubeflow version 1.6.1
  • AWS build number v1.6.1-aws-b1.0.1
  • AWS service targeted (S3, RDS, etc.) RDS S3 Cognito
@AlexandreBrown AlexandreBrown added the bug Something isn't working label Mar 7, 2023
@surajkota
Copy link
Contributor

This is unexpected behavior. On a quick search I dont see any references to where this role is created. Pretty sure I have tried deleting and recreating long time back. This should be fixed

On a side note, what do you think about the experience with terraform deployment method? does it bring down some of your heavy lifting on your side and help in making the installation configurable, declarative and maintainable?

@AlexandreBrown
Copy link
Contributor Author

AlexandreBrown commented Mar 13, 2023

@surajkota Ok tested using aws latest main + the fix from this PR and using KUBEFLOW_RELEASE_VERSION=v1.7.0-rc.2 and I am still facing the timeout issue when using Terraform.
Notice how the timeout happens at the same step at each time.
Wondering if this PR from @ryansteakley would fix the issue.
Timeout always happens after 10m30s elapsed of waiting for module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]
Logs:

module.kubeflow_components.module.kubeflow_istio_resources.module.helm_addon.helm_release.addon[0]: Creation complete after 2s [id=kubeflow-istio-resources]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Creating...
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [10s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [20s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [30s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [40s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [50s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [1m0s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [1m10s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [1m20s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [1m30s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [1m40s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [1m50s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [2m0s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [2m10s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [2m20s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [2m30s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [2m40s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [2m50s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [3m0s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [3m10s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [3m20s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [3m30s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [3m40s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [3m50s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [4m0s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [4m10s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [4m20s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [4m30s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [4m40s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [4m50s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [5m0s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [5m10s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [5m20s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [5m30s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [5m40s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [5m50s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [6m0s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [6m10s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [6m20s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [6m30s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [6m40s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [6m50s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [7m0s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [7m10s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [7m20s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [7m30s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [7m40s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [7m50s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [8m0s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [8m10s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [8m20s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [8m30s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [8m40s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [8m50s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [9m0s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [9m10s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [9m20s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [9m30s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [9m40s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [9m50s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [10m0s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [10m10s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [10m20s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [10m30s elapsed]
╷
│ Warning: Resource targeting is in effect
│ 
│ You are creating a plan with the -target option, which means that the
│ result of this plan may not represent all of the changes requested by the
│ current configuration.
│ 
│ The -target option is not for routine use, and is provided only for
│ exceptional situations such as recovering from errors or mistakes, or when
│ Terraform specifically suggests to use it as part of an error message.
╵
╷
│ Warning: Applied changes may be incomplete
│ 
│ The plan was created with the -target option in effect, so some changes
│ requested in the configuration may have been ignored and the output values
│ may not be fully updated. Run the following command to verify that no other
│ changes are pending:
│     terraform plan
│ 	
│ Note that the -target option is not suitable for routine use, and is
│ provided only for exceptional situations such as recovering from errors or
│ mistakes, or when Terraform specifically suggests to use it as part of an
│ error message.
╵
╷
│ Warning: Redundant empty provider block
│ 
│   on cognito-rds-s3-components/main.tf line 1:
│    1: provider "aws" {
│ 
│ Earlier versions of Terraform used empty provider blocks ("proxy provider
│ configurations") for child modules to declare their need to be passed a
│ provider configuration by their callers. That approach was ambiguous and is
│ now deprecated.
│ 
│ If you control this module, you can migrate to the new declaration syntax
│ by removing all of the empty provider "aws" blocks and then adding or
│ updating an entry like the following to the required_providers block of
│ module.kubeflow_components:
│     aws = {
│       source = "hashicorp/aws"
│       configuration_aliases = [
│         aws.aws,
│         aws.virginia,
│       ]
│     }
│ 
│ (and one more similar warning elsewhere)
╵
╷
│ Warning: Experimental feature "module_variable_optional_attrs" is active
│ 
│   on .terraform/modules/eks_blueprints_kubernetes_addons.ondat/locals.tf line 2, in terraform:
│    2:   experiments = [module_variable_optional_attrs]
│ 
│ Experimental features are subject to breaking changes in future minor or
│ patch releases, based on feedback.
│ 
│ If you have feedback on the design of this feature, please open a GitHub
│ issue to discuss it.
│ 
│ (and 7 more similar warnings elsewhere)
╵
╷
│ Warning: "default_secret_name" is no longer applicable for Kubernetes v1.24.0 and above
│ 
│   with module.kubeflow_components.module.kubeflow_secrets_manager_irsa.kubernetes_service_account_v1.irsa[0],
│   on .terraform/modules/kubeflow_components.kubeflow_secrets_manager_irsa/modules/irsa/main.tf line 16, in resource "kubernetes_service_account_v1" "irsa":
│   16: resource "kubernetes_service_account_v1" "irsa" {
│ 
│ Starting from version 1.24.0 Kubernetes does not automatically generate a
│ token for service accounts, in this case, "default_secret_name" will be
│ empty
╵
╷
│ Warning: Helm release "kubeflow-pipelines" was created but has a failed status. Use the `helm` command to investigate the error, correct it, then run Terraform again.
│ 
│   with module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0],
│   on .terraform/modules/kubeflow_components.kubeflow_pipelines.helm_addon/modules/kubernetes-addons/helm-addon/main.tf line 1, in resource "helm_release" "addon":
│    1: resource "helm_release" "addon" {
│ 
╵
╷
│ Error: timed out waiting for the condition
│ 
│   with module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0],
│   on .terraform/modules/kubeflow_components.kubeflow_pipelines.helm_addon/modules/kubernetes-addons/helm-addon/main.tf line 1, in resource "helm_release" "addon":
│    1: resource "helm_release" "addon" {
│ 
╵
make: *** [deploy-kubeflow-components] Error 1
Makefile:30: recipe for target 'deploy-kubeflow-components' failed

As for the role, it created the role automatically again when I ran the deployment using Terraform :
image
image

Therefore I'm afraid that the make delete would not work again...

On a side note, what do you think about the experience with terraform deployment method? does it bring down some of your heavy lifting on your side and help in making the installation configurable, declarative and maintainable?

I was not able to get an install working yet with Terraform but the deployment feels much nicer compared to Kustomize.
I have very little experience with Terraform but it's simpler to understand I feel like, it's easy to understand the variable substitution principle that is being applied using the sample.auto.tfvars.
Kustomize on the other hand requires creating overlays and it can get quite messy I find (I had to create a lot of overlay for our current Kubeflow 1.4.1 setup and it was not the smoothest experience).

I think Terraform deployment option will probably easier to adapt to our needs, we'll be able to make PR to add missing variables in the sample.auto.tfvars and eventually get everything we need (if we are missing something).
I'll be able to get a better idea of the Terraform deployment once I get the deployment to work tho.

@surajkota
Copy link
Contributor

Need to check the pods related to pipelines. Pipelines installation is failing, do you see any errors?

cc @ryansteakley @jsitu777 any idea why this might be happening? is pipelines helm chart for rds-s3 up to date for 1.7?

@AlexandreBrown
Copy link
Contributor Author

AlexandreBrown commented Mar 15, 2023

@surajkota @ryansteakley @jsitu777 Just tested using latest main (new commits were added since last week test) :

export KUBEFLOW_RELEASE_VERSION=v1.7.0-rc.2
export AWS_RELEASE_VERSION=b3d7174598539b60a673b3a55e50115a18a8c651

Now I get a different error :

module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Creating...
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [10s elapsed]
module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0]: Still creating... [20s elapsed]
╷
│ Warning: Resource targeting is in effect
│ 
│ You are creating a plan with the -target option, which means that the
│ result of this plan may not represent all of the changes requested by the
│ current configuration.
│ 
│ The -target option is not for routine use, and is provided only for
│ exceptional situations such as recovering from errors or mistakes, or when
│ Terraform specifically suggests to use it as part of an error message.
╵
╷
│ Warning: Applied changes may be incomplete
│ 
│ The plan was created with the -target option in effect, so some changes
│ requested in the configuration may have been ignored and the output values
│ may not be fully updated. Run the following command to verify that no other
│ changes are pending:
│     terraform plan
│ 	
│ Note that the -target option is not suitable for routine use, and is
│ provided only for exceptional situations such as recovering from errors or
│ mistakes, or when Terraform specifically suggests to use it as part of an
│ error message.
╵
╷
│ Warning: Redundant empty provider block
│ 
│   on cognito-rds-s3-components/main.tf line 1:
│    1: provider "aws" {
│ 
│ Earlier versions of Terraform used empty provider blocks ("proxy provider
│ configurations") for child modules to declare their need to be passed a
│ provider configuration by their callers. That approach was ambiguous and is
│ now deprecated.
│ 
│ If you control this module, you can migrate to the new declaration syntax
│ by removing all of the empty provider "aws" blocks and then adding or
│ updating an entry like the following to the required_providers block of
│ module.kubeflow_components:
│     aws = {
│       source = "hashicorp/aws"
│       configuration_aliases = [
│         aws.aws,
│         aws.virginia,
│       ]
│     }
│ 
│ (and one more similar warning elsewhere)
╵
╷
│ Warning: Experimental feature "module_variable_optional_attrs" is active
│ 
│   on .terraform/modules/eks_blueprints_kubernetes_addons.ondat/locals.tf line 2, in terraform:
│    2:   experiments = [module_variable_optional_attrs]
│ 
│ Experimental features are subject to breaking changes in future minor or
│ patch releases, based on feedback.
│ 
│ If you have feedback on the design of this feature, please open a GitHub
│ issue to discuss it.
│ 
│ (and 7 more similar warnings elsewhere)
╵
╷
│ Warning: "default_secret_name" is no longer applicable for Kubernetes v1.24.0 and above
│ 
│   with module.kubeflow_components.module.kubeflow_secrets_manager_irsa.kubernetes_service_account_v1.irsa[0],
│   on .terraform/modules/kubeflow_components.kubeflow_secrets_manager_irsa/modules/irsa/main.tf line 16, in resource "kubernetes_service_account_v1" "irsa":
│   16: resource "kubernetes_service_account_v1" "irsa" {
│ 
│ Starting from version 1.24.0 Kubernetes does not automatically generate a
│ token for service accounts, in this case, "default_secret_name" will be
│ empty
╵
╷
│ Error: rendered manifests contain a resource that already exists. Unable to continue with install: Secret "mlpipeline-minio-artifact" in namespace "kubeflow" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "kubeflow-pipelines"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "default"
│ 
│   with module.kubeflow_components.module.kubeflow_pipelines.module.helm_addon.helm_release.addon[0],
│   on .terraform/modules/kubeflow_components.kubeflow_pipelines.helm_addon/modules/kubernetes-addons/helm-addon/main.tf line 1, in resource "helm_release" "addon":
│    1: resource "helm_release" "addon" {
│ 
╵
make: *** [deploy-kubeflow-components] Error 1
Makefile:30: recipe for target 'deploy-kubeflow-components' failed

Also, all pods were running when checking with kubectl get pods --all-namespaces

@AlexandreBrown AlexandreBrown changed the title Timeout error during install & Make delete does not delete resource for failed install Kubeflow 1.7 Terraform Deployment Not Working Mar 15, 2023
@ryansteakley
Copy link
Contributor

ryansteakley commented Mar 15, 2023

@AlexandreBrown Which deployment option with terraform was this? Can you try running the terraform deployment off of the pr I have open?

@AlexandreBrown
Copy link
Contributor Author

AlexandreBrown commented Mar 15, 2023

@ryansteakley Using Terraform RDS-S3-Cognio deployment option.
I tried running it off of your PR and I get the pod ml-pipeline in a CrashLoopBackOff (namespace kubeflow)
kubectl logs ml-pipeline-58f47b6559-j2km9 -n kubeflow
Result :

I0315 20:59:56.238879       7 client_manager.go:160] Initializing client manager
I0315 20:59:56.238948       7 config.go:57] Config DBConfig.ExtraParams not specified, skipping
F0315 20:59:56.438072       7 client_manager.go:412] Failed to check if Minio bucket exists. Error: Access Denied.

Note that the IAM user for my test has full administrator access so this is a bit unexpected to me.
Here is the full kubectl get pods --all-namespaces output:

user@desktop:~/Documents/AI/kf-testing$ kubectl get pods --all-namespaces
NAMESPACE          NAME                                                        READY   STATUS             RESTARTS        AGE
ack-system         ack-sagemaker-controller-sagemaker-chart-86b89cd9fc-jjmtc   1/1     Running            0               24m
cert-manager       cert-manager-574b669cb4-lwmgr                               1/1     Running            0               26m
cert-manager       cert-manager-cainjector-78647849dc-qc6ml                    1/1     Running            0               26m
cert-manager       cert-manager-webhook-5fdfc97cbd-qmqqb                       1/1     Running            0               26m
istio-system       aws-authservice-6c565674b6-57cs2                            1/1     Running            0               11m
istio-system       cluster-local-gateway-6955b67f54-v7wqt                      1/1     Running            0               10m
istio-system       istio-ingressgateway-67f7b5f88d-bh8tr                       1/1     Running            0               24m
istio-system       istiod-56f7cf9bd6-5blgp                                     1/1     Running            0               24m
knative-eventing   eventing-controller-c6f5fd6cd-zpwtw                         1/1     Running            0               10m
knative-eventing   eventing-webhook-79cd6767-5xhtv                             1/1     Running            0               10m
knative-serving    activator-67849589d6-tgx47                                  2/2     Running            0               10m
knative-serving    autoscaler-6dbcdd95c7-dxrxs                                 2/2     Running            0               10m
knative-serving    controller-b9b8855b8-wcxw9                                  2/2     Running            0               10m
knative-serving    domain-mapping-75cc6d667f-86jhz                             2/2     Running            0               10m
knative-serving    domainmapping-webhook-6dfb78c944-dpc65                      2/2     Running            0               10m
knative-serving    net-istio-controller-5fcd96d76f-qbl7m                       2/2     Running            0               10m
knative-serving    net-istio-webhook-7ff9fdf999-p6f5z                          2/2     Running            0               10m
knative-serving    webhook-69cc5b9849-vrjnt                                    2/2     Running            0               10m
kube-system        aws-load-balancer-controller-74965c94c8-ptqf8               1/1     Running            0               26m
kube-system        aws-load-balancer-controller-74965c94c8-qxsrl               1/1     Running            0               26m
kube-system        aws-node-4wc68                                              1/1     Running            0               25m
kube-system        aws-node-j5kkl                                              1/1     Running            2 (26m ago)     26m
kube-system        aws-node-ldr2d                                              1/1     Running            0               25m
kube-system        aws-node-qdmnk                                              1/1     Running            0               25m
kube-system        aws-node-rc7qc                                              1/1     Running            0               25m
kube-system        coredns-8fd4db68f-6n78z                                     1/1     Running            0               31m
kube-system        coredns-8fd4db68f-94r95                                     1/1     Running            0               31m
kube-system        csi-secrets-store-provider-aws-5zzsm                        1/1     Running            0               26m
kube-system        csi-secrets-store-provider-aws-j6hhq                        1/1     Running            0               26m
kube-system        csi-secrets-store-provider-aws-jwgd6                        1/1     Running            0               26m
kube-system        csi-secrets-store-provider-aws-tmgtv                        1/1     Running            0               26m
kube-system        csi-secrets-store-provider-aws-xqppj                        1/1     Running            0               26m
kube-system        ebs-csi-controller-5d676f9b7f-kq846                         6/6     Running            0               26m
kube-system        ebs-csi-controller-5d676f9b7f-qwpsj                         6/6     Running            0               26m
kube-system        ebs-csi-node-czc8r                                          3/3     Running            0               26m
kube-system        ebs-csi-node-k9fjf                                          3/3     Running            0               26m
kube-system        ebs-csi-node-sbnrh                                          3/3     Running            0               26m
kube-system        ebs-csi-node-xdln5                                          3/3     Running            0               26m
kube-system        ebs-csi-node-zp8n8                                          3/3     Running            0               26m
kube-system        efs-csi-controller-5b696cc468-6k9xf                         3/3     Running            0               26m
kube-system        efs-csi-controller-5b696cc468-xjcn2                         3/3     Running            0               26m
kube-system        efs-csi-node-b4z84                                          3/3     Running            0               26m
kube-system        efs-csi-node-ccwgn                                          3/3     Running            0               26m
kube-system        efs-csi-node-djbj9                                          3/3     Running            0               26m
kube-system        efs-csi-node-shmwq                                          3/3     Running            0               26m
kube-system        efs-csi-node-v8qnm                                          3/3     Running            0               26m
kube-system        fsx-csi-controller-54c494d75f-2ncfx                         4/4     Running            0               26m
kube-system        fsx-csi-controller-54c494d75f-6zggr                         4/4     Running            0               26m
kube-system        fsx-csi-node-b9dv7                                          3/3     Running            0               26m
kube-system        fsx-csi-node-g72xn                                          3/3     Running            0               26m
kube-system        fsx-csi-node-j8npr                                          3/3     Running            0               26m
kube-system        fsx-csi-node-v56b4                                          3/3     Running            0               26m
kube-system        fsx-csi-node-w46ng                                          3/3     Running            0               26m
kube-system        kube-proxy-86vls                                            1/1     Running            0               27m
kube-system        kube-proxy-8kmvl                                            1/1     Running            0               27m
kube-system        kube-proxy-hwwnr                                            1/1     Running            0               27m
kube-system        kube-proxy-z5kps                                            1/1     Running            0               27m
kube-system        kube-proxy-zgwjx                                            1/1     Running            0               27m
kube-system        secrets-store-csi-driver-2rknn                              3/3     Running            0               25m
kube-system        secrets-store-csi-driver-655pc                              3/3     Running            0               25m
kube-system        secrets-store-csi-driver-kmk86                              3/3     Running            0               25m
kube-system        secrets-store-csi-driver-wsxvb                              3/3     Running            0               25m
kube-system        secrets-store-csi-driver-xjxcd                              3/3     Running            0               25m
kubeflow           aws-secrets-sync-fbc567b76-zrsjs                            2/2     Running            0               10m
kubeflow           cache-server-585f7bc798-65wsg                               2/2     Running            0               9m5s
kubeflow           kubeflow-pipelines-profile-controller-598f7567bc-tm22r      1/1     Running            0               9m5s
kubeflow           metacontroller-0                                            1/1     Running            0               9m5s
kubeflow           metadata-envoy-deployment-76769c7b7c-f68zk                  1/1     Running            0               9m5s
kubeflow           metadata-grpc-deployment-784b8b5fb4-9d7d4                   2/2     Running            1 (8m59s ago)   9m5s
kubeflow           metadata-writer-6d556b4c64-dc4xj                            2/2     Running            0               9m5s
kubeflow           ml-pipeline-58f47b6559-cb4pt                                1/2     CrashLoopBackOff   4 (25s ago)     116s
kubeflow           ml-pipeline-persistenceagent-7494b78d4b-rdhds               2/2     Running            0               9m5s
kubeflow           ml-pipeline-scheduledworkflow-65dc7497dc-b8vsr              2/2     Running            0               9m5s
kubeflow           ml-pipeline-ui-5bb55fd567-5c79k                             2/2     Running            0               9m5s
kubeflow           ml-pipeline-viewer-crd-866f8d9fb6-kxcw7                     2/2     Running            1 (8m54s ago)   9m5s
kubeflow           ml-pipeline-visualizationserver-7c8df85995-wxkmk            2/2     Running            0               9m4s
kubeflow           workflow-controller-6547f784cd-bdw8s                        2/2     Running            1 (9m1s ago)    9m4s

@AlexandreBrown
Copy link
Contributor Author

AlexandreBrown commented Mar 16, 2023

@ryansteakley I tested using your branch with your fix (new commit).
Now all pods are running but the install fails at the end when installing the aws telemtry:

module.kubeflow_components.module.kubeflow_aws_telemetry[0].module.helm_addon.helm_release.addon[0]: Creating...
╷
│ Warning: Resource targeting is in effect
│ 
│ You are creating a plan with the -target option, which means that the
│ result of this plan may not represent all of the changes requested by the
│ current configuration.
│ 
│ The -target option is not for routine use, and is provided only for
│ exceptional situations such as recovering from errors or mistakes, or when
│ Terraform specifically suggests to use it as part of an error message.
╵
╷
│ Warning: Applied changes may be incomplete
│ 
│ The plan was created with the -target option in effect, so some changes
│ requested in the configuration may have been ignored and the output values
│ may not be fully updated. Run the following command to verify that no other
│ changes are pending:
│     terraform plan
│ 	
│ Note that the -target option is not suitable for routine use, and is
│ provided only for exceptional situations such as recovering from errors or
│ mistakes, or when Terraform specifically suggests to use it as part of an
│ error message.
╵
╷
│ Warning: Redundant empty provider block
│ 
│   on cognito-rds-s3-components/main.tf line 1:
│    1: provider "aws" {
│ 
│ Earlier versions of Terraform used empty provider blocks ("proxy provider
│ configurations") for child modules to declare their need to be passed a
│ provider configuration by their callers. That approach was ambiguous and is
│ now deprecated.
│ 
│ If you control this module, you can migrate to the new declaration syntax
│ by removing all of the empty provider "aws" blocks and then adding or
│ updating an entry like the following to the required_providers block of
│ module.kubeflow_components:
│     aws = {
│       source = "hashicorp/aws"
│       configuration_aliases = [
│         aws.aws,
│         aws.virginia,
│       ]
│     }
│ 
│ (and one more similar warning elsewhere)
╵
╷
│ Warning: Experimental feature "module_variable_optional_attrs" is active
│ 
│   on .terraform/modules/eks_blueprints_kubernetes_addons.ondat/locals.tf line 2, in terraform:
│    2:   experiments = [module_variable_optional_attrs]
│ 
│ Experimental features are subject to breaking changes in future minor or
│ patch releases, based on feedback.
│ 
│ If you have feedback on the design of this feature, please open a GitHub
│ issue to discuss it.
│ 
│ (and 7 more similar warnings elsewhere)
╵
╷
│ Warning: "default_secret_name" is no longer applicable for Kubernetes v1.24.0 and above
│ 
│   with module.kubeflow_components.module.kubeflow_secrets_manager_irsa.kubernetes_service_account_v1.irsa[0],
│   on .terraform/modules/kubeflow_components.kubeflow_secrets_manager_irsa/modules/irsa/main.tf line 16, in resource "kubernetes_service_account_v1" "irsa":
│   16: resource "kubernetes_service_account_v1" "irsa" {
│ 
│ Starting from version 1.24.0 Kubernetes does not automatically generate a
│ token for service accounts, in this case, "default_secret_name" will be
│ empty
╵
╷
│ Error: unable to build kubernetes objects from release manifest: resource mapping not found for name: "aws-kubeflow-telemetry" namespace: "kubeflow" from "": no matches for kind "CronJob" in version "batch/v1beta1"
│ ensure CRDs are installed first
│ 
│   with module.kubeflow_components.module.kubeflow_aws_telemetry[0].module.helm_addon.helm_release.addon[0],
│   on .terraform/modules/kubeflow_components.kubeflow_aws_telemetry.helm_addon/modules/kubernetes-addons/helm-addon/main.tf line 1, in resource "helm_release" "addon":
│    1: resource "helm_release" "addon" {
│ 
╵
make: *** [deploy-kubeflow-components] Error 1
Makefile:30: recipe for target 'deploy-kubeflow-components' failed

I think the issue is that for Kubernetes 1.25 the apiVersion for CronJob CRD should be apiVersion: batch/v1 instead of batch/v1beta1.
Source: Officiak Kubernetes documentation
Here is the line to modify :

@surajkota
Copy link
Contributor

surajkota commented Mar 16, 2023

That's right, we are testing on 1.24. need to update this for 1.25, there might be more such things for 1.25

@AlexandreBrown
Copy link
Contributor Author

AlexandreBrown commented Mar 16, 2023

@surajkota I see.
Since Kubeflow 1.7 is said to be compatible with K8s 1.25 perhaps the testing should be done primarily on 1.25.
On my side I am testing using 1.25 so when the issue mentioned above will be fix I'll be able to report back if further issue arise for 1.25.

@AlexandreBrown AlexandreBrown changed the title Kubeflow 1.7 Terraform Deployment Not Working Kubeflow 1.7 Terraform Deployment Not Working on K8s 1.25 Mar 16, 2023
@AlexandreBrown
Copy link
Contributor Author

AlexandreBrown commented Mar 30, 2023

@surajkota @ryansteakley @jsitu777 Just tested the fix from my branch 62c84cb .
It fixed the install for K8s 1.25 every pods are all running and the install succeeds using Terraform and Kubeflow v1.7.0 branch.
I get an issue when trying to run a Kubeflow Pipeline (using KFP V2).
I created a profile following the doc : https://awslabs.github.io/kubeflow-manifests/docs/component-guides/profiles/
Running a kfp v1 pipeline from the samples ([Tutorial] Data passing in python components) worked but running a simple KFP v2 pipeline did not work.
Pipeline :
kfp sdk version: kfp==2.0.0b13

from kfp import dsl
import kfp

@dsl.component
def add(a: float, b: float) -> float:
    '''Calculates sum of two arguments'''
    return a + b


@dsl.pipeline(
    name='Addition pipeline',
    description='An example pipeline that performs addition calculations.')
def add_pipeline(
    a: float = 1.0,
    b: float = 7.0,
):
    first_add_task = add(a=a, b=4.0)
    second_add_task = add(a=first_add_task.output, b=b)

from kfp import Client

# This is the "Domain" in your cookies. eg: kubeflow.<platform.example.com>
kubeflow_gateway_endpoint="THE VALUE HERE"

alb_session_cookie0="THE VALUE HERE"
alb_session_cookie1="THE VALUE HERE"

namespace="alex"

client = Client(host=f"https://{kubeflow_gateway_endpoint}/pipeline", cookies=f"AWSELBAuthSessionCookie-0={alb_session_cookie0};AWSELBAuthSessionCookie-1={alb_session_cookie1}", namespace=namespace)

client.create_run_from_pipeline_func(
    add_pipeline, arguments={
        'a': 7.0,
        'b': 8.0
    })
kubectl logs addition-pipeline-tm2tl-478962274 -n alex

Output

time="2023-03-30T01:23:35.396Z" level=info msg="capturing logs" argo=true
time="2023-03-30T01:23:35.418Z" level=info msg="capturing logs" argo=true
I0330 01:23:35.430078      29 cache.go:139] Cannot detect ml-pipeline in the same namespace, default to ml-pipeline.kubeflow:8887 as KFP endpoint.
I0330 01:23:35.430089      29 cache.go:116] Connecting to cache endpoint ml-pipeline.kubeflow:8887
F0330 01:23:35.538427      29 main.go:49] failed to execute component: Failed to open bucket "mlpipeline": Failed to get minio credential: Failed to get MinIO credential from secret name="mlpipeline-minio-artifact" namespace="alex": does not have 'accesskey' key
Error: exit status 1
Error: exit status 1
kubectl describe secret mlpipeline-minio-artifact -n alex

Output

Name:         mlpipeline-minio-artifact
Namespace:    alex
Labels:       controller-uid=5012a9df-3d4e-460c-9b5e-85546e8da6d2
Annotations:  metacontroller.k8s.io/last-applied-configuration:
                {"apiVersion":"v1","data":{"accesskey":"","secretkey":""},"kind":"Secret","metadata":{"labels":{"controller-uid":"5012a9df-3d4e-460c-9b5e-...

Type:  Opaque

Data
====
accesskey:  0 bytes
secretkey:  0 bytes

Could this be because the Kubeflow Pipeline version on main is still 2.0.0a6 and not 2.0.0a7 ?
If so then I suppose this PR will fix it.
Let me know what you guys think, I can re-test once #626 is merged if you guys think it's related.

@ryansteakley
Copy link
Contributor

@AlexandreBrown don't expect for v2 to work even with the latest version, since kfp v2 has different source code, after looking into it, they grab credentials differerntly from v1 and have hard-coded it to use static credentials from that secret https://github.com/kubeflow/pipelines/blob/d2db64bebbd214e55c5ccde38dc1c7c7cab27dda/backend/src/v2/objectstore/object_store.go#L54

@AlexandreBrown
Copy link
Contributor Author

@ryansteakley thanks for the feedback.

Do you think we can fix this easily?

Our team want to use v2 after we upgrade Kubeflow.

@ryansteakley
Copy link
Contributor

Do not think it is a heavily complex change, however would need to get the pr merged in the upstream pipelines repo, which would take time.

@surajkota
Copy link
Contributor

Based on the code, looks like the current static credentials mechanism will still work. Can you try using that for this release?

We will plan to evaluate both rds and S3 support for kfpv2 in the next release when they plan to release beta

@AlexandreBrown
Copy link
Contributor Author

AlexandreBrown commented Mar 30, 2023

@surajkota @ryansteakley ok, we'll try using the static credentials for this release then.

Would that means we cannot use the Kubeflow IAM profile plugin when creating profile? What s the procedure for static credentials? To use the script I made back then or does the terraform deployment option handles it ? https://awslabs.github.io/kubeflow-manifests/docs/component-guides/notebooks/#set-up-secrets-access ?

Also regarding the change I tested (changing the api version from batch/v1beta to batch/v1), would this change be backward compatible with 1.24? Sure it will fix it for us since we plan on using k8s 1.25 but will it affect
1.24 users ? Just asking before opening a PR.

@surajkota
Copy link
Contributor

surajkota commented Mar 30, 2023

batch api version change

Yes, it will be backward compatible. +1 to change it

Would that means we cannot use the Kubeflow IAM profile plugin when creating profile?

You can still use it, just that pipelines will not use those credentials, it will still use the IAM user credentials

What s the procedure for static credentials?

There should be a argument for it. We haven't made the doc changes. Some of the PRs are still in review though
@ryansteakley

surajkota pushed a commit that referenced this issue Mar 31, 2023
- v1beta was changed to v1 in Kubernetes 1.25

**Which issue is resolved by this Pull Request:**
Resolves #609

**Description of your changes:**


**Testing:**
- [ ] Unit tests pass
- [ ] e2e tests pass
- Details about new tests (If this PR adds a new feature)
- Details about any manual tests performed

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
ryansteakley pushed a commit to ryansteakley/kubeflow-manifests that referenced this issue Apr 14, 2023
- v1beta was changed to v1 in Kubernetes 1.25

**Which issue is resolved by this Pull Request:**
Resolves awslabs#609

**Description of your changes:**


**Testing:**
- [ ] Unit tests pass
- [ ] e2e tests pass
- Details about new tests (If this PR adds a new feature)
- Details about any manual tests performed

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants