-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
e2e-wine test failed with kfp run in error state #38
Comments
The error message suggests that either |
@DnPlas as mentioned in the issue, it's the preprocess step, probably when passing the data from the step before (download) |
@DnPlas this happens on any step that is passing the output of one step as the input for the next. VersionsIn my case I have the following versions Microk8s: 1.25/stable Juju applications -> https://pastebin.ubuntu.com/p/MWRPjpkP3P/ ScenariosTest A - Fails on "preprocess_task" steprendered yaml file -> https://pastebin.ubuntu.com/p/XkWrDvbgGX/ @dsl.pipeline(
name="e2e_wine_pipeline",
description="WINE pipeline",
)
def wine_pipeline(url):
web_downloader_task = web_downloader_op(url=url)
preprocess_task = preprocess_op(file=web_downloader_task.outputs['data'])
train_task = (training_op(file=preprocess_task.outputs['output'])
.add_env_variable(V1EnvVar(name='MLFLOW_TRACKING_URI', value='http://mlflow-server.kubeflow.svc.cluster.local:5000'))
.add_env_variable(V1EnvVar(name='MLFLOW_S3_ENDPOINT_URL', value='http://minio.kubeflow.svc.cluster.local:9000'))
.add_env_variable(V1EnvVar(name='PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION', value='python'))
#https://kubeflow-pipelines.readthedocs.io/en/stable/source/kfp.extensions.html#kfp.onprem.use_k8s_secret
.apply(use_k8s_secret(secret_name='mlpipeline-minio-artifact', k8s_secret_key_to_env={
'accesskey': 'AWS_ACCESS_KEY_ID',
'secretkey': 'AWS_SECRET_ACCESS_KEY',
})))
deploy_task = deploy_op(model_uri=train_task.output) Test B - Fails on "train_task" step@dsl.pipeline(
name="e2e_wine_pipeline",
description="WINE pipeline",
)
def wine_pipeline(url):
#web_downloader_task = web_downloader_op(url=url)
preprocess_task = preprocess_op(file=url)
train_task = (training_op(file=preprocess_task.outputs['output'])
.add_env_variable(V1EnvVar(name='MLFLOW_TRACKING_URI', value='http://mlflow-server.kubeflow.svc.cluster.local:5000'))
.add_env_variable(V1EnvVar(name='MLFLOW_S3_ENDPOINT_URL', value='http://minio.kubeflow.svc.cluster.local:9000'))
.add_env_variable(V1EnvVar(name='PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION', value='python'))
#https://kubeflow-pipelines.readthedocs.io/en/stable/source/kfp.extensions.html#kfp.onprem.use_k8s_secret
.apply(use_k8s_secret(secret_name='mlpipeline-minio-artifact', k8s_secret_key_to_env={
'accesskey': 'AWS_ACCESS_KEY_ID',
'secretkey': 'AWS_SECRET_ACCESS_KEY',
})))
deploy_task = deploy_op(model_uri=train_task.output) In my case minio seems okay so I am guessing the bug is around Argo. Logs
|
I see that the error is in the initContainer of the run pod
the
@gustavosr98 you can use this ^ as a temporary workaround. |
Thanks @NohaIhab! Please keep me updated on any other bug report that would track this or the final patch on the OCI image we would need to provide for the customer |
hi @gustavosr98 We've patched the rock and re-published the charm with the new rock to |
Awesome, thanks @NohaIhab ! Btw, We should definitely add a big note on the readme of the repo canonical/kubeflow-examples that is no longer maintained and add a pointer to this repo I spend quite some time trying to make the e2e-wine sample work on the older repo while here it worked perfectly on the first run Big thanks for this repo 🚀 ! |
This commit defaults the executor image to argoproj/argoexec:v3.3.10 to bump the version and avoid canonical/charmed-kubeflow-uats#38.
This commit defaults the executor image to argoproj/argoexec:v3.3.10 to bump the version and avoid canonical/charmed-kubeflow-uats#38.
@gustavosr98 good point! We can follow up on that. Yes the goal of this repo is to have place were we CKF team define our tests in a way you can also use them, since they are notebooks. I'll go on and close this issue for now, since updating the Argo Exec image solved the issue. And we'll work afterwards in converting that image back to a ROCK |
channel:
1.7/edge
e2e-wine notebook test fails with
AssertionError: KFP run in Error state.
The
Preprocess
step in the pipeline fails with the logs:This step is in Error state with this message: Error (exit code 1): tar (child): gzip: Cannot exec: No such file or directory tar (child): Error is not recoverable: exiting now tar: Child returned status 2 tar: Error is not recoverable: exiting now
The text was updated successfully, but these errors were encountered: