Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bitnami/mlflow] support google-cloud-storage #67246

Closed

Conversation

dhrp
Copy link

@dhrp dhrp commented May 25, 2024

Description of the change

Updates the Bitnami MLFlow image to contain the google-cloud-storage pip module.

Benefits

This allows MLFlow to use the (built in) support for working with google cloud storage for storing and retrieving artifacts from google cloud storage. It is significant to run the MLFlow tracking server in this mode with google storage; but is also useful when using the MLFlow container in client mode.

Possible drawbacks

None that I know; though I don't know if adding a pip install at the end is the approach desired by Bitnami. • An alternative approach would be to add it to the mlflow stacksmith tarball; but AFAIK I cannot contribute to that. Will let that to the maintainers.

Applicable issues

Add support onto the container: fixes: #65108
Add support to google cloud storage in the MLFlow chart: Relates to bitnami/charts#22720

Additional information

I'm happy to change approach if directed to how.

Signed-off-by: Thatcher Peskens <thatcher@t2studio.nl>
@github-actions github-actions bot added mlflow triage Triage is needed labels May 25, 2024
@github-actions github-actions bot requested a review from javsalgar May 25, 2024 07:22
@javsalgar javsalgar added in-progress verify Execute verification workflow for these changes labels May 27, 2024
@github-actions github-actions bot removed the triage Triage is needed label May 27, 2024
@github-actions github-actions bot removed the request for review from javsalgar May 27, 2024 10:56
@github-actions github-actions bot requested a review from juan131 May 27, 2024 10:56
@dhrp
Copy link
Author

dhrp commented May 28, 2024

I'm chasing one weird issue that in some server configurations the mlflow client fails to download the artifact from google storage directly with a permission error; even though the server has access to the artifacts just fine; and upload also works.

@juan131
Copy link
Contributor

juan131 commented May 30, 2024

Hi @dhrp

Thanks so much for this contribution!

We're currently evaluating the impact of including this Python module which seems to increase the image size by 16MB. We need to decide whether it's widely used or not before including it since we want the image to include only the most important modules and ask users to extend the image adding their custom ones for less important use cases.

In case we decide to accept it, please note it won't be included in the image using the "pip install" directive you proposed, but as part of the mlflow-2.13.0-0-linux-${OS_ARCH}-debian-12.tar.gz tarball added below:

We'll keep you updated about any decision we take.

@dhrp
Copy link
Author

dhrp commented May 30, 2024

Hi @juan131, ok; thanks for your message.

It seems that at least two other people commented, and 4 👍 on my issue on the MLFlow Helm chart that they would like to have the feature. See: bitnami/charts#22720

Also: is there any way to see or contribute to how these tarballs are created?

@juan131
Copy link
Contributor

juan131 commented May 31, 2024

Hi @dhrp

Thanks for the insights, I'll share with the team.

Also: is there any way to see on contribute to how these tarballs are created?

I'm afraid the compilation recipes we use to build Bitnami assets are internal. We may consider moving them to some public repo since there's nothing to hide on them.

Copy link

This Pull Request has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thank you for your contribution.

@github-actions github-actions bot added the stale 15 days without activity label Jun 16, 2024
@juan131
Copy link
Contributor

juan131 commented Jun 21, 2024

not stale

@dhrp
Copy link
Author

dhrp commented Jun 25, 2024

Can we move this forward? I personally think the amount of 👍 (now 6) on the MLflow chart depending on this change is sufficient to make this change - if it's only a 16Mb size increase. -- As I see it storing models on a durable storage really is a primary feature of MLFlow.

@juan131
Copy link
Contributor

juan131 commented Jun 28, 2024

Hi @dhrp

I'm glad to confirm we got the "green light" to include this module by default in the image. I'm applying the required changes right now and I'll ping you once we released a new container image version including it.

@juan131
Copy link
Contributor

juan131 commented Jun 28, 2024

Hi @dhrp

Please give it a try using the image tag 2.14.1-debian-12-r1

Copy link

github-actions bot commented Jul 4, 2024

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Pull Request. Do not hesitate to reopen it later if necessary.

@github-actions github-actions bot added the solved label Jul 4, 2024
@juan131
Copy link
Contributor

juan131 commented Jul 15, 2024

I'm closing this PR given we included the missing pip module in 2.14.1-debian-12-r1 revision, please reopen it if you require further assistance.

@dhrp
Copy link
Author

dhrp commented Jul 21, 2024

HI @juan131, somehow didn't see this until now! Thanks so much of moving this forward. Very happy about it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
mlflow solved stale 15 days without activity verify Execute verification workflow for these changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bitnami/mlflow] MLFlow missing package: google-cloud-storage
3 participants