Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minio support for ModelDB #870

Open
Atharex opened this issue Jun 25, 2020 · 24 comments
Open

Minio support for ModelDB #870

Atharex opened this issue Jun 25, 2020 · 24 comments
Assignees
Labels
backend Related to modeldb backend

Comments

@Atharex
Copy link
Contributor

Atharex commented Jun 25, 2020

Can the S3 storage adapter support a Minio backend?

@conradoverta conradoverta added the backend Related to modeldb backend label Jun 25, 2020
@conradoverta
Copy link
Contributor

Hi, @Atharex!

Currently the artifacts go directly to S3 via signed URLs. To my knowledge, Minio supports such calls, so it should work out of the box, but we have never tested against it. Are you getting some specific error? Maybe we can help figure out what's going on.

@Atharex
Copy link
Contributor Author

Atharex commented Jun 26, 2020

Hi, @conradoverta!

Probably there are not many changes needed for it. Could be I'm missing something in the configuration or there is no capability yet to specify a custom endpoint in the S3 configuration (like a local Minio installation).

I've got the S3 artifact store type in my config.yaml configured like this:

artifactStoreConfig:
  artifactStoreType: S3
  S3:
    cloudAccessKey: {{ minio_access_key }}
    cloudSecretKey: {{ minio_secret_key }}
    cloudBucketName: {{ modeldb_minio_bucket }}

And I get the following error:
error: The AWS Access Key Id you provided does not exist in our records. (Service: Amazon S3; Status Code: 403; Error Code: InvalidAccessKeyId; Request ID: 83474DE39F314335; S3 Extended Request ID: wq4SSxhJMqBpyR+TgtoK3TCRLXylajG+x7iuCuOoOS8RP6XJIU5UI1WzViU9u8WR06qb054PWn8=)

So it seems that ModelDB tries to use those credentials to save the data into AWS, instead of my local Minio installation.
Is there a way to configure the endpoint for the S3 calls?

@conradoverta
Copy link
Contributor

Oh, that is a fair point. I don't think we have any configuration for the custom endpoint. It should be easy to add a configuration and pass it around, but we don't have a Minio setup currently to test.

Would you be willing to contribute a PR with that new configuration? We'd be happy to point you to useful information for this. Otherwise, I need to discuss with the team and put this in one of our coming sprints.

@Atharex
Copy link
Contributor Author

Atharex commented Jun 26, 2020

OK, I guess I could give it a try :)

Send me the information you have and I'll see what I can do.

@ravishetye
Copy link

@Atharex : I believe modifying https://github.com/VertaAI/modeldb/blob/master/backend/src/main/java/ai/verta/modeldb/artifactStore/storageservice/S3Service.java#L34-L51 should get you unblocked. If it does n't, it will be helpful for me if you can share a few more lines from the stack trace.

@Atharex
Copy link
Contributor Author

Atharex commented Jun 27, 2020

@ravishetye @conradoverta

I've started from where you pointed me out and I got a working example up and running for my Minio installation. I was able to log datasets into Minio successfully with it. Now I also opened the pull request (#889) with my proposed changes.

The changes also support setting the config:

      artifactStoreType: S3
      S3:
        cloudAccessKey: {{ minio_access }}
        cloudSecretKey: {{ minio_secret }}
        cloudBucketName: {{ modeldb_minio_bucket }}
        minioEndpoint: {{ minio_endpoint }}

@conradoverta
Copy link
Contributor

Awesome! That was fast =) We'll take a look tomorrow.

@ravishetye
Copy link

Thanks @Atharex for the request and the fix. Could you close the ticket if things are functional for you.

@Atharex
Copy link
Contributor Author

Atharex commented Jul 2, 2020

My pleasure @ravishetye :)

I would rather keep this ticket still open, as the support is not yet 100% (because of the still needed changes in the DB artifact storage path). You can show me where the changes should be made, but I cannot guarantee I will have time for another pull request in the near future :/

@Atharex
Copy link
Contributor Author

Atharex commented Sep 24, 2020

@ravishetye I got some time to take another look at this. Can someone from your side point out to me the code, which is creating the frontend links?

@Atharex
Copy link
Contributor Author

Atharex commented Oct 24, 2020

@ravishetye I see you guys are doing loads of refactoring on the codebase. I presume you are planning for a new release, where Minio support will already be completed by someone from your side?

@conradoverta
Copy link
Contributor

Hi, @Atharex! Could you clarify what you mean by links? I might be missing something here.

@Atharex
Copy link
Contributor Author

Atharex commented Nov 2, 2020

Might have been misled... I thought the DB stores direct links to the artifacts, which the frontend uses for downloads.
I've tried a build directly from the master branch now to try and debug my problem.

I install ModelDB with this config:

    artifactStoreConfig:
      artifactStoreType: S3
      S3:
        cloudAccessKey: [my-access-key]
        cloudSecretKey: [my-secret-key]
        cloudBucketName: modeldb-bucket
        minioEndpoint: http://minio-storage.minio.svc.cluster.local:9000

Then I followed this example: https://github.com/VertaAI/modeldb/blob/master/client/workflows/demos/census-end-to-end-local-data-example.ipynb

This is my postgres DB output when I tried your latest modeldb version
(initially thought the column artifacts stores the full S3 signed URLs of the artifacts).

select * from artifact;

10 |             4 | ExperimentRunEntity | artifacts  | json               | model_api.json   |                                      | 0c212b8fcd36072a29fb2e91e34a28e17a6504f28ec7fb2e9f54a83656c196d6/model_api.json     | f         |      
         | 75f807db-c6fc-462d-9438-e39f0b0d7ee0 |            | s3://modeldb-bucket/0c212b8fcd36072a29fb2e91e34a28e17a6504f28ec7fb2e9f54a83656c196d6/model_api.json     | 7763c9d7-be7c-4b36-be09-3c1a40e68537 | t
  4 |             4 | ExperimentRunEntity | artifacts  | zip                | custom_modules   |                                      | 5f95561f29a9f81f637fa50237d3729542b45c76ac47018b56dbfb16b277b37c/custom_modules.zip | f         |      
         | c2d12f87-2529-45be-bb8b-84828b4f35d1 |            | s3://modeldb-bucket/5f95561f29a9f81f637fa50237d3729542b45c76ac47018b56dbfb16b277b37c/custom_modules.zip | 61a7315e-d608-4e42-aab2-a207954fdb6f | t
...

The URL request (seen in the network analyzer of the browser) when I click on the download artifact button in the ModelDB web UI seems correct:
GET http://minio-storage.minio.svc.cluster.local:9000/modeldb-bucket/0c212b8fcd36072a29fb2e91e34a28e17a6504f28ec7fb2e9f54a83656c196d6/model_api.json?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20201102T103803Z&X-Amz-SignedHeaders=host&X-Amz-Expires=299&X-Amz-Credential=[my-credential]/20201102/us-east-1/s3/aws4_request&X-Amz-Signature=8e1ce4a94757d3d9d4a40be37629cca4a791c882125e78a75483bc0ce3224b33

When I look up my local Minio instance, I see the artifacts correctly stored there and I can download them directly:
[my-minio-url]/modeldb-bucket/0c212b8fcd36072a29fb2e91e34a28e17a6504f28ec7fb2e9f54a83656c196d6/model_api.json

Even "docker exec-ing" into the backend container and fetching the artifact links from there works. But somehow when I try to download that same file from the web UI I get an error message:

b1edc8f80de6c050e00debb2e3b401f15bec77650351f433923f61a85490a34c/custom_modules.zip
Error in downloading file: Something went wrong!

The webapp log seems fine...

/api/v1/modeldb/experiment-run/getUrlForArtifact
Requesting /api/v1/modeldb/experiment-run/getUrlForArtifact
Returning 200 OK; 433b sent

Also the modeldb-backend logs don't look suspicious

{"thread":"grpc-default-executor-6","level":"INFO","loggerName":"ai.verta.modeldb.ModelDBAuthInterceptor","message":"methodName: ai.verta.modeldb.ExperimentRunService/getUrlForArtifact","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","instant":{"epochSecond":1604316775,"nanoOfSecond":195000000},"threadId":455,"threadPriority":5,"hostName":"modeldb-backend-0","kubernetes.podIP":""}
{"thread":"grpc-default-executor-6","level":"DEBUG","loggerName":"ai.verta.modeldb.experimentRun.ExperimentRunDAORdbImpl","message":"Got ProjectId by ExperimentRunId ","endOfBatch":false,"loggerFqcn":"org.apache.logging.log4j.spi.AbstractLogger","instant":{"epochSecond":1604316775,"nanoOfSecond":215000000},"threadId":455,"threadPriority":5,"hostName":"modeldb-backend-0","kubernetes.podIP":""}

But now I'm out of ideas how to further investigate...
Where I can get more debug information?
Why would only the frontend get problems downloading the artifact, when all other approaches work?

@conradoverta
Copy link
Contributor

Is http://minio-storage.minio.svc.cluster.local:9000 the same as [my-minio-url]?

My current suspicion is that you have different DNS resolution for things running in the cluster than when you access from your other machine. What happens is that the webapp tries to fetch the URL http://minio-storage.minio.svc.cluster.local:9000/... since that's the URL that ModelDB is aware of.

Could you verify if you can resolve that hostname? You can usually do dig minio-storage.minio.svc.cluster.local or ping minio-storage.minio.svc.cluster.local, depending on your setup.

@Atharex
Copy link
Contributor Author

Atharex commented Nov 2, 2020

No [my-minio-url] is not http://minio-storage.minio.svc.cluster.local:9000
That is the URL to the web UI of my minio instance, which is reachable outside of my kubernetes cluster.

Though that external URL should not be used by ModelDB at all, since all of it's traffic is happening inside of the kubernetes cluster, where it has access to the http://minio-storage.minio.svc.cluster.local:9000 service (I presume this config at installation time is used by both backend and frontend services).
Also as I mentioned, if I go into the model-backend container and download the generated URL of the artifact, it works fine and also DNS resolution inside that container with nslookup minio-storage.minio.svc.cluster.local works correctly.

@conradoverta
Copy link
Contributor

The problem here seems to be that ModelDB and your browser are seeing different hostnames for the same system. So when ModelDB asks minio for the link to the artifact, the link comes back with ModelDB's hostname perspective. When the backend sends to the webapp, the webapp tries to make the request and it fails because it's a different name.

Would you mind configuring ModelDB to use the same hostname you use internally?

@Atharex
Copy link
Contributor Author

Atharex commented Nov 3, 2020

Aha, I see your point!

I thought that GET request I see in the traffic analyzer happens on the web app side, (the web app transfers the file from the artifact storage and then let's me download that cached copy), but it actually gives me a direct link to the storage from it's internally resolved DNS address
http://minio-storage.minio.svc.cluster.local:9000

where on the user side I want the externally defined DNS address:
https://minio.my-own-domain.net

Got confused because deleting an artifact did not throw an error (later realized it's because the webapp invokes it's REST API to perform the step
(e.g. /api/v1/modeldb/experiment-run/deleteArtifact {"id":"8c248b70-f001-452e-8ed0-9d3616eb4e81","key":"model_api.json"})

With this it deletes the entry from ModelDB, but leaves the artifact in MinIO intact (guess that is so by design also with other artifact stores? Or should the delete also happen inside the store?)

I guess some URL rewriting would need to take place to correctly resolve address handling on the web UI for this particular use-case (an external storage service, which has both an internal (cluster) and external (ingress) DNS name). Maybe an optional "AlternativeStoreURL" parameter supplied in the ModelDB configuration file to rewrite the generated links on the webapp side?

Just a thought... Not sure how other projects handle similar situations. Configuring ModelDB to the external name might not be easy, as there is a port in the internal service name and I would not be able to CNAME an external entry onto an internal address with a port, if I reconfigured my internal kubernetes DNS resolver.

@conradoverta
Copy link
Contributor

We use the direct link because it's usually much faster (since their services are built for big downloads and uploads). I think adding an alternative base makes sense to me to simplify the process. Usually we handle this by adding the CNAME entries in the right place, but it might be a high barrier to use.

If we pointed you to the right places for the change, would you be willing to contribute a PR with support for this feature? It would be greatly appreciated!

@Atharex
Copy link
Contributor Author

Atharex commented Nov 4, 2020

Sure, I'd go for it! This feature would help me out nicely.

@conradoverta
Copy link
Contributor

Great!

@ad-47 @ravishetye could you share some pointers on how we could add a config field AlternativeStoreURL that would replace the base url for artifacts? The context is that the user browser and ModelDB need to see different hostnames for the minio endpoint.

@ravishetye
Copy link

@Atharex Would setting the minio endpoint to https://minio.my-own-domain.net work and not require more code change?

@Atharex
Copy link
Contributor Author

Atharex commented Nov 6, 2020

Sadly no. There is a port in my service name and I cannot get DNS to resolve https://minio.my-own-domain.net to the internal address http://minio-storage.minio.svc.cluster.local:9000. The ingress controller also does not enable me to rewrite response URLs (only request URLs), so having this as an optional configuration step would be easiest to solve the problem.

@conradoverta
Copy link
Contributor

The challenge that Ravi correctly pointed out when I discussed this with him is that MDB would always use that alternative URL, even if the client was running inside the cluster. Would that be an issue for you?

@samru-rai
Copy link

samru-rai commented Nov 26, 2020

Would be cool if ModelDB team created an example for Minio so future users can just refer to the example

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend Related to modeldb backend
Projects
None yet
Development

No branches or pull requests

4 participants