Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Discussion] Improve Azure e2e/integration testing #497

Closed
thovoll opened this issue Nov 12, 2021 · 6 comments · Fixed by #761
Closed

[Discussion] Improve Azure e2e/integration testing #497

thovoll opened this issue Nov 12, 2021 · 6 comments · Fixed by #761
Labels
enhancement New feature or request help wanted Extra attention is needed storage/azure Azure Blog storage related

Comments

@thovoll
Copy link
Contributor

thovoll commented Nov 12, 2021

Description

Currently, there is an ignored test for Azure Storage that relies on @rtyler's Azure Resources.

I propose the following incremental improvements:

  1. short term: switch to requiring developers to manually run the test using their own Azure resources
  2. medium/long term: introduce mock testing framework similar to what is used in the Azure Rust SDK

A bit more info on 2. The way it works is that integration tests can be run in record mode, where the tests call real Azure resources and capture the API responses (which are then committed to the repo). In CI, the integration tests are run in replay mode where no real Azure resources are involved and the HTTP calls get intercepted and the stored API responses are returned. More info is here.

This approach is different from AWS S3 where I understand the local simulator is workable. For Azure, this isn't currently feasible since Azurite doesn't yet support ADLS Gen2. As an alternative to the mock approach described above, we could wait until Azurite does support ADLS Gen2.

A bit more on 1. If we need static files to be present in the storage account, we can commit them to the repo along with a script that uploads those files as part of local dev setup. We'd also have to make sure all the endpoint and auth info is read from environment variables.

Related Issue(s)
#486 (comment)

@thovoll thovoll added the enhancement New feature or request label Nov 12, 2021
@houqp
Copy link
Member

houqp commented Nov 15, 2021

Thanks @thovoll , I agree with both points. The replay based mocking method looks quite interesting 👍

@houqp houqp added help wanted Extra attention is needed storage/azure Azure Blog storage related labels Nov 15, 2021
@roeap
Copy link
Collaborator

roeap commented Dec 23, 2021

I had a deeper look into the mock_transport_framework for the azure crates and have to say I am quite the fan :). Seeing how the azure crates are migrating to the pipeline architecture we are going to have to adopt the new APIs anyhow and the testing framework just comes with it for free.

The price we have to pay however is that we need to refactor the whole object store implementation :D, but I think it's worth the effort. Since the question of authorization has been raised (#527, #499), that could happen as well in the migration. Trying to figure out authorization in the new datalake client I got a bit confused however. To me it seems - and I might be totally off - that since the storage core crate is not yet migrated, there are actually two authorization mechanisms happening at once. Whatever the underlying storage account client uses, as well as the authorization policy which right now only accepts a bearer token.

MY personal feeling is the best way froward is to try and advance the migration in the azure crates to a point where storage core and datalake are using the pipeline architecture. As for auto-refreshing credentials, right now it seems that that API does not allow us to customize the pipeline policy - I guess that would have to change? For the python clients I know that there is auto-refreshing build into the SDKs, maybe such a policy is something foe the azure SDKs as well? Otherwise I believe adopting the current mechanics for auto-refresh (mutex etc.) to a custom policy is quite straight forward, and will even be much simpler, since we no longer have to replace the entire client, and just have to refresh the actual token within the policy. Also looking at the latest code specifically in the cosmos crate I believe the lifetime of a return stream is no longer tied to the client, which (hopefully) makes for a nicer implementation of the list_* calls.

@thovoll, did you look into some of this already? I'd be happy to pick up some work, or if desired try and contribute some of these changes in the azure repo.

@thovoll
Copy link
Contributor Author

thovoll commented Dec 31, 2021

Thanks @roeap, you are correct on ALL counts and I agree with the direction. Would be great to get your help on this.

I'm restarting work soon and will finish #499 as discussed in #489, taking the shortest path to getting writes to work - I want to start testing the basic write path. Once that is in place, let's proceed as you outlined above.

@thovoll
Copy link
Contributor Author

thovoll commented Jan 3, 2022

On the matter of two authorization methods being used at the moment, I think the next step should be Azure/azure-sdk-for-rust#490.

After that, the other parts of storage and data_lake can evolve independently. The next step for data_lake would probably be Azure/azure-sdk-for-rust#426.

And finally Azure/azure-sdk-for-rust#543. At that point we are in a pretty good place from a delta-rs perspective.

@roeap
Copy link
Collaborator

roeap commented Jan 4, 2022

@thovoll, Happy to support where I can! Very much looking forward to having this integration :).

For some personal experiments I needed a Kusto client and recently refactored this to be compatible with the pipelines architecture. As part of this there is an auth policy based on token credentials and using essentially the same auto.refresh as in this crate. Could this already be a starting point?

I think the link above could at least address Azure/azure-sdk-for-rust#426, the auto refresh works as well, but requires tokio as a dependency right now. From what I understand the azure crates are so far meant to be independent of a specific runtime. In any case, mode elaborate features leveraging refresh tokens etc. are not part of this in any case.

As for Azure/azure-sdk-for-rust#490, I could give this a try. Assumption being that migrating to the pipeline policy already gets us most of the way. Details are probably better discussed over in the Azure repo?

@thovoll
Copy link
Contributor Author

thovoll commented Jan 5, 2022

Thanks @roeap, yes let's talk details over in the Azure repo :)

@roeap roeap mentioned this issue Aug 22, 2022
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed storage/azure Azure Blog storage related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants