-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace azure sdk with custom implementation #2509
Conversation
I intend to review this later today or tomorrow, very exiting! R.e. OAuth I've already added an OAuth implementation to this crate for GCP, it would be nice if we could use that. I'd be happy to help out with that if you like? |
After chasing a bug with signing via access key for a while, implementation was quite straight forward, and I could lean on most of the logic from the AWS implementation. Generally authorization needs to be cleaned up. Aside from getting rid of the OAuth crate, we have some fallible operations in the Also I tried to activate the Implementations of the
So far I only worked with OAuth via higher level APIs, so help is highly welcome :) |
Ok I'll try to find some time to have a play over the weekend |
Turns out getting an oauth token, especially with a client secret is not all that complex... we have a working implementation now, that is to say it is working, how we go abut that can likely be improved quite a bit. The important part though, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really nice, fantastic work, mostly just some minor nits.
I want to validate this against Azure, and give it another pass (at is rather large 😆). Unfortunately I am out for the next two days, but I will get back to this on my return. I definitely intend for this to make the next object_store release.
Hello, I'm building a lib reqsign to help signing API requests to different services, maybe you will be interested to take a look? So far, reqsign has supported the following services:
A quick example for use reqsign::services::azure::storage::Signer;
use reqwest::{Client, Request, Url};
use anyhow::Result;
#[tokio::main]
async fn main() -> Result<()>{
// Signer will load region and credentials from environment by default.
let signer = Signer::builder()
.account_name("account_name")
.account_key("YWNjb3VudF9rZXkK")
.build()?;
// Construct request
let url = Url::parse("https://test.blob.core.windows.net/testbucket/testblob")?;
let mut req = reqwest::Request::new(http::Method::GET, url);
// Signing request with Signer
signer.sign(&mut req)?;
// Sending already signed request.
let resp = Client::new().execute(req).await?;
println!("resp got status: {}", resp.status());
Ok(())
} |
Hi @Xuanwo very cool idea, that being said I think I would prefer to keep this logic in tree to keep the dependency burden low. We've been stung repeatedly by this in the past. The implementation here has no dependencies beyond ring, which is already a dependency of rustls. |
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Testing this out now |
So I've spent a decent length of time trying to get this to work. I couldn't get token auth to work, I was just getting 403 errors, but removing the |
Yes, I will look into it and make sure the oauth example is working ... I did run it before, but then it likely regressed. Also the infinite hang is a known issue, this is why the |
I have the service principal auth working again - turns out the custom date parsing did make a difference after all, from what I had read though, your comment was correct in that RFC2822 and RFC1123 should be equivalent when it comes to parsing dates. AS for the hangs, locally it works, but I never seem to have actually tested it against an account. Will keep on digging :) |
@tustvold, the latest version I pushed worked for me including service principal auth and multipart against an actual service. However the test against the service ran much longer then I expected, well over 1 min. Also while playing around with sizes of chunks etc while working with it i got some intermittent failures of individual parts. not sure, but we may also have to check for max package size, right now I think we just enforce a min. |
Awesome, I'm wrapping up for today but will check it out tomorrow |
The logic for SAS does not appear to be correct, in particular it seems to need some sort of signature setup - https://docs.microsoft.com/en-us/rest/api/storageservices/service-sas-examples. It seems to be similar to S3's concept of a pre-signed URL which makes me think it might not be a good fit for this crate anyway? Perhaps we could remove this functionality for now, and get this in without it? Everything else appears to be working now 💪 |
Thanks! w.r.t. SAS keys. The main way i know this is to be used, is that the actual, already signed query pairs are shared with the consumer. i.e. it usually looks like I am happy to remove this, we do however have an ask in delta-rs to support this kind of auth mechanism. |
The part that isn't clear to me is that if the path is encoded in the SAS token, which I think it is as part of the signature, how you could use this with an API such as ObjectStore which addresses multiple paths? TBC trying to use the SAS support as currently implemented just returns invalid signature errors. Although it is possible I am doing something wrong |
SAS tokens can be issued for a specific blob, or on a container level, it will then be valid for all requests to that container. It can also be scoped to a single blob/path, in which case there is not much use for use in this crate.. The way i tested it was to use the azure storage explorer to generate a query sting. This comes with multiple values. It would be up to the user to split the total query string provided by e.g. storage explorer into the key value pairs. This is then what the builder expects. Providing this as separate pairs rathen then a single string is something I adopted from the SDK - and a little bitr of lazyness making sure the string is formatted correrctly :) If you want I can set up a short lived example and share the query pairs here... |
I tried this and it didn't work... 🙁 Let me double check I'm not doing something stupid |
Or I somehow broke it along the way.. much like what happened with the SP support :). I'll make sure again ... |
Apologies it was PEBCAC, I needed to URL parse the string instead of splitting it manually - I was percent encoding the signature by accident 😅 Edit: There is something interesting going on with it returning an error that a file already exists, which is fun though... |
object_store/src/azure/client.rs
Outdated
) -> Result<()> { | ||
let credential = self.get_credential().await?; | ||
let url = self.config.path_url(to); | ||
let source = self.config.path_url(from); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The read operation on a source blob in the same storage account can be authorized via shared key. Beginning with version 2017-11-09, you can also use Azure Active Directory (Azure AD) to authorize the read operation on the source blob. However, if the source is a blob in another storage account, the source blob must be public, or access to it must be authorized via a shared access signature. If the source blob is public, no authorization is required to perform the copy operation.
When the source object is a file in Azure Files, the source URL uses the following format. Note that the URL must include a valid SAS token for the file.
From - https://docs.microsoft.com/en-us/rest/api/storageservices/copy-blob#request-headers
TLDR - we must include the SAS credentials in the copy source, if we are using SAS credentials
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense :D - will take care of this.
object_store/src/azure/client.rs
Outdated
let query = pairs | ||
.iter() | ||
.map(|pair| format!("{}={}", pair.0, pair.1)) | ||
.join("&"); | ||
source = format!("{}?{}", source, query); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this has issues with escaping, perhaps we could use query_pairs_mut
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wanted to avoid parsing a url, but yes, better safe than sorry :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I thought path_url was a URL 😅
I think you should be able to just percent encode the inputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that without this change SAS credentials don't work correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will clean up and push the code that I have working
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
@@ -713,7 +713,7 @@ mod tests { | |||
let location = Path::from("test_dir/test_upload_file.txt"); | |||
|
|||
// Can write to storage | |||
let data = get_vec_of_bytes(5_000_000, 10); | |||
let data = get_vec_of_bytes(5_000, 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped this as the tests are otherwise incredibly slow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ready to go now, there is some further cleanup possible but lets do that as follow up PRs. I will merge this once the CI finishes. Fantastic work on this 👍
Agreed! There is still some work to be done, but finally the SDKs are gone 🎉. There are two things I was hoping to have a look up in follow ups. One is managed identity auth and certificate support for azure, but lots of that should almost already work with the gcs and s3 things in the crate. The other topic is around the Url parsing and ObjectStoreProvider we discussed some time ago. I experimented in other crates, and maybe have something as a starting point for continuing the discussion... |
Benchmark runs are scheduled for baseline = f11fc1f and contender = d36f072. d36f072 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
🎉 |
Which issue does this PR close?
closes #2176
Rationale for this change
See #2176
What changes are included in this PR?
Replaces azure sdk with a custom implementation based on reqwest. So far this is a rough draft, and surely needs cleanup and some more work on that auth part. I tried to make the aws and azure implementations look as comparable as can be.
I also pulled in a new dependency on oauth2 crate. Will evaluate a bit more whzen cleaning up auth, but my feeling was that implementing the oauth flows manually could be another significant piece of work.Any feedback is highly welcome.
cc @tustvold @alamb
Are there any user-facing changes?
Not that I'm aware of, but there is a possibility