-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support WASB scheme in ADLSFileIO #11504
Support WASB scheme in ADLSFileIO #11504
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I just wonder about the suffix appended by default to account
.
@@ -93,7 +93,7 @@ public void applyClientConfiguration(String account, DataLakeFileSystemClientBui | |||
if (connectionString != null && !connectionString.isEmpty()) { | |||
builder.endpoint(connectionString); | |||
} else { | |||
builder.endpoint("https://" + account); | |||
builder.endpoint("https://" + account + ".dfs.core.windows.net"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we really change add .def.core.windows.net
by default ?
I mean that use currently include the suffix in account
so it should be clearly state that we append the suffix by default now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the storageAccount
of ADLSLocation
should be used for storing the storage account name only. And since the ADLS APIs are accessed via dfs.core.windows.net
by default, I think it's appropriate to append it to the storage account name here as the default. If users want to specify a different hostname, they can use the adls.connection-string
property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the right thing to do, I would just throw a javadoc on this method though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This deviates from the abfs[s]
scheme defined by Hadoop, where the domain is specified in the URI (docs). I feel we should check if the account already ends with the domain and only append then to support that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, you are stripping off the domain below, so this isn't an issue. You can ignore the above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bryanck + @danielcweeks - Please check, I was thinking this could be another candidate to put in 1.7.1 as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mrcnc , yeah I think supporting WASB despite it being deprecated by Azure is worth it and I think the way this PR handles it is correct.
Thanks for the PR @mrcnc ! And for the reviews @RussellSpitzer @amogh-jahagirdar @jbonofre ! |
This is a second attempt to resolve #10127 but avoids using
java.net.URI
for parsing, which was found to be problematic after the first attempt was merged.Additionally I've refactored to minimize the number of lines changed and clarify the usage of the
storageAccount
variable, which was previously storing the endpoint host and now will only store the storage account name as the subdomain.