Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After reviewing the concerns raised in #11344 about using
java.net.URI
for parsing in ADLSLocation, I contrived an example of a location that does not parse correctly. It also fails in the current implementation, so this PR adds a test and fix for the parsing code. Additionally it removes test cases that are invalid, since they don't test valid ABFS syntaxMotivation
The main reason to avoid using
java.net.URI
is that it parses according to RFC 2396 but object storage providers do not strictly follow this specification. Specifically, in standard URI syntax, the question mark?
separates the path component from the query component. However, Azure Blob Storage allows question marks in blob/file names, making these names incompatible with the RFC 2396 URI specification.Another important point is that Azure Storage APIs are accessed via HTTP APIs, so the
abfs
andwasb
location syntax serve as identifiers to blobs accessed through HTTP URLs. This is the motivation behind removing the tests that included query and fragment components, since they would only be used in the HTTP URLs and not in the ABFS URI-like syntax.