-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Make namespace separator configurable #10877
base: main
Are you sure you want to change the base?
Conversation
ffb244e
to
a4313e4
Compare
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java
Outdated
Show resolved
Hide resolved
a4313e4
to
c599527
Compare
06fbde2
to
92b8b8b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach 👍
open-api/rest-catalog-open-api.yaml
Outdated
@@ -222,7 +222,8 @@ paths: | |||
description: | |||
An optional namespace, underneath which to list namespaces. | |||
If not provided or empty, all top-level namespaces should be listed. | |||
If parent is a multipart namespace, the parts must be separated by the unit separator (`0x1F`) byte. | |||
If parent is a multipart namespace, the parts must be separated by the namespace separator as | |||
indicated via the /config override `rest-namespace-separator`, which defaults to the unit separator (`0x1F`) byte. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach will break old client in environments where the server is able to accept the non-printable separator, but chooses to use an alternative separator.
So, the upgrade path for users would be to first upgrade clients, then use new servers. As for me, it can be a significant burden on users. Also, this will require that new clients be API-compatible with older servers during the transition period... WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't break old clients and there's even a test that makes sure old clients send %1F
while the server chose %2E
. See TestRestUtil#encodeAsOldClientAndDecodeAsNewServer()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add that to the spec. I do not think we can assume that all REST server implementation reuse Iceberg java code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it stands now "must" in line 226 disallows clients to use the old separator.
f0c8ad2
to
1b00cd5
Compare
open-api/rest-catalog-open-api.yaml
Outdated
To be compatible with older clients, servers have to use `0x1F` as a fallback even when advertising a different | ||
namespace separator to clients. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update @nastra . Sorry to be nit-picky, but I believe this text is still not clear enough. I'd suggest To be compatible with older clients, servers should use both the advertised separator and 0x1F as valid separators when parsing namespaces.
(feel free to edit)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that it is not clear what is "fallback" in this case. I propose to treat both old and new separators equally (always respect both, even when intermixed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thx - LGTM 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I would disagree that a server should respect both intermixed. Only a single namespace separator should be used during encoding so the server should respect both during decoding, but not intermixed (a namespace shouldn't be encoded using %1F
and %2E
at the same time for example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the server know what (single) separator is effective for a particular request?
1b00cd5
to
4b5ed11
Compare
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java
Outdated
Show resolved
Hide resolved
!Strings.isNullOrEmpty(namespaceSeparator), "Invalid namespace separator: null or empty"); | ||
String[] levels; | ||
|
||
// for backwards compatibility |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't necessary if the client sends a specific separator
in query params, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be still necessary to be compatible with an old client that doesn't send send the query param and doesn't respect the new rest-namespace-separator
that the server sends. There's also a specific test that verifies this scenario in TestRestUtil.encodeAsOldClientAndDecodeAsNewServer()
@@ -73,6 +73,7 @@ | |||
/** Adaptor class to translate REST requests into {@link Catalog} API calls. */ | |||
public class RESTCatalogAdapter implements RESTClient { | |||
private static final Splitter SLASH = Splitter.on('/'); | |||
private static final String NAMESPACE_SEPARATOR = "%2E"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this not .
? It's not a character that needs to be escaped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CatalogTests#testNamespaceWithDot
would fail when using .
here, so in this case it needs to be the UTF-8 encoded string.
@@ -665,7 +670,7 @@ public static void configureResponseFromException( | |||
} | |||
|
|||
private static Namespace namespaceFromPathVars(Map<String, String> pathVars) { | |||
return RESTUtil.decodeNamespace(pathVars.get("namespace")); | |||
return RESTUtil.decodeNamespace(pathVars.get("namespace"), NAMESPACE_SEPARATOR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that @jackye1995 suggested sending the separator from the client each time. Is that not what we want to do here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes and that is being handled in #10905. However, as a first step we need to make the namespace separator configurable (regardless of whether it's configurable from the server or the client), which is being handled in this PR. Making it controllable from the client and send a query param is handled after this PR in #10905 (and requires a vote on the spec change)
@Test | ||
public void testRoundTripUrlEncodeDecodeNamespace() { | ||
@ParameterizedTest | ||
@ValueSource(strings = {"%1F", "%2D", "%2E"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't escaping be handled automatically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but the test also uses the non-UTF-8 strings within the namespace, so we need to use the UTF-8 encoded string. I've added some non-UTF-8 encoded strings to the parameters list to indicate that such strings can be used as well (as long as they aren't allowed in the namespace name itself)
4b5ed11
to
8929e42
Compare
8929e42
to
b663845
Compare
@@ -194,15 +192,34 @@ public static String decodeString(String encoded) { | |||
* @return UTF-8 encoded string representing the namespace, suitable for use as a URL parameter | |||
*/ | |||
public static String encodeNamespace(Namespace ns) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function looks dangerous now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please elaborate what you mean by "dangerous" here? Nothing really changed for callers of this method (other than the fact that a Joiner
is created every time this method is called)
@@ -215,8 +232,32 @@ public static String encodeNamespace(Namespace ns) { | |||
* @return a namespace | |||
*/ | |||
public static Namespace decodeNamespace(String encodedNs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function looks dangerous now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please elaborate what you mean by "dangerous" here? Nothing really changed for callers of this method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mixing this and the new function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's still not clear to my why that would be "dangerous". There is no behavioral change for existing callers of this method. Can you please add a concrete example/explanation to justify the "dangerous" part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users can mix both - causing "confusion".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that argument is actually true. Users won't be using this code as this code is used on the client and the server but not by users in the classical sense of a user that uses Iceberg. For engine/catalog implementers the javadoc states that one should have used encodeNamespace(Namespace namespace)
if you use decodeNamespace(String encodedNs)
.
Using words like dangerous
and confusing
without clear examples and justifications isn't helpful in a code review
* <p>See also {@link #encodeNamespace} for generating correctly formatted URLs. | ||
* | ||
* @param encodedNamespace a namespace to decode | ||
* @param separator The namespace separator to use for decoding. This should be the same separator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should? not "must"?
// use legacy splitter for backwards compatibility in case an old clients encoded the namespace | ||
// with %1F | ||
Splitter splitter = | ||
encodedNamespace.contains(NAMESPACE_ESCAPED_SEPARATOR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can this trigger if separator
is not `\u001f"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate please what you mean exactly here?
* <p>See also {@link #encodeNamespace} for generating correctly formatted URLs. | ||
* | ||
* @param encodedNamespace a namespace to decode | ||
* @param separator The namespace separator to use for decoding. This should be the same separator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this URL-encoded or not the URL-encoded? Allowing both is dangerous, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are not enforcing this to be UTF-8 encoded. A user/server could also pass a non-UTF-8 encoded separator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no validation then and users can specify everything?
No definition & verification of whether this is a single (1 byte) UTF-8 character.
No definition & verification of whether this character is URL-encoded or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scope of this PR is to make the existing namespace separator configurable and let the server communicate to clients which one should be used. Users can't specify anything as it would always be overriden with what the server sends.
@jackye1995 could you take a look at this PR please? It would be great to get this in, so that we can can continue with #10904 / #10905 (where we pass the namespace separator via a query param to the server) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have concerns about this PR. Some of the concerns have been proactively resolved with "this is copy-paste".
This PR and the mentioned follow-ups change the REST spec. It seems to be agreed on, that all specification changes require a code-change vote on the dev mailing list. |
I'm not sure whether this comes from a misunderstanding of the scope of the PR. The scope of the PR is to make the namespace separator configurable and let the server specify its preferred namespace separator while still maintaining full backward compability with the previous separator ( |
b663845
to
1ac820b
Compare
1ac820b
to
fdee148
Compare
@nastra, can you please update the description field to explain why this change is necessary? |
fdee148
to
a2fd341
Compare
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
The REST spec currently uses
%1F
as the UTF-8 encoded namespace separator for multi-part namespaces.This causes issues, since it's a control character and the Servlet spec can reject such characters.
This PR makes the hard-coded namespace separator configurable by giving servers an option to send an optional namespace separator instead of
%1F
. The configuration part is entirely optional for REST server implementers and there's no behavioral change for existing installations.fixes ##10338.