Core: Make namespace separator configurable #10877

nastra · 2024-08-05T10:42:06Z

The REST spec currently uses %1F as the UTF-8 encoded namespace separator for multi-part namespaces.
This causes issues, since it's a control character and the Servlet spec can reject such characters.

This PR makes the hard-coded namespace separator configurable by giving servers an option to send an optional namespace separator instead of %1F. The configuration part is entirely optional for REST server implementers and there's no behavioral change for existing installations.

fixes ##10338.

core/src/main/java/org/apache/iceberg/rest/RESTUtil.java

core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java

Fokko

I like this approach 👍

dimas-b · 2024-08-08T12:57:51Z

open-api/rest-catalog-open-api.yaml

@@ -222,7 +222,8 @@ paths:
          description:
            An optional namespace, underneath which to list namespaces.
            If not provided or empty, all top-level namespaces should be listed.
-            If parent is a multipart namespace, the parts must be separated by the unit separator (`0x1F`) byte.
+            If parent is a multipart namespace, the parts must be separated by the namespace separator as
+            indicated via the /config override `rest-namespace-separator`, which defaults to the unit separator (`0x1F`) byte.


This approach will break old client in environments where the server is able to accept the non-printable separator, but chooses to use an alternative separator.

So, the upgrade path for users would be to first upgrade clients, then use new servers. As for me, it can be a significant burden on users. Also, this will require that new clients be API-compatible with older servers during the transition period... WDYT?

This won't break old clients and there's even a test that makes sure old clients send %1F while the server chose %2E. See TestRestUtil#encodeAsOldClientAndDecodeAsNewServer()

Please add that to the spec. I do not think we can assume that all REST server implementation reuse Iceberg java code.

As it stands now "must" in line 226 disallows clients to use the old separator.

dimas-b · 2024-08-08T14:50:06Z

open-api/rest-catalog-open-api.yaml

+            To be compatible with older clients, servers have to use `0x1F` as a fallback even when advertising a different
+            namespace separator to clients.


Thanks for the update @nastra . Sorry to be nit-picky, but I believe this text is still not clear enough. I'd suggest To be compatible with older clients, servers should use both the advertised separator and 0x1F as valid separators when parsing namespaces. (feel free to edit)

My point is that it is not clear what is "fallback" in this case. I propose to treat both old and new separators equally (always respect both, even when intermixed)

thx - LGTM 👍

actually I would disagree that a server should respect both intermixed. Only a single namespace separator should be used during encoding so the server should respect both during decoding, but not intermixed (a namespace shouldn't be encoded using %1F and %2E at the same time for example)

How does the server know what (single) separator is effective for a particular request?

core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java

core/src/main/java/org/apache/iceberg/rest/RESTUtil.java

rdblue · 2024-08-08T16:33:23Z

core/src/main/java/org/apache/iceberg/rest/RESTUtil.java

+        !Strings.isNullOrEmpty(namespaceSeparator), "Invalid namespace separator: null or empty");
+    String[] levels;
+
+    // for backwards compatibility


This isn't necessary if the client sends a specific separator in query params, right?

it would be still necessary to be compatible with an old client that doesn't send send the query param and doesn't respect the new rest-namespace-separator that the server sends. There's also a specific test that verifies this scenario in TestRestUtil.encodeAsOldClientAndDecodeAsNewServer()

core/src/main/java/org/apache/iceberg/rest/RESTUtil.java

core/src/main/java/org/apache/iceberg/rest/ResourcePaths.java

rdblue · 2024-08-08T16:46:37Z

core/src/test/java/org/apache/iceberg/rest/RESTCatalogAdapter.java

@@ -73,6 +73,7 @@
 /** Adaptor class to translate REST requests into {@link Catalog} API calls. */
 public class RESTCatalogAdapter implements RESTClient {
  private static final Splitter SLASH = Splitter.on('/');
+  private static final String NAMESPACE_SEPARATOR = "%2E";


Why is this not .? It's not a character that needs to be escaped.

CatalogTests#testNamespaceWithDot would fail when using . here, so in this case it needs to be the UTF-8 encoded string.

rdblue · 2024-08-08T16:47:22Z

core/src/test/java/org/apache/iceberg/rest/RESTCatalogAdapter.java

@@ -665,7 +670,7 @@ public static void configureResponseFromException(
  }

  private static Namespace namespaceFromPathVars(Map<String, String> pathVars) {
-    return RESTUtil.decodeNamespace(pathVars.get("namespace"));
+    return RESTUtil.decodeNamespace(pathVars.get("namespace"), NAMESPACE_SEPARATOR);


I thought that @jackye1995 suggested sending the separator from the client each time. Is that not what we want to do here?

yes and that is being handled in #10905. However, as a first step we need to make the namespace separator configurable (regardless of whether it's configurable from the server or the client), which is being handled in this PR. Making it controllable from the client and send a query param is handled after this PR in #10905 (and requires a vote on the spec change)

rdblue · 2024-08-08T16:50:35Z

core/src/test/java/org/apache/iceberg/rest/TestRESTUtil.java

-  @Test
-  public void testRoundTripUrlEncodeDecodeNamespace() {
+  @ParameterizedTest
+  @ValueSource(strings = {"%1F", "%2D", "%2E"})


Shouldn't escaping be handled automatically?

yes, but the test also uses the non-UTF-8 strings within the namespace, so we need to use the UTF-8 encoded string. I've added some non-UTF-8 encoded strings to the parameters list to indicate that such strings can be used as well (as long as they aren't allowed in the namespace name itself)

core/src/main/java/org/apache/iceberg/rest/RESTUtil.java

snazy · 2024-08-09T11:01:42Z

core/src/main/java/org/apache/iceberg/rest/RESTUtil.java

@@ -194,15 +192,34 @@ public static String decodeString(String encoded) {
   * @return UTF-8 encoded string representing the namespace, suitable for use as a URL parameter
   */
  public static String encodeNamespace(Namespace ns) {


This function looks dangerous now.

can you please elaborate what you mean by "dangerous" here? Nothing really changed for callers of this method (other than the fact that a Joiner is created every time this method is called)

snazy · 2024-08-09T11:01:57Z

core/src/main/java/org/apache/iceberg/rest/RESTUtil.java

@@ -215,8 +232,32 @@ public static String encodeNamespace(Namespace ns) {
   * @return a namespace
   */
  public static Namespace decodeNamespace(String encodedNs) {


This function looks dangerous now.

can you please elaborate what you mean by "dangerous" here? Nothing really changed for callers of this method

Mixing this and the new function.

it's still not clear to my why that would be "dangerous". There is no behavioral change for existing callers of this method. Can you please add a concrete example/explanation to justify the "dangerous" part?

Users can mix both - causing "confusion".

I don't think that argument is actually true. Users won't be using this code as this code is used on the client and the server but not by users in the classical sense of a user that uses Iceberg. For engine/catalog implementers the javadoc states that one should have used encodeNamespace(Namespace namespace) if you use decodeNamespace(String encodedNs).
Using words like dangerous and confusing without clear examples and justifications isn't helpful in a code review

core/src/main/java/org/apache/iceberg/rest/RESTUtil.java

snazy · 2024-08-09T11:02:50Z

core/src/main/java/org/apache/iceberg/rest/RESTUtil.java

+   * <p>See also {@link #encodeNamespace} for generating correctly formatted URLs.
+   *
+   * @param encodedNamespace a namespace to decode
+   * @param separator The namespace separator to use for decoding. This should be the same separator


should? not "must"?

snazy · 2024-08-09T11:04:26Z

core/src/main/java/org/apache/iceberg/rest/RESTUtil.java

+    // use legacy splitter for backwards compatibility in case an old clients encoded the namespace
+    // with %1F
+    Splitter splitter =
+        encodedNamespace.contains(NAMESPACE_ESCAPED_SEPARATOR)


How can this trigger if separator is not `\u001f"?

can you elaborate please what you mean exactly here?

snazy · 2024-08-09T11:15:03Z

core/src/main/java/org/apache/iceberg/rest/RESTUtil.java

+   * <p>See also {@link #encodeNamespace} for generating correctly formatted URLs.
+   *
+   * @param encodedNamespace a namespace to decode
+   * @param separator The namespace separator to use for decoding. This should be the same separator


Is this URL-encoded or not the URL-encoded? Allowing both is dangerous, right?

we are not enforcing this to be UTF-8 encoded. A user/server could also pass a non-UTF-8 encoded separator

There's no validation then and users can specify everything?
No definition & verification of whether this is a single (1 byte) UTF-8 character.
No definition & verification of whether this character is URL-encoded or not.

The scope of this PR is to make the existing namespace separator configurable and let the server communicate to clients which one should be used. Users can't specify anything as it would always be overriden with what the server sends.

nastra · 2024-08-13T06:34:32Z

@jackye1995 could you take a look at this PR please? It would be great to get this in, so that we can can continue with #10904 / #10905 (where we pass the namespace separator via a query param to the server)

snazy

I still have concerns about this PR. Some of the concerns have been proactively resolved with "this is copy-paste".

snazy · 2024-08-13T08:34:49Z

This PR and the mentioned follow-ups change the REST spec. It seems to be agreed on, that all specification changes require a code-change vote on the dev mailing list.

nastra · 2024-08-13T10:28:59Z

I still have concerns about this PR. Some of the concerns have been proactively resolved with "this is copy-paste".

I'm not sure whether this comes from a misunderstanding of the scope of the PR. The scope of the PR is to make the namespace separator configurable and let the server specify its preferred namespace separator while still maintaining full backward compability with the previous separator (%1F)

core/src/main/java/org/apache/iceberg/rest/RESTUtil.java

open-api/rest-catalog-open-api.yaml

cwsteinbach · 2024-10-11T17:17:59Z

@nastra, can you please update the description field to explain why this change is necessary?

github-actions · 2024-11-14T00:15:02Z

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

github-actions bot added the core label Aug 5, 2024

nastra force-pushed the configurable-namespace-separator branch 3 times, most recently from ffb244e to a4313e4 Compare August 5, 2024 12:35

nastra closed this Aug 5, 2024

nastra reopened this Aug 5, 2024

amogh-jahagirdar reviewed Aug 5, 2024

View reviewed changes

core/src/main/java/org/apache/iceberg/rest/RESTUtil.java Outdated Show resolved Hide resolved

core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java Outdated Show resolved Hide resolved

nastra force-pushed the configurable-namespace-separator branch from a4313e4 to c599527 Compare August 6, 2024 08:59

github-actions bot added the OPENAPI label Aug 6, 2024

nastra force-pushed the configurable-namespace-separator branch 2 times, most recently from 06fbde2 to 92b8b8b Compare August 6, 2024 10:05

nastra requested a review from amogh-jahagirdar August 6, 2024 14:31

amogh-jahagirdar approved these changes Aug 7, 2024

View reviewed changes

Fokko approved these changes Aug 8, 2024

View reviewed changes

dimas-b reviewed Aug 8, 2024

View reviewed changes

nastra force-pushed the configurable-namespace-separator branch 2 times, most recently from f0c8ad2 to 1b00cd5 Compare August 8, 2024 14:49

dimas-b reviewed Aug 8, 2024

View reviewed changes

nastra force-pushed the configurable-namespace-separator branch from 1b00cd5 to 4b5ed11 Compare August 8, 2024 14:52

dimas-b approved these changes Aug 8, 2024

View reviewed changes

dimas-b mentioned this pull request Aug 8, 2024

OpenAPI: Add query param to control namespace separator #10904

Closed

rdblue reviewed Aug 8, 2024

View reviewed changes

core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java Outdated Show resolved Hide resolved

rdblue reviewed Aug 8, 2024

View reviewed changes

core/src/main/java/org/apache/iceberg/rest/RESTUtil.java Outdated Show resolved Hide resolved

rdblue reviewed Aug 8, 2024

View reviewed changes

core/src/main/java/org/apache/iceberg/rest/RESTUtil.java Outdated Show resolved Hide resolved

rdblue reviewed Aug 8, 2024

View reviewed changes

core/src/main/java/org/apache/iceberg/rest/ResourcePaths.java Show resolved Hide resolved

rdblue reviewed Aug 8, 2024

View reviewed changes

core/src/main/java/org/apache/iceberg/rest/ResourcePaths.java Outdated Show resolved Hide resolved

rdblue reviewed Aug 8, 2024

View reviewed changes

nastra force-pushed the configurable-namespace-separator branch from 4b5ed11 to 8929e42 Compare August 9, 2024 06:29

nastra mentioned this pull request Aug 9, 2024

Core: Pass namespace separator via query param #10905

Closed

nastra force-pushed the configurable-namespace-separator branch from 8929e42 to b663845 Compare August 9, 2024 07:12

snazy suggested changes Aug 9, 2024

View reviewed changes

snazy suggested changes Aug 13, 2024

View reviewed changes

dimas-b reviewed Aug 16, 2024

View reviewed changes

core/src/main/java/org/apache/iceberg/rest/RESTUtil.java Show resolved Hide resolved

jackye1995 reviewed Aug 16, 2024

View reviewed changes

core/src/main/java/org/apache/iceberg/rest/RESTUtil.java Show resolved Hide resolved

nastra force-pushed the configurable-namespace-separator branch from b663845 to 1ac820b Compare August 21, 2024 08:45

jackye1995 approved these changes Aug 28, 2024

View reviewed changes

nastra force-pushed the configurable-namespace-separator branch from 1ac820b to fdee148 Compare September 11, 2024 11:15

nastra mentioned this pull request Sep 11, 2024

OpenAPI: Use %2E as namespace separator instead of %1F #10839

Closed

sungwy mentioned this pull request Sep 18, 2024

Make REST Catalog Namespace Separator Configurable apache/iceberg-python#1183

Open

cwsteinbach reviewed Oct 11, 2024

View reviewed changes

open-api/rest-catalog-open-api.yaml Outdated Show resolved Hide resolved

cwsteinbach reviewed Oct 11, 2024

View reviewed changes

open-api/rest-catalog-open-api.yaml Outdated Show resolved Hide resolved

Core: Make namespace separator configurable

a2fd341

nastra force-pushed the configurable-namespace-separator branch from fdee148 to a2fd341 Compare October 14, 2024 09:52

nastra requested a review from cwsteinbach October 14, 2024 09:53

ebyhr mentioned this pull request Nov 4, 2024

Move nested namespace support in REST catalog behind a config trinodb/trino#24016

Merged

singhpk234 mentioned this pull request Nov 12, 2024

Upgrade Iceberg 1.7.0 apache/polaris#442

Open

11 tasks

github-actions bot added the stale label Nov 14, 2024

nastra added not-stale and removed stale labels Nov 14, 2024

nastra mentioned this pull request Nov 19, 2024

Revert "Core: Use encoding/decoding methods for namespaces and deprecate Splitter/Joiner" #11574

Merged

		To be compatible with older clients, servers have to use `0x1F` as a fallback even when advertising a different
		namespace separator to clients.

Core: Make namespace separator configurable #10877

Are you sure you want to change the base?

Core: Make namespace separator configurable #10877

Conversation

nastra commented Aug 5, 2024 • edited Loading

Fokko left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dimas-b Aug 8, 2024 • edited Loading

Choose a reason for hiding this comment

dimas-b Aug 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nastra Aug 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nastra commented Aug 13, 2024

snazy left a comment

Choose a reason for hiding this comment

snazy commented Aug 13, 2024

nastra commented Aug 13, 2024

cwsteinbach commented Oct 11, 2024

github-actions bot commented Nov 14, 2024

nastra commented Aug 5, 2024 •

edited

Loading

dimas-b Aug 8, 2024 •

edited

Loading

dimas-b Aug 8, 2024 •

edited

Loading

nastra Aug 9, 2024 •

edited

Loading