Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running SHOW SCHEMAS IN catalog makes N+1 requests to an iceberg REST catalog #23961

Closed
jhatcher1 opened this issue Oct 29, 2024 · 4 comments · Fixed by #24016
Closed

Running SHOW SCHEMAS IN catalog makes N+1 requests to an iceberg REST catalog #23961

jhatcher1 opened this issue Oct 29, 2024 · 4 comments · Fixed by #24016
Assignees

Comments

@jhatcher1
Copy link

Description

Trino version: 463

When running SHOW SCHEMAS IN iceberg, Trino is making one "list namespaces" API request to the REST catalog to get a list of top-level namespaces, and then one "list namespaces" API request per namespace returned in the first request. If I have several thousand namespaces, this results in thousands of API requests being sent to the iceberg REST catalog, and takes a long time to return the results. For our use case, we're planning on having thousands to millions of namespaces, so this would put a lot of load on our REST catalog if we were to run this command.

As an example with 10000 schemas, Trino made ~10000 API requests to the REST catalog and took 54 seconds to return the list of schemas:

trino> SHOW SCHEMAS in "iceberg";
<results...>

Query 20241029_023036_09007_cqnh5, FINISHED, 1 node
Splits: 19 total, 19 done (100.00%)
54.80 [10K rows, 116KiB] [183 rows/s, 2.12KiB/s]

I think this was introduced in #22916 as a way to support nested namespaces. Since the Iceberg REST API only has a way to list direct descendants of a given namespace, you have to recursively make "list namespace" requests to list all of them.

I checked what spark does in this case. If I run SHOW SCHEMAS, it makes a single "list namespace" request, and only returns the top level namespaces.

spark-sql ()> SHOW SCHEMAS;
<top level results...>
Time taken: 0.223 seconds, Fetched 10025 row(s)

If I want to show nested namespaces, I need to explicitly request the list of nested namespaces in the parent:

spark-sql ()> SHOW SCHEMAS in a;
SHOW SCHEMAS in a
a.a
a.b
a.c
a.d
Time taken: 0.174 seconds, Fetched 4 row(s)
@neerajvash8
Copy link

I believe I have witnessed this firsthand with the Unity Catalog REST API, where connection requests were rate-limited after some time. Upon

@ebyhr
Copy link
Member

ebyhr commented Oct 29, 2024

cc: @mayankvadariya

@mayankvadariya mayankvadariya self-assigned this Oct 29, 2024
@mayankvadariya
Copy link
Contributor

Effectively Trino doesn't support SHOW SCHEMAS in "catalog.schema"; syntax https://trino.io/docs/current/sql/show-schemas.html
Recursively calling listnamespaces seems the way to support nested namespace.
@ebyhr does it make sense to control nested namespace support via a config to avoid making recursive calls in case a catalog is not configured to query nested namespace?

@ebyhr
Copy link
Member

ebyhr commented Oct 31, 2024

@mayankvadariya Yes, makes sense to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

4 participants