Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive metastore invocations when querying Iceberg table #8675

Closed
findepi opened this issue Jul 27, 2021 · 6 comments
Closed

Excessive metastore invocations when querying Iceberg table #8675

findepi opened this issue Jul 27, 2021 · 6 comments

Comments

@findepi
Copy link
Member

findepi commented Jul 27, 2021

There was a concern about second call to getTable here #8151 (comment) because these can be expensive.

as

assertMetastoreInvocations("SELECT * FROM test_select_from",
ImmutableMultiset.builder()
.addCopies(GET_TABLE, 12)
shows, we currently do 12 such calls per table when querying Iceberg.

We need to understand why we're doing those additional accesses to metastore.
If caching is found to be desired, #8659 is a WIP adding it.

@findepi
Copy link
Member Author

findepi commented Jul 27, 2021

Out of these 12:

  • 4x io.trino.plugin.iceberg.IcebergMetadata#getSystemTable
  • 1x io.trino.plugin.iceberg.IcebergMetadata#getMaterializedView
  • 1x io.trino.plugin.iceberg.IcebergMetadata#getView
  • 2x IcebergMetadata.getTableHandle (apparently within one call)
  • 4x IcebergMetadata.getTableMetadata (these seem plain redundant)

cc @homar @joshthoward @clemensvonschwerin

@findepi
Copy link
Member Author

findepi commented Jul 27, 2021

  • 4x IcebergMetadata.getTableMetadata (these seem plain redundant)

#8676

@electrum
Copy link
Member

Nice analysis. It looks like we will need to change the connector, since it seems hard to avoid all of these.

@findepi
Copy link
Member Author

findepi commented Jul 28, 2021

  • 4x io.trino.plugin.iceberg.IcebergMetadata#getSystemTable

#8689

@findepi
Copy link
Member Author

findepi commented Jul 28, 2021

When selecting from a system table, we currently do even more accesses:

    // select from $history
    assertMetastoreInvocations("SELECT * FROM \"test_select_snapshots$history\"",
            ImmutableMultiset.builder()
                    .addCopies(GET_TABLE, 17)
                    .build());

    // select from $snapshots
    assertMetastoreInvocations("SELECT * FROM \"test_select_snapshots$snapshots\"",
            ImmutableMultiset.builder()
                    .addCopies(GET_TABLE, 19)
                    .build());

    // select from $manifests
    assertMetastoreInvocations("SELECT * FROM \"test_select_snapshots$manifests\"",
            ImmutableMultiset.builder()
                    .addCopies(GET_TABLE, 21)
                    .build());

    // select from $partitions
    assertMetastoreInvocations("SELECT * FROM \"test_select_snapshots$partitions\"",
            ImmutableMultiset.builder()
                    .addCopies(GET_TABLE, 17)
                    .build());

    // select from $files
    assertMetastoreInvocations("SELECT * FROM \"test_select_snapshots$files\"",
            ImmutableMultiset.builder()
                    .addCopies(GET_TABLE, 24)
                    .build());

@findepi
Copy link
Member Author

findepi commented Jul 28, 2021

When selecting from a system table, we currently do even more accesses

#8692

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants