Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Iceberg statistics loading after snapshot expiration #17356

Conversation

findepi
Copy link
Member

@findepi findepi commented May 5, 2023

Previously getting stats would fail with NPE:

Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.iceberg.Snapshot.parentId()" because the return value of "org.apache.iceberg.Table.snapshot(long)" is null
	at io.trino.plugin.iceberg.TableStatisticsReader$2.computeNext(TableStatisticsReader.java:328)
	at io.trino.plugin.iceberg.TableStatisticsReader$2.computeNext(TableStatisticsReader.java:322)
	at com.google.common.collect.AbstractSequentialIterator.next(AbstractSequentialIterator.java:74)
	at io.trino.plugin.iceberg.TableStatisticsReader$1.computeNext(TableStatisticsReader.java:305)
	at io.trino.plugin.iceberg.TableStatisticsReader$1.computeNext(TableStatisticsReader.java:284)
	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:146)
	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:141)
	at io.trino.plugin.iceberg.TableStatisticsReader.readNdvs(TableStatisticsReader.java:232)
	at io.trino.plugin.iceberg.TableStatisticsReader.makeTableStatistics(TableStatisticsReader.java:160)
	at io.trino.plugin.iceberg.TableStatisticsReader.getTableStatistics(TableStatisticsReader.java:93)
	at io.trino.plugin.iceberg.IcebergMetadata.lambda$getTableStatistics$77(IcebergMetadata.java:2501)
	at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1708)
	at io.trino.plugin.iceberg.IcebergMetadata.getTableStatistics(IcebergMetadata.java:2481)

Previously getting stats would fail with NPE:

```
Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.iceberg.Snapshot.parentId()" because the return value of "org.apache.iceberg.Table.snapshot(long)" is null
	at io.trino.plugin.iceberg.TableStatisticsReader$2.computeNext(TableStatisticsReader.java:328)
	at io.trino.plugin.iceberg.TableStatisticsReader$2.computeNext(TableStatisticsReader.java:322)
	at com.google.common.collect.AbstractSequentialIterator.next(AbstractSequentialIterator.java:74)
	at io.trino.plugin.iceberg.TableStatisticsReader$1.computeNext(TableStatisticsReader.java:305)
	at io.trino.plugin.iceberg.TableStatisticsReader$1.computeNext(TableStatisticsReader.java:284)
	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:146)
	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:141)
	at io.trino.plugin.iceberg.TableStatisticsReader.readNdvs(TableStatisticsReader.java:232)
	at io.trino.plugin.iceberg.TableStatisticsReader.makeTableStatistics(TableStatisticsReader.java:160)
	at io.trino.plugin.iceberg.TableStatisticsReader.getTableStatistics(TableStatisticsReader.java:93)
	at io.trino.plugin.iceberg.IcebergMetadata.lambda$getTableStatistics$77(IcebergMetadata.java:2501)
	at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1708)
	at io.trino.plugin.iceberg.IcebergMetadata.getTableStatistics(IcebergMetadata.java:2481)
```
@cla-bot cla-bot bot added the cla-signed label May 5, 2023
@github-actions github-actions bot added the iceberg Iceberg connector label May 5, 2023
@Test
public void testShowStatsAsOf()
{
Session writeSession = withStatsOnWrite(getSession(), false);
Copy link
Contributor

@findinpath findinpath May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extract "show_stats_as_of" to a variable.

('key', null, 4, 0, null, '1', '5'), -- NDV present, stats "inherited" from previous snapshot
(null, null, null, null, 5, null, null)""");

// Re-analyzing after snapshot expired
Copy link
Contributor

@findinpath findinpath May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which "snapshot expired" are you referring to here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some snapshot expired

// Snapshot referenced by `previous` had no parent.
return null;
}
return verifyNotNull(snapshot.parentId(), "snapshot.parentId()");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verifyNotNull seem redundant here - given that we check above

if (snapshot.parentId() == null)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, obviously.

it's here for readability & future proofing that we do not return end-of-data on this line accidentally.

otherwise i could just have

return snapshot.parentId();

without preceding if on that value

@findepi findepi requested a review from findinpath May 5, 2023 10:24
@findepi findepi merged commit f202784 into trinodb:master May 5, 2023
@findepi findepi deleted the findepi/fix-iceberg-statistics-loading-after-snapshot-expiration-51be93 branch May 5, 2023 11:49
@github-actions github-actions bot added this to the 417 milestone May 5, 2023
@colebow
Copy link
Member

colebow commented May 10, 2023

Does this need release notes? @findepi

@findepi
Copy link
Member Author

findepi commented May 10, 2023

yes, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

5 participants