Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scan: fix error when reading an empty table #608

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mattheusv
Copy link
Contributor

Previously TableScan struct was requiring a Snapshot to plan files and for empty tables without a snapshot an error was being returned instead of an empty result.

Following the same approach of Java [0] and Python [1] implementation this commit change the snapshot property to accept None values and the plan_files method was also changed to return an empty stream if the snapshot is not present on on PlanContext.

[0] https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SnapshotScan.java#L119
[1] https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/__init__.py#L1979

Fixes: #580

@mattheusv
Copy link
Contributor Author

Just some notes for reviewers:

I'm not 100% sure that this the best approach to fix this issue, I've just tried to follow the same approach used on Java and Python implementation, but I don't know if there is a better way to implement in Rust.

Another point is that I'm bit confusing where should I write a test case for this issue?

@mattheusv mattheusv force-pushed the fix-read-empty-table branch 2 times, most recently from 39b8da9 to abcd7f8 Compare September 6, 2024 18:58
@sdd
Copy link
Contributor

sdd commented Sep 6, 2024

Thanks for the contribution! Do we need to address this inside scan though? Why let someone build a TableScan that will always be useless?

This can be handled instead in the code that invokes table.scan(), without needing to make changes to the scan builder, scan, and context objects just for this edge case.

let scan_builder = table.scan();
// (customize builder here if reqd)...

let Ok(scan) = scan_builder.build() else {
    return Ok(stream::empty().boxed());
};
  
scan().plan_files()

@mattheusv
Copy link
Contributor Author

Hi @sdd , thanks for your review.

I'm not sure if I understand your suggestion. I agree that would be better to fix this edge case with a smaller change, but I'm not sure If I understand your suggestion correctly.

The idea would be make the callers of TableScanBuilder.build() to handle the case where the table don't have any data? The scan_builder.build() currently returns a TableScan and the TableScan.plan_files that actually may return a stream::empty().boxed(), so I don't know if I'm missing something here? (I'm new on this codebase)

Just adding another idea: would make sense to return an error like Error::new(ErrorKind::EmptyTable) when calling TableScanBuilder.build()?

@sdd
Copy link
Contributor

sdd commented Sep 9, 2024

Just to clarify, not having any snapshots is not necessarily the same as not having any data. If there is no current snapshot then there can't be any data, but someone could delete all data from a table, resulting in there being a snapshot, but no data. The existing code would handle this second case just fine - we only need to handle the issue of no snapshots.

@mattheusv
Copy link
Contributor Author

mattheusv commented Sep 10, 2024

@sdd I've changed the code to return a ErrorKind::TableWithoutSnapshot instead of FeatureUnsupported. With this the user can differentiate a table without snapshots and a table without data. WYT?

@sdd
Copy link
Contributor

sdd commented Sep 11, 2024

We've been very selective when it comes to adding new values to ErrorKind. I'd personally go for Unexpected here - but maybe @liurenjie1024 or @Xuanwo can confirm what would be best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed to read empty table
2 participants