You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
As of now, you can
create an external table (implemented by ListingTable) that points at a local directory and can data to it which makes new files
create an external table (implemented by ListingTable) that points at a local directory with a declared sort order and datafusion will take advantage of that order!
Sadly you can not do both together -- insert data into external table that has had a sort order declared. For example:
$ mkdir output
$ datafusion-cli
DataFusion CLI v29.0.0
❯ create external table output(timetimestamp) stored as parquet location 'output' with order (time);
0 rows inset. Query took 0.002 seconds.
❯ insert into output values (now());
This feature is not implemented: Writing to a sorted listing table via insert into is not supported yet. To write to this table in the meantime, register an equivalent table with file_sort_order = vec![]
In the case of appending new files to a directory, I think it is as simple as having FileSinkExec require its input be sorted. DataFusion's optimizer should do the rest to ensure the new file is sorted properly.
In the case of a single file (LOCATION 'foo.parquet' for example), likely can't be handled efficiently as doing so would require reading the existing file, merging that with the new data and rewriting the whole file.
Describe alternatives you've considered
Alternatively, we could have a check to see if 1) the table is sorted and 2) the input to FileSinkExec is sorted. If 1) is true but 2) is not, we would need to update the metadata about the table to indicate for subsequent queries it is no longer guaranteed to be sorted.
Additional context
No response
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem or challenge?
As of now, you can
ListingTable
) that points at a local directory and can data to it which makes new filesListingTable
) that points at a local directory with a declared sort order and datafusion will take advantage of that order!Sadly you can not do both together -- insert data into external table that has had a sort order declared. For example:
Describe the solution you'd like
From @devinjdangelo comments in #6569 (comment)
In the case of appending new files to a directory, I think it is as simple as having FileSinkExec require its input be sorted. DataFusion's optimizer should do the rest to ensure the new file is sorted properly.
In the case of a single file (
LOCATION 'foo.parquet'
for example), likely can't be handled efficiently as doing so would require reading the existing file, merging that with the new data and rewriting the whole file.Describe alternatives you've considered
Alternatively, we could have a check to see if 1) the table is sorted and 2) the input to FileSinkExec is sorted. If 1) is true but 2) is not, we would need to update the metadata about the table to indicate for subsequent queries it is no longer guaranteed to be sorted.
Additional context
No response
The text was updated successfully, but these errors were encountered: