Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow inserts to a sorted ListingTable #7354

Closed
alamb opened this issue Aug 21, 2023 · 1 comment · Fixed by #7743
Closed

Allow inserts to a sorted ListingTable #7354

alamb opened this issue Aug 21, 2023 · 1 comment · Fixed by #7743
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Aug 21, 2023

Is your feature request related to a problem or challenge?

As of now, you can

  1. create an external table (implemented by ListingTable) that points at a local directory and can data to it which makes new files
  2. create an external table (implemented by ListingTable) that points at a local directory with a declared sort order and datafusion will take advantage of that order!

Sadly you can not do both together -- insert data into external table that has had a sort order declared. For example:

$ mkdir output
$ datafusion-cli
DataFusion CLI v29.0.0
❯ create external table output(time timestamp) stored as parquet location 'output' with order (time);
0 rows in set. Query took 0.002 seconds.

❯ insert into output values (now());
This feature is not implemented: Writing to a sorted listing table via insert into is not supported yet. To write to this table in the meantime, register an equivalent table with file_sort_order = vec![]

Describe the solution you'd like

From @devinjdangelo comments in #6569 (comment)

In the case of appending new files to a directory, I think it is as simple as having FileSinkExec require its input be sorted. DataFusion's optimizer should do the rest to ensure the new file is sorted properly.

In the case of a single file (LOCATION 'foo.parquet' for example), likely can't be handled efficiently as doing so would require reading the existing file, merging that with the new data and rewriting the whole file.

Describe alternatives you've considered

Alternatively, we could have a check to see if 1) the table is sorted and 2) the input to FileSinkExec is sorted. If 1) is true but 2) is not, we would need to update the metadata about the table to indicate for subsequent queries it is no longer guaranteed to be sorted.

Additional context

No response

@alamb alamb added the enhancement New feature or request label Aug 21, 2023
@parkma99
Copy link
Contributor

Looks interesting to me , I would like to take this issue😊.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants