Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Reading Excel Tables #204

Open
aersam opened this issue Mar 15, 2024 · 6 comments · May be fixed by #287
Open

Support for Reading Excel Tables #204

aersam opened this issue Mar 15, 2024 · 6 comments · May be fixed by #287
Labels
feature request question Further information is requested
Milestone

Comments

@aersam
Copy link

aersam commented Mar 15, 2024

From my experience it's usually much safer to load data from an excel table than from a sheet. would be nice if one could get the table names per sheet and get the table data as arrow/pandas like with the sheets

@lukapeschke lukapeschke added question Further information is requested feature request labels Mar 21, 2024
@lukapeschke
Copy link
Collaborator

Hi @aersam , could you please provide a bit more details on what you have in mind ? And maybe provide a few examples on how that parameter is passed in other libraries (for example pandas) ? A file with the issue you are encountering would also be great

@aersam
Copy link
Author

aersam commented Mar 21, 2024

Hi there!
Here's a sample file: tables.xlsx
I'd like to pass the table_name instead of sheet_name, table name's are unique within an excel workbook as well:

image

@lukapeschke
Copy link
Collaborator

Thanks! This seems to be doable through calamine's APIs (https://docs.rs/calamine/latest/calamine/struct.Xlsx.html#method.table_by_name) but will require quite some work, so I don't know when/if we will ship it... We might pritoritize this if a lot of people ask for this feature 🙂

@alexander-beedie
Copy link
Contributor

Thanks! This seems to be doable through calamine's APIs (https://docs.rs/calamine/latest/calamine/struct.Xlsx.html#method.table_by_name) but will require quite some work, so I don't know when/if we will ship it... We might pritoritize this if a lot of people ask for this feature 🙂

As a minor FYI, we always write "real" Excel table objects from Polars' write_excel method as they are generally considered quite useful, given how freeform Excel can otherwise be ;)

@lukapeschke
Copy link
Collaborator

@alexander-beedie ah good to know, thanks! How do you handle tables on read ? Is there a parameter to read a specific table ?

@alexander-beedie
Copy link
Contributor

alexander-beedie commented May 1, 2024

@alexander-beedie ah good to know, thanks! How do you handle tables on read ? Is there a parameter to read a specific table ?

At the moment I don't, heh 😎 However, I was thinking about adding a table_name parameter, as one of our other engines (openpyxl) also supports reading table objects. If we could get an equivalent capability for fastexcel that would be really nice, as it completely eliminates the guessing game of "where does the data start/end" 👌

(You may also want an option to skip the final row when reading table objects, as that can contain totals that are not part of the data - they also may or may not have a header row, so the existing parameters to determine that can be respected).

@PrettyWood PrettyWood added this to the v0.12.0 milestone Jul 1, 2024
@lukapeschke lukapeschke linked a pull request Sep 18, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants