Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: table support for Excel (XLSX) sheets #282

Merged
merged 19 commits into from
Sep 18, 2024

Conversation

wdoppenberg
Copy link
Contributor

@wdoppenberg wdoppenberg commented Sep 3, 2024

Implemented functionality to read and work with named tables within Excel sheets. Updated .gitignore, added corresponding tests.

One note: I'm using a clone() in the build_table method, please advise on how to get rid of this. Also, please give me feedback on eager vs. lazy loading, I believe some refinement might be necessary there.

Related issue: #204

Implemented functionality to read and work with named tables within Excel sheets, extending support for lazy loading and error handling. Updated `.gitignore`, added corresponding tests, and refined error structures and build processes.
@lukapeschke lukapeschke self-requested a review September 3, 2024 09:42
@lukapeschke
Copy link
Collaborator

Oh wow thanks a lot! I'll review this ASAP. In the meantime, could you please rebase the branch and run make format ? Looks like a lot of code has been re-indented to 8 spaces, and we use 4

Implemented functionality to read and work with named tables within Excel sheets, extending support for lazy loading and error handling. Updated `.gitignore`, added corresponding tests, and refined error structures and build processes.
…ture/extract-tables

# Conflicts:
#	python/fastexcel/_fastexcel.pyi
#	python/tests/test_tables.py
#	src/types/python/excelreader.rs
#	src/types/python/excelsheet/table.rs
#	test.py
@wdoppenberg wdoppenberg changed the title Add table support for Excel (XLSX) sheets feat: table support for Excel (XLSX) sheets Sep 3, 2024
Copy link
Collaborator

@lukapeschke lukapeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your work, looks great overall 🚀

Since it's a pretty big change, I'd like @PrettyWood to have a look as well, as I might have missed some parts (he'll be back from vacation next week).

Also, I wonder wether we should introduce a new type for tables: It'd allow us to store some extra metadata, and would allow us to add to_{arrow,polars,pandas} methods as well. As a nice side effect, it would also avoid the clone there currently is without needing changes on the calamine side

Anyway, this has motivated me again to polish the README and add some proper contributor docs 😅

Makefile Outdated Show resolved Hide resolved
python/fastexcel/__init__.py Outdated Show resolved Hide resolved
src/error.rs Outdated Show resolved Hide resolved
Comment on lines 240 to 241
// TODO: Remove clone
ExcelSheetData::from(range.clone()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I believe the best solution would be to submit a change to calamine: Either make the data field pub in or add a pub into_data(self) -> T method to the Table type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened a PR: tafia/calamine#464

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the TODO with "to update once calamine v.... is added"

src/types/python/excelreader.rs Outdated Show resolved Hide resolved
test.py Outdated Show resolved Hide resolved
test.py Outdated Show resolved Hide resolved
test.py Outdated Show resolved Hide resolved
src/types/python/excelreader.rs Outdated Show resolved Hide resolved
@lukapeschke lukapeschke added enhancement New feature or request 🐍 python 🐍 Pull requests that edit Python code 🦀 rust 🦀 Pull requests that edit Rust code 🗒️ calamine 🗒️ Calamine-related issue labels Sep 3, 2024
@dhcsousa
Copy link

dhcsousa commented Sep 6, 2024

Very cool feature, it would also be useful for me!

Copy link
Member

@PrettyWood PrettyWood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks a lot

Comment on lines 240 to 241
// TODO: Remove clone
ExcelSheetData::from(range.clone()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the TODO with "to update once calamine v.... is added"

@PrettyWood
Copy link
Member

Thanks a lot for your work, looks great overall 🚀

Since it's a pretty big change, I'd like @PrettyWood to have a look as well, as I might have missed some parts (he'll be back from vacation next week).

Also, I wonder wether we should introduce a new type for tables: It'd allow us to store some extra metadata, and would allow us to add to_{arrow,polars,pandas} methods as well. As a nice side effect, it would also avoid the clone there currently is without needing changes on the calamine side

Anyway, this has motivated me again to polish the README and add some proper contributor docs 😅

@lukapeschke Yup I reckon we should differentiate Sheet and Table. I don' know if you want to work on that @wdoppenberg directly in this PR. It would be great but if you don't have the time we can merge this PR like that and work on a refacto after that on our end

@wdoppenberg
Copy link
Contributor Author

Thanks a lot for your work, looks great overall 🚀
Since it's a pretty big change, I'd like @PrettyWood to have a look as well, as I might have missed some parts (he'll be back from vacation next week).
Also, I wonder wether we should introduce a new type for tables: It'd allow us to store some extra metadata, and would allow us to add to_{arrow,polars,pandas} methods as well. As a nice side effect, it would also avoid the clone there currently is without needing changes on the calamine side
Anyway, this has motivated me again to polish the README and add some proper contributor docs 😅

@lukapeschke Yup I reckon we should differentiate Sheet and Table. I don' know if you want to work on that @wdoppenberg directly in this PR. It would be great but if you don't have the time we can merge this PR like that and work on a refacto after that on our end

I don't have time for this this week. So depending on your timelines you might want to merge and open a new branch. If you can wait a week I can spend some time then.

@lukapeschke
Copy link
Collaborator

@wdoppenberg Let's merge this as is then to avoid more merge conflicts, and I'll see what I can do regarding the new Table object 🙂 Could you please give me push rights on your fork ? I'll fix the conflicts and merge the PR afterwards

@wdoppenberg
Copy link
Contributor Author

@wdoppenberg Let's merge this as is then to avoid more merge conflicts, and I'll see what I can do regarding the new Table object 🙂 Could you please give me push rights on your fork ? I'll fix the conflicts and merge the PR afterwards

Should work:

Screenshot 2024-09-17 at 20 02 58

Copy link
Collaborator

@lukapeschke lukapeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks a lot @wdoppenberg 🙏

@lukapeschke lukapeschke merged commit 79a7f17 into ToucanToco:main Sep 18, 2024
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🦀 rust 🦀 Pull requests that edit Rust code enhancement New feature or request ✋ need review ✋ 🐍 python 🐍 Pull requests that edit Python code 🗒️ calamine 🗒️ Calamine-related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants