Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: extend fastexcel support to python >= 3.8 #152

Merged

Conversation

alexander-beedie
Copy link
Contributor

@alexander-beedie alexander-beedie commented Jan 23, 2024

Hoping that a patch to extend Python support is welcome 😉

I was looking to integrate a calamine based Excel parser as one of our supported engines in the Polars read_excel method (I am one of the core devs for that project), but we need to support Python versions >= 3.8. While we could potentially have python_calamine handle earlier versions, I'm reluctant to do so as fastexcel is much better-suited for our use-case and benchmarks significantly faster (producing Arrow data without materialising anything in Python-space; nicely done!)

Took a look at the code to see if it could be extended to cover the earlier versions, and it seems like a straightforward low-impact update; I've validated that everything compiles/runs on the earlier Python versions (testing on x86 Ubuntu Linux for py3.8 and an Apple Silicon Mac for py3.9). Mostly just some minor typing updates and avoidance of the match expression in one of the tests. Put in an extra lint rule to catch the need for a from __future__ import annotations statement at the top of some files (to keep the modern typing syntax).

I can't validate the workflow changes, but with a little luck they'll run ok 🤞

Thanks for the great work, and let me know if you want any changes/modifications, etc?

@lukapeschke lukapeschke added enhancement New feature or request python labels Jan 24, 2024
Copy link
Collaborator

@lukapeschke lukapeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thanks a lot for your PR!

We wanted to propose fastexcel as an engine to polars as well, but you've been faster 😉

The changes look good to me overall, just a few comments here and there.

Also, have you encountered any difficulties/questions when you looked at the code ? We want to make the docs and README more welcoming to newcomers, so feedback from a first-time contributor would be precious 😄

.github/workflows/CI.yml Outdated Show resolved Hide resolved
.github/workflows/CI.yml Show resolved Hide resolved
README.md Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
Copy link
Collaborator

@lukapeschke lukapeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@alexander-beedie
Copy link
Contributor Author

alexander-beedie commented Jan 24, 2024

Also, have you encountered any difficulties/questions when you looked at the code?

Actually I was pleasantly surprised how easy it was to go from "clone repo" to "everything is compiled and tests are running/passing" 👌 On that front I only have one comment, which is that I didn't immediately expect make dev-install and make prod-install to actually build, heh. Perhaps getting the word "build" in there would make it so even people who aren't fully paying attention (in this case: me!) wouldn't have to look twice - but honestly that's about the only thing I can think of, and it only took a few seconds to read through the Makefile with a bit more care 🤣

Otherwise it was smooth sailing; was up & running almost immediately. As for the source itself, I have only given it a superficial scan so far - mostly just looking to see what features are exposed (eg: is there a "has_headers" flag in case the sheet has no column names, an option to set them if there aren't, etc).

@lukapeschke lukapeschke merged commit 54b882d into ToucanToco:main Jan 24, 2024
10 checks passed
@alexander-beedie alexander-beedie deleted the extend-python-compatibility branch January 24, 2024 15:08
@lukapeschke
Copy link
Collaborator

I didn't immediately expect make dev-install and make prod-install to actually build

Good point, I'll update the make target names :)

Regarding the features you'd like to add, feel free to open issues and @ me on them! We've only exposed what we need internally for now, but things such as sheet headers handling should be pretty easy to add

@lukapeschke
Copy link
Collaborator

@alexander-beedie fastexcel 0.7.0 has been released and wheels for python 3.8 -> 3.12 have been successfully built and published to PyPI 🥳 https://pypi.org/project/fastexcel/0.7.0/#files

@alexander-beedie
Copy link
Contributor Author

feel free to open issues and @ me on them

Fantastic; thanks for the extremely quick turnaround on this! I've already integrated it as a new option for Polars in a local build and it's looking great. Will finish up some unit tests and look to push it out in the next day or two ✌️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants