Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyO3 0.22 #1665

Merged
merged 4 commits into from
Nov 1, 2024
Merged

PyO3 0.22 #1665

merged 4 commits into from
Nov 1, 2024

Conversation

diliop
Copy link
Contributor

@diliop diliop commented Oct 23, 2024

Upgrade from PyO3 0.21 to 0.22

Closes #1639

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 🔥
Just approved the workflows, let's go 🚀
Feel free to ping me, tho I think you are up to a great start!

@yookoala
Copy link

yookoala commented Oct 24, 2024

Sadly ran into numpy compilation issue on Windows (32bit).

Details:
PyO3/rust-numpy#448

Hopefully, the fix is on the way:
PyO3/rust-numpy#463

@yookoala
Copy link

The fix has merged to rust-numpy. Now we need to wait for a new release that includes the merge.

@yookoala
Copy link

@diliop: rust-numpy v0.22.1 has been released and it should have the Win32 compilation issue fixed. Please update to use it and try again.

@diliop
Copy link
Contributor Author

diliop commented Oct 30, 2024

There is no need to change anything in code. Just re-running checks on this PR is sufficient since numpy is already set to 0.22 which includes 0.22.1. Looking a the checks above, you can see the 0.22.1 download here. That said, everything looks green :)

@diliop diliop requested a review from ArthurZucker October 30, 2024 14:50
@yookoala
Copy link

@diliop: I think you probably should add "Closes #1639" to the PR's description so it will automatically close it when merged.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a TON! Let's remove the name and good to go

bindings/python/src/models.rs Outdated Show resolved Hide resolved
@ArthurZucker ArthurZucker merged commit 6ade8c2 into huggingface:main Nov 1, 2024
11 checks passed
@ArthurZucker
Copy link
Collaborator

🎉 thanks for the PR 🤗

@@ -1030,7 +1030,7 @@ impl PyTokenizer {
fn encode_batch(
&self,
py: Python<'_>,
input: Vec<&PyAny>,
input: Bound<'_, PyList>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually this broke tokenizers because it only supports PyList now 😓 looking into a fix!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@diliop we need this to be probably PySequence, but I am not sure about the fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I just saw this :( Yeah I "assumed" that Vec was only accepting list from Python hence PyList but if you need tuples then PySequence is indeed the way to go. Py* will give you the benefit of the Python type check at almost zero cost as you mentioned in your PR so they should be preferred where possible. That said, there should be tests covering this so I can pick this up over the weekend and make sure anything else I changed is also covered.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool yeah was in a rush to fix this, forgot about the tests, super nice if you want to add them 🤗

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a quick stab at adding tests but from the looks of it I will need to spend a bit more time here to do this right since PySequence is not the right solution after all. The TL;DR here is that the reason why the change I made was not caught by tests is because the test covering this line was turned off (here). Turning the test back on will now fail on parsing ndarray as an additional input type. So encode_batch and encode_batch_fast need to support list, tuple and ndarray. I think I can support all 3 by changing the input arg type to something like Vec<Bound<'_, PyAny>> with some changes in PreTokenizedEncodeInput and TextEncodeInput. I will have some time to work on this over the weekend so hopefully I have fix for this soon.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1679 should be the fix. Still planning to add more tests but this my hope is would be enough to restore the previous functionality.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for diving into it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python 3.13 support
4 participants