Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overview of impact / achievements of the CZI grant (round 1) #6

Open
jorisvandenbossche opened this issue Jul 15, 2020 · 1 comment

Comments

@jorisvandenbossche
Copy link
Collaborator

(using this issue here to dump some draft content)

The funding period has not ended yet, so this is only an interim/provisional overview

Library Maintenance

...

Extension Types

(not sure if I should link only to my PRs, as some related PRs have also been done not on CZI pay)

https://github.com/pandas-dev/pandas/pulls?q=is%3Aclosed+is%3Apr+author%3Ajorisvandenbossche+label%3AExtensionArray+
https://github.com/pandas-dev/pandas/pulls?q=is%3Aclosed+is%3Apr+author%3Ajorisvandenbossche+label%3A%22NA+-+MaskedArrays%22

Native String Data Type

New string dtype in pandas (backed by Python objects) (this was not strictly done with CZI though).

Additionally, Maarten Breddels is adding string algorithms to the Apache Arrow library (https://issues.apache.org/jira/browse/ARROW-555 and linked issues). At this point, some kernels already have been implemented (upper and lower, several is_.. predicates). Implementing fast, optimized algorithms in Arrow ensures that those can not only be used by pandas, but can be reused by the broader data science ecosystem.

@TomAugspurger
Copy link
Owner

Thanks. I got a little behind last week. Focusing on this today and tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants