You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
New string dtype in pandas (backed by Python objects) (this was not strictly done with CZI though).
Additionally, Maarten Breddels is adding string algorithms to the Apache Arrow library (https://issues.apache.org/jira/browse/ARROW-555 and linked issues). At this point, some kernels already have been implemented (upper and lower, several is_.. predicates). Implementing fast, optimized algorithms in Arrow ensures that those can not only be used by pandas, but can be reused by the broader data science ecosystem.
The text was updated successfully, but these errors were encountered:
(using this issue here to dump some draft content)
The funding period has not ended yet, so this is only an interim/provisional overview
Library Maintenance
...
Extension Types
pd.concat
for internal extension dtype, and also enable external projects to control the behaviour for their ExtensionArray as well. (ENH: general concat with ExtensionArrays through find_common_type pandas-dev/pandas#33607, BUG: Fix concat of frames with extension types (no reindexed columns) pandas-dev/pandas#34339, ENH: concat of nullable int + bool preserves int dtype pandas-dev/pandas#34985, BUG: Fixed concat with reindex and extension types pandas-dev/pandas#33522)factorize
,sum
,prod
,min
,max
(ENH/PERF: use mask in factorize for nullable dtypes pandas-dev/pandas#33064, PERF: masked ops for reductions (min/max) pandas-dev/pandas#33261, PERF: masked ops for reductions (sum) pandas-dev/pandas#30982)(not sure if I should link only to my PRs, as some related PRs have also been done not on CZI pay)
https://github.com/pandas-dev/pandas/pulls?q=is%3Aclosed+is%3Apr+author%3Ajorisvandenbossche+label%3AExtensionArray+
https://github.com/pandas-dev/pandas/pulls?q=is%3Aclosed+is%3Apr+author%3Ajorisvandenbossche+label%3A%22NA+-+MaskedArrays%22
Native String Data Type
New
string
dtype in pandas (backed by Python objects) (this was not strictly done with CZI though).Additionally, Maarten Breddels is adding string algorithms to the Apache Arrow library (https://issues.apache.org/jira/browse/ARROW-555 and linked issues). At this point, some kernels already have been implemented (
upper
andlower
, severalis_..
predicates). Implementing fast, optimized algorithms in Arrow ensures that those can not only be used by pandas, but can be reused by the broader data science ecosystem.The text was updated successfully, but these errors were encountered: