Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VOTE: Voting Issue for PDEP-10: Add pyarrow as a required dependency #54106

Closed
Dr-Irv opened this issue Jul 13, 2023 · 21 comments
Closed

VOTE: Voting Issue for PDEP-10: Add pyarrow as a required dependency #54106

Dr-Irv opened this issue Jul 13, 2023 · 21 comments
Labels
Vote Used to track votes issues for PDEPs

Comments

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Jul 13, 2023

This is the issue where we will track votes for PDEP-10.

Pull request with discussion is here: #52711

Rendered PDEP for easy reading: https://github.com/pandas-dev/pandas/blob/2db0037b10aaa14994b307cbe64ff82b7c1dc260/web/pandas/pdeps/0010-required-pyarrow-dependency.md

Cast your vote in a comment below.

Voting will close in 15 days, i.e., on July 28.

@pandas-dev/pandas-core

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Jul 13, 2023

+1

(no reason necessary, but nonetheless: after recent discussions and clarifications, I'm sold. Mainly based on superseding object dtype in string, list, and struct dtypes, which would be a real and immediate benefit to users)

@phofl
Copy link
Member

phofl commented Jul 13, 2023

+1

2 similar comments
@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 13, 2023

+1

@rohanjain101

This comment was marked as resolved.

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Jul 13, 2023

Thanks for voting, but just as a reminder, this is meant to be limited to members of pandas-core. Not an issue, our fault for not having made this clearer 😄

Locking the conversation then, I think that then only core members will be able to comment

@pandas-dev pandas-dev locked and limited conversation to collaborators Jul 13, 2023
@jbrockmendel
Copy link
Member

+1

4 similar comments
@lithomas1
Copy link
Member

+1

@bashtage
Copy link
Contributor

+1

@mroeschke
Copy link
Member

+1

@WillAyd
Copy link
Member

WillAyd commented Jul 13, 2023

+1

@noatamir noatamir added the Vote Used to track votes issues for PDEPs label Jul 13, 2023
@lukemanley
Copy link
Member

+1

3 similar comments
@jreback
Copy link
Contributor

jreback commented Jul 14, 2023

+1

@rhshadrach
Copy link
Member

+1

@simonjayhawkins
Copy link
Member

+1

@jorisvandenbossche
Copy link
Member

+1

Longer version:
Big +1 on using pyarrow for string data by default.
+0 on "requiring" pyarrow for that, because I personally think we could relatively easily make pyarrow only a "default" dependency (i.e. users get it by default when pip/conda installing pandas) while still allowing to run pandas without having pyarrow installed (as long as we still have alternative data types for the arrow-based ones, that feels like a better first step).
But I know I disagree about the impact on development while I am also not doing much day-to-day maintenance myself, so I'll give more weight to the string feature argument ;)

@alimcmaster1
Copy link
Member

+1

@attack68
Copy link
Contributor

-1

I have waited until the close of the vote to post because I did not want to submit a vote that would cause this to not be accepted. This vote has already been won and therefore my addition does not change the outcome.

In the discussion there are a number of references that suggest pyarrow makes life better for (possibly) 99% of users but there were a number of comments made by people who are not able to vote that were not as keen.

PyArrow dependency breaks my own workflow.
I use pandas, and have for many years, within microservices web APIs that peform numpy tensor calculations, usually 2-4 dimensions <200 elements in length, primarily in a financial calculation/optimization context. I use pandas for its indexing capability to dynamically arrange and slice these arrays for a better UI display.
I do not use strings and do not need pyArrow for large data, and its addition prevents my microservices provider from building becuase it takes up too much space - I am already using 80% disk space.

I think it is reasonable for this vote to be well accepted but to have on record some objection which I suspect we might all agree reflects some small proportion of users.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 26, 2023

@pandas-dev/pandas-core Voting on this PDEP closes in a couple of days.

@twoertwein
Copy link
Member

0

I understand the benefits but also the risks. I don't feel comfortable voting +1 or -1 but I want to help reach the minimum quorum (if we haven't already reached it).

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 28, 2023

Final vote tally Yes/Abstain/No: 14/1/1

PDEP-10 is now approved.

@phofl
Copy link
Member

phofl commented Aug 9, 2023

Closing

@phofl phofl closed this as completed Aug 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Vote Used to track votes issues for PDEPs
Projects
None yet
Development

No branches or pull requests