Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix df.count() behavior to perform count_rows instead #1996

Closed
jaychia opened this issue Mar 8, 2024 · 2 comments · Fixed by #2653
Closed

Fix df.count() behavior to perform count_rows instead #1996

jaychia opened this issue Mar 8, 2024 · 2 comments · Fixed by #2653
Assignees
Labels
p1 Important to tackle soon, but preemptable by p0

Comments

@jaychia
Copy link
Contributor

jaychia commented Mar 8, 2024

Is your feature request related to a problem? Please describe.

When users run df.count(), they often expect df.count_rows() behavior. Instead, df.count() will perform a count aggregation on every column, which is not the intended behavior.

Interestingly, Spark returns an int for the .count() operation, which is perhaps most intuitive.

See Spark behavior: https://saturncloud.io/blog/counting-rows-in-pyspark-dataframes-a-comprehensive-guide/#counting-rows-using-the-count-function

We should discuss the intended behavior for our df.count() operation and implement a fix.

@jaychia jaychia added the p1 Important to tackle soon, but preemptable by p0 label Mar 8, 2024
@jaychia jaychia added this to Daft-OSS Mar 8, 2024
@github-project-automation github-project-automation bot moved this to On Deck in Daft-OSS Mar 8, 2024
@jaychia jaychia removed the status in Daft-OSS Jun 28, 2024
@universalmind303
Copy link
Contributor

universalmind303 commented Aug 14, 2024

Coming from other dataframe libraries, this has tripped me up several times.

+1 to changing the behavior

@samster25
Copy link
Member

Yeah it's a common confusion point. I think its finally time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p1 Important to tackle soon, but preemptable by p0
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants