-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Generate descriptive statistics for a ibis table #8459
Comments
I would look at implementing this for a few backends. We explicitly chose not to add the quantiles because most of our backends do not support a quantile reduction. |
@jitingxu1 It might be helpful if you explore the specifics here. Since Ibis doesn't have anything like an object type, there are unanswered questions that need to be answered to achieve something like describe:
|
Sorry for being the "well, in R..." but, well, in R we have a neat package called skimr that produces some nice output and does some things that I think are smart. Here's an example:
What I like about it is that it:
However it does seem to convert complex types like dates into numbers which is still useful but harder to interpret. It's also not possible to see what it'll do with list columns because Code you can run to produce the above output# install.packages(c("dplyr", "dbplyr", "RSQLite", "skimr"))
library(dplyr, warn.conflicts = FALSE)
library(dbplyr)
nrows <- 1000
example_df <- data.frame(
numbers = sample(seq_along(LETTERS), nrows, replace = TRUE),
letters = sample(LETTERS, nrows, replace = TRUE),
dates = sample(seq(as.Date("1970-01-01"), as.Date("2024-01-01"), length.out = length(LETTERS)), nrows, replace = TRUE))
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, example_df)
tbl(con, "example_df") |>
skimr::skim() Example with list columns (duckdb)
|
this could also be useful with #8369 -- I'd personally love to see a package like |
Is your feature request related to a problem?
I was a panda user, I used pandas dataframe
describe()
function a lot to get a sense of the data. I found ibis have theinfo()
function, but it does not return enough information.Describe the solution you'd like
Option 1:
Pandas dataframe describe -describe()
Analyzes both numeric and object series,
Numeric columns:
For object data (e.g. strings or timestamps), the result’s index will include count, unique, top, and freq. The top is the most common value. The freq is the most common value’s frequency. Timestamps also include the first and last items.
What version of ibis are you running?
8.0.0
What backend(s) are you using, if any?
DuckDB
Code of Conduct
The text was updated successfully, but these errors were encountered: