Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Print column specification with write_csv #895

Closed
bschneidr opened this issue Oct 10, 2018 · 10 comments
Closed

[Feature Request] Print column specification with write_csv #895

bschneidr opened this issue Oct 10, 2018 · 10 comments
Labels
feature a feature request or enhancement
Milestone

Comments

@bschneidr
Copy link

When a csv created by write_csv() is read into another script, it would be helpful to have the column specification already produced as a side-effect of the write_csv() call that produced it.

Here's one way that could look.

After write_csv() is used, a message is printed with the column specification matching the datatypes of the dataframe that was input to write_csv().

library(tibble)
library(lubridate)
library(readr)

df <- data_frame(ID = c("01", "42"),
                 Date = ymd(c("2018-10-31", "2010-10-30")))

write_csv(df, path = "Data.csv")

#> To read "Data.csv", you can use the following column specification:
#> cols(
#>      ID = col_character(),
#>      Date = col_date(format = "")
#>      )

This would make it easier to integrate write_csv() into a data-processing pipeline that plays nicely with Git.

@jimhester jimhester added the feature a feature request or enhancement label Nov 13, 2018
@jimhester jimhester added this to the backlog milestone Nov 15, 2018
@mpettis
Copy link

mpettis commented Mar 19, 2019

I'd like to second this request, but add that it would be nice to not just display the column spec, but return them as first-class objects of some sort. I was thinking of something like:

df <- tibble(a=1L, b=1.0, c="a", d=TRUE, e=ymd_hms("2019-03-19T13:15:18Z"), f=ymd("2019-03-19"))

# Proposed functions:
# gen_spec(df)
#< cols(
#<   a=col_integer(),
#<   b=col_double(),
#<   c=col_character(),
#<   d=col_logical(),
#<   e=col_datetime(),
#<   f=col_date()
#< )
#<
#< gen_spec_short(df)
#< "idclTD"

Thank you for the work and consideration.

@mpettis
Copy link

mpettis commented Mar 19, 2019

I have also asked this question (as to how others have done this) and posted my local solution here: https://stackoverflow.com/q/55249599/1022967

@at062084
Copy link

at062084 commented Mar 28, 2019

Proposed function / workflow to migrate from base R data frames to readr

  1. prepare migration to readr::read_*
    df <- some base R data frame
    ' new method to extract current col_types
    df.col_types <- spec_extract(df)
    saveRDS(df.col_types, "df.col_types.rds")
    write_delim(df, path="df.csv")
  2. read
    df <- read_delim("df.csv", col_types = readRDS("df.col_types.rds"))

@jimhester
Copy link
Collaborator

jimhester commented May 3, 2019

So you can now generate a column specification from any data.frame with as.col_spec(df) and also optionally convert it to the concise string representation with as.character(), which should be all you need to do this fairly easily.

Currently I don't think it makes sense to print the spec out by default, but we can revisit it in a separate issue if needed.

@bschneidr
Copy link
Author

This is fantastic. Thank you!
Since the as.col_type(df) function call is so simple, it seems like there's little marginal value in having write_csv() automatically print the column specification as a side effect.

@Dulani
Copy link

Dulani commented Sep 26, 2019

@jimhester You said:

So you can now generate a column specification from any data.frame with as.col_type(df) and also optionally convert it to the concise string representation with as.character(), which should be all you need to do this fairly easily.

I see that there is a new as.col_spec(df) function, but I can't find an as.col_type(df) anywhere in the current (1.3.1) version or by searching this repository on Github. What am I missing?

What I'd really like to do is what you suggest in your post. Take a data frame or its column specification and automatically generate the concise string representation of the column specification.

In other words, something like this:
as.col_type(mtcars) %>% as.character(). Is this doable using existing functionality in readr (as you suggest?)

@jimhester
Copy link
Collaborator

It was a typo, I meant as.col_spec(df).

@Dulani
Copy link

Dulani commented Sep 28, 2019

Thanks @jimhester! However, as.col_spec(mtcars) %>% as.character() does not produce a concise string representation. The first part as.col_spec(mtcars) produces the error: Error: col_types must be NULL, a list or a string.

The example in the documentation does convert a concise string representation into a col_spec: as.col_spec("cccnnn"), but I still haven't figured out how to take an existing data frame and generate a concise specification from it. Does that functionality already exist within readr or should it be an "open" feature request?

@jimhester
Copy link
Collaborator

Yes it does, you need to use the development version of readr.

as.character(readr::as.col_spec(mtcars))
#> [1] "ddddddddddd"
packageVersion("readr")
#> [1] '1.3.1.9000'

Created on 2019-09-30 by the reprex package (v0.3.0)

@lock
Copy link

lock bot commented Apr 2, 2020

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Apr 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants