You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Apologies for (possibly) submitting a support request disguised as an issue report, but I am failing to get the class option to work and can't figure out what is going wrong. Here is the reprex:
I appreciate the concern that one does not want to restore a broken object. One possibility for allowing the storage of different types of data frames would be to have an accept_classes (or simply classes) parameter, which would only allow reading (and writing?) objects if any classes were included in the accept_classes list, punting the responsibility to the user. Trying to read/write objects with unlisted class entries would result in an error. A default value of c("tbl_df", "tbl", "data.frame") would allow some very common use cases (i.e. base data frames and tibbles) to be transparently round-tripped. Interaction with class could work by some heuristic about priorities, such as:
On write, add class data if accept_classes is set
On read, if accept_classes is set and class data is found when reading a parquet file, use that
If no class data is found when reading a parquet file, use value from class
If one or the other params are null/NA, use the other
If both are null/NA, error out
Would it make sense to reserve ... as the first parameter in parquet_options() to force the use of name-based parameters and prevent location-based usage? It feels like parquet_options() might accumulate a lot of parameters over time, and having the flexibility to group related parameters would help with documentation, but reordering them would break existing location-based usage. Putting ... first would of course break backwards compatibility now, but ensure that later reordering of parameters would from then on be backwards compatible. Since it is still early days for nanoparquet I think it might be worthwhile trade-off.
I wonder why the default value of class is c("tbl", "data.frame") and not c("tbl_df", "tbl", "data.frame")? It seems to me that tbl is more of an "interface" than a class, since all actual instances of tbls that I have encountered previously are actually also other classes such as either tbl_df or more complex types (such as those found in dbplyr). And the returned object is structurally a tbl_df, right, so it would be useful for any functions taking it to be able to use the most specific class information?
Originally posted by @torfason in #82
The text was updated successfully, but these errors were encountered: