-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read_csv() incorrectly reads character vectors if all strings begin with "Inf" (e.g. "Inform") #1319
Labels
bug
an unexpected problem or unintended behavior
Comments
Thank you for opening the issue and for supplying a reproducible example, it is a big help! This should be fixed in the next release of vroom. |
netbsd-srcmastr
pushed a commit
to NetBSD/pkgsrc
that referenced
this issue
May 1, 2022
# vroom 1.5.7 * Jenny Bryan is now the official maintainer. * Fix uninitialized bool detected by CRAN's UBSAN check (tidyverse/vroom#386) * Fix buffer overflow when trying to parse an integer field that is over 64 characters long (tidyverse/readr#1326) * Fix subset indexing when indexes span a file boundary multiple times (#383) # vroom 1.5.6 * `vroom(col_select=)` now works if `col_names = FALSE` as intended (#381) * `vroom(n_max=)` now correctly handles cases when reading from a connection and the file does _not_ end with a newline (tidyverse/readr#1321) * `vroom()` no longer issues a spurious warning when the parsing needs * to be restarted due to the presence of embedded newlines * (tidyverse/readr#1313) Fix performance * issue when materializing subsetted vectors (#378) * `vroom_format()` now uses the same internal multi-threaded code as `vroom_write()`, improving its performance in most cases (#377) * `vroom_fwf()` no longer omits the last line if it does _not_ end with a newline (tidyverse/readr#1293) * Empty files or files with only a header line and no data no longer cause a crash if read with multiple files (tidyverse/readr#1297) * Files with a header but no contents, or a empty file if `col_names = FALSE` no longer cause a hang when `progress = TRUE` (tidyverse/readr#1297) * Commented lines with comments at the end of lines no longer hang R (tidyverse/readr#1309) * Comment lines containing unpaired quotes are no longer treated as unterminated quotations (tidyverse/readr#1307) * Values with only a `Inf` or `NaN` prefix but additional data afterwards, like `Inform` or no longer inappropriately guessed as doubles (tidyverse/readr#1319) * Time types now support `%h` format to denote hour durations greater than 24, like readr (tidyverse/readr#1312) * Fix performance issue when materializing subsetted vectors (#378) # vroom 1.5.5 * `vroom()` now supports files with only carriage return newlines (`\r`). (#360, tidyverse/readr#1236) * `vroom()` now parses single digit datetimes more consistently as readr has done (tidyverse/readr#1276) * `vroom()` now parses `Inf` values as doubles (tidyverse/readr#1283) * `vroom()` now parses `NaN` values as doubles (tidyverse/readr#1277) * `VROOM_CONNECTION_SIZE` is now parsed as a double, which supports scientific notation (#364) * `vroom()` now works around specifying a `\n` as the delimiter (#365, tidyverse/dplyr#5977) * `vroom()` no longer crashes if given a `col_name` and `col_type` both less than the number of columns (tidyverse/readr#1271) * `vroom()` no longer hangs if given an empty value for `locale(grouping_mark=)` (tidyverse/readr#1241) * Fix performance regression when guessing with large numbers of rows (tidyverse/readr#1267)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thank you for the excellent readr package! I think I'm seeing unexpected behavior from read_csv() though. When trying to read a tibble containing a character vector of strings that all begin with "Inf" (e.g. "Inform", "Information"), read_csv() incorrectly reads it as a numeric Inf, instead of correctly reading it as a string. The base read.csv() correctly reads it as a string though. If the character vector contains at least one string that does not begin with "Inf" however (e.g. "Indigo"), then read_csv() will correctly read the vector as a string. Read_csv() will also correctly read the vector as a string if the col_types argument specifies it as a character vector, but that requires manual checks/edits.
It seems problematic to have to continually check all character vectors first and then manually specify col_types if all the strings happen to begin with "Inf". Is it possible to please update read_csv() to handle these kind of vectors in the same way as read.csv()?
Thanks very much, and apologies if I'm just missing something. (also posted on Stack Overflow: https://stackoverflow.com/questions/69680431/r-readrread-csv-incorrectly-reads-character-vectors-if-all-strings-begin-with)
Created on 2021-10-22 by the reprex package (v2.0.1)
The text was updated successfully, but these errors were encountered: