R session crashing when reading a bunch of csv files when one of the files has no rows #1297

gorkang · 2021-09-07T11:46:02Z

This has been hard to reduce to a shareable reprex, but this is as simple as I could get it.

I have 3 csv files. Two of them have 1 or more rows of data, and one has no rows of data (only has column names).

If trying to read all of them with read_csv(files) (possible since readr 2.0.0?), the R session crashes. If I read them with map_df(files, read_csv), all is well.

These are the files used in the example: CSV4.zip

  library(readr)
  library(purrr)
  suppressPackageStartupMessages(library(dplyr))
  suppressPackageStartupMessages(library(here))
  
  
  files_giftcards = list.files(here::here("dev/BUG/CSV4/"), full.names = TRUE)
  
  DF12 = read_csv(files_giftcards[1:2], 
                 col_types = 
                   cols(
                     .default = col_character()
                   ))
  
  DF12
#> # A tibble: 1 × 1
#>   id          
#>   <chr>       
#> 1 rwsf7qgy2hsv
  
  DF2 = read_csv(files_giftcards[2], 
                 col_types = 
                   cols(
                     .default = col_character()
                   ))
  
  DF2 
#> # A tibble: 0 × 1
#> # … with 1 variable: id <chr>
  
  DF3 = read_csv(files_giftcards[3], 
                 col_types = 
                   cols(
                     .default = col_character()
                   ))
  
  DF3 
#> # A tibble: 1 × 1
#>   id          
#>   <chr>       
#> 1 rwsf7qgy2hsv
  

# FAILS -------------------------------------------------------------------
  
  DF13 = read_csv(files_giftcards[1:3], 
                 col_types = 
                   cols(
                     .default = col_character()
                   ))
  
  # DF13 # CRASHES R
  
  
  DF13 = read_csv(files_giftcards[1:3])
#> Rows: 2 Columns: 1
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (1): id
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
  
  
# Using map_df makes this work ----------------------------------------------
  
  DF13_2 = map_df(files_giftcards[1:3], read_csv, 
                  col_types = 
                    cols(
                      .default = col_character()
                    ))
  
  DF13_2 # WORKS!
#> # A tibble: 2 × 1
#>   id          
#>   <chr>       
#> 1 rwsf7qgy2hsv
#> 2 rwsf7qgy2hsv

^{Created on 2021-09-07 by the reprex package (v2.0.1)}

Session info

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.1.1 (2021-08-10)
#>  os       Ubuntu 20.04.3 LTS          
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       Atlantic/Canary             
#>  date     2021-09-07                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  archive       1.1.0   2021-08-05 [1] CRAN (R 4.1.0)
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.1.0)
#>  backports     1.2.1   2020-12-09 [1] CRAN (R 4.1.0)
#>  bit           4.0.4   2020-08-04 [1] CRAN (R 4.1.0)
#>  bit64         4.0.5   2020-08-30 [1] CRAN (R 4.1.0)
#>  cli           3.0.1   2021-07-17 [1] CRAN (R 4.1.0)
#>  crayon        1.4.1   2021-02-08 [1] CRAN (R 4.1.0)
#>  DBI           1.1.1   2021-01-15 [1] CRAN (R 4.1.0)
#>  digest        0.6.27  2020-10-24 [1] CRAN (R 4.1.0)
#>  dplyr       * 1.0.7   2021-06-18 [1] CRAN (R 4.1.0)
#>  ellipsis      0.3.2   2021-04-29 [1] CRAN (R 4.1.0)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.1.0)
#>  fansi         0.5.0   2021-05-25 [1] CRAN (R 4.1.0)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.1.0)
#>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.1.0)
#>  generics      0.1.0   2020-10-31 [1] CRAN (R 4.1.0)
#>  glue          1.4.2   2020-08-27 [1] CRAN (R 4.1.0)
#>  here        * 1.0.1   2020-12-13 [1] CRAN (R 4.1.0)
#>  highr         0.9     2021-04-16 [1] CRAN (R 4.1.0)
#>  hms           1.1.0   2021-05-17 [1] CRAN (R 4.1.0)
#>  htmltools     0.5.2   2021-08-25 [1] CRAN (R 4.1.1)
#>  knitr         1.33    2021-04-24 [1] CRAN (R 4.1.0)
#>  lifecycle     1.0.0   2021-02-15 [1] CRAN (R 4.1.0)
#>  magrittr      2.0.1   2020-11-17 [1] CRAN (R 4.1.0)
#>  pillar        1.6.2   2021-07-29 [1] CRAN (R 4.1.0)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.1.0)
#>  purrr       * 0.3.4   2020-04-17 [1] CRAN (R 4.1.0)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.1.1)
#>  readr       * 2.0.1   2021-08-10 [1] CRAN (R 4.1.1)
#>  reprex        2.0.1   2021-08-05 [1] CRAN (R 4.1.0)
#>  rlang         0.4.11  2021-04-30 [1] CRAN (R 4.1.0)
#>  rmarkdown     2.10    2021-08-06 [1] CRAN (R 4.1.1)
#>  rprojroot     2.0.2   2020-11-15 [1] CRAN (R 4.1.0)
#>  rstudioapi    0.13    2020-11-12 [1] CRAN (R 4.1.0)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.1.0)
#>  stringi       1.7.4   2021-08-25 [1] CRAN (R 4.1.1)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.1.0)
#>  styler        1.5.1   2021-07-13 [1] CRAN (R 4.1.0)
#>  tibble        3.1.4   2021-08-25 [1] CRAN (R 4.1.1)
#>  tidyselect    1.1.1   2021-04-30 [1] CRAN (R 4.1.0)
#>  tzdb          0.1.2   2021-07-20 [1] CRAN (R 4.1.0)
#>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.1.0)
#>  vctrs         0.3.8   2021-04-29 [1] CRAN (R 4.1.0)
#>  vroom         1.5.4   2021-08-05 [1] CRAN (R 4.1.1)
#>  withr         2.4.2   2021-04-18 [1] CRAN (R 4.1.0)
#>  xfun          0.25    2021-08-06 [1] CRAN (R 4.1.1)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.1.0)
#> 
#> [1] /home/emrys/R/x86_64-pc-linux-gnu-library/4.1
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/lib/R/site-library
#> [4] /usr/lib/R/library

The text was updated successfully, but these errors were encountered:

thinkelman-ESA · 2021-09-18T18:28:16Z

I just encountered this same issue. You can download the file that caused the problem for me from here. The problem code was simply readr::read_csv("Example.csv"). I encountered the problem on Windows 10, R 4.1.0, readr 2.0.1.

ShinyFabio · 2021-09-22T14:08:24Z

Hi gorkang I had a similar problem with empty files and R that frezees. #1305 check it. I solved setting a number (even a very big number like 10e6) in the n_max parameter.

gorkang · 2021-09-23T08:52:26Z

Thanks ShinyFabio for the suggestion. Sadly, using the n_max parameter with 10e6 does not solve the problem.

Using the files attached in the first message, the R session crashes when doing:

  library(readr)
  suppressPackageStartupMessages(library(dplyr))
  suppressPackageStartupMessages(library(here))
  
  
  files_giftcards = list.files(here::here("dev/BUG/CSV4/"), full.names = TRUE)

  DF13 = read_csv(files_giftcards[1:3], n_max = 10e6,
                 col_types = 
                   cols(
                     .default = col_character()
                   ))
  
  DF13 # CRASHES R

Update: Tried with the dev version 2.0.1.9000 and the problem is still there.

Adafede · 2021-09-27T08:04:54Z

Having a similar issue here, with:

Error: C stack usage 7976356 is too close to the limit

yogat3ch · 2021-10-05T19:25:22Z

A single line CSV causes readr 2.0.2 (from CRAN) to soft hang the R session too. I replicated the file using write and read it with read_csv and it worked fine but reading the file as is just hangs the session. Not sure if it has something to do with line endings or what?

dominikbach · 2021-10-15T10:05:42Z

I encounter the same issue: R freezes when reading a file with only one column name but no entries with read_csv("emtpytestfile.csv").

This happens on Windows 10 with tidyverse 1.3.1 and R 4.0.5 but using the same file and code, not Mac OS 11.5.2 with tidyverse 1.3.1 and R 4.1.1.

It was solved by setting n_max to a finite number.

yogat3ch · 2021-10-19T14:23:15Z

Interesting, thanks for finding a workaround @dominikbach !

eihwood · 2021-10-25T19:00:41Z

I am encountering the same issue:
OS 11.6 Big Sur
R Version 4.1.1
readr Version: 2.0.2

Reading in 18 csv files, one of which has column headers but no rows of data. R freezes and must crash/restart. Also happens with vroom version 1.5.5

jimhester · 2021-11-05T18:12:56Z

There were two separate issues here.

The first was an issue with windows line endings containing only one line and an interaction with the vroom progress bar that caused a hang in the R process.

The second issue was a crash due to invalid indexing when reading multiple files and one of the input files was empty or had only a header line.

Both issues should now be fixed in the next released version of vroom.

yogat3ch · 2021-11-10T13:09:36Z

Thanks for solving this @jimhester!

# vroom 1.5.7 * Jenny Bryan is now the official maintainer. * Fix uninitialized bool detected by CRAN's UBSAN check (tidyverse/vroom#386) * Fix buffer overflow when trying to parse an integer field that is over 64 characters long (tidyverse/readr#1326) * Fix subset indexing when indexes span a file boundary multiple times (#383) # vroom 1.5.6 * `vroom(col_select=)` now works if `col_names = FALSE` as intended (#381) * `vroom(n_max=)` now correctly handles cases when reading from a connection and the file does _not_ end with a newline (tidyverse/readr#1321) * `vroom()` no longer issues a spurious warning when the parsing needs * to be restarted due to the presence of embedded newlines * (tidyverse/readr#1313) Fix performance * issue when materializing subsetted vectors (#378) * `vroom_format()` now uses the same internal multi-threaded code as `vroom_write()`, improving its performance in most cases (#377) * `vroom_fwf()` no longer omits the last line if it does _not_ end with a newline (tidyverse/readr#1293) * Empty files or files with only a header line and no data no longer cause a crash if read with multiple files (tidyverse/readr#1297) * Files with a header but no contents, or a empty file if `col_names = FALSE` no longer cause a hang when `progress = TRUE` (tidyverse/readr#1297) * Commented lines with comments at the end of lines no longer hang R (tidyverse/readr#1309) * Comment lines containing unpaired quotes are no longer treated as unterminated quotations (tidyverse/readr#1307) * Values with only a `Inf` or `NaN` prefix but additional data afterwards, like `Inform` or no longer inappropriately guessed as doubles (tidyverse/readr#1319) * Time types now support `%h` format to denote hour durations greater than 24, like readr (tidyverse/readr#1312) * Fix performance issue when materializing subsetted vectors (#378) # vroom 1.5.5 * `vroom()` now supports files with only carriage return newlines (`\r`). (#360, tidyverse/readr#1236) * `vroom()` now parses single digit datetimes more consistently as readr has done (tidyverse/readr#1276) * `vroom()` now parses `Inf` values as doubles (tidyverse/readr#1283) * `vroom()` now parses `NaN` values as doubles (tidyverse/readr#1277) * `VROOM_CONNECTION_SIZE` is now parsed as a double, which supports scientific notation (#364) * `vroom()` now works around specifying a `\n` as the delimiter (#365, tidyverse/dplyr#5977) * `vroom()` no longer crashes if given a `col_name` and `col_type` both less than the number of columns (tidyverse/readr#1271) * `vroom()` no longer hangs if given an empty value for `locale(grouping_mark=)` (tidyverse/readr#1241) * Fix performance regression when guessing with large numbers of rows (tidyverse/readr#1267)

jimhester added the bug an unexpected problem or unintended behavior label Sep 20, 2021

jimhester mentioned this issue Sep 22, 2021

Bug after package update in read_delim with empty file #1305

Closed

jimhester mentioned this issue Oct 20, 2021

read_csv causes R to hang #1310

Closed

jimhester closed this as completed in tidyverse/vroom@9688370 Nov 5, 2021

dfv-ms added a commit to dfv-ms/piwikproR that referenced this issue Nov 12, 2021

Require readr 2.1.0, fixes #7, see tidyverse/readr#1297

cca806a

AB-Kent mentioned this issue Dec 9, 2021

Merge seg files bug akoyabio/phenoptrReports#55

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R session crashing when reading a bunch of csv files when one of the files has no rows #1297

R session crashing when reading a bunch of csv files when one of the files has no rows #1297

gorkang commented Sep 7, 2021

thinkelman-ESA commented Sep 18, 2021

ShinyFabio commented Sep 22, 2021

gorkang commented Sep 23, 2021 •

edited

Loading

Adafede commented Sep 27, 2021

yogat3ch commented Oct 5, 2021 •

edited

Loading

dominikbach commented Oct 15, 2021 •

edited

Loading

yogat3ch commented Oct 19, 2021

eihwood commented Oct 25, 2021

jimhester commented Nov 5, 2021

yogat3ch commented Nov 10, 2021

R session crashing when reading a bunch of csv files when one of the files has no rows #1297

R session crashing when reading a bunch of csv files when one of the files has no rows #1297

Comments

gorkang commented Sep 7, 2021

thinkelman-ESA commented Sep 18, 2021

ShinyFabio commented Sep 22, 2021

gorkang commented Sep 23, 2021 • edited Loading

Adafede commented Sep 27, 2021

yogat3ch commented Oct 5, 2021 • edited Loading

dominikbach commented Oct 15, 2021 • edited Loading

yogat3ch commented Oct 19, 2021

eihwood commented Oct 25, 2021

jimhester commented Nov 5, 2021

yogat3ch commented Nov 10, 2021

gorkang commented Sep 23, 2021 •

edited

Loading

yogat3ch commented Oct 5, 2021 •

edited

Loading

dominikbach commented Oct 15, 2021 •

edited

Loading