Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance when reading a large number of smaller files #416

Closed
ghaarsma opened this issue Jun 7, 2016 · 1 comment
Closed

Performance when reading a large number of smaller files #416

ghaarsma opened this issue Jun 7, 2016 · 1 comment
Assignees
Labels
feature a feature request or enhancement

Comments

@ghaarsma
Copy link
Contributor

ghaarsma commented Jun 7, 2016

This is not a bug, but we noticed a significant decrease in performance with readr between versions 0.1.0 and 0.2.2 when reading in a large set of smaller files. Turns out default_local is quite slow compared to the other fast readr functions.

Perhaps the below example can be captured somewhere in the documentation.

x <- paste0(paste0(1:1000,',',rep(letters,length=1000)),collapse = '\n')
# Version 0.1.0
t1 <- system.time(l <- lapply(rep(x,1000),FUN = read_lines))
#user  system elapsed
#0.19    0.03    0.22

That is fast!, now the same with readr 0.2.2

# Version 0.2.2
t2 <- system.time(l <- lapply(rep(x,1000),FUN = read_lines))
#user  system elapsed 
#8.67   19.06   27.86 

That is over 100 times slower. The way to fix this is by making a single call to default_local

t3 <- system.time({locale=default_locale();l <- lapply(rep(x,1000),FUN = read_lines,locale=locale)})
#user  system elapsed 
#0.17    0.01    0.19 

Back to the old readr 0.1.0 performance (perhaps even a hair faster) Nice!

@hadley hadley added feature a feature request or enhancement ready labels Jun 9, 2016
@hadley
Copy link
Member

hadley commented Jun 9, 2016

@jimhester this should just be a matter of memoising default_locale()

jimhester added a commit to jimhester/readr that referenced this issue Jun 15, 2016
@jimhester jimhester self-assigned this Jun 15, 2016
@lock lock bot locked and limited conversation to collaborators Sep 25, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants