Skip to content
This repository has been archived by the owner on Sep 9, 2022. It is now read-only.

ft_get on Windows #165

Closed
maelle opened this issue Aug 9, 2018 · 10 comments
Closed

ft_get on Windows #165

maelle opened this issue Aug 9, 2018 · 10 comments
Milestone

Comments

@maelle
Copy link
Contributor

maelle commented Aug 9, 2018

Maybe it is my fault and I wrote a wrong workflow?

library("magrittr")

results <- fulltext::ft_search("black-headed gull")%>%
  fulltext::ft_links() %>%
  fulltext::ft_get() %>%
  fulltext::ft_collect() %>%
  fulltext::ft_chunks("title") %>%
  fulltext::ft_tabularize()
#> Error: 'C:\Users\Maelle\AppData\Local\Cache/R/fulltext/10_1371_journal_pone_0038256.xml' does not exist.

Created on 2018-08-09 by the reprex package (v0.2.0).

Session info
devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.5.0 (2018-04-23)
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  tz       Europe/Paris                
#>  date     2018-08-09
#> Packages -----------------------------------------------------------------
#>  package    * version    date       source                            
#>  aRxiv        0.5.16     2017-04-28 CRAN (R 3.5.1)                    
#>  assertthat   0.2.0      2017-04-11 CRAN (R 3.5.0)                    
#>  backports    1.1.2      2017-12-13 CRAN (R 3.5.0)                    
#>  base       * 3.5.0      2018-04-23 local                             
#>  bibtex       0.4.2      2017-06-30 CRAN (R 3.5.0)                    
#>  bindr        0.1.1      2018-03-13 CRAN (R 3.5.0)                    
#>  bindrcpp     0.2.2      2018-03-29 CRAN (R 3.5.0)                    
#>  colorspace   1.3-2      2016-12-14 CRAN (R 3.5.1)                    
#>  compiler     3.5.0      2018-04-23 local                             
#>  crayon       1.3.4      2017-09-16 CRAN (R 3.5.0)                    
#>  crul         0.6.0      2018-07-10 CRAN (R 3.5.0)                    
#>  curl         3.2        2018-03-28 CRAN (R 3.5.0)                    
#>  datasets   * 3.5.0      2018-04-23 local                             
#>  devtools     1.13.5     2018-02-18 CRAN (R 3.5.0)                    
#>  digest       0.6.15     2018-01-28 CRAN (R 3.5.0)                    
#>  dplyr        0.7.6      2018-06-29 CRAN (R 3.5.1)                    
#>  evaluate     0.10.1     2017-06-24 CRAN (R 3.5.0)                    
#>  fulltext     1.0.1      2018-02-07 CRAN (R 3.5.1)                    
#>  ggplot2      3.0.0      2018-07-03 CRAN (R 3.5.1)                    
#>  glue         1.3.0      2018-07-17 CRAN (R 3.5.0)                    
#>  graphics   * 3.5.0      2018-04-23 local                             
#>  grDevices  * 3.5.0      2018-04-23 local                             
#>  grid         3.5.0      2018-04-23 local                             
#>  gtable       0.2.0      2016-02-26 CRAN (R 3.5.0)                    
#>  hoardr       0.2.0      2017-05-10 CRAN (R 3.5.0)                    
#>  htmltools    0.3.6.9001 2018-06-16 Github (rstudio/htmltools@3aee819)
#>  httpcode     0.2.0      2016-11-14 CRAN (R 3.5.0)                    
#>  httpuv       1.4.4.1    2018-06-18 CRAN (R 3.5.0)                    
#>  httr         1.3.1      2017-08-20 CRAN (R 3.5.0)                    
#>  jsonlite     1.5        2017-06-01 CRAN (R 3.5.0)                    
#>  knitr        1.20       2018-02-20 CRAN (R 3.5.0)                    
#>  later        0.7.3      2018-06-08 CRAN (R 3.5.0)                    
#>  lazyeval     0.2.1      2017-10-29 CRAN (R 3.5.0)                    
#>  lubridate    1.7.4      2018-04-11 CRAN (R 3.5.0)                    
#>  magrittr   * 1.5        2014-11-22 CRAN (R 3.5.0)                    
#>  memoise      1.1.0      2017-04-21 CRAN (R 3.5.0)                    
#>  methods    * 3.5.0      2018-04-23 local                             
#>  microdemic   0.3.0      2018-03-29 CRAN (R 3.5.1)                    
#>  mime         0.5        2016-07-07 CRAN (R 3.5.0)                    
#>  miniUI       0.1.1.1    2018-05-18 CRAN (R 3.5.0)                    
#>  munsell      0.5.0      2018-06-12 CRAN (R 3.5.0)                    
#>  pillar       1.3.0      2018-07-14 CRAN (R 3.5.1)                    
#>  pkgconfig    2.0.1      2017-03-21 CRAN (R 3.5.0)                    
#>  plyr         1.8.4      2016-06-08 CRAN (R 3.5.0)                    
#>  promises     1.0.1      2018-04-13 CRAN (R 3.5.0)                    
#>  purrr        0.2.5      2018-05-29 CRAN (R 3.5.0)                    
#>  R6           2.2.2      2017-06-17 CRAN (R 3.5.0)                    
#>  rappdirs     0.3.1      2016-03-28 CRAN (R 3.5.0)                    
#>  Rcpp         0.12.18    2018-07-23 CRAN (R 3.5.0)                    
#>  rcrossref    0.8.0      2017-12-03 CRAN (R 3.4.3)                    
#>  rentrez      1.2.1      2018-03-05 CRAN (R 3.5.1)                    
#>  reshape2     1.4.3      2017-12-11 CRAN (R 3.5.0)                    
#>  rlang        0.2.1      2018-05-30 CRAN (R 3.5.0)                    
#>  rmarkdown    1.10       2018-06-11 CRAN (R 3.5.0)                    
#>  rplos        0.8.2      2018-07-19 CRAN (R 3.5.1)                    
#>  rprojroot    1.3-2      2018-01-03 CRAN (R 3.4.3)                    
#>  scales       0.5.0      2017-08-24 CRAN (R 3.5.0)                    
#>  shiny        1.1.0      2018-05-17 CRAN (R 3.5.0)                    
#>  solrium      1.0.0      2017-11-02 CRAN (R 3.5.1)                    
#>  stats      * 3.5.0      2018-04-23 local                             
#>  storr        1.2.0      2018-05-31 CRAN (R 3.5.0)                    
#>  stringi      1.2.4      2018-07-23 local                             
#>  stringr      1.3.1      2018-05-10 CRAN (R 3.5.0)                    
#>  tibble       1.4.2      2018-01-22 CRAN (R 3.5.0)                    
#>  tidyselect   0.2.4      2018-02-26 CRAN (R 3.5.0)                    
#>  tools        3.5.0      2018-04-23 local                             
#>  triebeard    0.3.0      2016-08-04 CRAN (R 3.5.0)                    
#>  urltools     1.7.0      2018-01-20 CRAN (R 3.5.0)                    
#>  utils      * 3.5.0      2018-04-23 local                             
#>  whisker      0.3-2      2013-04-28 CRAN (R 3.4.0)                    
#>  withr        2.1.2      2018-03-15 CRAN (R 3.4.4)                    
#>  XML          3.98-1.11  2018-04-16 CRAN (R 3.5.0)                    
#>  xml2         1.2.0      2018-01-24 CRAN (R 3.5.0)                    
#>  xtable       1.8-2      2016-02-05 CRAN (R 3.4.0)                    
#>  yaml         2.1.19     2018-05-01 CRAN (R 3.5.0)

The failure happens at the ft_get step and indeed that folder is empty.

fs::dir_exists("C:\\Users\\Maelle\\AppData\\Local\\Cache/R/fulltext")
#> C:/Users/Maelle/AppData/Local/Cache/R/fulltext 
#>                                           TRUE
fs::dir_ls("C:\\Users\\Maelle\\AppData\\Local\\Cache/R/fulltext")
#> character(0)
fs::dir_ls("C:\\Users\\Maelle\\AppData\\Local\\Cache/R/fulltext_storr")
#> C:/Users/Maelle/AppData/Local/Cache/R/fulltext_storr/config
#> C:/Users/Maelle/AppData/Local/Cache/R/fulltext_storr/data
#> C:/Users/Maelle/AppData/Local/Cache/R/fulltext_storr/keys
#> C:/Users/Maelle/AppData/Local/Cache/R/fulltext_storr/scratch

Created on 2018-08-09 by the reprex package (v0.2.0).

@sckott
Copy link
Contributor

sckott commented Aug 9, 2018

thanks!

can you try the same thing but removing the ft_links call? hopefully will work then. will investigate what's going on with ft_links

@maelle
Copy link
Contributor Author

maelle commented Aug 10, 2018

Yes it works! Now for my own enlightenment (and the enlightenment of readers of my future post), why would I actually use ft_links in this pipeline?

@maelle
Copy link
Contributor Author

maelle commented Aug 10, 2018

I also wonder why I have both ft_collect and ft_get. I copied the succession of calls from https://github.com/ropensci/fulltext#extract-chunks but I'm a bit puzzled by the verbs.

"ft_collect grabs full text data from a remote storage device. " so where does ft_get put the data? Sorry if this is a dumb question.

@sckott
Copy link
Contributor

sckott commented Aug 10, 2018

why would I actually use ft_links in this pipeline?

if you for some reason just wanted links, e.g., to write them out to a file to store them, or to use them somewhere else. it should work to go from ft_link to ft_get, but i need to work out some bug i guess

ft_get just stores the files on disk, so only returns to the user metadata and where the files are on disk, so that the user isn't dealing with huge objects until they need to. so ft_collect grabs the parsed text from xml or pdf or plain text

@maelle
Copy link
Contributor Author

maelle commented Aug 10, 2018

Ah ok, thank you! I think the "remote" in "ft_collect grabs full text data from a remote storage device. " is puzzling... but maybe only for me!

@sckott
Copy link
Contributor

sckott commented Aug 10, 2018

ah true. previously i had dreamed of fulltext supporting lots of different storage options, but it was too complicated, so just file paths now. so good idea to change that language.

@sckott sckott added this to the v1.1.0 milestone Aug 10, 2018
@sckott sckott closed this as completed in 31f27e9 Aug 10, 2018
@sckott
Copy link
Contributor

sckott commented Aug 10, 2018

hopefully the docs make more sense now 31f27e9?w=1

@maelle
Copy link
Contributor Author

maelle commented Aug 10, 2018

Yes 😀👌

@maelle
Copy link
Contributor Author

maelle commented Aug 10, 2018

And reg the "too complicated", fulltext already does many complicated things, it looks like magic from the user's perspective 😀

@sckott
Copy link
Contributor

sckott commented Aug 10, 2018

right about already complicated enough, and that's why i narrowed down article storage to just local files and pushed for suppdata as a sep pkg 😸

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants