Better approaches to importing of the Zotero database #55

Robinlovelace · 2019-10-16T04:28:40Z

It's frustrating when citr freezes your session so I thought I'd have a play with the future package. Results seem promising so far, so thought I'd report back, having alluded to the potential utility of having the initial bib read running in the background several months ago. Basic concept demonstrated in reprex below. Thoughts: welcome!

# exclude things to make reprex faster
exclude = c("My Library", "energy-and-transport")

# no future
tictoc::tic()
b = citr::load_betterbiblatex_bib(encoding = "UTF-8", exclude_betterbiblatex_library = exclude)
#> Importing 'LIDA-leeds'...
#> Importing 'tds'...
plot(1:9)
tictoc::toc()
#> 0.58 sec elapsed
tictoc::tic()
# do some other work
class(b)
#> [1] "BibEntry" "bibentry"
tictoc::toc()
#> 0.002 sec elapsed


# with future
tictoc::tic()
future::plan("multiprocess")
b = future::future(citr::load_betterbiblatex_bib(encoding = "UTF-8", exclude_betterbiblatex_library = exclude))
plot(1:9)

tictoc::toc()
#> 0.085 sec elapsed
tictoc::tic()
# do some other work
b = future::value(b)
#> Importing 'LIDA-leeds'...
#> Importing 'tds'...
class(b)
#> [1] "BibEntry" "bibentry"
tictoc::toc()
#> 0.322 sec elapsed

^{Created on 2019-10-16 by the reprex package (v0.3.0)}

The text was updated successfully, but these errors were encountered:

Robinlovelace · 2019-10-16T04:40:03Z

As a follow-on point, I've just tested out parsing files with the bib2df package and it seems fast.

Timings below on 2000+ .bib file FYI.

system.time({b = bib2df::bib2df("allrefs.bib")})
Some BibTeX entries may have been dropped.
            The result could be malformed.
            Review the .bib file and make sure every single entry starts
            with a '@'.
Column `YEAR` contains character strings.
              No coercion to numeric applied.
   user  system elapsed 
  2.098   0.003   2.112 
Warning message:
In bib2df_tidy(bib, separate_names) : NAs introduced by coercion
> nrow(b)
[1] 2755
> system.time({b2 = citr:::read_bib_catch_error("allrefs.bib")})
<simpleError in RefManageR::ReadBib(x, check = FALSE, .Encoding = encoding): argument "encoding" is missing, with no default>
   user  system elapsed 
  0.108   0.000   0.108 
> system.time({b2 = citr:::read_bib_catch_error("~/uaf/allrefs.bib", )})
x=         encoding=  
> system.time({b2 = citr:::read_bib_catch_error("~/uaf/allrefs.bib", "UTF-8")})
   user  system elapsed 
  7.179   0.093   7.272

Robinlovelace · 2019-10-16T04:48:02Z

Update: FYI I think the output from that package is not production ready yet. Just food for thought...

crsh · 2019-10-16T07:38:40Z

Hi Robin, thanks for sharing your results. This is actually one of the top two issues I want to tackle next. This looks promising.

Here are some of my thoughts on this. I think there are two major options here to speed up reading from Zotero:

Improve the current approach by possibly speeding up the reading of the bibliography file exposed by BBT by trying bib2df and using future or promises to enable loading the database in the background.

Have you, by chance, looked at promises? They seem to be an alternative to future, but I haven't fully understood the strengths of each approach to decided which way to go on this. bib2df also looks like a promising alternative to RefManageR and bibtex!

Search the Zotero database directly by using the BBT CAYW search (see below) and require users to use the pandoc-zotxt Lua filter with their R Markdown document format (e.g., using rmdfiltr). However, if I understand correctly, this would require installation of zotxt, another Zotero plugin.

I haven't tried zotxt and pandoc-zotxt, but if the bibliography export is fast(er than BBT), this could be the easiest and fastest way to address slow loading of the Zotero database. Hence, I'm leaning towards the second option. This would require some testing and some user interface considerations (would this be a separate addin or could it be integrated with the existing one?).

Just to link to the previous issue on background loading of the Zotero database: #36

Robinlovelace · 2019-10-16T08:02:52Z

Not tried promises, in my experience bib2df is buggy. All approaches sound good, I'm excited for this new behaviour and happy to test anything you come up with. Many thanks.

crsh · 2019-10-16T13:20:15Z

After playing around with pandoc-zotxt a little I've come to understand that it requires the global pandoc variable PANDOC_STATE, which was introduced in pandoc 2.4. Currently, RStudio is shipping version 2.3.1, so I'll wait until they ship a newer version before starting to implement and test this.

crsh added the enhancement label Oct 16, 2019

crsh mentioned this issue Oct 16, 2019

Load database silently #36

Closed

crsh changed the title ~~Support for reading zotero file in background~~ Better approaches to importing of the Zotero database Oct 16, 2019

crsh mentioned this issue Nov 21, 2019

Updating local bibliography file #16

Open

crsh mentioned this issue Jan 15, 2020

Limit re-fetches of the Zotero library #58

Open

This was referenced Jan 23, 2020

Use (Better) CSL JSON instead of BibTeX or BibLaTeX #59

Open

Parse from biblatex to CSL references jgm/pandoc-citeproc#435

Closed

crsh mentioned this issue Jul 15, 2020

Encourage use of CSL-JSON crsh/papaja#387

Open

crsh mentioned this issue Oct 15, 2020

citr crashes connecting to Zotero library #74

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better approaches to importing of the Zotero database #55

Better approaches to importing of the Zotero database #55

Robinlovelace commented Oct 16, 2019

Robinlovelace commented Oct 16, 2019

Robinlovelace commented Oct 16, 2019

crsh commented Oct 16, 2019

Robinlovelace commented Oct 16, 2019

crsh commented Oct 16, 2019

Better approaches to importing of the Zotero database #55

Better approaches to importing of the Zotero database #55

Comments

Robinlovelace commented Oct 16, 2019

Robinlovelace commented Oct 16, 2019

Robinlovelace commented Oct 16, 2019

crsh commented Oct 16, 2019

Robinlovelace commented Oct 16, 2019

crsh commented Oct 16, 2019