Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exporting a page-range 'x-y' (single hyphen) from Zotero results in 'x--y' in bib-file. #1602

Closed
florisvdh opened this issue Aug 7, 2020 · 37 comments
Labels

Comments

@florisvdh
Copy link

I'm using Zotero 5.0.89 and Better BibTeX 5.2.47.

Report ID: XSEF3MV8-euc

Exporter used: Better BibTeX

Exported item:

Screenshot

image

Copy from the Pages field: 3-10

Expected output:

@article{id4,
  title = {Title},
  author = {Family Name, First Name},
  year = {2020},
  volume = {1},
  pages = {3-10},
  journal = {Journal of Abcde Research},
  number = {2}
}

Actual output:

@article{id4,
  title = {Title},
  author = {Family Name, First Name},
  year = {2020},
  volume = {1},
  pages = {3--10},
  journal = {Journal of Abcde Research},
  number = {2}
}

Note the double dash. This seems to be the problem that was mentioned here.

When importing this result in Zotero, a long dash is shown in Zotero, see below screenshot. Copy from the Pages field: 3–10. So the double dash seems to cause unwanted changes.

Screenshot

image

@florisvdh florisvdh added the bug label Aug 7, 2020
@retorquere
Copy link
Owner

I'm conflicted on what to do with this one. One soft-goal of BBT is to generate bib(la)tex that renders to the same (or as close as possible) bibliography you would get when Zotero renders the bibliography itself; BBT aims to translate the "intent" in terms of how Zotero would understand the item (I hope that sentence is parsable).

When you create a bibliography with Zotero itself for the item in XSEF3MV8-euc (with either 3<n-dash>10 or 3<hyphen>10), it outputs 3<n-dash>10. The way to get 3<n-dash>10 with bib(la)tex is with 3--10. That's the reason for the behavior you see. I can't find any authoritative source on the matter, but all information I could find on page ranges (like eg https://www.punctuationmatters.com/hyphen-dash-n-dash-and-m-dash/) points to page-ranges needing n-dashes rather than hyphens, and even that apparently many, but not all, BibTeX styles will correct hyphen to n-dash.

But I do realize this leaves you without an avenue to generate hyphen-separated numbers. What problem are you experiencing with the current behavior? Maybe another fix can be made.

@florisvdh
Copy link
Author

florisvdh commented Aug 13, 2020

Thanks for the explanation @retorquere , sorry to keep you waiting. Your answer is clear and helpful. And indeed, it seems like a logical choice. I didn't know about these things and had 'just' assumed something had gone wrong when I noted the difference. Thanks also for the provided links! I also found this reference for CSL: https://docs.citationstyles.org/en/1.0.1/specification.html#range-delimiters

I provide some background which may be of some interest and for later reference.

We have been testing our corporate csl + bst on 1) a bib file in connection to the rmarkdown R package (hence pandoc, using either csl with pandoc or bst with latex (natbib)) and 2) Zotero (csl). This test bib file has some history, involving records probably exported by Mendeley (anyway, double hyphens do occur, see e.g. here) and others by JabRef, which apparently doesn't change hyphens to double ones (e.g. here) after entering a record. [The fact that JabRef doesn't change it probably makes sense as JabRef is a direct frontend to the bib file.] More recently, some BBT-like fixes have been made by hand in the bib file, for JabRef-generated records.

Summarising, when this bib file is imported in Zotero (with BBT) and then re-exported:

  • single hyphens are retained as hyphens in Zotero (as in the first screenshot), while these are doubled in the bib file at export time
  • double hyphens are seen as a n-dash in Zotero (second screenshot), and converted to a double hyphen in the bib file at export time

While this seemed a little weird to me, the implications are indeed consistent. The Zotero-generated bibliography (in the Style Editor) after importing the two above referenced records, i.e. one time - and one time -- (leading to n-dash in Zotero) - using this csl file - is like:

Amano T. (2012). Unravelling the dynamics of organisms in a changing world using ecological modelling. Ecological Research 27 (3): 495–507. https://doi.org/10.1007/s11284-012-0928-6.
Article A. & Second A. (2020). A title. Journal of ALLCAPS and Title Case 123 (456): 7–89. https://doi.org/10.1007/s11284-012-0928-6.

I.e., two times a n-dash, consistent indeed. 👍

Also R markdown generated bibliography from the bib (with mix of - and --), using either the csl or the bst makes this result consistent by using a n-dash:

Amano T. (2012). Unravelling the dynamics of organisms in a changing world using ecological modelling. Ecological Research 27 (3): 495–507. https://doi.org/10.1007/s11284-012-0928-6.
Article A. & Second A. (2020). A title. Journal of ALLCAPS and Title Case 123 (456): 7–89. https://doi.org/10.1007/s11284-012-0928-6.

Pinging my colleague @ThierryO who will be interested as well.

So the way BBT approaches this at export seems excellent - it would further straighten the bib file after importing to and re-exporting from Zotero. BTW we also want to make more use of the citr package in R, which makes use of Zotero & BBT, in order to use Zotero with rmarkdown.

Summarizing, I don't see a need for an avenue to generate hyphen-separated numbers from Zotero.

@retorquere
Copy link
Owner

The reason I retain the single dash on import is that I can always make changes correcting the export, but damage done during import I cannot correct. The import is more conservative than the export. Also, a vanishingly small minority of the items that come into zotero come through my import, so a lot of items have hyphen-separated ranges anyhow.

@bwiernik
Copy link
Contributor

bwiernik commented Aug 14, 2020

@retorquere We've been discussing the hyphenated numbers problem over on the CSL repos, and we will probably recommend some markup syntax to indicate to not parse numbers, so I would suggest keeping your current behavior for now.

@retorquere
Copy link
Owner

@florisvdh I take it you are satisfied with the current behavior?

@bwiernik if there's news feel free to hail me here or on the forums.

@florisvdh
Copy link
Author

Sure @retorquere 👍 , and thanks for helping out.

@bwiernik
Copy link
Contributor

@florisvdh I would suggest though that you use CSL YAML as a data format if you are working with RMarkdown/pandoc, rather than .bib. BibLaTeX and CSL data are not fully compatible and using .bib with pandoc will often generate incorrect output for non-journal, non-book items (e.g., reports, software). Internally, pandoc converts everything to CSL data.

@retorquere
Copy link
Owner

That is correct.

@ThierryO
Copy link

@bwiernik IFAIK we still need .bib data for the RMarkdown > pandoc > latex + natbib output. I like having a single source for different output formats.

@retorquere
Copy link
Owner

I mean technically you still would have a single source, it would just be zotero. The translation zotero - bib - csl that all pandoc paths go through except for latex + natbib has 2 unnecessary (and not necessarily lossless/reversible) steps in it.

@bwiernik
Copy link
Contributor

@retorquere I'm not sure, but I think that when natbib is used, pandoc passes the bib file unchanged.

@bwiernik
Copy link
Contributor

@ThierryO That makes sense, but be advised that if you don't use natbib (e.g., if you use pandoc's built-in citation formatting with latex), using a CSL data format will be better.

@retorquere
Copy link
Owner

@retorquere I'm not sure, but I think that when natbib is used, pandoc passes the bib file unchanged.

That's what I had meant to say. The pandoc path for latex + natbib is the one instance where I would say you're better off using the biblatex export. In all other cases you'll get better results using CSL exported from Zotero.

@florisvdh
Copy link
Author

florisvdh commented Aug 17, 2020

@bwiernik thanks for pointing at the CSL YAML alternative, I wasn't aware of that. BTW for pandoc, is there some preference over CSL JSON? I see at citr there are some issues on the topic.

In our environment, different reference managers are used by different scientists (Endnote, Zotero, Mendeley). Converging on / enforcing one system or one common group library is not really an option.

For maximizing cooperation in R Markdown projects on GitHub, we need to settle on a common textfile where every collaborator can add references to. If that could be CSL YAML/JSON (for pandoc with pandoc-citeproc), in se that seems OK - unless there are usecases where we actually need natbib (@ThierryO?). Cf. https://pandoc.org/MANUAL.html#creating-a-pdf

Currently I see added benefit for our situation in using a bib file, because of the citr RStudio addin. Allowing to add references in a shared bib file from a local Zotero database (using BBT) is interesting in our usecase. Hence not all references in the file need to be present in the Zotero database (meaning, someone is allowed to add records from elsewhere). Of course some errors can't be excluded when the bib file is changed in this way, and that has to been taken care of manually.

But maybe if citr supports CSL YAML or CSL JSON in the future, we need to reconsider. On the other hand, I saw this comment, which still considers the bib format as the currently most widely supported one.

@bwiernik
Copy link
Contributor

CSL JSON and CSL YAML are compatible. CSL YAML can accept full Markdown formatting and is more human readable/editable. JSON is currently more supported outside of pandoc.

@bwiernik
Copy link
Contributor

I would recommend using the rbbt package over citr. It works much better with Zotero, especially if your library is large (citr can't even load my Zotero library), and it doesn't require the lossy Zotero-bib-CSL conversion.

You could also just start using RStudio's first-party built in Zotero integration that's currently available in the Daily build. That can output to CSL YAML or JSON, as well as to bib (that's currently not the greatest).

@retorquere
Copy link
Owner

retorquere commented Aug 17, 2020

@bwiernik thanks for pointing at the CSL YAML alternative, I wasn't aware of that. BTW for pandoc, is there some preference over CSL JSON? I see at citr there are some issues on the topic.

There are minimal differences in date formatting between CSL JSON and CSL YAML that can be converted to each other without loss. It's mostly an aesthetic preference which one you prefer.

But maybe if citr supports CSL YAML or CSL JSON in the future, we need to reconsider. On the other hand, I saw this comment, which still considers the bib format as the currently most widely supported one.

Whether that's true depends on what you mean by "supported".

CSL is a standardized format that's easy to produce and consume without ambiguity. Bib(la)tex on the other hand has a million subtle rules about how it should be produced/consumed. No two consumers will interpret a given bibtex exactly the same. So if "supported" includes "will be interpreted roughly the same, for relatively simple cases", then yes, they are supported. If it means "whatever stack I put this through, I'm going to end up with the same rendered bibliography", then absolutely not. CSL gets you that. And the Zotero data model is (probably by design) much closer to CSL than to bib(la)tex, so the conversion from Zotero to CSL requires way less changes to the input data to produce CSL, and it's unambiguous from that point on. Many years into BBT development I still find problems to fix in producing/consuming bib(la)tex.

I can vouch for the quality of the interpretation of BBT (which isn't perfect, because it cannot be given how much more bibtex allows in virtue of having all of TeX at its disposal than CSL), and I'm ready to trust pandoc being on par because it'd be pretty foolish to underestimate jgm (although there used to be edge cases where BBT was better, but I haven't kept track), but in my experience, outside the limited group of pandoc, BBT and of course the actual LaTeX stacks, everything else that consumes or produces bib(la)tex ranges from passable to garbage.

@florisvdh
Copy link
Author

florisvdh commented Aug 17, 2020

Thanks for sharing your experience and view on this @retorquere . I agree that the broad support for bib is only true in a loose sense. My earlier described case indeed demonstrates that different syntaxis will occur inside the same bib file, if different origins have been at work.

@florisvdh
Copy link
Author

florisvdh commented Aug 17, 2020

I would recommend using the rbbt package over citr. It works much better with Zotero, especially if your library is large (citr can't even load my Zotero library), and it doesn't require the lossy Zotero-bib-CSL conversion.

@bwiernik Thanks for sharing, I didn't know the rbbt package! Being able to connect to larger databases is indeed interesting. I looked a bit in the repo, and compared with citr. As explained, for our usecase it is necessary to manage a central, project-specific text file to which collaborators can contribute - each has his/her own tool/database so we need to deal with e.g. user A with Zotero database 1, user B with Zotero database 2, user C with Mendeley database 1 and user D who does some text editing by hand. [I don't want to defend mixed approaches, but they are reality in the scientific community and we still should maximize collaboration (and at least citr provides a way to deal with it).]

  • Then, most importantly: is it possible, for Zotero-users, to use bbt_write_bib(overwrite = FALSE) to add selected records from their personal Zotero database to an existing JSON (or YAML or bib) file? I.e. without touching existing records. In that case, I agree it would be a very useful package to Zotero users in our organization as it would meet the above collaborative requirements.
  • For non-Zotero users, citr::md_cite() offers a way (also through the addin) to search the text file (bib in this case) and insert a citation from the text file. For them (or for Zotero users without that bibliographic item in their database), that is a very interesting feature. Is such functionality provided in rbbt? I may have overlooked, but I didn't see it.

@retorquere
Copy link
Owner

For the first point: this is not currently possible, but it'd be easy to build it. You can also build it with bibtex, but amending yaml or json is absolutely trivial in comparison.

@bwiernik
Copy link
Contributor

@florisvdh Check out the RStudio daily build with the new Markdown editor and citation integration. It does that without needing any additional packages.

@florisvdh
Copy link
Author

@bwiernik sorry for the delay, I needed to find some time to better look into this. Thanks a lot for the hint!

It appears that a nice documentation website is in the making for many new features in RStudio 1.4 (Visual Markdown). It's here: https://rstudio.github.io/visual-markdown-editing (the info on citations & Zotero is under 'technical writing'). And YES, it has it all! 👍 This will indeed fullfil the explained needs.

Some extracts:

Citations can be drawn from a variety of sources:

  • Document or project level bibliographies.
  • DOI (Document Object Identifier) references.
  • Zotero libraries.

If you insert a citation from a DOI or Zotero library that isn’t already in your bibliography then it will be automatically added to the bibliography.

R Markdown supports bibliographies in a wide variety of formats including BibTeX and CSL. Add a bibliography to your document using the bibliography YAML metadata field.

I didn't try it yet, but it seems rather convincing already, and some issues have in the meantime been solved in the RStudio github repo. They also provide integration with BBT when using Zotero (e.g. https://github.com/rstudio/rstudio/blob/master/src/cpp/session/modules/zotero/ZoteroBetterBibTeX.cpp).

@florisvdh
Copy link
Author

I have installed the RStudio daily build and had a quick look.

It's important to add that the 'insert citation' feature (currently?) is only available in the visual editor mode (which is comparable to a word processor interface), not in the plain Rmd mode. I feel this may be a limitation on its own, compared to citr or rbbt - the visual mode is currently enforced in order to use the citation features. Being able to insert citations in plain mode may be worth a feature request at RStudio. Entering visual mode rewraps and reformats existing (valid) markdown lines, which is unwanted if the use of visual editor is just to insert citations.

@retorquere
Copy link
Owner

I'll give the new RStudio a shot, but maybe this doesn't really connect to the reported issue anymore.

@florisvdh
Copy link
Author

florisvdh commented Aug 31, 2020

this doesn't really connect to the reported issue anymore

@retorquere It surely doesn't connect to this issue anymore, you're right. (I don't know where else we should do the ongoing discussion). On the other hand maybe it's worth to keep this broader discussion in one place until some well-defined need emerges that can then be re-initiated in the right place (e.g., rbbt, rstudio).

@retorquere
Copy link
Owner

It's fine to have the discussion here, I just needed to know whether something was expected of me for this.

@florisvdh
Copy link
Author

FYI. I filed some issues at https://github.com/rstudio/rstudio/issues

@florisvdh
Copy link
Author

florisvdh commented Sep 4, 2020

Some preliminary conclusions after further tests with RStudio visual markdown editing (VME), citr and rbbt.

@retorquere, @bwiernik can you please have a look at this? Further comments/corrections are most welcome!!

0 means no, 1 means yes.

Feature Rstudio VME citr rbbt
insert-citation addin works in Rmd source mode 0 1 1
use of insert-citation tooling leaves existing markdown lines as-is: user is in full control of markdown (needed for optimal version control & keeping diffs small) 0 (needs further investigation though) 1 1
can insert citation(s) from *.bib file 1 1 0
can insert citation(s) from csl-json/csl-yaml file 1 0 0
can insert citation(s) from Zotero 1 1 1
insert-citation addin has blazing fast access to records of large Zotero database 1 0 1 (uses Zotero API)
trigger for amending/overwriting biblio file: selection of Zotero-records to insert citations selection of Zotero-records to insert citations running bbt_write_bib() = Rmd detection of previously inserted citations from Zotero + writing to file
write (selected/detected) Zotero records to *.bib file (with BBT) 1 (add records) 1 (add records) 1 (overwrite file)
write (selected/detected) Zotero records to csl-json file (with BBT) 1 (add records) 0 1 (overwrite file)
write (selected/detected) Zotero records to csl-yaml file (with BBT) 1 (add records) 0 1 (overwrite file)
insert (paste) bibliographic entry/entries from Zotero (manual file amending) 0 0 1 (bib/json/yaml; uses a 2nd addin)

(*) due to not being able to use another Zotero profile (to which I switched) from RStudio VME (even after restart, reinstall etc); will file an issue.

Below are some possible (suggested) development paths (discussed with my colleagues @ThierryO and @hansvancalster) which could provide a solution for the aforementioned usecase (#1602 (comment)). I.e., collaborative researchers contributing to the same bibliographic file in a common git repo, each using its own (manual / (semi-)automated) tools to do that, and each adding citations inside Rmd files from RStudio. We are really interested in getting to a solution for one of the following.

  • RStudio VME:
    • provide the insert-citation tool in Rmd source mode
  • citr:
    • extend functionality to also support csl-json and csl-yaml (inserting citations from file; amending the file)
    • when inserting citations from Zotero, connect to Zotero in a really fast way - probably through the Zotero API. (Maybe rbbt code could be reused for this)
  • rbbt:
    • extend functionality to insert citations from bibliographic file (bib/json/yaml) - we need more than Zotero-only use
    • extend functionality to amend existing bibliographic file (bib/json/yaml), not touching existing records (which may not all occur in local Zotero) - actually we need that

We intend to add an issue on these topics at the respective repositories, referring to this table here. I will first await further considerations and ideas from you, which may lead to edits of the above table and suggestions.

@bwiernik
Copy link
Contributor

bwiernik commented Sep 4, 2020

Important to remember that rbbt is developed as an interface to Zotero/BBT. It's entire design is to use Zotero's citation interface. This is what makes it fast and very lightweight. The workflow is designed to be insert items from
your Zotero library and then generate a bibliography file at the end of writing. The generation of the bibliography file can be automatically integrated into the knitting process. That isn't going to change.

If someone wanted to cite from a bib or YAML file with rbbt, the intended workflow would be to import those files to their Zotero library.

@bwiernik
Copy link
Contributor

bwiernik commented Sep 4, 2020

I wouldn't expect much more development on citr. Frederick Aust happily described the RStudio development as "effectively making citr obsolete".

@retorquere
Copy link
Owner

I can't really speak on what rbbt can do; I've collaborated to make it possible by extending the BBT backend, but I've not used it myself.

@florisvdh
Copy link
Author

Thank you for the advice. We'll have a further look into RStudio first.

BTW just found this nice tutorial on Zotero + BBT + Rmarkdown + rbbt.

@bwiernik
Copy link
Contributor

bwiernik commented Sep 5, 2020

Be sure to try out the bbt_bib_file() function to automatically generate your CSL JSON file from the items you actually cited.

@florisvdh
Copy link
Author

@bwiernik I did try bbt_write_bib() (see table); I guess that's the function you mean (bbt_bib_file() does not exist, I think).

@bwiernik
Copy link
Contributor

bwiernik commented Sep 7, 2020

Oh, right. The name of the function changed from my initial propsal.

@retorquere
Copy link
Owner

BTW there's still another option when going the bibtex/csl-json route: it is possible to set up an auto-export through the BBT HTTP API.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants