Add support for arXiv entries and non-PDF URLs #97

k4rtik · 2021-08-11T17:27:48Z

No description provided.

gcushen · 2021-08-11T19:46:42Z

The title states Add support for URLs... but actually logic for URLs is already implemented in the software and your PR is changing the behaviour of the url entry logic to give it a label named "URL" rather than "PDF"? How do you propose to address this change with existing users of the software who are using it for PDFs?

Can you link to where archiveprefix appears in the Bibtex spec?

Also, the PR is failing the checks and would need fixing before it can be merged.

k4rtik · 2021-08-11T22:04:58Z

Hi @gcushen, thanks for taking a look. I looked at your comment only after noticing the build failure and pushing the fix for that.

About "URL" vs the "PDF" label, I think currently the choice to apply the PDF label to any URL without validation that it points to a PDF is incorrect. See, for example, a large bibliography database that I maintain at http://ks.cs.uchicago.edu/qpl-bib/ which is generated using the tool bibtex2html which makes a better decision by choosing the labels ".pdf", "http" or "https", depending on the kind of URL it encounters. I am slowly moving over that bibliography to a Hugo-based system running on wowchemy (https://quantumpl.github.io) and notice these inconsistencies. Would you like that kind of smarter distinction to go along with this PR? (I believe that will also take care of your concern about backward compatibility.)

About archiveprefix field, it is a non-standard bibtex field (just like url) most commonly used for arXiv e-prints as you can see while exporting bibtex for any paper from their web interface. It is worth supporting in the tool; a lot of large research communities such as math, physics, and CS depend on arXiv to provide archival (and open access) versions of their research. See for example https://arxiv.org/abs/1402.4467

gcushen · 2021-08-12T00:06:26Z

Yes, we should check if the URL ends in .pdf (case-insensitive) to maintain backward compatibility for users. Let's keep the labels user friendly though and not label links as tech protocols like HTTP and HTTPS.

If arxiv.org are generating bibtex with non-standard archiveprefix, then let's support it. The challenge for contributors and maintainers is that we are effectively creating a new (undocumented) Bibtex standard from all these non-standard fields rather than adhering to a clearly defined existing spec...

k4rtik · 2021-08-12T02:36:25Z

Alright, I don't see the option to convert this into a draft PR, but I will try and make the changes and let you know when it's ready for a potential merge.

I am not sure what is the clearly defined existing spec that you are referring to. bibtex is really old, even url field is a non-standard field. The future is certainly with biblatex that I hope every major publisher starts supporting, but until then we need to stick with what the norms in major communities are.

arXiv is large enough that biblatex provides aliases for new fields that it has introduced for arXiv compatibility, see sec. 3.14.7 Electronic Publishing Information at https://ctan.mirrors.hoobly.com/macros/latex/contrib/biblatex/doc/biblatex.pdf :

There are two aliases which ease the integration of arXiv entries. archiveprefix is treated as an alias for eprinttype; primaryclass is an alias for eprintclass. If hyperlinks are enabled, the eprint identifier will be transformed into a link to arxiv.org.

k4rtik · 2021-08-13T19:29:01Z

Hi @gcushen, I have made the change as you suggested. This PR is now ready for merge.

academic/import_bibtex.py

Add support for URLs and arXiv entries

fa55a20

k4rtik mentioned this pull request Aug 11, 2021

Generate arXiv Links QuantumPL/site#2

Closed

fix indent issue

888ac10

Support url_pdf for backward compatibility

11f6629

gcushen requested changes Aug 15, 2021

View reviewed changes

academic/import_bibtex.py Outdated Show resolved Hide resolved

Refactor based on feedback

7aba40b

gcushen changed the title ~~Add support for URLs and arXiv entries~~ Add support for arXiv entries and non-PDF URLs Aug 16, 2021

gcushen merged commit 061ef0e into GetRD:main Aug 16, 2021

gcushen added the enhancement New feature or request label Aug 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for arXiv entries and non-PDF URLs #97

Add support for arXiv entries and non-PDF URLs #97

k4rtik commented Aug 11, 2021

gcushen commented Aug 11, 2021 •

edited

Loading

k4rtik commented Aug 11, 2021

gcushen commented Aug 12, 2021 •

edited

Loading

k4rtik commented Aug 12, 2021

k4rtik commented Aug 13, 2021

Add support for arXiv entries and non-PDF URLs #97

Add support for arXiv entries and non-PDF URLs #97

Conversation

k4rtik commented Aug 11, 2021

gcushen commented Aug 11, 2021 • edited Loading

k4rtik commented Aug 11, 2021

gcushen commented Aug 12, 2021 • edited Loading

k4rtik commented Aug 12, 2021

k4rtik commented Aug 13, 2021

gcushen commented Aug 11, 2021 •

edited

Loading

gcushen commented Aug 12, 2021 •

edited

Loading