Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for arXiv entries and non-PDF URLs #97

Merged
merged 4 commits into from
Aug 16, 2021
Merged

Add support for arXiv entries and non-PDF URLs #97

merged 4 commits into from
Aug 16, 2021

Conversation

k4rtik
Copy link
Contributor

@k4rtik k4rtik commented Aug 11, 2021

No description provided.

@gcushen
Copy link
Collaborator

gcushen commented Aug 11, 2021

The title states Add support for URLs... but actually logic for URLs is already implemented in the software and your PR is changing the behaviour of the url entry logic to give it a label named "URL" rather than "PDF"? How do you propose to address this change with existing users of the software who are using it for PDFs?

Can you link to where archiveprefix appears in the Bibtex spec?

Also, the PR is failing the checks and would need fixing before it can be merged.

@k4rtik
Copy link
Contributor Author

k4rtik commented Aug 11, 2021

Hi @gcushen, thanks for taking a look. I looked at your comment only after noticing the build failure and pushing the fix for that.

About "URL" vs the "PDF" label, I think currently the choice to apply the PDF label to any URL without validation that it points to a PDF is incorrect. See, for example, a large bibliography database that I maintain at http://ks.cs.uchicago.edu/qpl-bib/ which is generated using the tool bibtex2html which makes a better decision by choosing the labels ".pdf", "http" or "https", depending on the kind of URL it encounters. I am slowly moving over that bibliography to a Hugo-based system running on wowchemy (https://quantumpl.github.io) and notice these inconsistencies. Would you like that kind of smarter distinction to go along with this PR? (I believe that will also take care of your concern about backward compatibility.)

About archiveprefix field, it is a non-standard bibtex field (just like url) most commonly used for arXiv e-prints as you can see while exporting bibtex for any paper from their web interface. It is worth supporting in the tool; a lot of large research communities such as math, physics, and CS depend on arXiv to provide archival (and open access) versions of their research. See for example https://arxiv.org/abs/1402.4467

@gcushen
Copy link
Collaborator

gcushen commented Aug 12, 2021

Yes, we should check if the URL ends in .pdf (case-insensitive) to maintain backward compatibility for users. Let's keep the labels user friendly though and not label links as tech protocols like HTTP and HTTPS.

If arxiv.org are generating bibtex with non-standard archiveprefix, then let's support it. The challenge for contributors and maintainers is that we are effectively creating a new (undocumented) Bibtex standard from all these non-standard fields rather than adhering to a clearly defined existing spec...

@k4rtik
Copy link
Contributor Author

k4rtik commented Aug 12, 2021

Alright, I don't see the option to convert this into a draft PR, but I will try and make the changes and let you know when it's ready for a potential merge.

I am not sure what is the clearly defined existing spec that you are referring to. bibtex is really old, even url field is a non-standard field. The future is certainly with biblatex that I hope every major publisher starts supporting, but until then we need to stick with what the norms in major communities are.

arXiv is large enough that biblatex provides aliases for new fields that it has introduced for arXiv compatibility, see sec. 3.14.7 Electronic Publishing Information at https://ctan.mirrors.hoobly.com/macros/latex/contrib/biblatex/doc/biblatex.pdf :

There are two aliases which ease the integration of arXiv entries. archiveprefix is treated as an alias for eprinttype; primaryclass is an alias for eprintclass. If hyperlinks are enabled, the eprint identifier will be transformed into a link to arxiv.org.

@k4rtik
Copy link
Contributor Author

k4rtik commented Aug 13, 2021

Hi @gcushen, I have made the change as you suggested. This PR is now ready for merge.

academic/import_bibtex.py Outdated Show resolved Hide resolved
@gcushen gcushen changed the title Add support for URLs and arXiv entries Add support for arXiv entries and non-PDF URLs Aug 16, 2021
@gcushen gcushen merged commit 061ef0e into GetRD:main Aug 16, 2021
@gcushen gcushen added the enhancement New feature or request label Aug 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants