-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BibTeXML vs. bibteXMP #938
Comments
Refs #898 |
I'm not really following the argumentation. One may argue of different export formats, but how is it relevant that they are both XML? Isn't it more of an issue if it is a relevant format in itself? |
Context: The format is used for storing BibTeX data in XML files using the XMP functionality (follow I am arguing that JabRef uses a proprietary format which is not used elsewhere. Thus, our XMP data cannot be processed by other software. I see the point, that the last commit at the current BibTeXML repository is from 2011. Nevertheless, I vote for joining forces. These formats are too similar to go into different directions. I see following alternatives:
Somehow, the current code seems to use "Dublin Core", which reads good. Maybe, that code can just be used and the other serialization using In case everything is replaced by Dublin Core, one can update PDFBox - see #1096. |
Ah, OK, so bibteXMP is JabRef's own format? Then it clearly makes more
sense so not support exporting in that.
|
The question would be: How many people actually use the XMP feature? From a quick look at the Code you referenced, I saw that it uses rdf-Tags...:confused: |
The XMP feature is the central tool to distribute PDFs with bibliographic information. I learned it from Adrian Daerr (possibly @adriandaerr?). I am also confused by the code and also had the strange feelings about nesting JabRef's bibtexml into rdf tags. Therefore, I proposed to focus on Dublin Core (see above). |
thanks for inviting me to the discussion! the BibTeXML we developed and implemented (http://dret.net/netdret/publications#wil01e) is a different one than the sourceforge repo. the paper is from 15 years ago, and while we used the language in a later project (http://dret.net/projects/sharef/), the software produced by that project is not really used anywhere, as far as i can tell. i did hand the sources to some people who liked it and wanted to have a bibtex-xml converter, but i don't think anybody ever made their versions public. i think our XML schema was pretty well-desgined, but it's something i haven't looked at in quite a while. |
Either format you prefer to embed in PDF, would be great if it is compatible with PDF/A compliance checks.
If it will be format like BibteXML, that can be exported in xml it would be also great to have some minimal example for correct embedding it through latex with |
After dealing with this in #1096 I think the most portable solution would be to drop the JabRef bibteXMP and to encode everything into Dublin core (which we already do on top of our custom serialization). That is, if we do not decide to drop the XMP functionality completely. |
Some info about correct storage of xmp inside pdf (to be compatible with pdf/a for example) can be found with samples at http://www.pdflib.com/knowledge-base/xmp-metadata/xmp-in-pdfa/ |
Idea (as discussed with @hummelriegel): Add bibtexs of cited entries to the PDF. This is especially useful for a self-written paper. |
Further options include bibtexml and MODS. I think, dublin core is still the way to go as it is standards-based. We should go in this direction. |
Refs JabRef#6 |
Hi guys. I am not developer. I am just another user. I really hope that you maintain the XML feature. This one of the most important unique feature of Jabref that keep me come back time and time again (after using great reference manager like Bookends). The XML is useful not just for sharing Pdf files. Embedding the information into the Pdf is very useful for powerful search tools like Deveonthink[Mac], Spotlight[MaC], dtSearch[Windows]. With the embedded data, it is possible to search Pdf files by their author, title and the like data. In addition, re-generating the Jabref library from the pdf files (incase the library is corrupted or deleted) is possible with the embedded data. I had a couple of cases where my pdf files get dissociated from the reference. I drag them back. Voilà, I have the whole reference. This is just so great. |
Hi dellu. Thanks for the praise! And no worries, we have no intentions of removing support for this feature. Quite the contrary, we would like to update and improve it. Unfortunately, this has so far failed due to issues in the libraries that we use for this functionality. As a result, I assume that there will be no significant changes here in the near future. |
Interesting link, thanks! Unfortunately, it will not be easy to interact with that tool or the ExifTool. The former is written in C++ and the latter in Perl, whereas JabRef is written in Java. There is always a way around the language differences, but in my point of view we should stick to the Java ecosystem and build a JabRef where everything is closely integrated and without language-related friction. Other developers might have a different opinion, though. |
Together with @snisnisniksonah I am investigating whether we can use Dublin Core. Current steps:
Results:
|
Nice! I think the XMPUtil is not that important since in most cases you can just write the information again to the PDF using Dublin Core and thus overwriting / "converting" the old XMP data. |
Note to self: Do not forget #938 (comment). pdflatex can easily do that: authorarchive. Check the example PDF. |
This fixes #938 - Reading and writing multiple dublinCore entries works: XMPUtilWriter supports mutliple metadata entries in dublinCore and a single entry in the PDDocumentInformation. If you want to test the reading of multiple entries, the PDF file JabRef_multipleMetaEntries.pdf contains three metadata entries in DublinCore for testing locally. - Removed to much code when refactoring the XMPUtil. Non XMP metadata are also relevent, when retrieving org.apache.pdfbox.pdmodel.PDDocumentInformation - Update pdfbox and fontbox from 1.8.13 to 2.0.8 and migritate from jempbox to xmpbox. See pull #1096. - Refactor extraction from DublinCoreSchema - The tests cover the most important use cases, which include reading and writing metadata from pdf files. Both formats, DublinCore and PDMetadata (which are no XMP metadata) are tested. - Separated XMPUtils in a reader and a writer utitlity class. - add meaningful names in DublinCoreExtractor and use StringUtils.isNullOrEmpty - Log exception in XMPUtilShared
JabRef 3.2
It seems that JabRef offers a second kind of XML serialization in BibTeX:
IMHO, it is not worth to keep two different XML Schemas for an XML serialization of BibTeX. AFAIK, there isn't even one for JabRef's XML. Therefore, I propose that we should use BibTeXML only and migrate old XMP meta data to the BibTeXML format.
XMP examples can be found at
jabref/src/test/java/net/sf/jabref/logic/xmp/XMPUtilTest.java
Line 139 in fc82796
The text was updated successfully, but these errors were encountered: