Add support for application/x-bibtex type #532

koppor · 2020-01-08T23:54:06Z

As I caller, I want to have BibTeX as output (instead of XML). This PR adds that functionality. A caller has to send the HTTP header Accept: application/x-bibtex

The implementation tries to keep the existing code as much as possible. Thus, the modification of the processCitation method with an additional parameter.

…mat rather that tei

coveralls · 2020-01-09T00:01:16Z

Coverage increased (+0.2%) to 38.082% when pulling 0b86f3f on koppor:add-bibtex-accept into 5a325e3 on kermitt2:master.

kermitt2 · 2020-02-29T08:05:29Z

Sorry Oliver for being so slow to review the PR!

No problem to add the bibtex response type of course.

Just two questions,

what is your impression about the current quality of the generated BibTeX? It's something I've written quickly like 7-8 years ago and my worry is that it's not comprehensive enough.
do we want, for consistency, to add this response type also to the service /api/processReferences, so extracting all the bibliographical references of a PDF into bibTeX? We can do it later of course.

koppor · 2020-03-06T22:13:30Z

what is your impression about the current quality of the generated BibTeX? It's something I've written quickly like 7-8 years ago and my worry is that it's not comprehensive enough.

It's good enough to integrate it in JabRef.

We have one major issue: The names are not in the format Lastname, Firstname, which the JabRef team thinks is the more consistent way to represent names.

We see that the date parsing is not that easy.

So following example

Kolb, S., Wirtz G.: Towards Application Portability in Platform as a Service
Proceedings of the 8th IEEE International Symposium on Service-Oriented System Engineering (SOSE), Oxford, United Kingdom, April 7 - 10, 2014.

appears as follows in JabRef:

@Article{KolbApril7102014,
  author    = {S Kolb and G Wirtz},
  year      = {April 7 - 10, 2014},
  address   = {Oxford, United Kingdom},
  booktitle = {Towards Application Portability in Platform as a Service Proceedings of the 8th IEEE International Symposium on Service-Oriented System Engineering (SOSE)},
}

We have no issues in converting ". As soon as we offer JabRef functionalities as library (refs JabRef/jabref#110), we'll come up with a PR updating the BibTeX writing.

Nevertheless, we experienced no issues in parsing the result and integrating it in JabRef.

do we want, for consistency, to add this response type also to the service /api/processReferences, so extracting all the bibliographical references of a PDF into bibTeX? We can do it later of course.

Done. 😅

koppor · 2020-03-06T22:34:05Z

Think, my tests are too straight-forward. Maybe, I should start a REST server and then use http://rest-assured.io/ ^^.

org.grobid.service.process.GrobidRestProcessStringTest > processCitationReturnsBibTeX FAILED

    java.lang.UnsatisfiedLinkError: Native Library /home/travis/build/kermitt2/grobid/grobid-home/lib/lin-64/libwapiti.so already loaded in another classloader

Would it be OK to disable the tests for now to include the functionality in the main branch?

- Add some exception logging - Use == for enums

…nning tests https://github.com/radarsh/gradle-test-logger-plugin

koppor · 2020-03-08T15:47:01Z

I think, in the long run, org.grobid.core.data.BiblioItem#toTEI(int, int, org.grobid.core.engines.config.GrobidAnalysisConfig) should be used - the result then be converted to BibTeX (maybe using XSLT). Reason: That method does much magic - whereas toBibTeX "just" outputs the core data.

kermitt2 · 2020-03-08T18:44:40Z

Thank you @koppor for making the test passing!

I think, in the long run, org.grobid.core.data.BiblioItem#toTEI(int, int, org.grobid.core.engines.config.GrobidAnalysisConfig) should be used - the result then be converted to BibTeX (maybe using XSLT). Reason: That method does much magic - whereas toBibTeX "just" outputs the core data.

yes actually this is the whole idea of using TEI as unique output format for GROBID. Then the other legion of "degraded" formats can simply be derived from the TEI via XSLT. It's probably a bad idea to output directly BibTeX given how ambiguous and presentation-oriented the format is (it's why I didn't touch the toBibTeX() method since 7-8 years, and kept it superficial), but I recognize that BibTeX is very useful for researchers and I was happy yo have BibTeX references when I was still writing research papers.

Probably the last thing to be done is to update the web service documentation, under doc/Grobid-service.md :)

…ces into the Java code)

koppor · 2020-03-08T20:45:14Z

I wonder whether MODS should be used as XML format. Is there an ADR for TEI ^^.

Meanwhile, I also worked on the BibTeX output heuristics.

I also added/enabled basic tests for the URLs provided at "Service checks"

Linted the markdown file using https://github.com/DavidAnson/markdownlint (the visual studio code plugin: https://github.com/DavidAnson/vscode-markdownlint).

koppor · 2020-03-08T20:49:02Z

See https://github.com/koppor/grobid/blob/add-bibtex-accept/doc/Grobid-service.md#apiprocessreferences for the documentation added. Tried to use monospaced text for parameter names etc.

kermitt2 · 2020-03-08T21:46:42Z

I wonder whether MODS should be used as XML format.

Well MODS covers only the biblio, so as GROBID requires to encode a complete text body it's not a valid choice. For the biblio, TEI is actually more comprehensive too, MODS has almost nothing to encode the affiliation information (there are other small issues with MODS).

The only other valid candidate would be JATS, but it's a bit a mess with many flavors/freedom, and it's focusing on article only. For GROBID, we need to cover, full monograph, patents, standards, etc. and TEI is simply more comprehensive. TEI has also nice customization mechanism to define a non-ambiguous encoding.

end of ADR :D

koppor · 2020-03-08T22:21:32Z

Thank you for the ADR ^^.

I did not do any more changes, since both travis and coveralls is green, I think, the update is complete 😇

kermitt2 · 2020-03-08T22:29:38Z

I did not do any more changes, since both travis and coveralls is green, I think, the update is complete

Yes and thank you so much for the extensive corrections in the Grobid-service.md file !

kermitt2 · 2020-03-08T22:38:37Z

Doing some test, the person.getFirstName() in toBibTeX() can be null apparently:

lopez@work:~/grobid$ curl -X POST -H "Accept: application/x-bibtex" -d "citations=Graff, Expert. Opin. Ther. Targets (2002) 6(1): 103-113" localhost:8070/api/processCitation
@article{-1,
  author = {Graff, null},
  journal = {Expert. Opin. Ther. Targets},
  year = {2002},
  pages = {103--113},
  volume = {6},
  number = {1}
}

grobid-core/src/main/java/org/grobid/core/data/BiblioItem.java

koppor · 2020-03-09T05:36:15Z

Fixed. Added test using your example. Added the example to Grobit-service.md.

Also refined Grobit-service.md with consistent command highlighting.

kermitt2 · 2020-03-09T14:39:49Z

Thanks a lot Oliver for all the improvements you introduced and the feature contribution!

koppor referenced this pull request in NikodemKch/grobid Jan 8, 2020

Change citation processing service to return the result in bibtex for…

e89810b

…mat rather that tei

koppor requested a review from kermitt2 February 19, 2020 09:57

koppor mentioned this pull request Feb 20, 2020

Ensure correct Accept header for Grobid JabRef/jabref-koppor#406

Closed

3 tasks

Add support for application/x-bibtex type

271904d

koppor added 3 commits March 8, 2020 16:37

Slight code improvements

18d4a31

- Add some exception logging - Use == for enums

Add Gradle plugin for printing beautiful logs on the console while ru…

d023bf8

…nning tests https://github.com/radarsh/gradle-test-logger-plugin

Refine grobid-service/README.md

02240d3

Add support for BibTeX for analyzing PDFs (/api/processReferences)

86ebdfe

koppor added 3 commits March 8, 2020 20:32

Update heuristics for determing entry type

2c8cf03

Change BibTeX to wrap the values in {}

b13c357

Add new functionality to Grobid-service.md (and copy it into some pla…

35e740a

…ces into the Java code)

kermitt2 requested changes Mar 8, 2020

View reviewed changes

grobid-core/src/main/java/org/grobid/core/data/BiblioItem.java Outdated Show resolved Hide resolved

koppor added 3 commits March 9, 2020 06:25

Fix null author (and space before author =)

3662d36

Try "console" instead of "bash" in markdown

5dadee0

Add link to PDF.js homepage

0b86f3f

kermitt2 approved these changes Mar 9, 2020

View reviewed changes

kermitt2 merged commit adeca65 into kermitt2:master Mar 9, 2020

koppor deleted the add-bibtex-accept branch March 9, 2020 19:12

koppor mentioned this pull request Mar 23, 2020

Use GROBID for extraction of metadata from PDFs JabRef/jabref#6158

Closed

kermitt2 mentioned this pull request Aug 13, 2020

BibTeX output in dockerized Grobid server? #165

Closed

koppor mentioned this pull request Jul 21, 2021

Accept application/x-bibtex for processHeaderDocument #800

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for application/x-bibtex type #532

Add support for application/x-bibtex type #532

koppor commented Jan 8, 2020

coveralls commented Jan 9, 2020 •

edited

Loading

kermitt2 commented Feb 29, 2020

koppor commented Mar 6, 2020

koppor commented Mar 6, 2020 •

edited

Loading

koppor commented Mar 8, 2020

kermitt2 commented Mar 8, 2020

koppor commented Mar 8, 2020

koppor commented Mar 8, 2020

kermitt2 commented Mar 8, 2020

koppor commented Mar 8, 2020

kermitt2 commented Mar 8, 2020

kermitt2 commented Mar 8, 2020

koppor commented Mar 9, 2020

kermitt2 commented Mar 9, 2020

Add support for application/x-bibtex type #532

Add support for application/x-bibtex type #532

Conversation

koppor commented Jan 8, 2020

coveralls commented Jan 9, 2020 • edited Loading

kermitt2 commented Feb 29, 2020

koppor commented Mar 6, 2020

koppor commented Mar 6, 2020 • edited Loading

koppor commented Mar 8, 2020

kermitt2 commented Mar 8, 2020

koppor commented Mar 8, 2020

koppor commented Mar 8, 2020

kermitt2 commented Mar 8, 2020

koppor commented Mar 8, 2020

kermitt2 commented Mar 8, 2020

kermitt2 commented Mar 8, 2020

koppor commented Mar 9, 2020

kermitt2 commented Mar 9, 2020

coveralls commented Jan 9, 2020 •

edited

Loading

koppor commented Mar 6, 2020 •

edited

Loading