-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to parse new references from plain text using GROBID service [solving #4826] #5614
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for working on this. This is a long awaited feature.
Working on such a huge code base with a community where the developers are working more than five years on the product is hard. Thank you for taking the challenge.
We all had hard discussions within the team on user experience, code design, ... Now you get all these discussions (results) in a shorter time. Please take it as challenge to level-up your coding skills. Both in reading existing code and writing new code.
That said, some general comments:
JabRef 5.0 tries to work with less dialogs. Thus, I propose to remove following dialogs: (Refs MockUp.pdf)
- Remove "Diplay Success Dialog". Just do a notificaton. Refs Fix 5555 status popups #5560.
- Remove step "Display BibEntry". This is done by the JabRef EntryEditor. Just select the found entries. If you find that uncomfortable, add a feature to add them to the group "new". (I assume that this feature does not yet exist. Should be a one-liner)
- Remove "Display Error" --> not able to parse should be logged (maybe popup). Should be automatically handled in the fetcher architecture (?)
Please adapt the use cases accordingly.
I would Add /P50/: There is fetching logic in JabRef. Such as org.jabref.logic.importer.SearchBasedParserFetcher (https://github.com/JabRef/jabref/blob/master/src/main/java/org/jabref/logic/importer/SearchBasedFetcher.java). That should be the interface the functionality builds on. (Especially List<BibEntry> performSearch(String query) throws FetcherException
. One can all exception handling is done in the "framework" Example: https://docs.jabref.org/import-export/import-using-online-bibliographic-database/gvk).
Can you add a rendered version of "ProjectPlan.gan" or add a hint to open the "sweng" project? - I am reading it with GitHub at https://github.com/NikodemKch/jabref-1/tree/milestone1/docs/sweng.
There is no need to touch the submodules (just do not commit these changes - or do "git submodule update")
Please update the existing feature:
(Maybe also put a button in the menu bar
)
(Maybe switch the two buttons in the existing implementation - the go-on-button should always be right. And NEVER label "Cancel". Label it "Return to Library" so that the user knows what happens)
Result (opened in the entry editor)
You see, this is not usable. The code at https://github.com/NikodemKch/jabref-1/blob/develop/src/main/java/org/jabref/gui/bibtexextractor/BibtexExtractor.java is wrong. Moreover, it is placed in the wrong module (gui
is the wrong package; it should be logic
). Think, we overlooked it in the review of #4985 - and the continuation at #5206
These dialogues can be a good start.
My recommendation would be to look at org.jabref.logic.importer.fetcher.DBLPFetcher
how a searched-based fetcher is implemented. A team should implement it. Another team should implement test cases.
Side comment: Did you hear about remote mob programming? Maybe, this helps to work on something. We also could do such a session the next days. (Maybe 30 mins).
src/main/java/org/jabref/logic/plaintextparser/ParserPipeline.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jabref/logic/plaintextparser/ParserPipeline.java
Outdated
Show resolved
Hide resolved
src/test/java/org/jabref/logic/plaintextparser/ParserPipelineTest.java
Outdated
Show resolved
Hide resolved
Note: In #5628 I tried to bring-back the old menu item (as it exists in the 4.x versions of JabRef) |
Quick statement about the points you mentioned:
|
Concerning:
That depends on the OS. You can add the correct type and then JavaFX will put them in the right order automatically. (And "Cancel" is pretty ok) |
Please merge #5628 into your branch. Then I will close that PR. |
👍 - I would make the checkbox an additional button. "Insert into current library" and "Insert into new library" |
Here is the link to the custom GROBID server: https://github.com/NikodemKch/grobid |
@koppor do we have the resources to host the grobid server? I don't think many people have the knowledge to install and setup gropid on localhost... |
@tobiasdiez There are options. Maybe, we "just" limit the request rates, "just" limit the requests per user per hour or simply charge for it. |
src/main/java/org/jabref/logic/importer/fetcher/GrobidCitationFetcher.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jabref/logic/importer/fetcher/GrobidCitationFetcher.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jabref/logic/importer/fetcher/GrobidCitationFetcher.java
Outdated
Show resolved
Hide resolved
We will check options how to host and maybe host some public service in beta. |
We finished our work for this week. (except unit-test and documentation) After pressing this button (or the corresponding context menu button) this window appears: While waiting for the server response, the following dialog is displayed: After successfully parsing, the entries are created and one of them is displayed inside the EntryEditor: When Grobid fails to parse a String, the following message is displayed: (This is also the case, when some entries are parsed successfully and some are not.) The user must separate the entries with double semicolon (;;) (we had no better idea). The still open to do's are:
Also we intergrated our code better into the existing code base. It would be nice to get some feedback on that. NOTE: The GROBID server does not work right with Java 13. We used Java 11 for the server but recommended is Java 8 (As stated here)(The server had some NullPointer exceptions for no reason sometimes). |
Unit tests have now been implemented. |
Hey @NikodemKch, @obsluk00, @marcelluethi! |
Hey @LinusDietz |
So, I do have the server running. It might be a bit slow, but feel free to test it and report your findings to me. (potentially via Slack, I've invited @NikodemKch ) |
What's the current status here? Is this already ready for review, or do you need input on certain parts? |
Today we will fix the requested changes and try to fix the Travis pipeline. Then, we will open the PR for review. Sadly, we will probably not find the time to change the GROBID server as requested, as the examination phase is kicking in (First exam on December 20th)... |
This PR is now ready for review. Sadly the provided server (http://grobid.cm.in.tum.de:8070/) seems to not work properly (I cant access the page even in a browser, it says the server took too long to respond). The feature was tested with a locally hosted server. |
Sorry! Indeed the server is running, but the port in question is unreachable from the internet. I'll have a look. |
I have resolved all these issues and am re-requesting your review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small code nitpicks. Is it possible to fix it soon?
src/main/java/org/jabref/logic/importer/fetcher/GrobidCitationFetcher.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jabref/logic/importer/fetcher/GrobidCitationFetcher.java
Show resolved
Hide resolved
src/main/java/org/jabref/logic/importer/fetcher/GrobidCitationFetcher.java
Outdated
Show resolved
Hide resolved
src/main/java/org/jabref/logic/importer/fetcher/GrobidCitationFetcher.java
Outdated
Show resolved
Hide resolved
* Implements an API to a GROBID server, as described at | ||
* https://grobid.readthedocs.io/en/latest/Grobid-service/#grobid-web-services | ||
* <p> | ||
* Note: Currently a custom GROBID server is used... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please ensure that you send an Accept
header here (can be done in a follow-up pull request)
src/test/java/org/jabref/logic/importer/util/GrobidServiceTest.java
Outdated
Show resolved
Hide resolved
Pasting
from https://www.uni-bamberg.de/pi/team/kolb-stefan/ Result: @Misc{Kolb,
author = {S Kolb and G Wirtz},
title = {Towards Application Portability in Platform as a Service},
} Result is so, so. However, user experience is OK. The only thing is the textbox. Can it be soft-wrapped? |
public class GrobidCitationFetcher implements SearchBasedFetcher { | ||
|
||
private static final Logger LOGGER = LoggerFactory.getLogger(GrobidCitationFetcher.class); | ||
private static final String GROBID_URL = "http://grobid.cm.in.tum.de:8070"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change to http://grobid.jabref.org:8070
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This address is not working for some reason...
We made the conscious design choice to go without softwrapping user input to improve quality of user experience when passing multiple entries. |
I see the problem. We decided to split entries at line breaks (regex: "[\\r\\n]+")
Softwrap makes sense, since it (mentally) prevents the user from adding more breaks after he pasted his references. We thought using line breaks would be most convenient, but we can easily change this to double line breaks or even something else, so what do you think? @koppor |
As the example shows it would be indeed good to change the item separation to two lines (so that one empty line needs to be between the entries). Could you also add this as a short comment in the prompt text "Please enter the plain text references ..." to make it easier for users to discover the feature. Thanks |
Co-Authored-By: obsluk00 <obsluk00@users.noreply.github.com>
Some minor imrovements and switch to softwrap Co-Authored-By: obsluk0…
Ready for review again. Also the feature now separates at double empty lines only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect! Thanks a lot for the quick follow-up!
Think, we are good to go. For a follow up, the Thank you so much for working on this issue and keeping up the work after the exam phase! 🎉 |
@koppor The address is still pointing to tum and your alias is not working: https://github.com/JabRef/jabref/pull/5614/files#r381558452. Would be nice to fix this before the release. |
We also want to thank you very much! It was a great experience for the whole team to participate on your project. It was very helpful that you mentioned so many import details that we didn't see. That helped us improving our skills to become a good software engineer! :-) 🎉 |
@tobiasdiez Added it to JabRef#406 |
@koppor good, but can we also fix that before we release this weekend?! |
Thank you for your support and feedback during the development of this feature! I got response for my feature request at Grobid, asking for a service call that can process multiple requests at once. kermitt2/grobid#540 (comment) Indeed, the second use case is not what we are looking for. So what do you think @koppor @tobiasdiez ? |
3b00357 Update urad-rs-za-makroekonomske-analize-in-razvoj.csl (#5639) fcf6625 updated styles (#5638) 0df0633 Update universite-du-quebec-a-montreal-etudes-litteraires-et-semiolog… (#5614) eaddf8e Human Molecular Genetics (#5635) 0afd7fb Update harvard-cardiff-university.csl (#5623) f424672 Switch ISQ to Chicago author-date ccb7184 Update chicago-author-date.csl (#5605) a408957 SBL: implementation of book reviews (#5613) fbbe7b3 Update harvard-swinburne-university-of-technology.csl to meet published spec (#5627) 8d7a0d8 Add UNESCO IIEP style (#5631) git-subtree-dir: buildres/csl/csl-styles git-subtree-split: 3b00357
This PR should solve #4826.
This JabRef extension is developed as part of a software engineering course.
Even though we write this feature mainly for our university course, we are willing to adjust our feature, so that JabRef can benefit from it.
The feature is now ready for review. It reintroduces the possibility to extract references from plain text using a custom GROBID server. This is implemented via a new SearchBasedFetcher.
One could work some more on the GROBID server (See : NikodemKch/grobid@e89810b), but sadly we do not have time for that.