Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to parse new references from plain text using GROBID service [solving #4826] #5614

Merged
merged 9 commits into from
Feb 20, 2020

Conversation

NikodemKch
Copy link
Contributor

@NikodemKch NikodemKch commented Nov 15, 2019

This PR should solve #4826.
This JabRef extension is developed as part of a software engineering course.
Even though we write this feature mainly for our university course, we are willing to adjust our feature, so that JabRef can benefit from it.

The feature is now ready for review. It reintroduces the possibility to extract references from plain text using a custom GROBID server. This is implemented via a new SearchBasedFetcher.

One could work some more on the GROBID server (See : NikodemKch/grobid@e89810b), but sadly we do not have time for that.


  • Change in CHANGELOG.md described
  • Tests created for changes
  • Manually tested changed features in running JabRef
  • Screenshots added in PR description (for bigger UI changes)
  • Ensured that the git commit message is a good one

Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for working on this. This is a long awaited feature.

Working on such a huge code base with a community where the developers are working more than five years on the product is hard. Thank you for taking the challenge.

We all had hard discussions within the team on user experience, code design, ... Now you get all these discussions (results) in a shorter time. Please take it as challenge to level-up your coding skills. Both in reading existing code and writing new code.

That said, some general comments:

JabRef 5.0 tries to work with less dialogs. Thus, I propose to remove following dialogs: (Refs MockUp.pdf)

  • Remove "Diplay Success Dialog". Just do a notificaton. Refs Fix 5555 status popups #5560.
  • Remove step "Display BibEntry". This is done by the JabRef EntryEditor. Just select the found entries. If you find that uncomfortable, add a feature to add them to the group "new". (I assume that this feature does not yet exist. Should be a one-liner)
  • Remove "Display Error" --> not able to parse should be logged (maybe popup). Should be automatically handled in the fetcher architecture (?)

Please adapt the use cases accordingly.

I would Add /P50/: There is fetching logic in JabRef. Such as org.jabref.logic.importer.SearchBasedParserFetcher (https://github.com/JabRef/jabref/blob/master/src/main/java/org/jabref/logic/importer/SearchBasedFetcher.java). That should be the interface the functionality builds on. (Especially List<BibEntry> performSearch(String query) throws FetcherException. One can all exception handling is done in the "framework" Example: https://docs.jabref.org/import-export/import-using-online-bibliographic-database/gvk).

Can you add a rendered version of "ProjectPlan.gan" or add a hint to open the "sweng" project? - I am reading it with GitHub at https://github.com/NikodemKch/jabref-1/tree/milestone1/docs/sweng.

There is no need to touch the submodules (just do not commit these changes - or do "git submodule update")

Please update the existing feature:

grafik

(Maybe also put a button in the menu bar
grafik
)

grafik

(Maybe switch the two buttons in the existing implementation - the go-on-button should always be right. And NEVER label "Cancel". Label it "Return to Library" so that the user knows what happens)

Result (opened in the entry editor)

grafik

You see, this is not usable. The code at https://github.com/NikodemKch/jabref-1/blob/develop/src/main/java/org/jabref/gui/bibtexextractor/BibtexExtractor.java is wrong. Moreover, it is placed in the wrong module (gui is the wrong package; it should be logic). Think, we overlooked it in the review of #4985 - and the continuation at #5206

These dialogues can be a good start.

My recommendation would be to look at org.jabref.logic.importer.fetcher.DBLPFetcher how a searched-based fetcher is implemented. A team should implement it. Another team should implement test cases.

Side comment: Did you hear about remote mob programming? Maybe, this helps to work on something. We also could do such a session the next days. (Maybe 30 mins).

.idea/runConfigurations/JabRef_Main.xml Outdated Show resolved Hide resolved
build.gradle Outdated Show resolved Hide resolved
build.gradle Outdated Show resolved Hide resolved
build.gradle Outdated Show resolved Hide resolved
docs/sweng/Glossary.md Outdated Show resolved Hide resolved
src/main/java/org/jabref/gui/actions/StandardActions.java Outdated Show resolved Hide resolved
src/main/java/org/jabref/logic/net/HttpPostService.java Outdated Show resolved Hide resolved
@koppor
Copy link
Member

koppor commented Nov 18, 2019

Note: In #5628 I tried to bring-back the old menu item (as it exists in the 4.x versions of JabRef)

@zGatsu
Copy link
Contributor

zGatsu commented Nov 18, 2019

Quick statement about the points you mentioned:

  • The documentation you read is not up to date especially in terms of the dialogue-popups. On branch milestone3 we already reused the existing pop-up window and added the entries automatically after parsing. Moreover, since the user might want to add multiple entries at once, we inserted a checkbox, which enables the user to create a new library tab, so that possibly wrong parsed references are not immediately in his (the current) library.
    Screenshot:
    Ceckbox
    (We will change the buttons soon).

  • We changed our mind and do not intend to implement the conversion function (.tei to .bib) by ourselves anymore. (After seeing this method and this method.)
    The reference text is parsed by the grobid server which already has the needed function. We just change the function to return the processed citation in bibtex format rather than tei format. (Here) That's why we need a custom GROBID server. (We will link this server in this PR)

  • We already implemented a checkbox to use a custom grobid server ip( preferences -> advanced tab), if unchecked the default value is selected. Probably we should set that to http://cloud.science-miner.com/grobid/ , but the problem is, that this server seems to be constantly overwhelmed by the number of incoming requests. That's why we hosted the server on localhost for testing purposes. You can find a manual how to start that server here. (It uses gradle.)
    Screenshot:
    grobid

  • We will try to change our package structure, button placing as well as used dependencies according to your suggestions this week. We will also try to update the documentation.

@tobiasdiez
Copy link
Member

Concerning:

(Maybe switch the two buttons in the existing implementation - the go-on-button should always be right. And NEVER label "Cancel". Label it "Return to Library" so that the user knows what happens)

That depends on the OS. You can add the correct type and then JavaFX will put them in the right order automatically. (And "Cancel" is pretty ok)

@koppor
Copy link
Member

koppor commented Nov 18, 2019

Please merge #5628 into your branch. Then I will close that PR.

@koppor
Copy link
Member

koppor commented Nov 18, 2019

👍 - I would make the checkbox an additional button.

"Insert into current library" and "Insert into new library"

NikodemKch pushed a commit to NikodemKch/jabref-1 that referenced this pull request Nov 18, 2019
NikodemKch pushed a commit to NikodemKch/jabref-1 that referenced this pull request Nov 18, 2019
@NikodemKch
Copy link
Contributor Author

Here is the link to the custom GROBID server: https://github.com/NikodemKch/grobid

@tobiasdiez
Copy link
Member

@koppor do we have the resources to host the grobid server? I don't think many people have the knowledge to install and setup gropid on localhost...

@koppor
Copy link
Member

koppor commented Nov 19, 2019

@tobiasdiez There are options. Maybe, we "just" limit the request rates, "just" limit the requests per user per hour or simply charge for it.

@koppor
Copy link
Member

koppor commented Nov 22, 2019

We will check options how to host and maybe host some public service in beta.

@guenesaydin
Copy link
Contributor

We finished our work for this week. (except unit-test and documentation)
We added a new button in the toolbar:
2019-11-22

After pressing this button (or the corresponding context menu button) this window appears:
2019-11-22 (2)

While waiting for the server response, the following dialog is displayed:
2019-11-22 (3)

After successfully parsing, the entries are created and one of them is displayed inside the EntryEditor:
2019-11-22 (5)

When Grobid fails to parse a String, the following message is displayed: (This is also the case, when some entries are parsed successfully and some are not.)
2019-11-22 (7)

The user must separate the entries with double semicolon (;;) (we had no better idea).

The still open to do's are:

  • Write the unit tests
  • Update the documentation
  • Add localization entries
  • Work on the GROBID server

Also we intergrated our code better into the existing code base. It would be nice to get some feedback on that.

NOTE: The GROBID server does not work right with Java 13. We used Java 11 for the server but recommended is Java 8 (As stated here)(The server had some NullPointer exceptions for no reason sometimes).

@obsluk
Copy link
Contributor

obsluk commented Nov 25, 2019

Unit tests have now been implemented.
The documentation was updated to match the current development status and to track changes in our timeline and planning.

@LinusDietz
Copy link
Member

Hey @NikodemKch, @obsluk00, @marcelluethi!
We want to help you by deploying a grobid server. I hope you understand that it is important that the server is maintained by the @JabRef/developers team. I've already set up the default grobid server, but what I read is that you want to customize the server? Can you point me to a repo from which I can deploy you custom server?

@NikodemKch
Copy link
Contributor Author

Hey @LinusDietz
The repo is here.
The only difference to the original server is the following: NikodemKch/grobid@e89810b
We still have to look at the requested changes.
Thank you for deploying the server.

@LinusDietz
Copy link
Member

Hey @LinusDietz
The repo is here.
The only difference to the original server is the following: NikodemKch/grobid@e89810b
We still have to look at the requested changes.
Thank you for deploying the server.

So, I do have the server running. It might be a bit slow, but feel free to test it and report your findings to me. (potentially via Slack, I've invited @NikodemKch )

http://grobid.cm.in.tum.de:8070/

@tobiasdiez
Copy link
Member

What's the current status here? Is this already ready for review, or do you need input on certain parts?

@NikodemKch
Copy link
Contributor Author

Today we will fix the requested changes and try to fix the Travis pipeline. Then, we will open the PR for review. Sadly, we will probably not find the time to change the GROBID server as requested, as the examination phase is kicking in (First exam on December 20th)...

@NikodemKch NikodemKch marked this pull request as ready for review December 6, 2019 09:57
@NikodemKch
Copy link
Contributor Author

NikodemKch commented Dec 6, 2019

This PR is now ready for review. Sadly the provided server (http://grobid.cm.in.tum.de:8070/) seems to not work properly (I cant access the page even in a browser, it says the server took too long to respond). The feature was tested with a locally hosted server.

@NikodemKch NikodemKch changed the title [WIP][solving #4826] Add option to parse new references from plain text using GROBID service [solving #4826] Add option to parse new references from plain text using GROBID service Dec 6, 2019
@LinusDietz
Copy link
Member

LinusDietz commented Dec 6, 2019

Sadly the provided server (http://grobid.cm.in.tum.de:8070/) seems to not work properly (I cant access the page even in a browser, it says the server took too long to respond).

Sorry! Indeed the server is running, but the port in question is unreachable from the internet. I'll have a look.

@NikodemKch
Copy link
Contributor Author

Thanks a lot for the very good revision. I've only a couple of minor remarks left before we can merge. Would be awesome if you could address them in a timely manner so that your PR still makes it into the 5.0 release in two weeks.

@koppor is it possible to create the domain grobid.jabref.org as an alias for @LinusDietz grobid server? In this way we can change the server in the future without changing the url and thus without the need of a new release.

I have resolved all these issues and am re-requesting your review.

Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small code nitpicks. Is it possible to fix it soon?

* Implements an API to a GROBID server, as described at
* https://grobid.readthedocs.io/en/latest/Grobid-service/#grobid-web-services
* <p>
* Note: Currently a custom GROBID server is used...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ensure that you send an Accept header here (can be done in a follow-up pull request)

@koppor
Copy link
Member

koppor commented Feb 19, 2020

grafik

grafik

Pasting

Kolb, S., Wirtz G.: Towards Application Portability in Platform as a Service
Proceedings of the 8th IEEE International Symposium on Service-Oriented System Engineering (SOSE), Oxford, United Kingdom, April 7 - 10, 2014.

from https://www.uni-bamberg.de/pi/team/kolb-stefan/

grafik

Result:

@Misc{Kolb,
  author = {S Kolb and G Wirtz},
  title  = {Towards Application Portability in Platform as a Service},
}

Result is so, so. However, user experience is OK. The only thing is the textbox. Can it be soft-wrapped?

public class GrobidCitationFetcher implements SearchBasedFetcher {

private static final Logger LOGGER = LoggerFactory.getLogger(GrobidCitationFetcher.class);
private static final String GROBID_URL = "http://grobid.cm.in.tum.de:8070";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change to http://grobid.jabref.org:8070

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This address is not working for some reason...

@obsluk
Copy link
Contributor

obsluk commented Feb 19, 2020

grafik

grafik

Pasting

Kolb, S., Wirtz G.: Towards Application Portability in Platform as a Service
Proceedings of the 8th IEEE International Symposium on Service-Oriented System Engineering (SOSE), Oxford, United Kingdom, April 7 - 10, 2014.

from https://www.uni-bamberg.de/pi/team/kolb-stefan/

grafik

Result:

@Misc{Kolb,
  author = {S Kolb and G Wirtz},
  title  = {Towards Application Portability in Platform as a Service},
}

Result is so, so. However, user experience is OK. The only thing is the textbox. Can it be soft-wrapped?

We made the conscious design choice to go without softwrapping user input to improve quality of user experience when passing multiple entries.
But changing it isn't a problem and we should be able implement softwrapping together with the other requested changes until the weekend.

@NikodemKch
Copy link
Contributor Author

NikodemKch commented Feb 19, 2020

I see the problem. We decided to split entries at line breaks (regex: "[\\r\\n]+")
Without the line break you copied over from the website, the result is quite better:

@Article{KolbJune27July22015,
  author    = {S Kolb and J Lenhard and G Wirtz},
  year      = {June 27 - July 2, 2015},
  address   = {New York, NY, USA},
  booktitle = {Application Migration Effort in the Cloud - The Case of Cloud Platforms Proceedings of the 8th IEEE International Conference on Cloud Computing (CLOUD)},
}

Softwrap makes sense, since it (mentally) prevents the user from adding more breaks after he pasted his references.

We thought using line breaks would be most convenient, but we can easily change this to double line breaks or even something else, so what do you think? @koppor

@tobiasdiez
Copy link
Member

As the example shows it would be indeed good to change the item separation to two lines (so that one empty line needs to be between the entries). Could you also add this as a short comment in the prompt text "Please enter the plain text references ..." to make it easier for users to discover the feature. Thanks

NikodemKch and others added 3 commits February 19, 2020 21:34
Co-Authored-By: obsluk00 <obsluk00@users.noreply.github.com>
Some minor imrovements and switch to softwrap Co-Authored-By: obsluk0…
@NikodemKch
Copy link
Contributor Author

NikodemKch commented Feb 19, 2020

Ready for review again.
See comments in open issues above.

Also the feature now separates at double empty lines only.
And this is displayed to the user. (Since the commit below)

Copy link
Member

@tobiasdiez tobiasdiez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect! Thanks a lot for the quick follow-up!

@koppor
Copy link
Member

koppor commented Feb 20, 2020

Think, we are good to go.

For a follow up, the Accept header should be set. I created a follow-up issue at JabRef#406.

Thank you so much for working on this issue and keeping up the work after the exam phase! 🎉

@koppor koppor merged commit 5fa1dcf into JabRef:master Feb 20, 2020
@koppor koppor deleted the develop branch February 20, 2020 12:45
@tobiasdiez
Copy link
Member

@koppor The address is still pointing to tum and your alias is not working: https://github.com/JabRef/jabref/pull/5614/files#r381558452. Would be nice to fix this before the release.

@zGatsu
Copy link
Contributor

zGatsu commented Feb 20, 2020

We also want to thank you very much! It was a great experience for the whole team to participate on your project. It was very helpful that you mentioned so many import details that we didn't see. That helped us improving our skills to become a good software engineer! :-) 🎉

@koppor
Copy link
Member

koppor commented Feb 20, 2020

@tobiasdiez Added it to JabRef#406

@tobiasdiez
Copy link
Member

@koppor good, but can we also fix that before we release this weekend?!

@NikodemKch
Copy link
Contributor Author

Thank you for your support and feedback during the development of this feature!

I got response for my feature request at Grobid, asking for a service call that can process multiple requests at once. kermitt2/grobid#540 (comment)

Indeed, the second use case is not what we are looking for.
As I understand, I as caller would have the choice between sending everything in one request with little slower processing, or creating and collecting multiple async. threads but slightly faster processing.

So what do you think @koppor @tobiasdiez ?

Siedlerchr added a commit that referenced this pull request Mar 6, 2020
* upstream/master:
  Add option to parse new references from plain text using GROBID… (#5614)
  update jlink plugin and gradle to 6.2 (#5964)
  Remove background command line window (#5965)
koppor pushed a commit that referenced this pull request Oct 1, 2021
3b00357 Update urad-rs-za-makroekonomske-analize-in-razvoj.csl (#5639)
fcf6625 updated styles (#5638)
0df0633 Update universite-du-quebec-a-montreal-etudes-litteraires-et-semiolog… (#5614)
eaddf8e Human Molecular Genetics (#5635)
0afd7fb Update harvard-cardiff-university.csl (#5623)
f424672 Switch ISQ to Chicago author-date
ccb7184 Update chicago-author-date.csl (#5605)
a408957 SBL: implementation of book reviews (#5613)
fbbe7b3 Update harvard-swinburne-university-of-technology.csl to meet published spec (#5627)
8d7a0d8 Add UNESCO IIEP style (#5631)

git-subtree-dir: buildres/csl/csl-styles
git-subtree-split: 3b00357
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants