-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement more pdf importers #7947
Conversation
Implemented an Importer that querries Grobid for metadata of a pdf. The necessary Grobid functionality (retrieving BibTeX for a pdf) is not yet available in Grobid, but we opened a PR that implements it (kermitt2/grobid#800).
It's no longer necessary to set the POST data by bytes as we use JSoup for that.
Grobid cannot predict a citationkey
Co-authored-by: Christoph <siedlerkiller@gmail.com>
Users can perform a PDF import on already imported pdf's to improve the quality of the entry
When importing, try importers that can tell if they are suitable for a certain file format or not. Some importers only check if a file is present, not if it in the correct format (isRecognizedFormat is always true if an existing file is given). They are used last. The List of importers now reflects that prioritization. It is not sorted by importer names anymore. The getter-methods getImportFormats and getImportFormatList still sort the List by name for the View.
you seem to have an error:
|
Wouldn't make this PR more complex now than it is. We can discuss this probably on JabCon. Let's just merge it and see after that. |
It's just changing a number here. Nothing complex about it. We just need to decide if we want that behaviour or not.
Adapted them to the new Grobid output. Should be fine now. |
* upstream/main: (110 commits) Extract PushTo names into model (#8005) Refactor processCitation in GrobidService to match processPdf (#8003) Improved progress indication for fulltext-index operations (#7981) Reordered Pdf-Importer priorities (#8001) Implement more pdf importers (#7947) Adding icon picker for group dialog issue#6142 (#7776) Fix possible NPE in exporter with empty charset (#7979) Fix icon color (#7994) Bump slf4j-api from 2.0.0-alpha2 to 2.0.0-alpha4 (#7991) Bump classgraph from 4.8.112 to 4.8.114 (#7990) Bump mariadb-java-client from 2.7.3 to 2.7.4 (#7992) Bump jsoup from 1.14.1 to 1.14.2 (#7993) New yaml issue template (#7983) [Bot] Update CSL styles (#7985) Reordered items in main table right-click menu (#7952) Fulltext Index: Only index local pdf files (#7980) Bump WyriHaximus/github-action-wait-for-status from 1.3 to 1.4 (#7973) Bump byte-buddy-parent from 1.11.9 to 1.11.12 (#7974) Bump classgraph from 4.8.110 to 4.8.112 (#7975) Bump checkstyle from 8.45 to 8.45.1 (#7978) ... # Conflicts: # src/main/java/module-info.java
This PR aims to implement more pdf importers.
Currently, pdfs can be imported using the PdfContentImporter that is tailored to some IEEE and Springer formats. We want to add:
The PdfMergeMetadataImporter will be used when users import PDFs into JabRef.
CHANGELOG.md
described in a way that is understandable for the average user (if applicable)