Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fulltext fetcher for IACR eprints #9651

Merged
merged 8 commits into from
Mar 6, 2023
Merged

Conversation

SECtim
Copy link
Contributor

@SECtim SECtim commented Mar 6, 2023

Adds fulltext fetching for IACR eprint PDFs.

My Java skills are a bit rusty, and while the code works, I'm happy to make additional changes if there are more elegant ways to do things.

I am unsure about whether to add anything to the developer docs on fetchers, as they seem to only cover a subset of all fetchers anyway (so far, the IACR fetcher isn't mentioned at all).

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
    • (not applicable)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.
    • There does not seem to be fetcher-specific documentation for users (regarding fulltext fetching)

@SECtim
Copy link
Contributor Author

SECtim commented Mar 6, 2023

The failing fetcher tests seem unrelated to my changes. Will take a closer look later.

Siedlerchr
Siedlerchr previously approved these changes Mar 6, 2023
Copy link
Member

@Siedlerchr Siedlerchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

codewise lgtm! Thanks for the addition.

if (urlField.isPresent()) {
String descriptiveHtml = getHtml(urlField.get());
String startOfFulltextLink = "<a class=\"btn btn-sm btn-outline-dark\"";
String fulltextLinkAsInHtml = getRequiredValueBetween(startOfFulltextLink, ".pdf", descriptiveHtml);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, one could use JSOUP here https://jsoup.org/ (see e.g. SemanticScholar) to operate directly on the HTML dom elements, but as this getValueBeteween is already used here it's fine

@Siedlerchr
Copy link
Member

@SECtim you can ignore the unrelated fetcher tests. Sometimes they fail just on GitHub if they are executed to often or Github IPs are blocked

@Siedlerchr Siedlerchr added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Mar 6, 2023
@Siedlerchr
Copy link
Member

Just did a small simplification of the tests, it's easier to directly compare the Optionals

@Siedlerchr Siedlerchr merged commit 85ab410 into JabRef:main Mar 6, 2023
Siedlerchr added a commit that referenced this pull request Mar 14, 2023
…rg.mariadb.jdbc-mariadb-java-client-3.1.0

* upstream/main: (357 commits)
  Fix syntax
  Add experimental Fetcher for Bibliotheksverbund Bayern with MarcXML parser (#9641)
  Update guidelines-for-setting-up-a-local-workspace.md
  Update guidelines-for-setting-up-a-local-workspace.md
  Bump org.tinylog:slf4j-tinylog from 2.6.0 to 2.6.1 (#9665)
  Bump apple-actions/import-codesign-certs from 1 to 2 (#9662)
  Bump com.puppycrawl.tools:checkstyle from 10.8.0 to 10.8.1 (#9661)
  Bump gittools/actions from 0.9.15 to 0.10.2 (#9663)
  Bump hmarr/auto-approve-action from 3.1.0 to 3.2.0 (#9664)
  Bump io.github.classgraph:classgraph from 4.8.156 to 4.8.157 (#9666)
  Bump org.tinylog:tinylog-api from 2.6.0 to 2.6.1 (#9667)
  Add option to open arks in the browser from an ark identifier (#9601)
  remove "jdk 19 does not work" (#9658)
  Fulltext fetcher for IACR eprints (#9651)
  Observable Preferences S (#9619)
  Issue 9646: Right-click context menu "Attach file from URL" (#9648)
  Improve the INSPIREFetcher in "Update with bibliographic information from the web" (#9645)
  Bump appleboy/ssh-action from 0.1.7 to 0.1.8 (#9653)
  Bump com.fasterxml.jackson.datatype:jackson-datatype-jsr310 (#9656)
  Bump com.puppycrawl.tools:checkstyle from 10.7.0 to 10.8.0 (#9655)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants