Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text extraction from PDF to annotation not working #6011

Closed
funnym0nk3y opened this issue Feb 24, 2020 · 5 comments
Closed

Text extraction from PDF to annotation not working #6011

funnym0nk3y opened this issue Feb 24, 2020 · 5 comments
Labels
status: stale status: waiting-for-feedback The submitter or other users need to provide more information about the issue

Comments

@funnym0nk3y
Copy link

JabRef version:
JabRef 5.0-beta.438--2020-02-20--5fa1dcf
Windows 10 10.0 amd64
Java 13.0.2

Hi there,

when using JabRef I found a bug in the annotations function. I marked several passages in my pdf-file and they were shown in the annotations tab accordingly. But unfortunately the text extracted from those annotations did not match the marked text. I tried copying the text directly from Adobe Reader DC via right click > copy text and pasting it to notepad, which works fine. There are characters missing within the whole marked area. Also there are passages included which are not marked.

Example:
While these results are excellent from an academic perspective, scaling up the overallcell device area is crucial for achieving practical utility for hybrid perovskite based thin-film solar cells.In this paper, we present a comprehensive study on the use of temperature-controlled doctor bladingtechnique for the growth of large island, crystalline perovskite thin-films. Specifically, we elucidate thephysical conditions such as substrate temperature, solution volume, and blade speed under ambientconditions that control the growth of large area perovskite thin-films with desired island size, thickness,uniformity and crystallinity. Using these doctor-bladed thin-films we fabricated devices of ∼1 cm2areain air that yielded an average efficiency of 7.32% with negligible hysteresis in the current-voltage scans.Further improvements in

In JabRef displayed:
ng up the overall rea is crucial for achieving practical utility for hybrid perovskite based thin-film solar cells.
r, we present a comprehensive study on the use of temperature-controlled doctor blading r the growth of large island

When exported via Acrobat Reader:
Specifically, we elucidate the
physical conditions such as substrate temperature, solution volume, and blade speed under ambient
conditions that control the growth of large area perovskite thin-films with desired island size, thickness,
uniformity and crystallinity

Regards,
funnym0nk3y

@Siedlerchr
Copy link
Member

Could you maybe attach or give us a link to the pdf? Is this the only pdf where you noticed that or do experience that with others as well?

@Siedlerchr Siedlerchr added the status: waiting-for-feedback The submitter or other users need to provide more information about the issue label Feb 25, 2020
@funnym0nk3y
Copy link
Author

I experienced this behavior with several PDFs.
Unfortunately I can't attach the PDF here because of copyright protection. But you can find the files here:
https://pubs.acs.org/doi/10.1021/la803646e
https://www.sciencedirect.com/science/article/pii/S2352940716300038?via%3Dihub
If you don't have access through your institution I'll try to find some with OpenAccess.

Besides that I noticed that there are difficulties with some characters like µ,° and sub-/superscript or stuff like 4,23 x 10^2. But that is just an annoyance.

@smihael
Copy link

smihael commented Mar 28, 2020

I can reproduce this with the following test.pdf file in a test.bib library.

I am using:
JabRef 5.0--2020-03-06--2e6f433
Linux 5.3.0-42-generic amd64
Java 13.0.2

PDF was generated using LibreOffice and annotations were added in Okular.

@github-actions
Copy link
Contributor

This issue will be closed in 7 days due to inactivity 💤 Please provide the requested information if the problem persists.

@funnym0nk3y
Copy link
Author

@smihael provided an example pdf, please update status

@github-actions github-actions bot closed this as completed May 6, 2020
koppor pushed a commit that referenced this issue Jul 1, 2022
3d3573c Update centre-de-recherche-sur-les-civilisations-de-l-asie-orientale.csl (#5988)
5de0fbe Update society-of-biblical-literature-fullnote-bibliography.csl (#5913)
04b6c7a Create revue-internationale-durbanisme.csl (#5974)
4a5bfe2 Update biological-reviews.csl (#6116)
957b2bc Update harvard-cite-them-right-no-et-al.csl (#6115)
e836a6c Update harvard-university-of-bath.csl (#6011)
b4a8dd7 Update and rename harvard-cite-them-right.csl to harvard-cite-them-ri… (#6113)
a198884 Update twentieth-century-music.csl (#6110)
81c1619 Update archaeonautica.csl (#5928)
fc46f1d Bump actions/cache from 2 to 3 (#6112)
fab57ed Bump actions/checkout from 2 to 3 (#6111)
519d594 [don't merge] chore: Included githubactions in the dependabot config (#6109)
a8aa898 Update universidade-estadual-de-alagoas-uneal-abnt.csl (#5915)
6191640 Update isnad-dipnotlu.csl (#5909)
d65a6ac Update isnad-metinici.csl (#5910)
830d337 Update technische-universitat-dresden-linguistik.csl (#6097)
81adc43 Update american-society-for-horticultural-science.csl (#6089)
b767623 Create south-african-law-journal.csl (#6092)
215e1e9 Create journal-of-lithic-studies.csl (#6080)
0740f8c Create eunomia-revista-en-cultura-de-la-legalidad.csl (#6095)
f93c809 Create endocrine-journal.csl (#6086)
3fdeb51 Revert "chore: Set permissions for GitHub actions (#6096)" (#6108)
35ebd1e chore: Set permissions for GitHub actions (#6096)
1cb8758 Create journal-fur-medienlinguistik (#6100)
f4b5f7f Update unified-style-sheet-for-linguistics.csl (#6098)
c3f856a Update advanced-materials.csl (#6103)
d1e7576 Bump diffy from 3.4.0 to 3.4.2 (#6107)
9e5e7ab Fix Dev Dynamics (#6099)
7234520 Add CSL style for the journal Developmental Dynamics (#6093)
ba8db05 Create independent style for vox-sanguinis.csl (#6085)
845dee0 Create meta.csl (#6088)
684bc3a Update universite-du-quebec-a-montreal.csl (#6087)
3602c18 Up-date & re-title pour-reussir/dionne (#6043)
0cc6e82 Fix Mainz Geography
cfc4cec Add DOI and fix printing author names in Population and Économie et statistique (#6079)
14e8b1d Update journal-of-neuroimaging.csl (#6084)
2c0e1f1 Update isnad-dipnotlu.csl (#6081)
02fdb9b Merge pull request #6082 from denismaier/patch-ube-muwi-note
9309378 removes default-locale

git-subtree-dir: buildres/csl/csl-styles
git-subtree-split: 3d3573c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: stale status: waiting-for-feedback The submitter or other users need to provide more information about the issue
Projects
None yet
Development

No branches or pull requests

3 participants