-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I don't understand how to interpret the result from getTextContent #8096
Comments
Even I need this info. I need to know the exact location of text on the pdf doc |
See https://github.com/mozilla/pdf.js/blob/master/examples/text-only/pdf2svg.js example comments. |
By pageLoaded, it doesn't mean the whole pdf has to be rendered, right? My use case is I want to get text from a range of page while I am on the first page of the pdf, for example. Will the example below work for me? thanks
|
Me too. Have you figured this out? I assume each of these elements in this example What do each of the array elements mean? |
Closing since the height and width calculation was wrong. This has been fixed in #10508. |
I am using pdf.js to do understand the text layout of pdf documents. However, I am having trouble understanding the information returned by
getTextContent
. Sometime it appears that I have to scale the height for each item by the vertical scale in the transform, i.e.,transform[3]
. Othertimes I don't. I have no idea how to determine when I have to and when I don't. Below are two examples from different pdf documents. In the first, I should not scale the height. In the second, I should. Does anyone know how I can figure this out?(this is also posted on stackoverlow)
The text was updated successfully, but these errors were encountered: