Error on Chinese Characters text fetching!!! any idea to resolve it? #28

marcusau · 2021-03-22T10:36:31Z

Hi , I am using the module to fetch texts (combination of both English and Chinese) from the pdf files, with the following error:

from pyxpdf import Document, Page, Config
from pyxpdf.xpdf import TextOutput, TextControl, page_iterator
with open(pdf_file, 'rb') as fp:
    doc = Document(fp)
for page in doc:
    res_box =page.find_text('Cornerstone', search_box=[0, 0, 400, 400], case_sensitive=True)
    if res_box:
        print(page.label,res_box)

results:

Syntax Error: Unknown character collection 'Adobe-CNS1'
278 (406.8096, 94.85200000000002, 465.46160000000003, 104.47700000000002)
Syntax Error: Unknown character collection 'Adobe-CNS1'
279 (69.6101, 103.50040000000014, 106.93410000000002, 109.62540000000014)
280 (230.7095, 348.65500000000003, 284.4775, 358.28000000000003)
Syntax Error: Unknown character collection 'Adobe-CNS1'
Syntax Error: Unknown character collection 'Adobe-CNS1'
Syntax Error: Unknown character collection 'Adobe-CNS1'

The text was updated successfully, but these errors were encountered:

mlove4u · 2021-10-01T06:42:44Z

@marcusau
try to instal this package:

pip install pyxpdf_data

The CJK encoding files is needed.
See: https://www.xpdfreader.com/download.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error on Chinese Characters text fetching!!! any idea to resolve it? #28

Error on Chinese Characters text fetching!!! any idea to resolve it? #28

marcusau commented Mar 22, 2021

mlove4u commented Oct 1, 2021

Error on Chinese Characters text fetching!!! any idea to resolve it? #28

Error on Chinese Characters text fetching!!! any idea to resolve it? #28

Comments

marcusau commented Mar 22, 2021

mlove4u commented Oct 1, 2021