Incorrect parsing of filenames in file specifications #152

davidtr1037 · 2018-06-09T15:43:20Z

There is an issue with parsing filenames of file specifications:
For example, when I run:

dumppdf.py -E /tmp/ sample.pdf

I get:

Traceback (most recent call last):
  File "./build/scripts-2.7/dumppdf.py", line 275, in <module>
    if __name__ == '__main__': sys.exit(main(sys.argv))
  File "./build/scripts-2.7/dumppdf.py", line 272, in main
    dumpall=dumpall, codec=codec, extractdir=extractdir)
  File "./build/scripts-2.7/dumppdf.py", line 200, in extractembedded
    extract1(obj)
  File "./build/scripts-2.7/dumppdf.py", line 173, in extract1
    filename = os.path.basename(obj['UF'] or obj['F'])
KeyError: 'UF'

Possible fix:

filename = obj.get('UF') or obj.get('F') or <some_default_string>

The pdf is attached:
7c127eb6889074efbfac63d35ba0b69cfae22d56bfa2755ef0e925b6c032c4b0.pdf

The text was updated successfully, but these errors were encountered:

pietermarsman · 2019-11-17T16:11:41Z

Thanks for your issue! I've created #338 to fix this. Do you have more pdf's with embedded images? Could you check if the PR code works for those?

pietermarsman added the type: bug label Oct 13, 2019

pietermarsman mentioned this issue Nov 17, 2019

Fix getting filename when extracting embedded files #338

Merged

5 tasks

pietermarsman closed this as completed in #338 Jan 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect parsing of filenames in file specifications #152

Incorrect parsing of filenames in file specifications #152

davidtr1037 commented Jun 9, 2018 •

edited

Loading

pietermarsman commented Nov 17, 2019

Incorrect parsing of filenames in file specifications #152

Incorrect parsing of filenames in file specifications #152

Comments

davidtr1037 commented Jun 9, 2018 • edited Loading

pietermarsman commented Nov 17, 2019

davidtr1037 commented Jun 9, 2018 •

edited

Loading