Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes needed to be able to compile pdfminer.six with Cython #142

Merged
merged 5 commits into from
Jul 14, 2018

Conversation

mawoqiw
Copy link
Contributor

@mawoqiw mawoqiw commented Apr 11, 2018

Per comment in the original project euske/pdfminer#140 compiling PDFMiner with Cython leads to substantial speed improvements. Submitted fixes are necessary to be able to compile pdfminer.six, otherwise compiler fails.

@timb07
Copy link
Contributor

timb07 commented Apr 16, 2018

I'm quite interested in this. I've made some other improvements to speed up processing for certain inputs (#133, #141), but my next target for speeding up is psparser.py, and I haven't found any obvious improvements to make there. However, compiling just that file with Cython does give a small speed increase.

@@ -15,6 +15,8 @@

import six #Python 2+3 compatibility

import unittest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this needed for? Was it included by mistake?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved in the main() and/or the main removed entirely and moved to some testing script outside of the main library code

setup.py Outdated
@@ -3,7 +3,7 @@

import pdfminer as package

requires = ['six', 'pycryptodome']
requires = ['six', 'pycryptodome', 'pygame']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, what is pygame needed for?

Copy link
Contributor Author

@mawoqiw mawoqiw Apr 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A function from unitest is called here without an import.
pygame is not listed among requirements but is called here and here

So far I haven't found much space for improvement in psparser.py, other than this, either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mawoqiw I think we should not add pygame as a dep... instead it should be removed. When was this introduced?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This main() should be removed entirely and moved to some testing script outside of the main library code and we cannot add pygame as a runtime dep

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right @pombredanne. pygame is indeed not required to use Cython, thanks.
With regards to main() function I would rather deal with it in a different merge request. Do you have any idea what is the purpose of that code?

@mawoqiw
Copy link
Contributor Author

mawoqiw commented Jun 20, 2018

One of the tests is using image.save method from that library. I guess it could be replaced by something else to get rid of that dependency.

@tataganesh tataganesh merged commit d437e5c into pdfminer:master Jul 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants