Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate using PyPy as default python compiler instead of CPython #1728

Open
david-ragazzi opened this issue Jan 7, 2015 · 10 comments
Open

Comments

@david-ragazzi
Copy link
Contributor

These days I was reading articles explanning differences between the several derivations of Python world (http://www.toptal.com/python/why-are-there-so-many-pythons, for example) and one of these projects catched my interest.

During my research I read several good things about PyPy, which is not an implementation of Python itself but a JIT (just-in-time) compiler of python code. What really interested me, is that it is able to compile much faster than CPython (the default Python compiler) without many changes in the current code (http://pypy.org/compat.html).

Just look this speed comparison:
http://speed.pypy.org/

According to the benchmarking, code compiled by the current PyPy version is able to run 6.90x faster than code compiled using the default compiler. @breznak also said in ML that his colleague reported that his text processing project was 10x faster using PyPy than CPython.

@scottpurdy suggested we could try it is with a simple loop that feeds an instance of the CLA classifier. You can then compare speeds for the Python and C++ implementations on both cpython and pypy.

Update

Just to avoid confusion: CPython is not Cython. cpython is the our well known Python default compiler (nupic uses the 2.7 version, for example) while cython (without the p) is a python-like language that use static typing among other features: http://www.reddit.com/r/Python/comments/23kz8o/cpython_vs_pypy_vs_cython/

@rhyolight rhyolight changed the title Use PyPy as default python compiler instead of CPython Investigate using PyPy as default python compiler instead of CPython Jan 7, 2015
@rhyolight
Copy link
Member

Just to be clear, this is an investigation. Closing this ticket will require some performance benchmarks comparing our current compilation vs PyPy compilation. Once benchmarks are posted, we'll continue discussion on the mailing list and decide the way forward.

I just don't want someone creating a PR that moves the whole repo to PyPy because it won't get merged until we've all discussed and agreed that it's the best thing.

@david-ragazzi
Copy link
Contributor Author

Ok. Agreed.

@breznak
Copy link
Member

breznak commented Jan 8, 2015

just a correction, pypy was 10x faster than python, not cython.

anyway, I think this could be an alternative, maybe not default resolution, so we can experiment, iron out possible incompatibilities and people can run whatever interpreter they prefer.

@david-ragazzi
Copy link
Contributor Author

just a correction, pypy was 10x faster than python, not cython.

Hi @breznak, project names in Python world are really confuse.. hehe.. Actually cpython is the our well known Python compiler (we use 2.7, for example) while cython (without the p) is a python-like language that use static typing among other features: http://www.reddit.com/r/Python/comments/23kz8o/cpython_vs_pypy_vs_cython/

In other words: pypy was 10x faster than (c)python, not cython.

All these confusion in names is because every project founder wants use the -ython suffix combined with a few letters!!

Once we talked about PyPy vs Cython, it seems that PyPy is faster than it in several applications, mostly in cases where long-running loops are frequent (https://groups.google.com/forum/#!topic/cython-users/OwAIcJwWH14), which is the case of nupic. The magic of PyPy is that it compile code that is frequently used. Futthermore its compiled code is well optimized.

@breznak
Copy link
Member

breznak commented Jan 8, 2015

@david-ragazzi they should get more creative with the names in python world 😄
I'm rushing to install pypy and try..

@breznak
Copy link
Member

breznak commented Jan 8, 2015

this will be not so easy as just switching..
my progress:

  • don't forget to install pypy-dev to avoid "Python.h not found"
  • pypy + numpy trouble: use pip install git+https://bitbucket.org/pypy/numpy.git
  • related to ABI compatible bindings for libffi nupic.core-legacy#305 : /home/mmm/env_pypy/site-packages/numpy/linalg/_umath_linalg.py:18: UserWarning: no cffi linalg functions and no _umath_linalg_capi module, expect problems. warn('no cffi linalg functions and no _umath_linalg_capi module, expect problems.')
  • crash in actual nupic build:
Scanning dependencies of target py_support
[  7%] Building CXX object CMakeFiles/py_support.dir/extensions/py_support/NumpyVector.cpp.o
/home/mmm/nupic/nupic-source/extensions/py_support/NumpyVector.cpp:73:35: error: ‘PyArray_UINT64’ was not declared in this scope
 NTA_DEF_NUMPY_DTYPE_TRAIT(size_t, PyArray_UINT64);
                                   ^
/home/mmm/nupic/nupic-source/extensions/py_support/NumpyVector.cpp:65:66: note: in definition of macro ‘NTA_DEF_NUMPY_DTYPE_TRAIT’
 template<> class NumpyDTypeTraits<a> { public: enum { numpyDType=b }; }; \
                                                                  ^
/home/mmm/nupic/nupic-source/extensions/py_support/NumpyVector.cpp:85:41: error: ‘PyArray_INT64’ was not declared in this scope
 NTA_DEF_NUMPY_DTYPE_TRAIT(nupic::Int64, PyArray_INT64);
                                         ^
/home/mmm/nupic/nupic-source/extensions/py_support/NumpyVector.cpp:65:66: note: in definition of macro ‘NTA_DEF_NUMPY_DTYPE_TRAIT’
 template<> class NumpyDTypeTraits<a> { public: enum { numpyDType=b }; }; \
                                                                  ^
/home/mmm/nupic/nupic-source/extensions/py_support/NumpyVector.cpp: In static member function ‘static void nupic::NumpyArray::init()’:
/home/mmm/nupic/nupic-source/extensions/py_support/NumpyVector.cpp:104:26: error: ‘_import_array’ was not declared in this scope
   int rc = _import_array();
                          ^
/home/mmm/nupic/nupic-source/extensions/py_support/NumpyVector.cpp: In constructor ‘nupic::NumpyArray::NumpyArray(PyObject*, int, int)’:
/home/mmm/nupic/nupic-source/extensions/py_support/NumpyVector.cpp:153:70: error: ‘PyArray_Cast’ was not declared in this scope
   PyObject *casted = PyArray_Cast((PyArrayObject *) contiguous, dtype);
                                                                      ^
/home/mmm/nupic/nupic-source/extensions/py_support/NumpyVector.cpp:159:42: error: ‘PyObject’ has no member named ‘nd’
   if((requiredDimension != 0) && (final->nd != requiredDimension))
                                          ^

^^^ this might be related to using numpy version from git

@breznak
Copy link
Member

breznak commented Feb 17, 2015

Seems this got stucks as pypy is not a drop-in replacement for python.
Worth considering can be numba https://github.com/numba/numba

@rhyolight rhyolight removed this from the 0.4.0 milestone Feb 23, 2015
@david-ragazzi david-ragazzi reopened this Jul 31, 2015
@david-ragazzi
Copy link
Contributor Author

Good news: Newer versions of PyPy has provided better compatibility with existent python base codes. So these days I decided test pypy and fortunatelly I got (very) excellent performance results to a (very) low price: only 1 or 2 lines in SpatialPooler need be refactored due to a numpy function not ported to pypy (yet). I still didn't test TemporalMemory with pypy but I'm pretty sure that probably few or no lines will need be changed.

I'll ellaborate a benchmark more complete these days and present a report with results and potential drawbacks to you check the feasibility.

To reinforce the gain with PyPy, look this 2011 article where PyPy beats even C!
http://morepypy.blogspot.com.br/2011/08/pypy-is-faster-than-c-again-string.html

@breznak
Copy link
Member

breznak commented Aug 11, 2015

I got (very) excellent performance results to a (very) low price: only 1 or 2 lines in SpatialPooler need be refactored due to a numpy function not ported to pypy (yet).

wow @david-ragazzi ! could you share some (performance) results and possibly a branch to try? ;)

@dstromberg
Copy link

I'm a total nupic newbie, but I have a lot of CPython and some Pypy and Cython. If someone will outline a script (like what to import for the C++ version, what to import for the pure-python version, what to instantiate and call in a loop), I'll excitedly dive into an actual performance comparison.

BTW, technically Python is a language, CPython is the reference implementation of the Python language, Pypy is a JIT compiled implementation of Python, and Cython is an AOT compiled dialect of Python that can be quite fast if you give it the right types (but can actually be slower than CPython if you're not careful). Often when people say Python, they mean CPython, but that's becoming less and less appropriate, what with Pypy, Micropython, Jython, IronPython, etcetera.

Last I heard, Pypy was able to use nearly all of the official numpy, passing 99.9% of numpy's test suite. However, it uses Pypy's cpyext to call the native code of numpy, which can be a bit slow - so it's only a decent performer for large datastructures like big matrices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants