PyArray conversion speedups and PyArrayFromBuffer #487

JobJob · 2018-04-04T14:29:29Z

This is the last one for now 😄

Speedups for PyArray conversion using Python's Buffer interface. PyArrayFromBuffer could become the default constructor for PyArray, if this looks good.

~15x speedup for PyArray(o::PyObject)
PyArrayFromBuffer(o::PyObject) further 4x faster PyArray creation (~60x faster than master)
setdata! for fast reassignment of data ptr in PyArray when you know the array is the right shape (~500x faster than creating a new PyArray on master - not an apples to apples comparison, but still good to have)
Workhorse is PyArrayInfoFromBuffer(o::PyObject) - a faster way to get PyArray_Info from numpy than extracting values from the __array_interface__
(speculative) ArrayFromBuffer no copy conversion to Julia Array
PyArray_Info parameterised T,N and immutable
PyArray_Info size and stride changed to Tuples (from Vectors)
PyArray_Info data changed from Ptr{Cvoid} to Ptr{T}
moved PyBuffer tests to separate file

In no major rush for review, so please don't feel pressured.

master results:

Mean times
----------
nprand_pyo(2, 2)    : TrialEstimate(17.451 μs)
convert_pyarr(2, 2) : TrialEstimate(403.306 μs)
nprand_pyo(100, 100): TrialEstimate(152.220 μs)
convert_pyarr(100, 100): TrialEstimate(418.158 μs)

This branch results:

Mean times
----------
nprand_pyo(2, 2)           : TrialEstimate(16.688 μs)
convert_pyarr(2, 2)        : TrialEstimate(21.455 μs)
PyArrayInfoFromBuffer(2, 2): TrialEstimate(3.519 μs)
convert_pyarrbuf(2, 2)     : TrialEstimate(5.450 μs)
convert_arr(2, 2)          : TrialEstimate(100.200 μs)
convert_arrbuf(2, 2)       : TrialEstimate(10.339 μs)
setdata!(2, 2)             : TrialEstimate(745.279 ns)
setdata! bufprealloc(2, 2) : TrialEstimate(469.487 ns)
nprand_pyo(100, 100)       : TrialEstimate(158.003 μs)
convert_pyarr(100, 100)    : TrialEstimate(21.049 μs)
PyArrayInfoFromBuffer(100, 100): TrialEstimate(3.402 μs)
convert_pyarrbuf(100, 100) : TrialEstimate(5.024 μs)
convert_arr(100, 100)      : TrialEstimate(4.559 ms)
convert_arrbuf(100, 100)   : TrialEstimate(10.037 μs)
setdata!(100, 100)         : TrialEstimate(745.049 ns)
setdata! bufprealloc(100, 100): TrialEstimate(465.865 ns)

Benchmark code for master.

using PyCall, BenchmarkTools, DataStructures

results = OrderedDict{String,Any}() 
let 
    np = pyimport("numpy") 
    nprand = np["random"]["rand"] 
    # nparray_pyo(x) = pycall(np["array"], PyObject, x) 
    # pytestarray(sz::Int...) = pycall(np["reshape"], PyObject, nparray_pyo(1:prod(sz)), sz) 

    nprand_pyo(sz...)   = pycall(nprand, PyObject, sz...) 

    for arr_size in [(2,2), (100,100)] 
        pyo_arr = nprand_pyo(arr_size...) 
        results["nprand_pyo$arr_size"] = @benchmark $nprand_pyo($arr_size...) 
        println("nprand_pyo $arr_size:\n"); display(results["nprand_pyo$arr_size"]) 
        println("--------------------------------------------------") 

        results["convert_pyarr$arr_size"] = @benchmark $convert(PyArray, $pyo_arr) 
        println("convert_pyarr $arr_size:\n"); display(results["convert_pyarr$arr_size"]) 
        println("--------------------------------------------------")
    end
end

println("")
println("Mean times")
println("----------")
foreach((r)->println(rpad(r[1],20), ": ", mean(r[2])), results)

stevengj · 2018-04-04T14:42:00Z

Thanks for working on this! My hope for a long time (see #70) has been to ditch numpy.jl completely in favor of something based entirely on PyBuffer. Can you remove numpy.jl completely in this PR?

stevengj · 2018-04-04T14:43:08Z

That is, PyArray (the no-copy conversion type) should just use the buffer API too.

JobJob · 2018-04-04T14:43:16Z

Yeah I think you can remove all the stuff related to Python Array -> PyArray conversion. But this doesn't do anything for julia array -> python array

JobJob · 2018-04-04T14:44:34Z

as in everything related to __aray_interface__

stevengj · 2018-04-04T14:45:18Z

Right.... for that we'd need to create an object that implements the buffer API. One concern is that I don't know how well this would work with Python functions expecting a NumPy array.

Okay, it makes sense to replace the Python -> Julia array code (__array_interface__ and PyArray) with buffer code in this PR, but leave the Julia -> Python stuff using NumPy for now and work on that in a separate PR.

JobJob · 2018-04-04T14:49:30Z

The other issue with replacing PyArray(o::PyObject) with PyArrayFromBuffer is if something implements __array_interface__ but not the buffer interface. Not sure which is more supported in the wild? But we could just add a fallback to __array_interface__ if the buffer interface isn't implemented.

stevengj · 2018-04-04T14:56:11Z

if something implements __array_interface__ but not the buffer interface.

I'm guessing that this never happens in practice.

My preference would be to remove __array_interface__ entirely, and only worry about adding it back as a fallback if someone encounters it in the wild.

JobJob · 2018-04-04T14:58:04Z

I'm guessing that this never happens in practice.

Ok, let's not support it then unless someone tells us it does 😄

JobJob · 2018-04-08T15:59:47Z

What's the issue with having numpy as a test dependency? Seems that using numpy will allow us to cover more cases, more easily.

I thought I might be able to get by with python's array module, but then I did a little digging and found out that it doesn't support the buffer interface on Python 2.7 🤦. You can get typed byte data with struct.pack, but you can't reshape a memoryview using cast in py2 (you can in Python 3). So, the only way I could see to create a multi-dim array without using numpy in py2.7 is messing around in ctypes.

It could be done, but would take effort, esp. for more exotic arrays (e.g. non-native endian and f_contiguous), and since I'm guessing the majority of the arrays these functions will see in the wild will be coming from numpy anyway (?), could be good to test with it.

stevengj · 2018-04-08T19:15:11Z

NumPy as a test dependency is fine; we should definitely test with NumPy arrays and operations anyway.

* PyArray_Info parameterised T,N and immutable * PyArray_Info size and stride changed to Tuples (from Vectors) * PyArray_Info data changed from Ptr{Cvoid} to Ptr{T} * `PyArrayInfoFromBuffer(o::PyObject)` faster way to get PyArray_Info from numpy than numpy's __array_interface__ * `PyArrayFromBuffer(o::PyObject)` 4x faster PyArray conversion * ArrayFromBuffer no copy conversion to Julia Array * moved PyBuffer tests to separate file

rename ArrayFromBuffer to NoCopyArray, and make indexing of NoCopyArray match py indexing for row major arrays too

stevengj · 2018-07-25T17:31:18Z

Bump.

JobJob · 2018-08-07T01:28:38Z

Assuming the tests pass I think this is pretty good to go. (they pass for me on mac with py 2/3 julia 0.6.3/0.7-dev-xxx)

There are things in pep-3118 that it doesn't handle, but I think it handles all the cases of arrays (strides and types) that current master does.

Oh the one thing I wanted to ask about is there's a deprecation warning about strides on 0.7 - but should PyArray and PyBuffer implement strides for 0.6?

JobJob · 2018-08-07T01:33:12Z

Umm nightly failures seem unrelated :/

stevengj · 2018-08-17T00:17:51Z

Nightly failures on master should be fixed now if you want to rebase.

src/pybuffer.jl

and change pass a PyPtr_NULL instead of a C_NULL for Py_buffer.obj when creating a new PyBuffer

codecov-io · 2018-10-22T10:29:26Z

Codecov Report

Merging #487 into master will decrease coverage by 0.08%.
The diff coverage is 44.51%.

@@            Coverage Diff             @@
##           master     #487      +/-   ##
==========================================
- Coverage   53.11%   53.03%   -0.09%     
==========================================
  Files          19       20       +1     
  Lines        1525     1567      +42     
==========================================
+ Hits          810      831      +21     
- Misses        715      736      +21

Impacted Files	Coverage Δ
src/PyCall.jl	`59.51% <ø> (ø)`	⬆️
src/numpy.jl	`67.24% <100%> (+16.33%)`	⬆️
src/pyarray.jl	`39.02% <39.02%> (ø)`
src/conversions.jl	`52.17% <50%> (-0.61%)`	⬇️
src/pybuffer.jl	`51.31% <65.51%> (+13.81%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e3f2ac...dbc9caa. Read the comment docs.

src/pyarray.jl

src/pybuffer.jl

src/PyCall.jl

indexing into a PyArray was throwing a ambiguous method error for similar(::PyArray, T, dims::Dims)

JobJob · 2018-10-24T11:34:41Z

test/testpybuffer.jl

+            pyarr = convert(PyArray, ao1)
+            ao2 = arrpyo(11.0:20.0, "d")
+            setdata!(pyarr, ao2)
+            @test all(pyarr[1:10] .== 11.0:20.0)


using PyCall np = pyimport("numpy") pyarr = pycall(np["array"], PyArray, [1:10;]) pyarr[1:2]

The above is an error on current master on Julia 1.0.1 🙀

JobJob · 2018-11-02T13:32:08Z

Bumpity bump? @stevengj

JobJob · 2018-11-13T04:58:12Z

Hello?

stevengj · 2018-11-13T13:00:45Z

Thanks for keeping at this!

JobJob · 2018-11-13T13:13:21Z

🎉 🎉 🎉
Thanks to you @stevengj you brilliant beast 😄
Sorry the diff became so big - will try to keep things more manageable in future.

tkf · 2018-11-15T08:27:29Z

There are failures in Julia 0.7 after this is merged: https://travis-ci.org/JuliaPy/PyCall.jl/builds/454473965

(One failure is from nightly but it's terminated due to "No output has been received in the last 10m0s" during installation. So it probably is irrelevant.)

The build before merge was all green: https://travis-ci.org/JuliaPy/PyCall.jl/builds/450430878
So I think the source tree was identical and the test should pass, unless there are upstream updates. But there is no updates for Julia 0.7.

Also, Travis reports "Job errored" ! rather than "Job failed" x even though there seems to be segmentation fault during the test:
https://travis-ci.org/JuliaPy/PyCall.jl/jobs/454473973#L756

What is happening?

JobJob · 2018-11-15T08:41:10Z

I think it could be related to 48d730f
I can replicate on 0.7 with master locally, but when I reset to 17dcb37 on the branch of this PR (array-perf on https://github.com/JobJob/PyCall.jl) things are fine - i.e. no segfault at the end of testing

JobJob · 2018-11-15T08:42:02Z

N.b. I didn't merge that commit into the branch of this PR

JobJob · 2018-11-15T08:54:06Z

Simple fix in #615

marius311 · 2019-01-23T22:53:40Z

Apologies, I haven't followed the discussion above, but I did bisect a failure of my code when upgrading to PyCall#master down to this commit. It seems that before this commit,

julia> using PyPlot
julia> subplots(1, 1; squeeze=false)[2]
1×1 Array{PyCall.PyObject,2}:
 PyObject <matplotlib.axes._subplots.AxesSubplot object at 0x7f75f95c7610>

but after

julia> using PyPlot
julia> subplots(1, 1; squeeze=false)[2]
PyObject array([[<matplotlib.axes._subplots.AxesSubplot object at 0x7ff0846ed610>]],
      dtype=object)

Was this intended or not?

In either case, is there a workaround to force conversion to a Julia Matrix in the second case?

stevengj · 2019-01-24T00:03:26Z

Probably not intended. Have you tried convert(Matrix, ans) on the result?

marius311 · 2019-01-24T00:11:54Z

Thanks. Neither of these two attempts on current master seem to work:

julia> convert(Matrix, subplots(1,1;squeeze=false)[2])
ERROR: MethodError: Cannot `convert` an object of type PyObject to an object of type Array{T,2} where T
Closest candidates are:
  convert(::Type{T<:Array}, ::AbstractArray) where T<:Array at array.jl:474
  convert(::Type{T<:AbstractArray}, ::T<:AbstractArray) where T<:AbstractArray at abstractarray.jl:14
  convert(::Type{T<:AbstractArray}, ::LinearAlgebra.Factorization) where T<:AbstractArray at /home/marius/src/julia/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/factorization.jl:46
  ...
Stacktrace:
 [1] top-level scope at none:0

julia> PyArray(subplots(1,1;squeeze=false)[2])
ERROR: KeyError: key "O" not found
Stacktrace:
 [1] getindex at ./dict.jl:478 [inlined]
 [2] array_format(::PyBuffer) at /home/marius/.julia/dev/PyCall/src/pybuffer.jl:261
 [3] PyArray_Info(::PyObject) at /home/marius/.julia/dev/PyCall/src/pyarray.jl:17
 [4] PyArray(::PyObject) at /home/marius/.julia/dev/PyCall/src/pyarray.jl:123
 [5] top-level scope at none:0

But permutedims(getindex.(Ref(_), 1:m, (1:n)')) works fine so I'm happy with that workaround.

JobJob force-pushed the array-perf branch from a25c297 to e84e7c9 Compare April 7, 2018 17:17

JobJob force-pushed the array-perf branch from e65da5f to 8023894 Compare April 29, 2018 17:51

JobJob added 6 commits April 30, 2018 17:05

Add isbuftype and use in pysequence_query

955243c

Increase robustness in parsing buffer format string

7fde0ec

ArrayFromBuffer Improvements

b8a9dcd

ArgumentError for non-native endian and typestrs_native

a0082c5

Move PyArray related things to their own file

d67e19e

rename ArrayFromBuffer to NoCopyArray, and make indexing of NoCopyArray match py indexing for row major arrays too

JobJob force-pushed the array-perf branch from 8023894 to d67e19e Compare April 30, 2018 07:06

JobJob mentioned this pull request Jun 28, 2018

Faster pycall. Adds pycall! #492

Merged

3 tasks

JobJob added 4 commits August 7, 2018 02:01

Merge branch 'master' into array-perf

86e708a

Void to Nothing/Cvoid

984a59f

0.7 deprecations

80fc1d9

put my thang down flip it and reverse it

edcc3ca

JobJob added 2 commits August 30, 2018 18:33

Merge branch 'master' into array-perf

35ad573

Merge branch 'master' into array-perf

de29427

tkf reviewed Oct 20, 2018

View reviewed changes

src/pybuffer.jl Outdated Show resolved Hide resolved

Clarify pydecref(o::PyBuffer) docstring

dbc9caa

and change pass a PyPtr_NULL instead of a C_NULL for Py_buffer.obj when creating a new PyBuffer

stevengj reviewed Oct 23, 2018

View reviewed changes

src/pyarray.jl Show resolved Hide resolved

stevengj reviewed Oct 23, 2018

View reviewed changes

src/pybuffer.jl Show resolved Hide resolved

stevengj reviewed Oct 23, 2018

View reviewed changes

src/PyCall.jl Outdated Show resolved Hide resolved

JobJob added 4 commits October 24, 2018 17:34

Merge branch 'master' into array-perf

a596634

Remove exports: setdata!, NoCopyArray, isbuftype

181b785

Test setdata!, improve NoCopyArray docstring

7304622

Add tests for PyArray getindex and fix similar for PyArray

0ea15df

indexing into a PyArray was throwing a ambiguous method error for similar(::PyArray, T, dims::Dims)

JobJob commented Oct 24, 2018

View reviewed changes

tkf mentioned this pull request Oct 28, 2018

WIP: override get/setproperty and propertynames #517

Merged

Add GC rooting note

99c519c

Merge branch 'master' into array-perf

17dcb37

tkf mentioned this pull request Nov 6, 2018

Numpy structured array unintuitive handling #610

Open

stevengj merged commit e74bf61 into JuliaPy:master Nov 13, 2018

tkf mentioned this pull request Nov 15, 2018

Fix Segfault during finalization #615

Merged

This was referenced Dec 6, 2018

Use native sizes by default when parsing buffer format string #621

Merged

Restore support for non-contiguous PyArrays, fix f_contiguous, simplify copy(::PyArray) #623

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyArray conversion speedups and PyArrayFromBuffer #487

PyArray conversion speedups and PyArrayFromBuffer #487

JobJob commented Apr 4, 2018 •

edited

Loading

stevengj commented Apr 4, 2018

stevengj commented Apr 4, 2018 •

edited

Loading

JobJob commented Apr 4, 2018

JobJob commented Apr 4, 2018

stevengj commented Apr 4, 2018 •

edited

Loading

JobJob commented Apr 4, 2018

stevengj commented Apr 4, 2018 •

edited

Loading

JobJob commented Apr 4, 2018

JobJob commented Apr 8, 2018 •

edited

Loading

stevengj commented Apr 8, 2018 •

edited

Loading

stevengj commented Jul 25, 2018

JobJob commented Aug 7, 2018 •

edited

Loading

JobJob commented Aug 7, 2018

stevengj commented Aug 17, 2018

codecov-io commented Oct 22, 2018 •

edited

Loading

JobJob Oct 24, 2018

JobJob commented Nov 2, 2018

JobJob commented Nov 13, 2018

stevengj commented Nov 13, 2018

JobJob commented Nov 13, 2018

tkf commented Nov 15, 2018

JobJob commented Nov 15, 2018

JobJob commented Nov 15, 2018

JobJob commented Nov 15, 2018

marius311 commented Jan 23, 2019

stevengj commented Jan 24, 2019

marius311 commented Jan 24, 2019 •

edited

Loading

PyArray conversion speedups and PyArrayFromBuffer #487

PyArray conversion speedups and PyArrayFromBuffer #487

Conversation

JobJob commented Apr 4, 2018 • edited Loading

stevengj commented Apr 4, 2018

stevengj commented Apr 4, 2018 • edited Loading

JobJob commented Apr 4, 2018

JobJob commented Apr 4, 2018

stevengj commented Apr 4, 2018 • edited Loading

JobJob commented Apr 4, 2018

stevengj commented Apr 4, 2018 • edited Loading

JobJob commented Apr 4, 2018

JobJob commented Apr 8, 2018 • edited Loading

stevengj commented Apr 8, 2018 • edited Loading

stevengj commented Jul 25, 2018

JobJob commented Aug 7, 2018 • edited Loading

JobJob commented Aug 7, 2018

stevengj commented Aug 17, 2018

codecov-io commented Oct 22, 2018 • edited Loading

Codecov Report

JobJob Oct 24, 2018

Choose a reason for hiding this comment

JobJob commented Nov 2, 2018

JobJob commented Nov 13, 2018

stevengj commented Nov 13, 2018

JobJob commented Nov 13, 2018

tkf commented Nov 15, 2018

JobJob commented Nov 15, 2018

JobJob commented Nov 15, 2018

JobJob commented Nov 15, 2018

marius311 commented Jan 23, 2019

stevengj commented Jan 24, 2019

marius311 commented Jan 24, 2019 • edited Loading

JobJob commented Apr 4, 2018 •

edited

Loading

stevengj commented Apr 4, 2018 •

edited

Loading

stevengj commented Apr 4, 2018 •

edited

Loading

stevengj commented Apr 4, 2018 •

edited

Loading

JobJob commented Apr 8, 2018 •

edited

Loading

stevengj commented Apr 8, 2018 •

edited

Loading

JobJob commented Aug 7, 2018 •

edited

Loading

codecov-io commented Oct 22, 2018 •

edited

Loading

marius311 commented Jan 24, 2019 •

edited

Loading