Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pythonizations for collection subscript #570

Merged
merged 9 commits into from
May 14, 2024

Conversation

m-fila
Copy link
Contributor

@m-fila m-fila commented Mar 18, 2024

BEGINRELEASENOTES

  • Added mechanism to load cppyy pythonizations
  • Added pythonization for bound-check subscript operation in collections

ENDRELEASENOTES

#458 and key4hep/EDM4hep#288 show there is some interest in tuning the cppyy generated bindings. The cppyy has a feature for this called pythonization -> a callback can be registered for all the classes that fulfill some predicate (e.g. derived from podio::CollectionBase).

This PR contains:

  • an abstraction for adding pythonizations (each pythonization implemented in a class derived from podio.pythonizations.utils.Pythonizer)
  • pythonization adding bound check __getitem__ for classes derived from podio::CollectionBase

The mechanism of loading the pythonizations is inspired by ROOT pythonizations with a change that de order of loading of the pythonizations is defined.

To load the pythonizations in a downstream projects, for instance for project utilizing a "edm4hep" namespace

from podio.pythonizations import load_pythonizations

load_pythonizations("edm4hep")

alternatively a selection of pythonizations can be loaded by importing and applying them one by one

The alternative mechanism I tough about:

  • c++ callback pythonization - instead of in python the callback to cppyy could be defined in C++ using the Python API. The downsides are extra dependency for C++ code (Python::Module) and knowing the Python C API is required
  • generating python wrappers with jinja2 - it would be rather cumbersome to add another set jinja2 templates for tuning the python bindings

@tmadlener
Copy link
Collaborator

If I understand correctly this would effectively give an "opt-in" solution for generated EDMs, as the EDM has to explicitly load these pythonizations. For EDM4hep this would not really be a problem, as we have some python glue code already in any case.

For me this looks like a good solution, but we should add a bit of documentation on how to use this in generated EDMs.

@m-fila
Copy link
Contributor Author

m-fila commented Mar 21, 2024

If I understand correctly this would effectively give an "opt-in" solution for generated EDMs, as the EDM has to explicitly load these pythonizations. For EDM4hep this would not really be a problem, as we have some python glue code already in any case.

yes, it's opt-in. load_pythonizations could be added to EDM4hep's __init__.py to load on import

For me this looks like a good solution, but we should add a bit of documentation on how to use this in generated EDMs.

Thanks, I will add it

@jmcarcell
Copy link
Member

jmcarcell commented Mar 25, 2024

Note that this will make indexing significantly slower:
With this PR and the one in EDM4hep:

In [2]: import edm4hep

In [3]: coll = edm4hep.MCParticleCollection()

In [4]: coll.create()
Out[4]: <cppyy.gbl.edm4hep.MutableMCParticle object at 0x55f984f08120>

In [5]: %timeit coll[0]
3.45 µs ± 72.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

and without this PR:

2.46 µs ± 26.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

It should be said somewhere that the current behaviour is you get an object when indexing out of bounds and then it crashes when you try to do something with that object. But I think the case of indexing out of bounds should be rare since typically you run either a for loop without indexes or if it has indexes then it's bounded by len(coll).

@m-fila
Copy link
Contributor Author

m-fila commented Mar 25, 2024

Note that this will make indexing significantly slower

Thanks for pointing this out, I'll check if this can be improved by mapping __getitem__ to at and raising IndexError

@m-fila
Copy link
Contributor Author

m-fila commented Mar 25, 2024

I tried benchmarking on following function:

def bench():
    try:
        coll[0]
    except:
        pass

coll = MCParticleCollection()
bench() # give cppyy time to load lazily
%timeit bench()

coll.create()
bench() # give cppyy time to load lazily
%timeit bench()
Bound check __getitem__ implementation Average time, no-exception branch Average time, exception branch
1. Default cppyy mapping to operator[] 683 ns -
2. ✔️ Originally proposed in this PR, bound-check in python 1.2 µs 779 ns
3. ✔️ Mapped to at 730 ns 30.8 µs
4. ✔️ at wrapped to raise IndexError 826 ns 32.1 µs

The implementations using at seems to be more performant in the usual case. Would it be acceptable to go with option 3. or 4.?

A C++ callback added to the Collection template could potentially achieve the same as at (option 3.) on the no-exception branch and slightly better than python side bound-check (option 2.) on the exception branch. But then extending with other c++ callbacks would be very ugly

The performance of at based implementations is poor if there is an exception, because by design cppyy tries to evaluate all eligible overloads if there was a throw (in this caseat(size_t) and at(size_t) const ). Since cppyy-1.7.0 it's possible to explicitly select between const and non-const overload, but it's newer than what we have

@tmadlener
Copy link
Collaborator

I see there has been quite some discussion (and progress?) here. A few comments / questions from my side in no particular order:

  • Did you by any chance check how the times compare to pure c++? Mainly for comparison, I don't think we learn too much from them other than a 30 % increase on the python side is pretty negligible compared to pure c++.
  • I think the exceptional case should not be the (main) driver for performance considerations. After all you are dealing with an error at that point and not with something "usual".
  • IIUC all of the presented methods are still possible with python callbacks, right?

@m-fila
Copy link
Contributor Author

m-fila commented Mar 27, 2024

Did you by any chance check how the times compare to pure c++? Mainly for comparison, I don't think we learn too much from them other than a 30 % increase on the python side is pretty negligible compared to pure c++.

Pure C++ google benchmark between a single element access with operator[] vs at:

----------------------------------------------------------------
Benchmark                      Time             CPU   Iterations
----------------------------------------------------------------
squareBracketsNoError       4.59 ns         4.59 ns    152241304
atNoError                   5.26 ns         5.25 ns    131108856
atError                     1552 ns         1550 ns       452227

where

static void squareBracketsNoError(benchmark::State& state) {
  edm4hep::MCParticleCollection coll;
  coll.create();
  for (auto _ : state) {
    try {
      benchmark::DoNotOptimize(coll[0]);
      benchmark::ClobberMemory();
    } catch (const std::out_of_range&) {
    }
  }
}

static void atNoError(benchmark::State& state) {
  edm4hep::MCParticleCollection coll;
  coll.create();
  for (auto _ : state) {
    try {
      benchmark::DoNotOptimize(coll.at(0));
      benchmark::ClobberMemory();
    } catch (const std::out_of_range&) {
    }
  }
}

static void atError(benchmark::State& state) {
  edm4hep::MCParticleCollection coll;
  coll.create();
  for (auto _ : state) {
    try {
      benchmark::DoNotOptimize(coll.at(1));
      benchmark::ClobberMemory();
    } catch (const std::out_of_range&) {
    }
  }
}

both this and python benchmark are isolated single access, looping over the collection will probably give different results

IIUC all of the presented methods are still possible with python callbacks, right?

Yes, all the alternatives from that table were python callbacks

I think the exceptional case should not be the (main) driver for performance considerations. After all you are dealing with an error at that point and not with something "usual".

In the last commit I changed the implementation proposed in PR to option 4 from the table. It seems to me as a reasonable compromise

Copy link
Collaborator

@tmadlener tmadlener left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for also checking the c++ times. The orders of magnitude difference between c++ and python looks like expected, I would say.

I would also vote for option 4 here. I just have one minor question below.

python/podio/pythonizations/collection_subscript.py Outdated Show resolved Hide resolved
Copy link
Member

@jmcarcell jmcarcell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This functionality should be in podio. For EDM4hep, I would like to have only what's necessary since there can be some drawbacks for interactive usage

@tmadlener tmadlener linked an issue Apr 9, 2024 that may be closed by this pull request
@tmadlener
Copy link
Collaborator

Can you rebase this onto the latest master to pick up the RNTuple API fixes? That should also make CI pass again.

@m-fila
Copy link
Contributor Author

m-fila commented Apr 25, 2024

Sorry for the delay. Rebased and added documentation. I also split the callback into two functions:

  • conditional to filter classes to by modified
  • modification body doing actual work here

I think previously it was not clear that callback should start with some conditional statement to select what should be modified

doc/python.md Outdated Show resolved Hide resolved
doc/python.md Outdated Show resolved Hide resolved
Comment on lines +49 to +50
with self.assertRaises(IndexError):
_ = collection[20]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails on dev3 due to some overload resolution issue. I would assume it's a ROOT issue that we are hitting here, but I am not sure yet which one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well an exception is thrown, but of a different type than expected:

  Traceback (most recent call last):
    File "/home/runner/work/podio/podio/python/podio/test_CodeGen.py", line 50, in test_bound_check
      _ = collection[20]
    File "/home/runner/work/podio/podio/python/podio/pythonizations/collection_subscript.py", line 22, in get_item
      return self.at(i)
  TypeError: none of the 2 overloaded methods succeeded. Full details:
    nsp::EnergyInNamespace nsp::EnergyInNamespaceCollection::at(size_t index) =>
      out_of_range: deque::_M_range_check: __n (which is 20)>= this->size() (which is 1)
    nsp::MutableEnergyInNamespace nsp::EnergyInNamespaceCollection::at(size_t index) =>
      out_of_range: deque::_M_range_check: __n (which is 20)>= this->size() (which is 1)

According to the docs if there is an exception then cppyy checks the rest of overloads and if they all throw the same type then the exception is propagated, if they all throw different types then TypeError is risen
Here both overloads throws std::out_of_range but somehow it isn't recognized as the same type

The possible solutions I could think of now are:

  • explicitly select overload - here the overloads are const and non-const. Selection between const and non-const was added in cppyy-1.7.0, nightlies have 1.6.2
  • change pythonization to catch either cppyy.gbl.std.out_of_range or TypeError and raise IndexError- I don't like it becasue it's also used for other things, like actual type mismatch, eg. calling with str instead of int
  • allow both types of exception in test - getting an exception is already an improvement, the message says it's out of range. The users get confused why it's TypeError not IndexError

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this can be boiled down to a small reproducer that we can submit to ROOT because they probably didn't want to introduce too many "breaking" changes in their update of the cppyy they bundle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened an issue wlav/cppyy#230 as the problems appears also outside of ROOT bundle

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also made the ROOT folks aware via root-project/root#15375

doc/python.md Outdated Show resolved Hide resolved
Co-authored-by: Thomas Madlener <thomas.madlener@desy.de>
@tmadlener
Copy link
Collaborator

Looks like all the necessary fixes have landed in upstream root (master and v6.32). IIRC this can be merged now, right?

@m-fila
Copy link
Contributor Author

m-fila commented May 14, 2024

Looks like all the necessary fixes have landed in upstream root (master and v6.32). IIRC this can be merged now, right?

Awesome 😎 Yes, it's ready to go

@hegner
Copy link
Collaborator

hegner commented May 14, 2024

and off we go...

@hegner hegner merged commit a18229b into AIDASoft:master May 14, 2024
18 checks passed
@m-fila m-fila deleted the pythonizations branch May 23, 2024 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Better out-of-bounds checks for collection access
4 participants