Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pythonizations for collection subscript #570

Merged
merged 9 commits into from
May 14, 2024
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
- [Data Model Syntax](./doc/datamodel_syntax.md)
- [Examples](./doc/examples.md)
- [Advanced Topics](./doc/advanced_topics.md)
- [Python Interface](./doc/python.md)
- [Contributing](./doc/contributing.md)

<!-- Browse the API documentation created with Doxygen at -->
Expand Down
1 change: 1 addition & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,6 @@ Welcome to PODIO's documentation!
userdata.md
advanced_topics.md
templates.md
python.md
cpp_api/api
py_api/modules
56 changes: 56 additions & 0 deletions doc/python.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Python interface for data models

Podio provides support for a Python interface for the generated data models. The [design choice](design.md) to create Python interface resembling the C++ interface is achieved by generating Python bindings from the C++ interface using
[cppyy](https://cppyy.readthedocs.io/en/latest/index.html). To make pyROOT aware of the bindings, the cppyy functionality bundled with ROOT can be used.

It's important to note that cppyy loads the bindings and presents them lazily at runtime to the Python interpreter, rather than writing Python interface files. Consequently, the Python bindings have a runtime dependencies on ROOT, cppyy and the data model's C++ interface.

To load the Python bindings from a generated C++ model dictionary, first make sure the model's library and headers can be found in `LD_LIBRARY_PATH` and `ROOT_INCLUDE_HEADERS` respectively, then:

```python
import ROOT

res = ROOT.gSystem.Load('libGeneratedModelDict.so')
if res < 0:
raise RuntimeError('Failed to load libGeneratedModelDict.so')
```

For reference usage, see the [Python module of EDM4hep](https://github.com/key4hep/EDM4hep/blob/main/python/edm4hep/__init__.py).

## Pythonizations

Python as a language uses different constructions and conventions than C++, perfectly fine C++ code translated one to one to Python could be clunky by Python's standard. cppyy offers a mechanism called [pythonizations](https://cppyy.readthedocs.io/en/latest/pythonizations.html) to make the resulting bindings more pythonic. Some basic pythonizations are included automatically (for instance `operator[]` is translated to `__getitem__`) but others can be specified by a user.

Podio comes with its own set of pythonizations useful for the data models generated with it. To apply all the provided pythonizations to a `model_namespace` namespace:

```python
from podio.pythonizations import load_pythonizations

load_pythonizations("model_namespace")
```

If only specific pythonizations should be applied:

```python
from podio.pythonizations import collection_subscript # specific pythonization

collection_subscript.CollectionSubscriptPythonizer.register("model_namespace")
```

### Developing new pythonizations

To be discovered by `load_pythonizations`, any new pythonization should be placed in `podio.pythonizations` and be derived from the abstract class `podio.pythonizations.utils.pythonizer.Pythonizer`.

A pythonization class should implement the following three class methods:

- `priority`: The `load_pythonizations` function applies the pythonizations in increasing order of their `priority`
- `filter`: A predicate to filter out classes to which given pythonization should be applied. See the [cppyy documentation](https://cppyy.readthedocs.io/en/latest/pythonizations.html#python-callbacks).
- `modify`: Applying the modifications to the pythonized classes.

### Considerations

The cppyy pythonizations come with some considerations:

- The general cppyy idea to lazily load only things that are needed applies only partially to the pythonizations. For instance, a pythonization modifying the `collection[]` will be applied the first time a class of `collection` is used, regardless if `collection[]` is actually used.
- Each pythonization is applied to all the entities in a namespace and relies on a conditional mechanism (`filter` method) inside the pythonizations to select entities they modify. With a large number of pythonizations, the overheads will add up and slow down the usage of any class from a pythonized namespace.
- The cppyy bindings hooking to the C++ routines are characterized by high performance compared to ordinary Python code. The pythonizations are written in Python and are executed at ordinary Python code speed.
15 changes: 15 additions & 0 deletions python/podio/pythonizations/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
"""cppyy pythonizations for podio"""

from importlib import import_module
from pkgutil import walk_packages
from .utils.pythonizer import Pythonizer


def load_pythonizations(namespace):
"""Register all available pythonizations for a given namespace"""
module_names = [name for _, name, _ in walk_packages(__path__) if not name.startswith("test_")]
for module_name in module_names:
import_module(__name__ + "." + module_name)
pythonizers = sorted(Pythonizer.__subclasses__(), key=lambda x: x.priority())
for i in pythonizers:
i.register(namespace)
26 changes: 26 additions & 0 deletions python/podio/pythonizations/collection_subscript.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
"""Pythonize subscript operation for collections"""

import cppyy
from .utils.pythonizer import Pythonizer


class CollectionSubscriptPythonizer(Pythonizer):
"""Bound-check __getitem__ for classes derived from podio::CollectionBase"""

@classmethod
def priority(cls):
return 50

@classmethod
def filter(cls, class_, name):
return issubclass(class_, cppyy.gbl.podio.CollectionBase)

@classmethod
def modify(cls, class_, name):
def get_item(self, i):
try:
return self.at(i)
except cppyy.gbl.std.out_of_range:
raise IndexError("collection index out of range") from None

class_.__getitem__ = get_item
Empty file.
57 changes: 57 additions & 0 deletions python/podio/pythonizations/utils/pythonizer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
"""cppyy pythonizations for podio"""

from abc import ABCMeta, abstractmethod
import cppyy


class Pythonizer(metaclass=ABCMeta):
"""
Base class to define cppyy pythonization for podio
"""

@classmethod
@abstractmethod
def priority(cls):
"""Order in which the pythonizations are applied

Returns:
int: Priority
"""

@classmethod
@abstractmethod
def filter(cls, class_, name):
"""
Abstract classmethod to filter classes to which the pythonizations should be applied

Args:
class_ (type): Class object.
name (str): Name of the class.

Returns:
bool: True if class should be pythonized.
"""

@classmethod
@abstractmethod
def modify(cls, class_, name):
"""Abstract classmethod modifying classes to be pythonized

Args:
class_ (type): Class object.
name (str): Name of the class.
"""

@classmethod
def register(cls, namespace):
"""Helper method to apply the pythonization to the given namespace

Args:
namespace (str): Namespace to by pythonized
"""

def pythonization_callback(class_, name):
if cls.filter(class_, name):
cls.modify(class_, name)

cppyy.py.add_pythonization(pythonization_callback, namespace)
18 changes: 18 additions & 0 deletions python/podio/test_CodeGen.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,12 @@
import unittest
import ROOT
from ROOT import ExampleMCCollection, MutableExampleMC
from ROOT import nsp
from pythonizations import load_pythonizations # pylint: disable=import-error

# load all available pythonizations to the classes in a namespace
# loading pythonizations changes the state of cppyy backend shared by all the tests in a process
load_pythonizations("nsp")


class ObjectConversionsTest(unittest.TestCase):
Expand Down Expand Up @@ -31,3 +37,15 @@ def test_add(self):
self.assertEqual(len(daughter_particle.parents()), 0)
daughter_particle.addparents(parent_particle)
self.assertEqual(len(daughter_particle.parents()), 1)


class CollectionSubscriptTest(unittest.TestCase):
"""Collection subscript test"""

def test_bound_check(self):
collection = nsp.EnergyInNamespaceCollection()
_ = collection.create()
self.assertEqual(len(collection), 1)
with self.assertRaises(IndexError):
_ = collection[20]
Comment on lines +49 to +50
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails on dev3 due to some overload resolution issue. I would assume it's a ROOT issue that we are hitting here, but I am not sure yet which one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well an exception is thrown, but of a different type than expected:

  Traceback (most recent call last):
    File "/home/runner/work/podio/podio/python/podio/test_CodeGen.py", line 50, in test_bound_check
      _ = collection[20]
    File "/home/runner/work/podio/podio/python/podio/pythonizations/collection_subscript.py", line 22, in get_item
      return self.at(i)
  TypeError: none of the 2 overloaded methods succeeded. Full details:
    nsp::EnergyInNamespace nsp::EnergyInNamespaceCollection::at(size_t index) =>
      out_of_range: deque::_M_range_check: __n (which is 20)>= this->size() (which is 1)
    nsp::MutableEnergyInNamespace nsp::EnergyInNamespaceCollection::at(size_t index) =>
      out_of_range: deque::_M_range_check: __n (which is 20)>= this->size() (which is 1)

According to the docs if there is an exception then cppyy checks the rest of overloads and if they all throw the same type then the exception is propagated, if they all throw different types then TypeError is risen
Here both overloads throws std::out_of_range but somehow it isn't recognized as the same type

The possible solutions I could think of now are:

  • explicitly select overload - here the overloads are const and non-const. Selection between const and non-const was added in cppyy-1.7.0, nightlies have 1.6.2
  • change pythonization to catch either cppyy.gbl.std.out_of_range or TypeError and raise IndexError- I don't like it becasue it's also used for other things, like actual type mismatch, eg. calling with str instead of int
  • allow both types of exception in test - getting an exception is already an improvement, the message says it's out of range. The users get confused why it's TypeError not IndexError

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this can be boiled down to a small reproducer that we can submit to ROOT because they probably didn't want to introduce too many "breaking" changes in their update of the cppyy they bundle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened an issue wlav/cppyy#230 as the problems appears also outside of ROOT bundle

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also made the ROOT folks aware via root-project/root#15375

_ = collection[0]