Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialization speedup proposal #192

Open
almaslov opened this issue Jan 31, 2023 · 4 comments
Open

Deserialization speedup proposal #192

almaslov opened this issue Jan 31, 2023 · 4 comments

Comments

@almaslov
Copy link

Feature request

Hello!
As already mentioned in other discussions python implementation has issues in serialization and deserialization performance(e.g. here). I focus on deserialization here.
I've tried some fixes on galactic branch and they look very promising for me: depending on message structure and size it's about 3-20 times faster. x3 for simple messages, up to x20 for complex types with high nesting level and array fields.

Feature description

The main idea is to avoid pure python code in message deserialization. To achieve it, every message is split into C extension base class and pure python derived class. C extension class contains all message fields, so deserialization method initializes C structure members without calling python methods. On the other hand pure python class contains all dunder methods and assertions.

Implementation considerations

Here is an example of how generated files look like for simple message file Example.msg:

_example_base.c extension module defines base class containing members of native types in lines 10-25.

Python module _example.py is the same as in upstream except for base class(line 62) and microoptimization of assertion in __init__ method(line 95).

Support extension module _example_s.c now uses specific subtype of PyObject(line 147), calls __new__ instead of __init__ to prevent pure python calls(line 170), directly initializes C typed fields of message(lines 188, 191) and creates nested collections (line 193-212).

Benchmark for this message which runs deserialization routine 1 million times gives 28 seconds vs 6,5 seconds

In case of more complex message like TestArrayComplex (from this issue) with 100 nested TestElement elements, the same benchmark gives 57,6 seconds vs 5,5 seconds for 50K deserializations.

I run benchmark with python3.8, ROS galactic, ubuntu 20.04, AMD 2,85GHz.

Pros:

  • performance improvement
  • public interface of message is not changed (as far as I can see)

Cons:

  • binary dependency for basic message operations(creation, reading/writing fields).
  • no more assertions while deserialization(I think it can be added)
  • initialization logic is duplicated in __init__ and *_convert_to_py methods.
  • compile time increased

I'm interested if this solution worth porting from my local galactic branch to upstream. Or may be it violates some ROS architecture principles.

@Voldivh
Copy link
Contributor

Voldivh commented Feb 2, 2023

Hi @almaslov , thanks for doing this effort and contribution and taking the time to explain your solution/feature to this issue. Would it be possible for you to create a PR so we can review it a little bit in more detail and possibly merge a solution upstream?

@almaslov
Copy link
Author

almaslov commented Feb 7, 2023

I was too optimistic about

public interface of message is not changed

It is not changed indeed, but costs me a questionable workaround. There are two inconsistent points:

  • message instance should not have arbitrary fields(no __dict__ field should be generated), so using __slots__ field is a reasonable solution.
  • message should contain full list of private fileds in __slots__ to preserve public api.

Workaround is described in comment to metaclass new method.

In addition to the above I patched current implementation instead of making side one.

@almaslov
Copy link
Author

almaslov commented Feb 8, 2023

I've tried the same approach for serialization and unexpectedly performance boost is quite large too.
In case of the same message TestArrayComplex it is 16,5 seconds vs 1,4 seconds for 50K serializations.

Patch is in PR already.

@Voldivh
Copy link
Contributor

Voldivh commented Feb 8, 2023

@almaslov thanks for pushing this! I'll be taking a look 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants