Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binding(Python): Use Python classes for codec messages #559

Closed
HenningHolmDE opened this issue Jul 15, 2024 · 4 comments · Fixed by #568
Closed

binding(Python): Use Python classes for codec messages #559

HenningHolmDE opened this issue Jul 15, 2024 · 4 comments · Fixed by #568

Comments

@HenningHolmDE
Copy link
Contributor

Currently, the codec messages are typeless dictionaries, e.g. {"tag": "a", "body": "Noop"}.

Using classes (e.g. Command, Greeting) for each of these messages would allow making it more clear what type the methods accept and return, even if these classes itself are constructed from dictionaries, e.g. Command({"tag": "a", "body": "Noop"}).

@duesee
Copy link
Owner

duesee commented Jul 15, 2024

I think this is a reasonable next step towards a pythonic API.

@AzideCupric
Copy link

I feel like the current Python side of type hints (pr #568) could still use a little more depth, and the internal structure is still a bit mysterious.

I'm not super familiar with Rust and PyO3, so I read the PyO3 documentation, but I couldn't find a way to export fields from a struct with a lifecycle in Rust to Python via PyO3.

The PyO3 documentation states that #pyclass(get_all, set_all) can make Python access all fields of a Rust struct, but it unfortunately doesn't work on something like:

#[pyclass(name = "Greeting", eq, get_all, set_all)] //Err: `get` and `set` with tuple struct fields require `name`
pub(crate) struct PyGreeting(pub(crate) Greeting<'static>);

So I gave it a go on my own fork, trying to convert the types in imap-types to Pydantic's model on the Python side and rebuild it back to a structure similar to the Rust side from the exported dict. Here are some attempts and tests I made.

@HenningHolmDE
Copy link
Contributor Author

@AzideCupric Thank you for your input!

I feel like the current Python side of type hints (pr #568) could still use a little more depth, and the internal structure is still a bit mysterious.

Could you elaborate a bit more on this? Then we can guide further development in the right direction with respect to developer experience. Currently, we aim at providing some basic way of using the capabilities of imap-codec for Python developers while not doing a reimplementation of the library in Python.

I'm not super familiar with Rust and PyO3, so I read the PyO3 documentation, but I couldn't find a way to export fields from a struct with a lifecycle in Rust to Python via PyO3.

The PyO3 documentation states that #pyclass(get_all, set_all) can make Python access all fields of a Rust struct, but it unfortunately doesn't work on something like:

#[pyclass(name = "Greeting", eq, get_all, set_all)] //Err: `get` and `set` with tuple struct fields require `name`
pub(crate) struct PyGreeting(pub(crate) Greeting<'static>);

Indeed this seems to be one of the largest issues for improving the Python experience for the types. Because imap-types aims on allowing zero-copy implementations, the corresponding types have lifetime parameters (e.g. Greeting<'a>) for allowing the compiler to track the lifetime of the borrowed external data.
One way around this is creating newtype-like wrappers for the pyclass implementation (e.g. struct PyGreeting(Greeting<'static>)), which bind the lifetime to the remainder of the programs execution ('static). However, in this case all functionality to be available from Python has to be re-implemented (or forwarded) manually on these wrappers, because - as you pointed out - the PyO3 macros do not access the implentation of the wrapped type.
This is the reason why I started this implementation on the highest layer for the codecs' interface types (fragments, messages) only.

Going forward, it might be possible to change automatically switch the lifetime parameters of the types in imap-types to 'static when used in conjunction with PyO3, but I did not dig into that any deeper yet.

So I gave it a go on my own fork, trying to convert the types in imap-types to Pydantic's model on the Python side and rebuild it back to a structure similar to the Rust side from the exported dict. Here are some attempts and tests I made.

I really like the idea of using Pydantic to create a verifyable model. However, looking at your implementation for greeting, wouldn't this basically lead to a complete reimplementation of imap-types in Pydantic?
The basic idea would then be to be able to convert between the Pydantic representation and the one using imap-types and reuse imap-codec for encoding/decoding?

@AzideCupric
Copy link

Thank you for your patience.

Could you elaborate a bit more on this? Then we can guide further development in the right direction with respect to developer experience.

I am trying to implement an Email adapter for the NoneBot project. To do this, I am looking for a Python library that can effectively decode and encode IMAP protocol commands and responses. When I found imap-codec, I noticed it was preparing to release Python bindings. I tried using the current binding code but found that the decode method's output is only a dict, lacking clear internal structure and type hints.

For instance, when the server returns a response like:

* 3 EXISTS
* 2 EXPUNGE
A002 OK IDLE terminated

I would like it to parse into Python classes:

# Now: 
{'Data': {'Exists': 3}}
{'Data': {'Expunge': 2}}
{'Status': {'Tagged': {'tag': 'A002', 'body': {'kind': 'Ok', 'code': None, 'text': 'IDLE terminated'}}}}
# Want: 
Data(data=Exist(3))
Data(data=Expunge(2))
Status(status=Tagged(
    tag='A002',
    body=Body(
    	kind="Ok",
        code=None,
        text="IDLE terminated"
    )
))

This approach would allow using match-case to handle different results and benefit from type-checking tools (like pyright) for type inference.

match ResponseCodec.decode(...):
    case Status() as s:
        print(s.body.kind, s.body.text)
    case Data(data=d):
        match d:
            case Exists() as e:
                emit_event("new_email", e.exists)
            case Expunge() as e:
                emit_event("del_email", e.expunge)
            case _:
                ...
    case _:
        ...

While pure dicts can also use match-case:

match ResponseCodec.decode(...):
    case {"Status": s}:
        ...
    case {"Data": d}:
        match d:
            case {"Exists": e}:
                ...
            case {"Expunge": e}:
                ...
            case _:
                ...
    case _:
        ...

This requires manually entering key values, which has the risk of typos/structural errors, and type-checking tools can't infer the type of s in {"Status": s} and other more.

Additionally, using classes makes it easier to construct a command to pass to the encode method, which is more convenient than manually typing the whole dict and allows type-checking tools to provide assistance and checks.

cmd = {'tag': 'a002', 'body': {'Select': {'mailbox': 'Inbox'}}}
# better
cmd = Command(tag=Tag("a002"), body=Select(mailbox=Mailbox.Inbox))
encoded_cmd = CommandCodec.encode(cmd.model_bump())

However, looking at your implementation for greeting, wouldn't this basically lead to a complete reimplementation of imap-types in Pydantic?

Even if PyO3 could convert Rust structs with lifetimes into the correct Python classes, it wouldn't automatically generate the class stubs on the Python side (Python Interface (.pyi) generation and runtime inspection). So, even if #[pyclass] worked, the .pyi files would still need to be manually written. What I am doing now is akin to manually constructing .pyi files, but also supporting serialization and deserialization of the dict output from the decode method. I don't see this as a reimplementation of imap-types in Pydantic because the dict output from the decode method is already the result of imap-codec parsing, with fixed keys and structure.

The basic idea would then be to be able to convert between the Pydantic representation and the one using imap-types and reuse imap-codec for encoding/decoding?

Yes, that's correct. Pydantic only needs to reconstruct the model based on the dict output, without performing complex validation (which is already handled by imap-codec). The model is then exported and passed back to the encode method, ensuring correctness by imap-codec.


I hope this clarifies my suggestions and the reasoning behind them. I would be happy to provide more details or assistance if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants