Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arbitrary Python objects #32

Open
adriangb opened this issue Nov 7, 2022 · 4 comments
Open

Arbitrary Python objects #32

adriangb opened this issue Nov 7, 2022 · 4 comments

Comments

@adriangb
Copy link

adriangb commented Nov 7, 2022

Maybe this is totally off, but would it even be possible to support arbitrary Python objects? For example, if I had a Python function like this:

class Input(TypedDict):
  type: Literal["http"]
  status_code: int
  method: Literal["GET", "PUT"]
  callback: Callable[[], None]

class Output(TypedDict):
  message: str
  callback: Callable[[], None]

def process(inp: Input) -> Output:
  assert inp["type"] == "http"
  status_code = input["status_code"]
  assert 99 < status_code < 600
  method = input["method"]
  assert method in ("GET", "PUT")
  msg = f'{method} {status_code}"
  return Output(message, inp["callback"])

This crate is super convenient to parse and validate type, status_code and method since you can make a declarative serve model for this:

#[derive(Serialize, Deserialize]
#[serde(rename_all = "UPPERCASE")]
enum Method {
  Get,
  Post,
}

#[derive(Serialize, Deserialize]
struct HTTPMessage {
  status_code: u32,
  method: Method,
}

#[derive(Serialize, Deserialize]
#[serde(tag = "type")]
enum Message {
  Http(HTTPMessage),
}

But as far as I can tell there is no way to say "make the callback key a Py<PyAny>. In other words, something like:

#[derive(Serialize, Deserialize]
struct HTTPMessage {
  status_code: u32,
  method: Method,
  callback: Py<PyAny>,
}

Is that right?

@davidhewitt
Copy link
Owner

Hello, sorry for the very slow response. You're correct at the moment this isn't possible.

I wonder if Pythonize and Depythonize traits could solve this problem. They could have blanket implementations for T: Serialize (or Deserialize) and then Python types which aren't serde-compatible could be handled separately. Other than that, I'm not aware of a way that we could hook into serde to achieve this.

(This problem is very related to #1 I think.)

@apendleton
Copy link

To deal with this in a current project, I've been using a proxy class that can wrap an object and make it look like a dict to depythonize, that looks something like:

class DictProxy(collections.abc.Mapping):
    _inner = None
    _keys = None

    def __init__(self, inner, aliases=None):
        self._aliases = aliases or {}
        self._inner = inner
        self._keys = [k for k in dir(inner) if not k.startswith("_")]
        for alias in self._aliases:
            if alias not in self._keys:
                self._keys.append(alias)

    def __getitem__(self, key):
        if key in self._aliases:
            alias = self._aliases[key]
            if type(alias) is str:
                return getattr(self._inner, self._aliases[key])
            else:
                return alias(self._inner)
        elif key in self._keys:
            return getattr(self._inner, key)
        else:
            raise KeyError

    def __iter__(self):
        yield from self._keys

    def __len__(self):
        return len(self._keys)

    def __contains__(self, key):
        return key in self._keys

    def keys(self):
        return list(self._keys)

I only need to go in the Python -> Rust direction, but I think a similar approach could work in the other direction as well. One semi-major frustration at the moment, though: as currently implemented, the deserializer for mappings calls both .keys() and .values() on the incoming map, and eagerly evaluates the returned iterators, which means every field on the object gets accessed, including @property fields (which get evaluated), even if those fields aren't actually needed in the deserialization. At least in my application, this is resulting in some unnecessary slow/expensive calls, and I haven't yet figured out a way around it.

It doesn't seem like there's any reason it has to work that way, but that's how it works now. I think if object deserialization were to be explicitly supported, some kind of laziness would be important.

@jonathan-s
Copy link

Given that pythonize doesn't yet support arbitrary python objects it is not as powerful as pickle which means that it doesn't have the same security concerns as pickle does. If you manage to support arbitrary python objects it's worth considering to leave a function that doesn't support the full set of python objects as that is a feature in of itself in terms of keeping security tight.

@Stargateur
Copy link
Contributor

Stargateur commented Jul 18, 2024

I'm using dill (or pickle) to serialize any python object with a little code you can do a serde with module and it's work nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants