Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add __slots__ to pl.Expr and pl.LazyFrame #13198

Closed
Object905 opened this issue Dec 22, 2023 · 9 comments · Fixed by #13236
Closed

Add __slots__ to pl.Expr and pl.LazyFrame #13198

Object905 opened this issue Dec 22, 2023 · 9 comments · Fixed by #13236
Assignees
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature python Related to Python Polars

Comments

@Object905
Copy link
Contributor

Description

Profiled count of created pl.Expr and pl.LazyFrame in my use case (graphQL api on top of polars).
Found that one of the heaviest queries creates pl.Expr around 3k times and pl.LazyFrame around 1k times.

Tried to "monkey patch" adding slots on them directly in .venv and found that it seems to be working ok.

Will follow up with PR later.

@Object905 Object905 added the enhancement New feature or an improvement of an existing feature label Dec 22, 2023
@ritchie46
Copy link
Member

What will that do?

@mcrumiller
Copy link
Contributor

__slots__ improves the performance of class attribute access, sometimes by quite a bit. I tried playing with this in dataframe/series a while back and ran into issues with namespace registration, since using __slots__ means you can't dynamically add attributes to a class. To get around this, I had to create a separate namespace dict and override __getattr__ to detect namespace retrieval, and in doing so removed the performance benefit of using __slots__.

@Object905
Copy link
Contributor Author

Object905 commented Dec 22, 2023

Well, then registration of namespace would be better handled by using descriptors.

Like this. Will experiment on this once I get to PR.

class Namespace:
    def __get__(self, obj, objtype=None):
        return "FooNamespace"


class Foo:
    __slots__ = ("_bar",)


Foo.namespace = Namespace()

instance = Foo()
print(instance.namespace)  # prints FooNamespace
print(Foo.namespace) # prints FooNamespace too

@Object905
Copy link
Contributor Author

Object905 commented Dec 22, 2023

Having __slots__ only means that you can't add attributes to instances, not class itself . Class object would still have __dict__.

@Object905
Copy link
Contributor Author

NameSpace class that wraps plugins is already is a descriptor. And it seems like it's registered on classes in _create_namespace, and that should not interact with __dict__ of instances in any way.
So, given that it goes to Expr.__dict__ I don't see why you would need to define __getattr__ on Expr.

@mcrumiller Am I missing something here?

@mcrumiller
Copy link
Contributor

@Object905 if it works, go for it! I may have done something incorrect, and this was a while ago so I don't recall the details. Are you talking about adding _expr and _ldf to the __slots__?

@Object905
Copy link
Contributor Author

Yes. Builtin namespaces, like str, list, etc also good candidates.

@stinodego stinodego added the accepted Ready for implementation label Feb 18, 2024
@stinodego stinodego added the python Related to Python Polars label Feb 19, 2024
@knl
Copy link
Contributor

knl commented Mar 1, 2024

@Object905 what do you recommend for adding attributes to instances, given that after this change one can't use setattr?

@mcrumiller
Copy link
Contributor

@knl I was just about to make a feature request for an empty attr dictionary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants