Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue #546

Open
umax opened this issue Nov 22, 2022 · 2 comments
Open

Performance issue #546

umax opened this issue Nov 22, 2022 · 2 comments
Labels

Comments

@umax
Copy link

umax commented Nov 22, 2022

Hey, guys! I use this library in my Google Cloud Run services to work with Google Datastore. Let's say you have a service with 10 requests per second traffic. Your service makes request to Google Datastore to fetch 250 entities. Nothing more. Every object from query result converted to Entity class instance. So, Google Cloud Run monitoring shows that my service make 50% CPU load. If I disable converting query results to Entity, service uses 8% CPU. The problem in this function: https://github.com/talkiq/gcloud-aio/blob/master/datastore/gcloud/aio/datastore/value.py#L31
Any ideas how to fix this performance issue?

@TheKevJames
Copy link
Member

Hi @umax, thanks for reporting this!

Offhand, I don't see any major perf issues with that code; ultimately all the work being done is "parse the response". Unfortunately, that means there probably won't be any quick wins here :( For this specific case, you could try providing a custom subclass for Value which has from_repr overridden to do less parsing, which might help a bit, but it's hard to say how much. If you override all the things with custom subclasses, you might get some improvements, but it'll be a bit of whack-a-mole, I fear.

More generally, the gcloud-aio library was generally built on the idea of converting back-and-forth from code representations to api representations, so this is a pretty fundamental design change. We've discussed before migrating the libraries to support a dual mode, eg. to have a client which handles just the raw api data and then an optional OOP layer on top. I'm thinking that might be the only real way to tackle this sort of thing, as exposing the raw details directly is going to be nearly impossible.

So... yeah, allowing direct API access is really the only path I see here. I'd certainly be open to it, but it's going to be a really big change and is going to need a bunch of design work upfront to see how we'd want to do it across all our libs to maintain a consistent feel. If you (or anyone else!) is interested in helping with this work, we can certainly consider it, but on our side of things we likely don't have OSS maintenance hours to devote to this sort of work in the near future.

@umax
Copy link
Author

umax commented Dec 27, 2022

Hi, @TheKevJames, thanks for reply!
I've made some benchmarks and found that most slowest part of the library inside from_repr method of Value class. I tried to improve it (got only x4-5 times, but it's not enough for me). The problem is that your library converts every raw property to Python type. Usually it's not necessary, because mostly you will analyze only some properties, change it and push in Datastore again. That's why I decided to create my own library (mostly the idea is the same - convert from API representation to Python objects), but with lazy loading entity's properties. It works about ~9-10 times faster and is more strict about mirroring HTTP API structures on Python classes. I tried to patch your library, but it breakes backward compatibility because of lazy loading approach (

Feel free to close this task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants