Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peformance issue with get_many #478

Open
matejsp opened this issue Feb 4, 2023 · 2 comments
Open

Peformance issue with get_many #478

matejsp opened this issue Feb 4, 2023 · 2 comments

Comments

@matejsp
Copy link
Contributor

matejsp commented Feb 4, 2023

When we replaced python-memcached with pymemcache in production we noticed increased latencies on our endpoints with several memcached servers (4).

After drilling down there major performance difference is between one and other implementation of get_many:

python-memcached get_many:
https://github.com/linsomniac/python-memcached/blob/master/memcache.py#L1195
request to first server, send request to second server, send request to third server, ...
wait for response from first server, second server ... etc

pymemcache:
https://github.com/pinterest/pymemcache/blob/master/pymemcache/client/base.py#L1182 and https://github.com/pinterest/pymemcache/blob/master/pymemcache/client/hash.py#L400
send request to first server, wait for response,
send request to second server, wait for response, ...
...

Do you have any idea how to optimise this?

@jogo
Copy link
Contributor

jogo commented Feb 10, 2023

Great find @matejsp, I suspect the way to optimize this is to refactor the hashing client to use a similar pattern to python-memcached's model.

We haven't hit this issue ourselves as use mcrouter.

@matejsp
Copy link
Contributor Author

matejsp commented Feb 6, 2024

Since I referenced this issue in Django, I would like to share some additional benchmarks that I made using pymemcache and python-memcached inside Django.

We are using memcached (ElastiCache) that has a round trip from each server on average 1 ms.

In our case we use 4 memcached server and when we call get_many based on hash algorithm it splits the load between all 4 servers and calls on each server get_many. Each call takes 1 ms so in total we observed total time of 4 ms (can also be 1ms, 2ms, 3ms, 4ms depending on hash key distribution and how many servers you hit with get_many).

In python-memcached when calling 4 servers in a loop the logic is optimized in a way that it first it sends to all 4 server and only then in receives the response from each of them taking 1 ms (basically waits for the slowest server) in total to complete get_many call.

I checked the code and from what I see _fetch_cmd/_store_cmd should be splited into sending and receiving logic and somehow brought up to hash client taking connection pooling into account. You would need to get 4 servers from pool, call send on each and then wait for receive for all servers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants