A simple but not simple mysql connection pool based on PyMySQL
.
While using pymysql with python multithreading, generally we will face the questions:
- It can't share a connection created by main thread with all sub-threads. It will result in the following error:
pymysql.err.InternalError: Packet sequence number wrong - got 0 expected 1
- If we make every sub-thread to create a connection and close it when this sub-thread ends that's workable but obviously lead to high cost on establishing connections with MySQL.
- Simple: just use it, there is no extra learning costs.
- Performance: almost no extra load compared to the original PyMysql(simple benchmark).
- Flexible: pre_create connection or just create when really need; normal pool size and max pool size for the scalability, it all depends on you.
- Thoughtful:
connection lifetime
andpre_ping
mechanism, in case of borrow a brokend connection from the pool(such as closed by the mysql server due towait_timeout
setting).
This module contains two classes:
Connection
class: this is a subclass ofpymysql.connections.Connection
. It can be used with or without a connection_pool, It used in the exact same way as pymysql. The details implementation of connection pool is hiddened (when used with a connection_pool additional actions are needed to maintain the pool).ConnectionPool
class: instance of this class represents the actual connection_pool.
Using the concept of connection pool, there are also some aspects should be considered except the core features, such as:
- when getting connection from a pool: we should deal with the retry_num and retry_interval parameters,in order to give the borrower more chance and don't return the
GetConnectionFromPoolError
error directly. - when putting connection back to pool: if the queries executed without exceptions, this connection can be putted back to the pool directly; but if exception occurred we have to decide whether this connection should be putted back to the pool depending on if it is reusable (depends on the exception type).
Luckily, this module will take care of these complicated details for you automaticly.
It also allows to create more than one connection_pool (with distinct ConnectionPool.name
attribute) to be associated with different databases.
pip install pymysql-pool
In the example below we're going to see how it works:
-
Create a pool with base/normal size is 2 and max size is 3, with pre_create_num=2 means will create 2 connections in the init phase:
>>> import pymysqlpool >>> pymysqlpool.logger.setLevel('DEBUG') >>> config={'host':'xxxx', 'user':'xxx', 'password':'xxx', 'database':'xxx', 'autocommit':True} >>> mypool = pymysqlpool.ConnectionPool(size=2, maxsize=3, pre_create_num=2, name='mypool', **config) 03-08 15:54:50 DEBUG: Create new connection in pool(mypool) 03-08 15:54:50 DEBUG: Create new connection in pool(mypool) >>> mypool.total_num 2 >>> con1 = mypool.get_connection() 12-25 21:38:48 DEBUG: Get connection from pool(mypool) >>> con2 = mypool.get_connection() 12-25 21:38:51 DEBUG: Get connection from pool(mypool) >>> mypool.available_num 0
-
Now the pool is empty, and we still borrow a connection from it, with the default parameters of get_connection(), we will see :
>>> con3=mypool.get_connection() 03-08 15:57:32 DEBUG: Retry to get connection from pool(mypool) 03-08 15:57:32 DEBUG: Retry to get connection from pool(mypool) 03-08 15:57:32 DEBUG: Retry to get connection from pool(mypool) 03-08 15:57:33 DEBUG: Create new connection in pool(mypool)
above message show us: although pool is empty, but the max size isn't reached, so after several times retry, a new connection is create(now max size of pool is reached)
-
Let's try to get another connection from pool:
>>> con4=mypool.get_connection() 03-08 16:29:43 DEBUG: Retry to get connection from pool(mypool) 03-08 16:29:43 DEBUG: Retry to get connection from pool(mypool) 03-08 16:29:43 DEBUG: Retry to get connection from pool(mypool) Traceback (most recent call last): File "/Users/kai/github/pymysql-pool/pymysqlpool.py", line 176, in get_connection conn = self._pool.pop() IndexError: pop from an empty deque ... ... pymysqlpool.GetConnectionFromPoolError: can't get connection from pool(mypool), retry_interval=0.1(s)
we can see that after several times retry, finally raise a exception
GetConnectionFromPoolError
-
Now let's see the connection's behavior while calling close() method or using it with Context Manager Protocol
>>> con1.close() 2017-12-25 21:39:56 DEBUG: Put connection back to pool(mypool) >>> with con2: with con2.cursor() as cur: cur.execute('select 1+1') 1 12-20 22:44:37 DEBUG: Put connection back to pool(mypool) >>> mypool.total_num 3 # as we expect >>> mypool.available_num 2 # as we expect
We can see that the module maintains the pool appropriately when (and only when) we call the close() method or use the Context Manager Protocol of the connection object.
I did a simple benchmark, focusing on the performance impact of the "extra" get
and return
operations in this module.
The test logic is in the simple-benchmark.py
, You can check and do it yourself.
Below is my test(loop 50000 )
# 'pymysql-one-conn' is the best performing scenario, native pymysql, and all queries are done within a single connection
➜ pymysql-pool ✗ python3 simple-benchmark.py pymysql-one-conn 50000
total 50000 finish within 6.564s.
7616.86 queries per second, avg 0.13 ms per query
➜ pymysql-pool ✗ python3 simple-benchmark.py pymysql-one-conn 50000
total 50000 finish within 6.647s.
7522.31 queries per second, avg 0.13 ms per query
➜ pymysql-pool ✗ python3 simple-benchmark.py pymysql-one-conn 50000
total 50000 finish within 6.558s.
7623.71 queries per second, avg 0.13 ms per query
➜ pymysql-pool ✗ python3 simple-benchmark.py pymysql-one-conn 50000
total 50000 finish within 6.737s.
7421.67 queries per second, avg 0.13 ms per query
# 'pymysql-pool' uses connection pool (as long as the pool is greater than 1, it doesn't matter because the test logic is executed sequentially in a for loop).
➜ pymysql-pool ✗ python3 simple-benchmark.py pymysql-pool 50000
total 50000 finish within 6.999s.
7143.77 queries per second, avg 0.14 ms per query
➜ pymysql-pool ✗ python3 simple-benchmark.py pymysql-pool 50000
total 50000 finish within 7.066s.
7076.48 queries per second, avg 0.14 ms per query
➜ pymysql-pool ✗ python3 simple-benchmark.py pymysql-pool 50000
total 50000 finish within 6.999s.
7143.71 queries per second, avg 0.14 ms per query
➜ pymysql-pool ✗ python3 simple-benchmark.py pymysql-pool 50000
total 50000 finish within 6.968s.
7175.65 queries per second, avg 0.14 ms per query
As we can see that one time get
plus return
operation only takes about 0.01ms.
-
We should always use either the
close()
method orContext Manager Protocol
of the connection object. Otherwise the pool will exhaust soon. -
The
Context Manager Protocol
is preferred. It can achieve an effect similar to the "multiplexing", means the more Fine-Grained use of pool, also do more with less connections.