Connection pooling #21

chase-seibert · 2013-04-17T16:18:31Z

Connection pooling should use a separate pool API, not be completely embedded inside the happybase.Connection class.

Goal:

When using happybase in the context of a web application, it would be useful to re-use connections between page requests. A connection pooling solution should take a MIN, MAX and IDLE count as parameters, and open connections as needed by the application.

Inspiration:

socketpool https://github.com/benoitc/socketpool
pybase https://github.com/bcopeland/pybase
PyMongo http://api.mongodb.org/python/
PyCassa http://pycassa.github.io/pycassa/

API:

import happybase

pool = happybase.ConnectionPool(
    'hostname', 
    port=9090, 
    timeout=None, 
    min=0, 
    max=3, 
    idle=1, 
    autoconnect=False,
    compat='0.92', 
    transport='buffered')

# block == wait until a connection is available 
# versus raise an exception
connection = pool.get_connection(
    block=True,
    table_prefix=None, 
    table_prefix_separator='_')

The pool could be instantiated manually per-process in the setup flow of a in a web server framework. For example, in Django, this could be done in settings.py with AUTOCONNECT=False so that connections are not established until the first calls to get_connection().

Retries:

If a connection cannot be established, or is terminated (ie by a timeout), it would attempt to re-establish after RETRY_MS milliseconds.

Errors:

ConnectionPool could thrown an error right away if it can't establish MIN connections immediately. Otherwise, a call to pool.get_connection, will raise various exceptions for things like pool exhaustion (if BLOCK=False), cannot connect to Thrift endpoint, etc.

Other thoughts:

I can't see how we could support connection pooling between multiple python processes except by implementing a separate process to connect through, similar to pgpool.

chase-seibert · 2013-04-17T16:19:59Z

Tests would be implemented by mocking out the Connection class so that no actual sockets need to be opened.

chase-seibert · 2013-04-18T22:28:38Z

I have a prototype solution working. I'm going to battle test it on production for a week before I come with an official patch.

import time
import random
import contextlib
import happybase
from socketpool import ConnectionPool
from socketpool.conn import TcpConnector


class HappybaseConnectionPool(object):
    ''' singleton to share a connection pool per process '''

    pool = None
    _instance = None

    def __new__(cls, *args, **kwargs):
        if not cls._instance:
            cls._instance = super(HappybaseConnectionPool, cls).__new__(cls, *args, **kwargs)
        return cls._instance

    def __init__(self, host, **options):
        if not self.pool:
            options['host'] = host
            self.pool = ConnectionPool(
                factory=HappybaseConnector,
                max_size=options.get('max_size', 10),
                options=options,
            )

    def connection(self, **options):
        return self.pool.connection(**options)

    @contextlib.contextmanager
    def table(self, table_name):
        with self.pool.connection() as connector:
            yield connector.table(table_name)


class HappybaseConnector(TcpConnector):

    def __init__(self, host, port, pool=None, **kwargs):
        self.host = host
        self.port = port
        self.connection = happybase.Connection(self.host, self.port)
        self._connected = True
        # use a 'jiggle' value to make sure there is some
        # randomization to expiry, to avoid many conns expiring very
        # closely together.
        self._life = time.time() - random.randint(0, 10)
        self._pool = pool
        self.logging = kwargs.get('logging')

    def is_connected(self):
        if self._connected and self.connection.transport.isOpen():
            try:
                # isOpen is unreliable, actually try to do something
                self.connection.tables()
                return True
            except:
                pass
        return False

    def handle_exception(self, exception):
        if self.logging:
            self.logging.error(exception)
        else:
            print exception

    def invalidate(self):
        self.connection.close()
        self._connected = False
        self._life = -1

    def open(self):
        pass

    def close(self):
        self.release()

    def __getattr__(self, name):
        if name in ['table', 'tables', 'create_table', 'delete_table',
                'enable_table', 'disable_table', 'is_table_enabled', 'compact_table']:
            return getattr(self.connection, name)
        else:
            raise AttributeError(name)

You use it like this:

pool = HappybaseConnectionPool('localhost', '9090')
with pool.connection() as connection:
     connection.create_table('foobar')

budlight · 2013-04-23T20:23:33Z

shouldn't this support multiple thrift servers? pycassa has support for that.

chase-seibert · 2013-04-23T20:44:13Z

I'm hitting a bunch of Thrift instances behind a load balancer, which I think makes sense to run externally. If we did load balancing in process, it would mean implementing options like round-robin, least connection, etc. Not sure how you would deal with least-connection between various python processes; they would all be keeping their own connection counts, exclusive of each other.

I think it's better left to an external load balancer.

budlight · 2013-04-23T21:17:29Z

well long term you could have it aware of regionserver splits for performance. Netflix has a cassandra client that does this

http://techblog.netflix.com/2012/01/announcing-astyanax.html

wbolster · 2013-04-27T11:29:32Z

I think I agree with Chase. Connection pooling is hard, and it adds quite a bit of complexity. Other solutions like load balancers are actually designed to handle this problem on a network level (instead of a process level).

wbolster · 2013-05-02T23:46:50Z

I actually had a go at this since it also seems the way to go for multi-threading support. I've pushed my current code to a feature branch, which can be seen here: https://github.com/wbolster/happybase/commits/connection-pool

Copy/paste from the (w-i-p) docs:

Thread-safe connection pool.

A connection pool allows multiple threads to share connections. The
`size` parameter specifies how many connections this pool manages.
The pool is lazy; it opens new connections when requested.

To ensure that connections are actually returned to the pool after
use, connections can only be obtained using Python's context manager
protocol, i.e. the ``with`` statement. Example::

    pool = ConnectionPool(size=3, host='...')
    with pool.connection() as connection:
        print(connection.tables())

When a thread asks for a connection using
:py:meth:`ConnectionPool.connection`, it is granted a lease, during
which the thread has exclusive access to the obtained connection. To
avoid starvation, connections should be returned as quickly as
possible. In practice this means that the amount of code included
inside the ``with`` block should be kept to an absolute minimum.

The connection pool is designed so that any thread can hold at most
one connection at a time. This does not require any coordination
from the application: when a thread holds a connection and asks for
a connection for a second time (e.g. because a called function also
wants to use a connection), the same connection instance it already
holds is returned. Ultimately, once the outer ``with`` block (which
may be in a function up in the call stack) terminates, the
connection is returned to the pool.

Additional keyword arguments are passed unmodified to the
:py:class:`happybase.Connection` constructor, with the exception of
the `autoconnect` argument, since maintaining connections is the
task of the pool.

:param int size: the maximum number of concurrently open connections
:param kwargs: keyword arguments passed to
:py:class:`happybase.Connection`

What do you think? I'd appreciate comments/flames/feedback!

chase-seibert · 2013-05-03T00:24:40Z

Looks good to me. Probably makes more sense than including a dependency. It would be cool if there was a way of getting a single pool object w/o passing it around everywhere. That's what I'm using a singleton for; but I suppose you could always layer that on top of what you have.

See issue #21.

wbolster · 2013-05-20T19:31:48Z

Okay, I have landed a Connection Pool implementation in the master branch. Please try it out. Comments on the design and API are most welcome.

See the API docs at https://happybase.readthedocs.org/en/latest/api.html#happybase.ConnectionPool for more information and example usage.

I'm leaving this ticket open since I need to refactor the tutorial/user guide to incorporate some information on the connection pool.

wbolster · 2013-05-22T18:40:00Z

Fwiw, the feature branch is gone now that this feature has landed on master. I'll need to expand the docs (working on it already) before I consider this issue closed.

I'll also cook up a 0.5 release soonish with this feature and some other unreleased enhancements from the master branch.

wbolster · 2013-05-22T18:43:09Z

Oh, I forgot to mention that I have (privately) received positive test reports about the connection pool, so I have confidence the current implementation is ready for public release. :-)

wbolster · 2013-05-24T20:52:16Z

HappyBase 0.5 is out! https://twitter.com/wbolster/status/338034468780662784

wbolster added a commit that referenced this issue May 20, 2013

Add thread-safe connection pool

08bdce5

See issue #21.

wbolster closed this as completed May 24, 2013

wbolster mentioned this issue Feb 17, 2015

Added host failover #81

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection pooling #21

Connection pooling #21

chase-seibert commented Apr 17, 2013

chase-seibert commented Apr 17, 2013

chase-seibert commented Apr 18, 2013

budlight commented Apr 23, 2013

chase-seibert commented Apr 23, 2013

budlight commented Apr 23, 2013

wbolster commented Apr 27, 2013

wbolster commented May 2, 2013

chase-seibert commented May 3, 2013

wbolster commented May 20, 2013

wbolster commented May 22, 2013

wbolster commented May 22, 2013

wbolster commented May 24, 2013