Using a NullPool for external connections by default (#4251)

Currently, even though `get_sqla_engine` calls get memoized, engines are still short lived since they are attached to an models.Database ORM object. All engines created through this method have the scope of a web request. Knowing that the SQLAlchemy objects are short lived means that a related connection pool would also be short lived and mostly useless. I think it's pretty rare that connections get reused within the context of a view or Celery worker task. We've noticed on Redshift that Superset was leaving many connections opened (hundreds). This is probably due to a combination of the current process not garbage collecting connections properly, and perhaps the absence of connection timeout on the redshift side of things. This could also be related to the fact that we experience web requests timeouts (enforced by gunicorn) and that process-killing may not allow SQLAlchemy to clean up connections as they occur (which this PR may not help fixing...) For all these reasons, it seems like the right thing to do to use NullPool for external connection (but not for our connection to the metadata db!). Opening the PR for conversation. Putting this query into our staging today to run some tests.
apache · Jan 23, 2018 · 4b11f45 · 4b11f45 · HUSSTECH · Mar 5, 2018
1 parent 04ae004
commit 4b11f45
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/superset/models/core.py b/superset/models/core.py
@@ -639,7 +639,7 @@ def get_effective_user(self, url, user_name=None):
 
     @utils.memoized(
         watch=('impersonate_user', 'sqlalchemy_uri_decrypted', 'extra'))
-    def get_sqla_engine(self, schema=None, nullpool=False, user_name=None):
+    def get_sqla_engine(self, schema=None, nullpool=True, user_name=None):
         extra = self.get_extra()
         url = make_url(self.sqlalchemy_uri_decrypted)
         url = self.db_engine_spec.adjust_database_uri(url, schema)