-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Ensure SQLAlchemy sessions are closed #25031
fix: Ensure SQLAlchemy sessions are closed #25031
Conversation
ad2cf09
to
d3ca60a
Compare
d3ca60a
to
a9427f0
Compare
superset/tasks/cache.py
Outdated
@@ -96,6 +96,7 @@ class DummyStrategy(Strategy): # pylint: disable=too-few-public-methods | |||
def get_payloads(self) -> list[dict[str, int]]: | |||
session = db.create_scoped_session() | |||
charts = session.query(Slice).all() | |||
session.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gather these standalone scoped sessions exist because tasks fall outside the jurisdiction of Flask. If this isn't the case then—to adhere to the KISS priniciple—we should simply reuse the Flask-SQLAlchemy session.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. However, at some point we might want to discuss if we may want to make it possible to not commit in the DAOs or similar methods. I've been writing some custom functionality in our security manager that wants to perform transaction-like operations, and I'd prefer to either have the whole chain of events succeed, or rollback everything. An option could be to add a commit=True
parameter to these methods, or then just remove the explicit commit and assume callers do it (e.g. in the api.py
) after performing the subtasks.
@villebro I was talking with @michael-s-molina earlier this week and plan to write a SIP about DAOs. The plan is—as you pointed out—that one shouldn't commit within a DAO but rather on a unit of work (typically a Flask request, though in our case it's likely a command). I hope to have the SIP drafted earlier next week which will also mention SQLAlchemy nested transactions, which as the name suggests, allows for one to nest operations (leveraging This removes the need to pass around |
@john-bodley this sounds amazing - I really look forward to this reform! ❤️ |
a9427f0
to
8ce364c
Compare
92d241c
to
5de7ef8
Compare
superset/models/dashboard.py
Outdated
|
||
# copy template dashboard to user | ||
template = session.query(Dashboard).filter_by(id=int(dashboard_id)).first() | ||
template = db.session.query(Dashboard).filter_by(id=int(dashboard_id)).first() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we use the connection
parameter here? According to the event definition:
connection
– the Connection
being used to emit INSERT statements for this instance. This provides a handle into the current transaction on the target database specific to this instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michael-s-molina I've reverted the PR back to my initial implementation which leverages the connection
parameter.
It states that it provides a handle into the current transaction, yet we're likely over committing, i.e., there should only be one commit per unit of work. I'm not sure what happens if you close the session without committing. Maybe we using the connection wrongly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it seems we should be using SQLAlchemy's object_session
rather than creating a new one. This should remove the need to commit and explicitly close the session as this is handled by Flask-SQLAlchemy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regrettably I was running into this issue and had to revert my last commit. I think we likely will have to revisit how tagging is working at a later date.
5de7ef8
to
cec52c6
Compare
e2e37fe
to
cfc8e46
Compare
f4443a2
to
73d4ed1
Compare
73d4ed1
to
cfc8e46
Compare
@@ -414,13 +415,12 @@ def export_dashboards( # pylint: disable=too-many-locals | |||
"native_filter_configuration", [] | |||
) | |||
for native_filter in native_filter_configuration: | |||
session = db.session() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear why a new session was instantiated (and never closed) here. On line #433 the Flask-SQLAlchemy session is used and thus it seems prudent (and hopefully) safe to use db.session
.
superset/models/dashboard.py
Outdated
@@ -96,6 +96,7 @@ def copy_dashboard(_mapper: Mapper, connection: Connection, target: Dashboard) - | |||
) | |||
session.add(extra_attributes) | |||
session.commit() | |||
session.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can just reuse the db.session everywhere so we don't create this arbitrary session
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we really need to use this session, better to wrap into try/catch and close at final otherwise exceptions thrown in commit will skip session.close
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zephyring that is the ideal solution, using the same session (which is obtainable via inspect(target).session
) however per #25031 (comment) this is currently problematic given the way our SQLAlchemy event listener callbacks are configured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@john-bodley Could you apply the try/finally
pattern to the places where you're closing the session? I also think we can remove the session.commit()
on line 91.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW @michael-s-molina per our offline conversation you can see here that the session is rolled back before closing.
This is great. We were just investigating this issue that we originally thought was a problem with ssh connections, but later found that that the root cause was data dbs hitting the max connections. cc @hughhhh |
07e6f80
to
5a11302
Compare
(cherry picked from commit adaab35)
(cherry picked from commit adaab35) (cherry picked from commit 0f390e401555da4cc8639eb7b3bfa462f23edef7)
SUMMARY
At Airbnb we've been running into the infamous SQLAlchemy QueuePool issue which typically occurs when connections aren't being closed. Thankfully Flask-SQLAlchemy normally handles this for us, per here, when the app context is torn down.
Grokking the code it seems like there are instances (for right or wrong) where we create our own sessions which aren't then explicitly closed. This PR ensures that either i) these sessions are now closed, or ii) the Flask-SQLAlchemy session is used.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
CI.
ADDITIONAL INFORMATION