Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: scheduled docs with latest_partition fail to run #1101

Merged
merged 1 commit into from
Dec 14, 2022

Conversation

jczhong84
Copy link
Collaborator

Issue
A user reported that one of their scheduled doc failed to run with status stuck on RUNNING, while manual run each query cell succeeds. (manual run the schedule also fails).

Findings

  • Error message got from the docker container:
    [2022-12-13 18:44:36,616: ERROR/ForkPoolWorker-210079] Task run_data_doc_216961[532b190e-f6c3-4710-8a84- bc6f2701f904] raised unexpected: DetachedInstanceError('Instance <DataCell at 0x7f040851f250> is not bound to a Session; attribute refresh operation cannot proceed')
  • We dont see similar failure from other scheduled data docs
  • The difference from the user's scheduled doc is one of the query has template function latest_partition
  • More investigation shows that, the issue only happens when
    • one of the query has template function latest_partition. AND
    • there is at least one more query cell after the template query

Cause
The issue was introduced since PR-1073. To explain the cause, we have to understand how the session works in Sqlalchemy.

  • There are two states of an ORM object in a session. detached and expired
  • detached means that the object has been fully removed from the session. Any changes made to the object will not be committed to the db anymore.
  • expired means the content of the object has been expired, while the object could be still attached to a session or maybe not.
  • when exits from with DBSession() or with_session, all objects will be detached from the session, but may be not expired
  • calling session.commit will cause all objects of a session to be expired, but not detached.
  • If an object is still attached but expired, when you access its attribute, it will get refreshed.
  • If an object is detached but not expired, you can still access its attribute, but it will not get refreshed.
  • If an object is detached and also expired, you can't access its attribute, which will cause the exception of DetachedInstanceError

Back to issue of the run_datadoc function, the code flow is like below

with DBSession() as session:
  data_doc = datadoc_logic.get_data_doc_by_id(doc_id, session=session)
  query_cells = data_doc.get_query_cells()
  session.commit() # introduced by PR-1073, which cause all query cells expired

  engine_id_0 = query_cells[0].meta["engine"] # Here query_cells[0], detached: False, expired: True
  with DBSession() as s:  # Here the DBSession is introduced because of the query contains `latest_partition`
                          # when render_templated_query. And with DBSession will return the same session 
                          # if the previous session hasn't been closed. 
                          # When it exits, the session will be closed and all objects will be detached.
                          # This is the case even before the PR-1073.
    pass

  engine_id_1 = query_cells[1].meta["engine"] # Here query_cells[1], detached: True, expired: True, 
                                              # which causes `DetachedInstanceError` when accessing `.meta`
  with DBSession() as s:
    pass

Before the PR-1073, although the query cell also has been detached from the session, but as it's not expired either(didn't call commit), so it can still access its attribute, which did cause any issue.

Fix
When nesting db sessions, we should avoid the current session got closed which makes all the objects to be detached.
In our case, we'll pass the current session down to render_templated_query, so that create_get_latest_partition will use this same session when it's already inside with DBSession() context and not close it. If it doesn't have session passed over, it can create its own by with_session.

@@ -162,11 +164,13 @@ def get_latest_partition(
return get_latest_partition


def get_templated_query_env(engine_id: int):
def get_templated_query_env(engine_id: int, session=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just have session without defaulting to None (otherwise it would fail)

@@ -311,7 +315,7 @@ def get_templated_query_variables(variables_provided, jinja_env):


def render_templated_query(
query: str, variables: Dict[str, str], engine_id: int
query: str, variables: Dict[str, str], engine_id: int, session=None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@czgu czgu merged commit 7b12960 into pinterest:master Dec 14, 2022
@jczhong84 jczhong84 deleted the fix/scheduledrun branch April 7, 2023 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants