Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IntegrityError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if th... #775

Closed
jasonthomas opened this issue Jan 18, 2019 · 12 comments
Assignees
Milestone

Comments

@jasonthomas
Copy link
Member

@mreid messaged me about his queries missing results after execution. I noticed under https://sql.telemetry.mozilla.org/admin/queries/tasks > Queries queue > In Progress that there are several tasks that are in the 'saving_results' state for several hours. It looks like this traceback is related to this.

https://sentry.prod.mozaws.net/operations/data-stmo-prod/issues/5058836/

IntegrityError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (psycopg2.IntegrityError) null value in column "data" violates not-null constraint
DETAIL:  Failing row contains (5555339, 1, 24, 27624b7fd311fa2b79a45e75292f19b3, set @run_time_global = (
  select sum(unix_timestamp(job.end_tim..., null, 2.83623, 2019-01-18 10:42:49.723733+00).
 [SQL: 'INSERT INTO query_results (org_id, data_source_id, query_hash, query, data, runtime, retrieved_at) VALUES (%(org_id)s, %(data_source_id)s, %(query_hash)s, %(query)s, %(data)s, %(runtime)s, %(retrieved_at)s) RETURNING query_results.id'] [parameters: {'retrieved_at': datetime.datetime(2019, 1, 18, 10, 42, 49, 723733, tzinfo=<UTC>), 'org_id': 1, 'query_hash': '27624b7fd311fa2b79a45e75292f19b3', 'query': 'set @run_time_global = (\n  select sum(unix_timestamp(job.end_time) - unix_timestamp(job.start_time))\n  from job\n  where job.result in (\'busted\', ... (463 characters truncated) ... e\non job.job_type_id = job_type.id\nwhere job.result in (\'busted\', \'success\', \'testfailed\')\ngroup by job_type_name\norder by runtime_sum desc', 'runtime': 2.8362338542938232, 'data': None, 'data_source_id': 24}] (Background on this error at: http://sqlalche.me/e/gkpj)
(21 additional frame(s) were not displayed)
...
  File "sqlalchemy/engine/base.py", line 1200, in _execute_context
    context)
  File "sqlalchemy/engine/base.py", line 1413, in _handle_dbapi_exception
    exc_info
  File "sqlalchemy/util/compat.py", line 203, in raise_from_cause
    reraise(type(exception), exception, tb=exc_tb, cause=cause)
  File "sqlalchemy/engine/base.py", line 1193, in _execute_context
    context)
  File "sqlalchemy/engine/default.py", line 507, in do_execute
    cursor.execute(statement, parameters)

IntegrityError: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (psycopg2.IntegrityError) null value in column "data" violates not-null constraint
DETAIL:  Failing row contains (5555339, 1, 24, 27624b7fd311fa2b79a45e75292f19b3, set @run_time_global = (
  select sum(unix_timestamp(job.end_tim..., null, 2.83623, 2019-01-18 10:42:49.723733+00).
 [SQL: 'INSERT INTO query_results (org_id, data_source_id, query_hash, query, data, runtime, retrieved_at) VALUES (%(org_id)s, %(data_source_id)s, %(query_hash)s, %(query)s, %(data)s, %(runtime)s, %(retrieved_at)s) RETURNING query_results.id'] [parameters: {'retrieved_at': datetime.datetime(2019, 1, 18, 10, 42, 49, 723733, tzinfo=<UTC>), 'org_id': 1, 'query_hash': '27624b7fd311fa2b79a45e75292f19b3', 'query': 'set @run_time_global = (\n  select sum(unix_timestamp(job.end_time) - unix_timestamp(job.start_time))\n  from job\n  where job.result in (\'busted\', ... (463 characters truncated) ... e\non job.job_type_id = job_type.id\nwhere job.result in (\'busted\', \'success\', \'testfailed\')\ngroup by job_type_name\norder by runtime_sum desc', 'runtime': 2.8362338542938232, 'data': None, 'data_source_id': 24}] (Background on this error at: http://sqlalche.me/e/gkpj)
@jasonthomas
Copy link
Member Author

screenshot_2019-01-18 running queries

@jezdez
Copy link

jezdez commented Jan 28, 2019

@jasonthomas So I'm not sure how to reproduce this locally and need your help digging into the prod logs.

All the tracebacks in Sentry pass through the code that runs the queries and fails when storing the result of the query run. Sadly I don't see the stack variables of individual stack frames in Sentry so we'll need to find the logging line just after it ran the query and before it stores the result.

Assuming we can grep the logs (can we?) searching for task=execute_query query_hash=<query hash from a traceback> would work.

E.g. task=execute_query query_hash=382872136cb88fa7a4dac0eb0c7a500a corresponds to https://sentry.prod.mozaws.net/operations/data-stmo-prod/issues/5058836/events/27278696/. Could you please see if you can find that line?

jezdez added a commit that referenced this issue Jan 28, 2019
@jasonthomas
Copy link
Member Author

I've communicated the related logs via IRC pm.

@jezdez
Copy link

jezdez commented Jan 28, 2019

FTR https://discuss.redash.io/t/athena-querry-not-working/1662 describes this as well, making this an actual query issue

@jezdez
Copy link

jezdez commented Jan 28, 2019

The fact the the query runner doesn't handle query errors like this is a bad thing though

@jezdez
Copy link

jezdez commented Jan 28, 2019

@jezdez
Copy link

jezdez commented Jan 28, 2019

And previously reported a while ago here: https://bugzilla.mozilla.org/show_bug.cgi?id=1314621

@jezdez
Copy link

jezdez commented Jan 28, 2019

And of course there is @fbertsch's getredash#1753 that adds handling InterruptException (as in our traceback) to the Presto query runner

@jezdez
Copy link

jezdez commented Jan 28, 2019

So I suggest to actually add the additional exception handling to the Athena query runner as well, @washort @jasonthomas @emtwo Anyone a different idea?

@rafrombrc rafrombrc added this to the 20 milestone Jan 30, 2019
@washort
Copy link

washort commented Feb 4, 2019

I'm surprised it doesn't already have this handling. Adding it makes sense to me.

@jezdez
Copy link

jezdez commented Feb 6, 2019

I've opened getredash#3403.

@jezdez
Copy link

jezdez commented Apr 3, 2019

This has been merged upstream and deployed in the recent deploy.

@jezdez jezdez closed this as completed Apr 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants