Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work around issues with sqlite3 database being locked during updates #108

Merged
merged 2 commits into from
Jan 31, 2018

Conversation

taldcroft
Copy link
Member

This is a workaround for problems like:

2017-07-19 10:01:09,238 Updating Scs107 events from 2017:178:14:01:05 to 2017:200:14:01:04
2017-07-19 10:01:19,455 Deleting 34 LoadSegment events after 2017:180:14:01:04.585
Traceback (most recent call last):
  File "/proj/sot/ska/share/kadi/update_events", line 12, in <module>
    update_events.main()
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-6/lib/python2.7/site-packages/kadi/update_events.py", line 221, in main
    update(EventModel, date_stop)
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-6/lib/python2.7/site-packages/kadi/update_events.py", line 100, in update
    delete_from_date(EventModel, delete_date, set_update_date=False)
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-6/lib/python2.7/site-packages/kadi/update_events.py", line 72, in delete_from_date
    events.delete()
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-6/lib/python2.7/site-packages/django/db/models/query.py", line 465, in delete
    collector.delete()
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-6/lib/python2.7/site-packages/django/db/models/deletion.py", line 282, in delete
    sender=model, instance=obj, using=self.using
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-6/lib/python2.7/site-packages/django/db/transaction.py", line 305, in __exit__
    connection.commit()
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-6/lib/python2.7/site-packages/django/db/backends/__init__.py", line 168, in commit
    self._commit()
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-6/lib/python2.7/site-packages/django/db/backends/__init__.py", line 136, in _commit
    return self.connection.commit()
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-6/lib/python2.7/site-packages/django/db/utils.py", line 99, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "/proj/sot/ska/arch/x86_64-linux_CentOS-6/lib/python2.7/site-packages/django/db/backends/__init__.py", line 136, in _commit
    return self.connection.commit()
django.db.utils.OperationalError: database is locked

This PR has been tested by running the ska_testr/packages/kadi/test_regression_long.sh test, once via ska_testr in Ska flight and once by hand in the repo for this PR. Outputs were diffed and no diffs were seen.

I have been unable to reproduce the databased locked problem. I tried locally starting the process to update the database over a 30 day period, then in a separate window repeatedly query the events database. No problem, no joy. So most of this code is not tested and just needs to be looked at carefully.

Plan B was to make a local copy of the events database, update that, and then copy back. But even then it would be better to not have the whole update process stop cold. My feeling is to try this and see if it works. We'll still see warnings pop up if the database was locked.

@taldcroft taldcroft requested a review from jeanconn January 24, 2018 18:07
# If processing got here with no exceptions then save the event update
# information to database
update.save()
# If processing got here with no exceptions then save the event update
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this final update.save() should be in the atomic commit block, because it too can fail due to DB locked. That's not good because you have the events being updated without the last date of update being correspondingly updated. (Though I think in normal circumstance it gets cleaned up the next time around with no problems).

@jeanconn
Copy link
Contributor

Ugh. I thought I could reliably lock the database with my guide stats processing; I'll try that again today and see if I can lock a copy.

This PR seems a fine way to at least try again if for some reason the database was locked by reasonable small query or netapp weirdness.

@taldcroft
Copy link
Member Author

@jeanconn
Copy link
Contributor

I think I looked at that when we started using sqlite but ran into issues with pragma support by sqlite version. Will look again.

@jeanconn
Copy link
Contributor

jeanconn commented Jan 26, 2018

On a locked local copy of events.db via

PRAGMA locking_mode = EXCLUSIVE;
BEGIN EXCLUSIVE;

it looks like I'm getting "django.db.utils.OperationalError: database is locked" without seeing any of the

  logger.info('Warning: locked database, waiting {} seconds'.format(delay))

I'd expect from this PR.

@taldcroft
Copy link
Member Author

Grrrr. Will investigate.

@jeanconn
Copy link
Contributor

Not clear to me if the exclusive lock via pragma really replicates what we've hit upon by accident in processing though (if the exclusive lock is also a read lock and maybe the lock we're getting is only a write lock?).

@taldcroft
Copy link
Member Author

From what you said it is raising what appears to be the same OperationalError without it being caught and handled, or else the database is locked is not matching. If you are working on this, can you try putting in some print statements to see what is happening?

@jeanconn
Copy link
Contributor

From the traceback, it looks like it is throwing the error at

update = models.Update.objects.get(name=cls_name)

when trying to do that get, not when trying to save.

@jeanconn
Copy link
Contributor

From looking through our kadi OperationalError cron task emails, it doesn't look to me like that has happened for us in practice (the errors we see are on save or delete), so I think the exclusive pragma lock might just be a bad test for this.

@taldcroft
Copy link
Member Author

taldcroft commented Jan 26, 2018

Ah, so now I understand your comment about read vs write. Agreed that the exclusive lock is no good, though you might wrap that particular get in a try4times and see what it does. (Noting that try4imes does not return the func result, which should actually be changed).

@taldcroft taldcroft merged commit 411166f into master Jan 31, 2018
@taldcroft taldcroft deleted the fix-db-locked branch January 31, 2018 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants