-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve database connection management #145
Comments
I met the same question, and it confused me a lot. After I viewed issue #73 and #60, I wrote a patch like this:
somehow, it worked! |
In order to figure out how to approach this issue, it probably makes sense to first look at how the standard Django framework deals with database connection management. Keep in mind however that Django typically only needs to perform database operations in response to some request that it received from a user via their browser. This request-response cycle does not exist when dealing with background process managers like APScheduler. Having said that, from what I have been able to gather, Django manages connections in the following way:
So there are essentially two problems to solve:
|
A potential solution has been proposed as part of #148. The approach is rather speculative / experimental and I will merge it into the main branch, and cut a new release, once enough people have tried it out and provided some feedback. |
From my understanding, your analysis is accurate. Thanks a lot for looking at this.
So the way I see it, we do have an analogy for a request-response cycle -- a scheduler loop iteration. In my PR, I used We currently use django-apscheduler with postgres, so full disclosure, I am mostly thinking about this database. As you say there are usually two scenarios for this:
The solution you propose in your PR is to detect connection problems after the fact and then fix the connections. My main objections to this are:
In your PR you wrote
This is true, but is not quite what I propose -- I suggest doing it at the start (and possibly end) of every scheduler iteration. If a connection problem occurs during an iteration, that particular iteration is lost, but the next one will try to recover. This is analogous to a connection problem occurring during a request -- the request itself is done for but the next one will try to reconnect. BTW, django-apscheduler users are likely to also use the Django DB inside their jobs, and the connection problems will then also happen there (particularly with a thread pool executor etc.). But that is a problem for the executor, not the jobstore, so out of scope for this discussion (although after this issue is resolved we can maybe provide some guidance on this as a note in the docs). |
@bluetech, thanks for the feedback! Comments below...
I'm not sure it is safe to make assumptions about how the APScheduler internals work. In fact, I could not even find
It would be nice if APScheduler could provide these type of hooks to allow job store implementers to integrate in a more decoupled fashion. The contributors of the APScheduler project appear to have their heads down working on release 4.0 however and I'm not sure they will be able to entertain feature requests at this time (although we could ask). I think that there will be many breaking changes in the new release anyway and they might even include a persistent job store in the standard solution (PostgresqlDataStore certainly appears to be a step in that direction). There are also issues with basic event sequencing which makes an event-based approach unreliable at this stage (see agronholm/apscheduler#445, for example). You are right about all of the other tradeoffs: Django will clean up connections at the start and end of every request, and we will basically wait until a problem occurs and then retry with a fresh DB connection. This is not ideal. For now I think the compromise is that this workaround at least gives us a way to help users avoid confusing database connection-related error conditions (most of whom are not used to having to worry about that type of thing as part of typical Django app development). Including a guideline to clean up connections at the start and end of your own jobs is a great idea (perhaps we could provide a utility context manager for that purpose). The downside of all this however is that, for now, we won't strictly honour the spirit of From my own experience: I have never encountered any connection issues in more than three years of running django-apscheduler in production. But this requires some extra care to configure everything properly. Relying on a connection pooler like pgbouncer simplifies connection management a lot. This workaround is probably for those projects that do not have the ability to run a database connection pooler themselves for whatever reason. We will most likely have to revisit everything again when APScheduler 4.0 is released and the underlying architecture changes again. |
Wouldn't |
I have added an This should mimic the Django standard behaviour for HTTP requests (i.e. each job run is handled as if it were a separate 'request'). Of course now the question becomes why we need both
|
PR has been merged in to make it easier for people to try out. Keeping this issue open for further comment and testing. |
I tested the current develop branch (d11fe38) in a local docker environment using postgres, and using I tested as follows:
I made sure that both the job itself fails (so the worker needs to reconnect) and the jobstore fails (so the jobstore needs to reconnect). Both recovered after the DB went back up. As a separate test, I did the same thing with a low |
A few minor comments: The name The README now says
First the phrase "and you are not making use of a connection pooler and persistent connections" is a bit ambiguous - I'm not sure if you meant Either way I think it makes sense to wrap in both cases:
The README says
which is not quite accurate anymore. |
Yes the Changed README to:
I'm not sure about removing the recommendation to restart django-apscheduler (and by implication Django) in tandem with the database. The open issue that is linked to in the Django project seems to indicate that this can still be necessary under the right circumstances. It is also meant to address weird user workflows that people have logged issues about in the past (e.g. taking the database down for backups, or to set the system clock to the past) and then expect everything to keep on working without a glitch. Thanks for the time that you have spent to look into this in such detail @bluetech! If there is nothing else then I will cut a new release in the next day or so. |
Sounds good to me, thanks a lot! |
New functionality has been released as part of v0.6.0. |
There have been various issues posted that are related to 'lost' database connections.
Database connection management is a fairly complex topic, and the official Django documentation contains some great pointers to help you configure connection management for your specific database properly.
Generally speaking, adding a database connection pooler like pgbouncer to handle database connections for you can also help avoid this class of errors altogether.
For everyone else, we might be able to improve things slightly be implementing our own connection management wrapper.
Taking a closer look at Celery's DjangoWorkerFixup implementation might be a good place to start.
The text was updated successfully, but these errors were encountered: