Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#2977] Basic implementation of background jobs via python-rq #3165

Merged
merged 24 commits into from
Sep 16, 2016

Conversation

torfsen
Copy link
Contributor

@torfsen torfsen commented Jul 14, 2016

This PR is WIP for #2977, a new implementation of background jobs using python-rq.

(Note that I'm not interested in the bounty)

It is loosely based on #2706 by @rossjones.

The implementation has a minimal feature set on purpose: I'm fine with adding more features but I wanted to avoid introducing things that nobody needs. So if you feel that this PR is missing an important feature please speak up.

Likewise, documentation and tests will not be written until there is an agreement regarding the general architecture.

There is now the ckan.lib.jobs module which most importantly provides the init_queue and enqueue functions. The former is called during CKAN startup and establishes the connection to Redis. I wanted to make sure that we fail early when Redis is not available (and not later at some random time when a job is enqueued). However I'm not sure whether ckan/websetup.py is the right place for that -- feedback appreciated!

Once you have enqueued a job using ckan.lib.jobs.enqueue you can use the job_* API functions to manage the queue:

  • job_list: List all enqueued jobs
  • job_show: Show details for a certain job
  • job_cancel: Cancel a certain enqueued job
  • job_clear: Clear the whole queue

Similar functionality is provided by paster jobs, see its help text. In particular, paster jobs worker starts a worker that fetches jobs from the queue and executes them. If you want to play around you can use paster jobs test to enqueue test jobs (which you then can list, cancel, execute, etc.).

@amercader
Copy link
Member

You rock @torfsen 🍻

Don't put anything on websetup.py, I don't even know what that file is used for. What about putting the queue init code in environment.load_environment(). That's what's first called when creating the web app and the CLI (Right now the queue is not initialised when starting up CKAN as websetup is not called).

Are you envisaging Redis being a hard requirement?

@wardi
Copy link
Contributor

wardi commented Jul 14, 2016

I think we all agreed to accept Redis as a hard requirement when this work is merged

@TkTech
Copy link
Member

TkTech commented Jul 15, 2016

Yup. It will also allow us to use Flask-Cache/plain redis as a general cache in core CKAN instead of stuffing everything into Solr.

@torfsen
Copy link
Contributor Author

torfsen commented Jul 15, 2016

@amercader I've moved the connection test to load_environment so that a warning is shown upon server start when Redis is not available. In addition I've made the queue initialization lazy, see jobs.get_queue.

@torfsen torfsen force-pushed the 2977-rq-background-tasks branch from f68d434 to c6fcb06 Compare July 15, 2016 08:58
@torfsen
Copy link
Contributor Author

torfsen commented Jul 18, 2016

To Do:

  • Tests for ckan.lib.jobs
  • Tests for paster jobs
  • Tests for API functions
  • Update installation instructions to include Redis
  • Update documentation of background tasks to include instructions for using the new system and for transitioning from Celery.
  • Add deprecation notices to Celery stuff (ckan.lib.celery_app, paster celeryd, resource_status_show)
  • Update default config to include new config options
  • Update documentation with instructions on how to run the worker (should be mentioned in the installation and deployment docs). The supervisor config template also needs updating.
  • Figure out how to deal with ckanext-harvest, which uses the same Redis database connection settings by default but doesn't play nice with others
  • Update contribution guidelines to mention that other uses of the Redis database should prefix their keys with the extension name and the site ID.
  • Use site ID in RD queue name to allow sharing of Redis databases between CKAN instances
  • Update test-core.ini to default to a different Redis database
  • Document in which CKAN version new functions were added

@torfsen
Copy link
Contributor Author

torfsen commented Jul 19, 2016

@TkTech You mentioned that you want to use Redis for other parts of CKAN, too. How should we let the user configure the Redis connection?

Since we're going to use 3rd-party code that uses Redis we probably want to put each distinct usage (background jobs, Flask cache) into a separate Redis database to make sure that they don't interfere with each other. Hence I suggest that we only add a single configuration option, redis_url (similar to solr_url) which contains Redis' URL without specifying a database (e.g. redis://localhost:6379). Our different use cases can then pick a separate database each. Redis offers 16 databases by default which should be plenty.

@TkTech
Copy link
Member

TkTech commented Jul 19, 2016

We need to support it all in a single database. People using hosted redis, or odd hosting setups may not be able to use SELECT. Allowing them to specify the connection URI is fine, since the database can be selected in that URI, allowing them to choose to use one or many.

There's no problem allowing all our uses to live in one database so long as everything is prefixed.

@torfsen
Copy link
Contributor Author

torfsen commented Jul 19, 2016

@TkTech Good point regarding hosted Redis.

RQ prefixes its keys, so interference should be avoidable. It also seems to empty queues intelligently. However, it seems that the prefix is not configurable, so we cannot add the CKAN site ID. This means that two CKAN instances can share a Redis instance but must use separate databases.

We still have to be cautious, though: For example, ckanext-harvest simply flushes the whole Redis database to empty its queues. This is a trivial example, since we control that code, but we must be careful not to miss something like that in 3rd-party code.

@TkTech
Copy link
Member

TkTech commented Jul 19, 2016

https://github.com/nvie/rq/blob/766bb600060c85e8a39579541798930bbb508ec8/rq/queue.py#L31

Looks like we can "configure" the prefix this way.

We should definitely update the plugins we control (no good plugin should be wiping the entire keyspace! Always use prefixes in redis), but for 3rd party plugins we shouldn't consider that our problem. Document it and assume good behavior.

@torfsen
Copy link
Contributor Author

torfsen commented Jul 19, 2016

@TkTech Unfortunately there doesn't seem to be such a setting for the prefix of job keys, that seems to be hard-coded.

Update: There's a PR for that feature: rq/rq#685.

@torfsen
Copy link
Contributor Author

torfsen commented Jul 20, 2016

It just occurred to me that we can also simply use the CKAN site ID as the queue name for RQ.

@torfsen
Copy link
Contributor Author

torfsen commented Jul 20, 2016

Regarding compatibility with ckanext-harvest: See ckan/ckanext-harvest#257 and the corresponding PR ckan/ckanext-harvest#258.

@torfsen
Copy link
Contributor Author

torfsen commented Jul 21, 2016

Do we need support for multiple queues? #2706 had, and ckanext-harvest also uses two separate queues (gather and fetch). This PR currently only supports a single queue but it would be easy to support an arbitrary number of named queues instead. Workers could then listen on all or a certain subset of the queues.

@rossjones and @amercader, what was the rationale behind the multiple queue-functionality in #2706 and ckanext-harvest, respectively?

@rossjones
Copy link
Contributor

I had multiple queues in that PR because on data.gov.uk we use different queues for different priority tasks. That way we can have a busy low-priority queue, but still get out the small tasks pretty quickly. FWIW I'd support multiple queues if it wasn't much work, but I wouldn't let it hold you up.

@torfsen torfsen force-pushed the 2977-rq-background-tasks branch 3 times, most recently from 917fe7b to 3f5b46f Compare July 22, 2016 10:24
@torfsen
Copy link
Contributor Author

torfsen commented Jul 22, 2016

Multiple queues are now supported. By default there is just a single queue (default) which is used whenever no other queue is specified. Queues can be created by simply adding tasks to them.

Queue names are automatically prefixed with the site ID so multiple CKAN instances can now share a Redis database.

@torfsen torfsen force-pushed the 2977-rq-background-tasks branch 8 times, most recently from 4472541 to 2aadcb6 Compare July 27, 2016 09:18
Useful for displaying details about a single job. Corresponds to the
job_show API function.
This output doesn't go into the CKAN logs but into a separate worker log
(`/var/log/ckan-worker.log` if the default Supervisor configuration is
used).
In most cases one can simply use `tempfile.NamedTemporaryFile`
instead.
Mention that it's usually best to simply use the `ckan.tests.recorded_logs`
context manager.
The documentation previously talked about sharing databases which could
be misunderstood as the possibility of sharing PostgreSQL databases
between CKAN instances (instead of sharing Redis databases, which was
the intended meaning).
A previous commit for ckan#2977 removed `ckan/config/celery-supervisor.conf`
as part of deprecating the old Celery background task system. However,
the old documentation told people to copy *or link* that file, so
removing it could break existing installations. Hence this commit
restores the file, it should be kept around until support for the Celery
system is removed completely.
@torfsen torfsen force-pushed the 2977-rq-background-tasks branch from 5a21108 to 18e140b Compare September 12, 2016 12:37
@torfsen
Copy link
Contributor Author

torfsen commented Sep 12, 2016

Thanks for the review and good idea regarding ckan.plugins.toolkit.enqueue_job, @wardi, that's done.

I've also restored ckan/config/celery-supervisor.conf: I had previously removed it while deprecating the old Celery system. However, the old documentation told people to copy or link the file, so removing it could break existing installations. I've now restored the file, we should keep it around until we remove Celery support completely.

the name of your extension. For example:

* The names of *configuration settings* introduced by your extension should
have the form ``ckan.my_extension.my_config_setting``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two patterns I've seen for configuration options are

ckanext.my_extension.my_config_setting = ...

and

my_extension.my_config_setting = ...

I choose the latter for my extensions because I find the configuration options are really long with the former.
Anything starting with ckan. should belong to ckan core.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it to my_extension.my_config_setting.

The docs now suggest to use `my_extension.my_setting` instead of the
previously suggested `ckan.my_extension.my_setting`.
prefixed names. Use the functions ``add_queue_name_prefix`` and
``remove_queue_name_prefix`` to manage queue name prefixes.

.. versionadded:: 2.6
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update these to 2.7

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


import logging

from pylons import config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since #3163 was merged we should never import config directly from Pylons anymore. Use from ckan.common import config.

My fault for not documenting / announcing it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, fixed.

@amercader
Copy link
Member

@torfsen This looks really, really good. Thanks for being so thorough with the tests and docs. Apart from the minor comments added it looks good to me.

Can't wait to try it with ckanext-harvest!

@wardi
Copy link
Contributor

wardi commented Sep 15, 2016

any objections to merging this now?

@amercader
Copy link
Member

No, let's get it in!

@wardi wardi merged commit a483968 into ckan:master Sep 16, 2016
@torfsen
Copy link
Contributor Author

torfsen commented Sep 16, 2016

Yay, great work everybody!

@wardi
Copy link
Contributor

wardi commented Sep 16, 2016

@torfsen I wrote a tiny stub of an extension that uses this code https://github.com/wardi/ckanext-geocodejob but it seems rq's forking is interfering with ckan's sqlalchemy state. My first job works but jobs after that fail with DatabaseError: (DatabaseError) SSL error: decryption failed or bad record mac

Can we wrap the job running with code that cleanly closes our database connections?

@torfsen
Copy link
Contributor Author

torfsen commented Sep 19, 2016

@wardi Good catch! I've created a separate issue (#3243) and I'm working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants