Celery: use an internal namespace to store build task's data #8874

humitos · 2022-02-01T09:26:28Z

Use a Task.data (readthedocs.projects.task.builds.TaskData object) to store
all the data the task needs to work instead of storing it directly using
self..

This is to allow us a simpler way to perform a clean before (and/or after)
starting the execution of a new task and avoid potentially sharing state with a
previous task executed that may not be able to perform the cleanup.

The only thing we need to keep in mind is that when modifying these Celery
tasks, we always have to add any new value inside the
self.data.<my-new-attribute> and not directly self.<my-new-attribute> to
avoid this problem. In the future, we could implement this protection at a code
level if we want to avoid this mistake.

See #8815 (comment)
See https://docs.celeryproject.org/en/master/userguide/tasks.html#instantiation

Use a `Task.data` (`readthedocs.projects.task.builds.TaskData` object) to store all the data the task needs to work instead of storing it directly using `self.`. This is to allow us a simpler way to perform a clean _before_ (and/or _after_) starting the execution of a new task and avoid potentially sharing state with a previous task executed that may not be able to perform the cleanup. The only thing we need to keep in mind is that when modifying these Celery tasks, we _always_ have to add any new value inside the `self.data.<my-new-attribute>` and not directly `self.<my-new-attribute>` to avoid this problem. In the future, we could implement this protection at a code level if we want to avoid this mistake. See #8815 (comment) See https://docs.celeryproject.org/en/master/userguide/tasks.html#instantiation

ericholscher · 2022-02-01T16:31:13Z

This looks like a similar implementation as what was had, and is useful to keep it separate 👍

…thedocs.org into humitos/celery-handlers-self

agjohnson

This is a better approach, manipulating __attributes__ probably would have caused a side effect we weren't expecting.

Already discussed, but just archiving here: this does not yet approximate our current pattern, which does use a task-unique instance of TaskStep to encapsulate the task data. We already covered why that pattern isn't usable with the workflow now though. Next up, we will want to turn self.data into a lookup table like self.data[task_id] or similar.

…thedocs.org into humitos/celery-handlers-self

humitos requested a review from agjohnson February 1, 2022 09:26

humitos requested a review from a team as a code owner February 1, 2022 09:26

humitos mentioned this pull request Feb 1, 2022

Build process: use Celery handlers #8815

Merged

humitos force-pushed the humitos/celery-handlers-self branch from 3786ee2 to 174985a Compare February 1, 2022 10:21

humitos added 3 commits February 3, 2022 12:12

Merge branch 'humitos/celery-handlers' of github.com:readthedocs/read…

e08c00e

…thedocs.org into humitos/celery-handlers-self

Lint

017df68

Lint

4c89df1

agjohnson approved these changes Feb 7, 2022

View reviewed changes

Merge branch 'humitos/celery-handlers' of github.com:readthedocs/read…

2491854

…thedocs.org into humitos/celery-handlers-self

humitos merged commit b68730d into humitos/celery-handlers Feb 7, 2022

humitos deleted the humitos/celery-handlers-self branch February 7, 2022 22:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Celery: use an internal namespace to store build task's data #8874

Celery: use an internal namespace to store build task's data #8874

humitos commented Feb 1, 2022

ericholscher commented Feb 1, 2022

agjohnson left a comment

Celery: use an internal namespace to store build task's data #8874

Celery: use an internal namespace to store build task's data #8874

Conversation

humitos commented Feb 1, 2022

ericholscher commented Feb 1, 2022

agjohnson left a comment

Choose a reason for hiding this comment