Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Celery: use an internal namespace to store build task's data #8874

Merged
merged 5 commits into from
Feb 7, 2022

Conversation

humitos
Copy link
Member

@humitos humitos commented Feb 1, 2022

Use a Task.data (readthedocs.projects.task.builds.TaskData object) to store
all the data the task needs to work instead of storing it directly using
self..

This is to allow us a simpler way to perform a clean before (and/or after)
starting the execution of a new task and avoid potentially sharing state with a
previous task executed that may not be able to perform the cleanup.

The only thing we need to keep in mind is that when modifying these Celery
tasks, we always have to add any new value inside the
self.data.<my-new-attribute> and not directly self.<my-new-attribute> to
avoid this problem. In the future, we could implement this protection at a code
level if we want to avoid this mistake.

See #8815 (comment)
See https://docs.celeryproject.org/en/master/userguide/tasks.html#instantiation

@humitos humitos requested a review from a team as a code owner February 1, 2022 09:26
Use a `Task.data` (`readthedocs.projects.task.builds.TaskData` object) to store
all the data the task needs to work instead of storing it directly using
`self.`.

This is to allow us a simpler way to perform a clean _before_ (and/or _after_)
starting the execution of a new task and avoid potentially sharing state with a
previous task executed that may not be able to perform the cleanup.

The only thing we need to keep in mind is that when modifying these Celery
tasks, we _always_ have to add any new value inside the
`self.data.<my-new-attribute>` and not directly `self.<my-new-attribute>` to
avoid this problem. In the future, we could implement this protection at a code
level if we want to avoid this mistake.

See #8815 (comment)
See https://docs.celeryproject.org/en/master/userguide/tasks.html#instantiation
@humitos humitos force-pushed the humitos/celery-handlers-self branch from 3786ee2 to 174985a Compare February 1, 2022 10:21
@ericholscher
Copy link
Member

This looks like a similar implementation as what was had, and is useful to keep it separate 👍

Copy link
Contributor

@agjohnson agjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a better approach, manipulating __attributes__ probably would have caused a side effect we weren't expecting.

Already discussed, but just archiving here: this does not yet approximate our current pattern, which does use a task-unique instance of TaskStep to encapsulate the task data. We already covered why that pattern isn't usable with the workflow now though. Next up, we will want to turn self.data into a lookup table like self.data[task_id] or similar.

@humitos humitos merged commit b68730d into humitos/celery-handlers Feb 7, 2022
@humitos humitos deleted the humitos/celery-handlers-self branch February 7, 2022 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants