Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build process: use Celery handlers #8815

Merged
merged 98 commits into from
Feb 7, 2022
Merged

Build process: use Celery handlers #8815

merged 98 commits into from
Feb 7, 2022

Conversation

humitos
Copy link
Member

@humitos humitos commented Jan 17, 2022

I'm writing some notes about the decisions I've made, problems I've had and some other concerns and things I'm noticing while working on this refactor. Hopefully, we can come back to these notes in the future to keep cleaning up our code. These notes are useful for a better understanding of this PR's changes: https://hackmd.io/4TP3_EizRF-nCDrPvf4_ug

I'm happy to have a call and do some "pair review" for this work because it's quite big. However, I think it won't be too hard to communicate and understand the changes if we are talking to each other.

Also note the important commits are those that are prefixed with Build: on their message. So, I suggest taking a look at them mainly.

Closes #3984
Closes #8762
Closes #7627
Closes #7587
Closes #6452
Closes #5672
Closes #4468
It will help to implement #7660

Initial migration/refactor of the build process to make usage of the Celery
handlers (`on_success`, `on_failure`, etc)
@humitos humitos added the PR: work in progress Pull request is not ready for full review label Jan 17, 2022
We use this environment a lot from our tests, which I don't like. However, for
now, I'm keeping the because the change is not small.

We should split unit-test from integration test and reduce the complexity of
executing "real commands" when calling tests in exchange of more mocked data
instead and unit tests for the logic.
Integration tests should not create the environment where to execute the
commands nor passing weird attributes to the task. They should mock the
responses of the API for the task to grab the correct data instead.

This way, we avoid having "extra arguments" on the tasks just for test purposes
and also avoid cheating the normal flow by overriding the environment with
custom attributes. This creates lot of complexity in our code and we endup with
multiple different flows that are not normal and won't ever happen in production
but that we need to maintain just for the tests.

In particular, we should avoid "integration tests" when they are not strictly
required. In this particular case, we could create a unit-test that checks the
logic for `SyncRepositoryMixin.validate_duplicate_reserved_versions` which is a
lot easier to generate the context required and then an "integration test" with
most things mocked (no `git` commands are executed at all) and only checks that
function is being called.
We are always using `CLEAN_AFTER_BUILD=True`, so there is no environment to wipe
anymore. This is a confusing message to send to the user since it has no effect.

I'm also removing the code because it doesn't make sense anymore. If we ever
need it again, we can re-implement it correctly and recover it from here.
We are not caching pip packages anymore.
There are lot of tests not really accurate and very complex without reason. I'd
like to use the same flow in production than in tests instead of
short-circuiting/hacking the process in tests to behave as we want.

We should use mocks to avoid executing commands when running tests instead, but
making sure the "build process" is following exact the same flow in both
environments. If we want to test something in particular outside the flow, we
should create unit tests for those functions/methods.
@humitos
Copy link
Member Author

humitos commented Jan 17, 2022

Yay! Only 85 tests skipped 💪🏼 at this point.

This way we avoid doing an aditional API call to get the version, since we
already have it in memory.
I wasn't able to mock the whole task as I wanted yet, but I follow a similar
convention than the one used for the migrated test. However, I've simplify them
a little.
@agjohnson
Copy link
Contributor

agjohnson commented Jan 18, 2022

Phew! This seems like a lot of work to wrangle! 💪

One thing that would make review easier on the backend team would be splitting up or rebasing some of this work when you're ready. For instance, one commit to split up project/tasks.py into multiple files, one commit to alter the files. This would make it much easier to follow along with what has changed in the logic there.

But tests are isolated nicely, and most of the other refactors make sense logically so far. 👍 It was mostly just the project tasks that were hard to follow here.

There is no need to have a `base.py` and `BuildTaskBase` because it don't share
code with sync repository task.
Skip moving the file if the old artifact path does not exist.
Ideally, we should be able to mock:

* API calls
* Commands run inside the environment
* Git repository
* Read the Docs config file

This commit mocks most of them. The only missing here is the config file, but
there should be code already for that.

These mocks allow us to use the exact same flow than in production (e.g. hitting
the API to get the Version/Build/Project object) and avoid short-circuiting it.
Once the "mocked build" is finished, we can assert all the calls to them.
That way, it executes the `on_success`, `on_failure`, etc.
New test to check that we are updating the build object via the API, we are
sending notifications and more when the build fails.
It checks all the API calls done to the webs, and that the functions we expect
to be called are effectively called.
Move tests from `test_celery.py` to the "new system" and remove the invalid ones.
Copy link
Contributor

@agjohnson agjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the changes that happened since our meeting, looking good still! I won't yet call this a +1 yet, but this is mostly me understanding the changes still.

Things I haven't reviewed well yet, and where I'd like to spend some time in the next few days:

  • Some of the code changes that happened before our meeting, and reviewing the project/build tasks in a bit more depth. I mostly eyeballed the difficult files to understand the logic, but didn't look very deeply at the changes yet. I'm guessing it might help to jump on another call at some point to discuss any remaining questions here, but let me see what questions I come up with before I distract you with this.
  • Skipped tests and test coverage. I'm still wrapping my head around some of the changes still, but I don't see anything really needing changes. Some of the skipped tests might worry me because I don't trust I can help discover issues more than good test coverage would. I do agree with your instinct on some of the skipped tests so far though, so most likely nothing will come of this.

I'm planning on spending more time with this in the next few days.

#
# Once we mock the DockerBuildEnvironment properly, we could also translate the
# new tests from `readthedocs/projects/tests/test_build_tasks.py` to use this
# mocks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I see what you mean with the tests above, but they do technically work the same with a more minimal implementation (and no mocking). I suppose I'm not too worried about the Docker environment test case for this as the coverage wouldn't be vastly improved. The exit code handling is slightly different when inspecting docker exec but it does seem like a lot just to test our processing of the Docker exec API return.

@@ -25,4 +25,51 @@ def create_application():
return application


def register_renamed_tasks(application, renamed_tasks):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to track future work, we'll probably automate pulling TODOs out from this PR after merge.

Suggested change
def register_renamed_tasks(application, renamed_tasks):
# TODO Remove this function when we have fully migrated to the new task paths
def register_renamed_tasks(application, renamed_tasks):

humitos added a commit that referenced this pull request Feb 1, 2022
Use a `Task.data` (`readthedocs.projects.task.builds.TaskData` object) to store
all the data the task needs to work instead of storing it directly using
`self.`.

This is to allow us a simpler way to perform a clean _before_ (and/or _after_)
starting the execution of a new task and avoid potentially sharing state with a
previous task executed that may not be able to perform the cleanup.

The only thing we need to keep in mind is that when modifying these Celery
tasks, we _always_ have to add any new value inside the
`self.data.<my-new-attribute>` and not directly `self.<my-new-attribute>` to
avoid this problem. In the future, we could implement this protection at a code
level if we want to avoid this mistake.

See #8815 (comment)
See https://docs.celeryproject.org/en/master/userguide/tasks.html#instantiation
Use a `Task.data` (`readthedocs.projects.task.builds.TaskData` object) to store
all the data the task needs to work instead of storing it directly using
`self.`.

This is to allow us a simpler way to perform a clean _before_ (and/or _after_)
starting the execution of a new task and avoid potentially sharing state with a
previous task executed that may not be able to perform the cleanup.

The only thing we need to keep in mind is that when modifying these Celery
tasks, we _always_ have to add any new value inside the
`self.data.<my-new-attribute>` and not directly `self.<my-new-attribute>` to
avoid this problem. In the future, we could implement this protection at a code
level if we want to avoid this mistake.

See #8815 (comment)
See https://docs.celeryproject.org/en/master/userguide/tasks.html#instantiation
humitos added a commit to readthedocs/common that referenced this pull request Feb 7, 2022
I wasn't able to solve this problem in
readthedocs/readthedocs.org#8815 and I don't see there
is anything in that PR's code producing a Cyclic Import. The code works properly
and I haven't seen an exception on it.

I support it may be related with a bug in pylint or something similar.
@humitos
Copy link
Member Author

humitos commented Feb 7, 2022

I'm not able to solve the cyclic-import. The module reported with the cyclic import by pylint does not make any sense. So, I'm not sure how to solve this. I spent some time trying to figure it out, but I failed. It seems to be a bug in pylint or something like that 🤷🏼

I opened readthedocs/common#103 to disable the cycle-import check in case we want to follow that path because I ran out of ideas about what it could be and the code is working properly locally and in production (with the instance deployed with this branch).

@agjohnson
Copy link
Contributor

agjohnson commented Feb 7, 2022

pycycle is equally not helpful here 😞

image

(the cyclic import details are missing, this should tell us where the cyclic import is)

Perhaps something more specific to django cyclic imports would help track this down. If we have two tools hinting at cyclic imports, it might be something to address now

@agjohnson
Copy link
Contributor

agjohnson commented Feb 7, 2022

Tuning pycycle a bit more to narrow this down, the error goes away:

(rtfd38) community/ humitos/celery-handlers* 💀  pycycle --here --ignore user_builds,bower_components,docs,media,common,.ropeproject,dockerfiles,node_modules          
Project successfully transformed to AST, checking imports for cycles..
No worries, no cycles here!
If you think some cycle was missed, please open an Issue on Github.
Finished.

🤷❓

Edit: also, noted in chat, pycycle last release was 2017, so not sure if output is really to be trusted here

I wasn't able to find nor solve the problem with the `cyclic-import` error
reported by pylint. So, I tried moving the utils functions used by search tasks
out from `readthedocs.projects.tasks.utils` into
`readthedocs.projects.tasks.search` instead.

This made `prospector` and `pylint` to pass locally without reporting any issue.
Copy link
Contributor

@agjohnson agjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finally got a chance to spend some more time with this PR. I reviewed work since our meeting and re-reviewed project tasks modules that I previously skimmed.

At a high level, all of the changes make sense and the new celery task pattern is really easy to follow. I can't say that I did a great job reviewing the code at a low level though. There was a lot of code that was moved alongside refactoring.

So, this is a +1, but also a little bit of a yolo +1 as well. I think we are mostly confident at this point that no obvious issues were introduced here, but we can probably expect a PR of this size to introduce a few non-obvious issues as well.

@humitos does have notes accumulated here on next steps. We should turn some into issues to return to later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment