-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use bigger fixture tree for distributed tests #4
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for improving this test! Originally this wasn't only running a local worker, but I think this only running a local worker wasn't noticed as it was caused by a combination of other changes.
It seems it's non trivial actually to run the second worker properly, since it requires a celery worker running and I don't see a fixture for that or something. Or were you using a different way of running it? (don't spend too much time on this, but if you remember / could find the initial way we were running the second worker - that might speedup my research)
Re this. I think I found the root cause for this and potentially we need to check some other places. It hangs on |
Originally, I was using this script: https://github.com/iterative/dvcx-server/blob/b232559d773dcee8cadc9f1ac8730c0856b94ff8/clickhouse-db-adapter/scripts/run_with_distributed_workers.py to run the tests with distributed workers, but this has been changed a few times since then. Now they are supposed to be run with this fixture here: https://github.com/iterative/studio/blob/v2.126.1/backend/clickhouse-db-adapter/tests/conftest.py#L18 but I'm not sure how this is supposed to work with these tests, this fixture was added in this PR: https://github.com/iterative/dvcx-server/pull/332 and that also removed the usage of And thanks for finding that strange Celery behavior! I didn't know that's how Celery worked, and yes, we don't have a test for the case where there are no workers running (or no workers with the correct queue, at least). |
Yep, I saw the script ... I thought it was more of a local helper. Good to have more context. Thanks. Assuming the fixture works - can we drop the script or do you use it locally also?
yep, I don't see an easy fix for this so far. I'll keep looking for a while. |
I have been using that script for local debugging, yes. And I don't see an obvious fix for this particular kind of Celery issue either, but I'll think of possible solutions as well. |
4272443
to
56b15b8
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #4 +/- ##
=======================================
Coverage 83.86% 83.86%
=======================================
Files 91 91
Lines 9479 9479
Branches 1855 1855
=======================================
Hits 7950 7950
Misses 1211 1211
Partials 318 318
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Deploying datachain-documentation with Cloudflare Pages
|
If you ever get back to this can you please move the tests from DatasetQuery to DataChain |
I'll suggest closing this for now, and reopening when you get back to this. It has been opened for quite a while now. |
A follow up from https://github.com/iterative/studio/pull/10211
Since the default tree fixture is too small
UDFDistributor
was smart enough to run a single worker (despite us passing 2 as UDF parameter in tests).I don't know if that was always the case or not, but atm it means we are testing some edge case (a single worker running as part of the
LocalWorkerProcess
I believe) and don't really launch other workers (that takes a different path).Introducing this bigger fixture reproduced the issue (one of them) in production - shutdown getting stuck. I'm looking into this (unless someone has some immediate ideas) and I'll fix other tests (unless I hit some walls).
Thanks @dtulga for pointing me to these tests.