-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jobs, observability: make SHOW JOBS non-blocking #76690
Comments
Another, somewhat controversial proposal would be to expose weaker isolation. One thing that affects not Another note is that what pisses people off is stuff not loading. One inbuilt thing we have to push locks and permit reads is the pusher that sits under a rangefeed. We've long talked about leveraging rangefeeds with jobs for various things. If we did that, then magically we'd see writing transactions get pushed. That sort of catch-all process seems appealing to me in lieu of some other way to enforce that we don't run long-running transactions against the jobs table. This all then comes back to other issues where the jobs data storage encourages bad transactions. If, from an API perspective, we didn't really allow for any combination of updating a job with other work, we'd not see these long-running transactions. Eliminating all the existing use cases, is, of course, work. |
I don't have a clear vision on how |
Ah, I was under the impression this would be helpful, but you’re saying it’s not desirable semantics for |
I would love to see an analysis of what long-running transactions we know to exist. Yes there are systemic solutions here, and they may be valuable, but the root cause to some extent is the individual transactions. I feel like one possible source of problems might be transactions which might work hard to avoid writing to the jobs table until the end (which seems like a good strategy), but then experiences restarts and holds the lock over restarts. If that's an issue, maybe we can add some functionality to invalidate certain locks upon restart. The rows in the jobs table shouldn't be at risk of getting overwritten after a restart. |
@ajwerner When a primary index row is locked, does it imply that the corresponding secondary index row would be locked? I ask this because I wonder if querying with an index hint would be helpful -- the index would act as a materialized view. |
Generally, yes, it does. Put differently, if the secondary index contains any rows in any column families which are locked, the secondary index will be locked. |
I don’t think any of these ideas (strictly about the observability/SELECT side) has legs, so closing this issue. We’ll continue on other efforts, focused on the core issue of contention: #73133 |
Describe the problem
The
SHOW JOBS
query doesn’t return if a lock is held on any row in the jobs table. This is perceived as the jobs system itself struggling -- which may or may not be correct.In any case, the
SHOW JOBS
query hanging is a common complaint from users, and is often escalated as a concern for the overall jobs system. It is expensive in terms of support.To Reproduce
This typically happens when a long-running transaction is held open, such as backup planning.
SHOW JOBS
is a table scan, so any open transaction will block it.Expected behavior
The goal of this issue is narrowly about finding ways to prevent the
SHOW JOBS
query from hanging: an observability problem.Some ideas:
SHOW JOBS
could use an AOST of (say) 5 or 10 seconds, to reduce the likelihood of hitting a locked record/open transactionSHOW JOBS
, which never blocks. The trade-off would be some tolerance of staleness.SHOW CHANGEFEED JOBS
and create similar sugar (likeSHOW BACKUP JOBS
). These filters would at least reduce the chance of hitting a locked row, and would solve, say, backups interfering with observing changefeeds.SHOW CHANGEFEEDS
(drop theJOBS
keyword)ImplementSKIP LOCKED
sql: supportFOR {UPDATE,SHARE} {SKIP LOCKED,NOWAIT}
#40476...and use it in jobs jobs: job adoption can block on intents #62734Environment:
The text was updated successfully, but these errors were encountered: