-
Notifications
You must be signed in to change notification settings - Fork 740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Simplified RequestQueueV2 implementation #2775
base: master
Are you sure you want to change the base?
Conversation
const giveUpLock = async (id?: string, uniqueKey?: string) => { | ||
if (id === undefined) { | ||
return; | ||
} | ||
|
||
try { | ||
await this.client.deleteRequestLock(id); | ||
} catch { | ||
this.log.debug('Failed to delete request lock', { id, uniqueKey }); | ||
} | ||
}; | ||
|
||
// If we tried to read new forefront requests, but another client appeared in the meantime, we can't be sure we'll only read our requests. | ||
// To retain the correct queue ordering, we rollback this head read. | ||
if (hasPendingForefrontRequests && headData.hadMultipleClients) { | ||
this.log.debug(`Skipping this read - forefront requests may not be fully consistent`); | ||
await Promise.all(headData.items.map(({ id, uniqueKey }) => giveUpLock(id, uniqueKey))); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@barjin I'm pretty sure this is equivalent to the previous version, but please check it.
/** | ||
* @inheritDoc | ||
*/ | ||
override async isFinished(): Promise<boolean> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@drobnikj I didn't remove the inheritance from RequestProvider
completely, just overwrote this method. I don't think there's any other stuff in RequestProvider
that could cause any trouble, but feel free to prove me wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I wrote, I would remove original implementation from Provider do not to confuse future developers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 💪
I would do some testing myself, but the first what about some unit tests, did you consider,add some? There are none -> https://github.com/apify/crawlee/blob/03951bdba8fb34f6bed00d1b68240ff7cd0bacbf/test/core/storages/request_queue.test.ts
Honesly, we are dealing with various bugs during time and we do not have any tests for these features still.
// If we tried to read new forefront requests, but another client appeared in the meantime, we can't be sure we'll only read our requests. | ||
// To retain the correct queue ordering, we rollback this head read. | ||
if (hasPendingForefrontRequests && headData.hadMultipleClients) { | ||
this.log.debug(`Skipping this read - forefront requests may not be fully consistent`); | ||
await Promise.all(headData.items.map(async ({ id, uniqueKey }) => giveUpLock(id, uniqueKey))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about those scenarios during the last batch of forefront fixes, but I considered them too edge case-y to handle them.
I'm not sure about these changes, though - let's say two clients are using the queue. Won't this cause any forefront request to be locked and then immediately unlocked? headData.hadMultipleClients
doesn't iirc say "a new client just appeared", once it's set, it's never false
again, no? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered them too edge case-y to handle them
My point here: I'd be happy not to do this at all. Both the clients are using the same queue - as a user, I'd be fine with some request intermingling.
The build did not finish, can you check @janbuchar ? |
I can, but only later this week - I have different stuff to finish first. |
@drobnikj the unit tests are now passing so you should be able to build. I'm still working on some e2e tests, if you have any ideas for scenarios to test (e2e, unit, doesn't matter), I'd love to hear those. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I did not find any issue, even during testing.
I have a few more comments, can you check pls? @janbuchar
|
||
const headData = await this.client.listAndLockHead({ | ||
limit: Math.min(forefront ? this.assumedForefrontCount : 25, 25), | ||
limit: Math.min(hasPendingForefrontRequests ? this.assumedForefrontCount : 25, 25), | ||
lockSecs: this.requestLockSecs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked where requestLockSecs
was set and I think we should consider to change it.
The currrent is 2x of requestHandlerTimeoutSecs plus 5 secs.
this.requestQueue.requestLockSecs = Math.max(this.internalTimeoutMillis / 1000 + 5, 60); |
I think we should set it to
requestHandlerTimeoutSecs plus some safe buffer.
@@ -361,7 +430,8 @@ export class RequestQueue extends RequestProvider { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot comment it below but during code review, I see that we are removing locks one by one in _clearPossibleLock
.
see
while ((requestId = this.queueHeadIds.removeFirst()) !== null) { |
There is 200 rps rate limit. I would remove lock in some batches maybe 10 to speed it up.
@@ -53,7 +53,8 @@ const RECENTLY_HANDLED_CACHE_SIZE = 1000; | |||
* @category Sources | |||
*/ | |||
export class RequestQueue extends RequestProvider { | |||
private _listHeadAndLockPromise: Promise<void> | null = null; | |||
private listHeadAndLockPromise: Promise<void> | null = null; | |||
private queueHasLockedRequests: boolean | undefined = undefined; | |||
|
|||
/** | |||
* Returns `true` if there are any requests in the queue that were enqueued to the forefront. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please update this function regarding notes in
https://github.com/apify/apify-core/issues/19218#issuecomment-2621466723 ?
} | ||
|
||
if (this.queueHasLockedRequests !== undefined) { | ||
return !this.queueHasLockedRequests; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add at least a debug or info log if there are still locked requests? It means that there are locked requests by another client.
@drobnikj thanks! You really went above and beyond for this one. I'll try to wrap this up today/tomorrow. |
queueHasLockedRequests
to simplify RequestQueue v2 #2767