Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Simplified RequestQueueV2 implementation #2775

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

janbuchar
Copy link
Contributor

@janbuchar janbuchar commented Dec 17, 2024

@janbuchar janbuchar added the t-tooling Issues with this label are in the ownership of the tooling team. label Dec 17, 2024
@github-actions github-actions bot added this to the 105th sprint - Tooling team milestone Dec 17, 2024
@janbuchar janbuchar marked this pull request as draft December 17, 2024 14:59
Copy link
Member

@drobnikj drobnikj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 💪

I would do some testing myself, but the first what about some unit tests, did you consider,add some? There are none -> https://github.com/apify/crawlee/blob/03951bdba8fb34f6bed00d1b68240ff7cd0bacbf/test/core/storages/request_queue.test.ts
Honesly, we are dealing with various bugs during time and we do not have any tests for these features still.

packages/core/src/storages/request_provider.ts Outdated Show resolved Hide resolved
@drobnikj
Copy link
Member

drobnikj commented Jan 6, 2025

The build did not finish, can you check @janbuchar ?
I would like to test it in some Actors.

@janbuchar
Copy link
Contributor Author

The build did not finish, can you check @janbuchar ? I would like to test it in some Actors.

I can, but only later this week - I have different stuff to finish first.

@github-actions github-actions bot added the tested Temporary label used only programatically for some analytics. label Jan 22, 2025
@janbuchar
Copy link
Contributor Author

@drobnikj the unit tests are now passing so you should be able to build. I'm still working on some e2e tests, if you have any ideas for scenarios to test (e2e, unit, doesn't matter), I'd love to hear those.

@drobnikj drobnikj self-requested a review January 30, 2025 10:41
Copy link
Member

@drobnikj drobnikj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I did not find any issue, even during testing.
I have a few more comments, can you check pls? @janbuchar


const headData = await this.client.listAndLockHead({
limit: Math.min(forefront ? this.assumedForefrontCount : 25, 25),
limit: Math.min(hasPendingForefrontRequests ? this.assumedForefrontCount : 25, 25),
lockSecs: this.requestLockSecs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked where requestLockSecs was set and I think we should consider to change it.
The currrent is 2x of requestHandlerTimeoutSecs plus 5 secs.

this.requestQueue.requestLockSecs = Math.max(this.internalTimeoutMillis / 1000 + 5, 60);

I think we should set it to
requestHandlerTimeoutSecs plus some safe buffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, what do you think a safe buffer would be?

@@ -361,7 +430,8 @@ export class RequestQueue extends RequestProvider {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot comment it below but during code review, I see that we are removing locks one by one in _clearPossibleLock.
see

while ((requestId = this.queueHeadIds.removeFirst()) !== null) {

There is 200 rps rate limit. I would remove lock in some batches maybe 10 to speed it up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? I don't think there is a batch unlock endpoint. Launching those requests in parallel surely won't help against rate limiting, too.

@@ -53,7 +53,8 @@ const RECENTLY_HANDLED_CACHE_SIZE = 1000;
* @category Sources
*/
export class RequestQueue extends RequestProvider {
private _listHeadAndLockPromise: Promise<void> | null = null;
private listHeadAndLockPromise: Promise<void> | null = null;
private queueHasLockedRequests: boolean | undefined = undefined;

/**
* Returns `true` if there are any requests in the queue that were enqueued to the forefront.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please update this function regarding notes in
https://github.com/apify/apify-core/issues/19218#issuecomment-2621466723 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I managed to make it work regardless of whether or not there are multiple clients.

@janbuchar
Copy link
Contributor Author

@drobnikj thanks! You really went above and beyond for this one. I'll try to wrap this up today/tomorrow.

@janbuchar janbuchar marked this pull request as ready for review February 4, 2025 22:49
@janbuchar janbuchar requested review from barjin and drobnikj February 4, 2025 22:49
@janbuchar
Copy link
Contributor Author

@barjin I gave the forefront handling a makeover. If you could check that out, I'd be super grateful.

@barjin
Copy link
Contributor

barjin commented Feb 5, 2025

Looking good to me 👍🏽 I remember reversing the forefront array somewhere already (likely memory-storage?), but as long as those tests are passing, this part is IMO good to go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Utilize queueHasLockedRequests to simplify RequestQueue v2
3 participants