Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrently crawling #834

Merged
merged 1 commit into from
Nov 6, 2021

Conversation

palmiak
Copy link
Contributor

@palmiak palmiak commented Nov 3, 2021

This is the implementation of the discussion #762

It would be great if you could take a second look.

There are some things I'm having problems with:

  • there are three places that throw a warning about too long lines
  • I missed this part, so this is something that should be re-add
try {
            $response = $this->client->send( $request );
        } catch ( TooManyRedirectsException $e ) {
            WsLog::l( "Too many redirects from $url" );
        }
  • also crawlURL() method is no longer used - so we should either remove it or deprecate

Currently the concurrency rate can be set with apply_filters( 'wp2static_concurrent_crawl_rate', 1 ) and is set to 1 as default.

I'm using the new crawler on production for WordPressowka.pl and wpowls.co.

@john-shaffer john-shaffer force-pushed the concurrently-crawling branch from 0a9f8a9 to 1d5ffe1 Compare November 6, 2021 00:53
Number of concurrent crawls is set via
`apply_filters( 'wp2static_crawl_concurrency', 1 )`
@john-shaffer john-shaffer force-pushed the concurrently-crawling branch from 1d5ffe1 to 27c7782 Compare November 6, 2021 01:29
@john-shaffer john-shaffer merged commit bfb48da into elementor:develop Nov 6, 2021
@john-shaffer
Copy link
Contributor

john-shaffer commented Nov 6, 2021

Excellent work!

I resolved the issues you mentioned and added a crawlConcurrency option on the advanced options page. I changed the filter name to wp2static_crawl_concurrency.

On some very small VMs (t3a.micro), I'm seeing improvements from ~10.5s to ~7.5s on a small site with crawlConcurrency=64, and from ~8m to ~6m on a larger slow site with crawlConcurrency=8. Higher concurrency values are slightly slower, but not by much.

@palmiak
Copy link
Contributor Author

palmiak commented Nov 6, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants