Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow down + retry on HTTP 429 errors #392

Open
benoit74 opened this issue Sep 25, 2023 · 1 comment · May be fixed by #393
Open

Slow down + retry on HTTP 429 errors #392

benoit74 opened this issue Sep 25, 2023 · 1 comment · May be fixed by #393

Comments

@benoit74
Copy link
Contributor

benoit74 commented Sep 25, 2023

The crawler should behave more appropriately when it is encountering HTTP 429 - Too Many Requests errors.

Below is an example log where the website requested the scraper to slow-down but the crawler continued to proceed at the same pace.

Sample website where it happens after some times (happening after more or less 1 hour) : https://radiopaedia.org

Logs capture
{"logLevel":"info","timestamp":"2023-09-05T00:14:58.691Z","context":"worker","message":"Starting page","details":{"workerid":5,"page":"https://radiopaedia.org/cases/solitary-fibrous-tumor-of-the-dura-4?lang=us"}}
{"logLevel":"info","timestamp":"2023-09-05T00:14:58.692Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":9,"total":410,"pending":6,"failed":1,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:14:56.143Z\",\"url\":\"https://radiopaedia.org/go-ad-free\",\"added\":\"2023-09-05T00:14:35.344Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:58.308Z\",\"url\":\"https://radiopaedia.org/about\",\"added\":\"2023-09-05T00:14:35.347Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:58.691Z\",\"url\":\"https://radiopaedia.org/cases/solitary-fibrous-tumor-of-the-dura-4?lang=us\",\"added\":\"2023-09-05T00:14:35.348Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:48.711Z\",\"url\":\"https://radiopaedia.org/quizzes/all?lang=us\",\"added\":\"2023-09-05T00:14:35.338Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:51.911Z\",\"url\":\"https://radiopaedia.org/?lang=us\",\"added\":\"2023-09-05T00:14:35.340Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:35.414Z\",\"url\":\"https://radiopaedia.org/edits?lang=us\",\"added\":\"2023-09-05T00:14:35.335Z\",\"depth\":1}"]}}
{"logLevel":"info","timestamp":"2023-09-05T00:14:58.844Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/cases/solitary-fibrous-tumor-of-the-dura-4?lang=us","workerid":5}}
{"logLevel":"error","timestamp":"2023-09-05T00:14:59.358Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/go-ad-free","workerid":3}}
{"logLevel":"error","timestamp":"2023-09-05T00:14:59.358Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/go-ad-free returned status code 429","page":"https://radiopaedia.org/go-ad-free","workerid":3}}
{"logLevel":"error","timestamp":"2023-09-05T00:14:59.358Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/go-ad-free returned status code 429","stack":"Error: Page https://radiopaedia.org/go-ad-free returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 3)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/go-ad-free","workerid":3}}
{"logLevel":"warn","timestamp":"2023-09-05T00:14:59.359Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/go-ad-free","workerid":3}}
{"logLevel":"info","timestamp":"2023-09-05T00:14:59.382Z","context":"worker","message":"Starting page","details":{"workerid":3,"page":"https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us"}}
{"logLevel":"info","timestamp":"2023-09-05T00:14:59.383Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":10,"total":410,"pending":6,"failed":2,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:14:58.308Z\",\"url\":\"https://radiopaedia.org/about\",\"added\":\"2023-09-05T00:14:35.347Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:58.691Z\",\"url\":\"https://radiopaedia.org/cases/solitary-fibrous-tumor-of-the-dura-4?lang=us\",\"added\":\"2023-09-05T00:14:35.348Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:59.381Z\",\"url\":\"https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us\",\"added\":\"2023-09-05T00:14:35.348Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:48.711Z\",\"url\":\"https://radiopaedia.org/quizzes/all?lang=us\",\"added\":\"2023-09-05T00:14:35.338Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:51.911Z\",\"url\":\"https://radiopaedia.org/?lang=us\",\"added\":\"2023-09-05T00:14:35.340Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:35.414Z\",\"url\":\"https://radiopaedia.org/edits?lang=us\",\"added\":\"2023-09-05T00:14:35.335Z\",\"depth\":1}"]}}
{"logLevel":"info","timestamp":"2023-09-05T00:14:59.561Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us","workerid":3}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:00.023Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/cases/solitary-fibrous-tumor-of-the-dura-4?lang=us","workerid":5}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:00.024Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/cases/solitary-fibrous-tumor-of-the-dura-4?lang=us returned status code 429","page":"https://radiopaedia.org/cases/solitary-fibrous-tumor-of-the-dura-4?lang=us","workerid":5}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:00.024Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/cases/solitary-fibrous-tumor-of-the-dura-4?lang=us returned status code 429","stack":"Error: Page https://radiopaedia.org/cases/solitary-fibrous-tumor-of-the-dura-4?lang=us returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 5)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/cases/solitary-fibrous-tumor-of-the-dura-4?lang=us","workerid":5}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:00.027Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/cases/solitary-fibrous-tumor-of-the-dura-4?lang=us","workerid":5}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.054Z","context":"worker","message":"Starting page","details":{"workerid":5,"page":"https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.055Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":11,"total":410,"pending":6,"failed":3,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:14:58.308Z\",\"url\":\"https://radiopaedia.org/about\",\"added\":\"2023-09-05T00:14:35.347Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:59.381Z\",\"url\":\"https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us\",\"added\":\"2023-09-05T00:14:35.348Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:48.711Z\",\"url\":\"https://radiopaedia.org/quizzes/all?lang=us\",\"added\":\"2023-09-05T00:14:35.338Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.054Z\",\"url\":\"https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us\",\"added\":\"2023-09-05T00:14:35.349Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:51.911Z\",\"url\":\"https://radiopaedia.org/?lang=us\",\"added\":\"2023-09-05T00:14:35.340Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:35.414Z\",\"url\":\"https://radiopaedia.org/edits?lang=us\",\"added\":\"2023-09-05T00:14:35.335Z\",\"depth\":1}"]}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:00.083Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/about","workerid":4}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:00.083Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/about returned status code 429","page":"https://radiopaedia.org/about","workerid":4}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:00.084Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/about returned status code 429","stack":"Error: Page https://radiopaedia.org/about returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 4)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/about","workerid":4}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:00.085Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/about","workerid":4}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.103Z","context":"worker","message":"Starting page","details":{"workerid":4,"page":"https://radiopaedia.org/feature_images/previous?lang=us"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.104Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":12,"total":410,"pending":6,"failed":4,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.103Z\",\"url\":\"https://radiopaedia.org/feature_images/previous?lang=us\",\"added\":\"2023-09-05T00:14:35.349Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:59.381Z\",\"url\":\"https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us\",\"added\":\"2023-09-05T00:14:35.348Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:48.711Z\",\"url\":\"https://radiopaedia.org/quizzes/all?lang=us\",\"added\":\"2023-09-05T00:14:35.338Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.054Z\",\"url\":\"https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us\",\"added\":\"2023-09-05T00:14:35.349Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:51.911Z\",\"url\":\"https://radiopaedia.org/?lang=us\",\"added\":\"2023-09-05T00:14:35.340Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:35.414Z\",\"url\":\"https://radiopaedia.org/edits?lang=us\",\"added\":\"2023-09-05T00:14:35.335Z\",\"depth\":1}"]}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.249Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us","workerid":5}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:00.264Z","context":"general","message":"Invalid Page - URL must start with http:// or https://","details":{"url":"javascript:;","page":"https://radiopaedia.org/edits?lang=us","workerid":0}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.272Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/feature_images/previous?lang=us","workerid":4}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:00.288Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us","workerid":3}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:00.288Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us returned status code 429","page":"https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us","workerid":3}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:00.288Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us returned status code 429","stack":"Error: Page https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 3)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us","workerid":3}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:00.289Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/users/jose-roberto-montanez-sauceda?lang=us","workerid":3}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.308Z","context":"worker","message":"Starting page","details":{"workerid":3,"page":"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.310Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":13,"total":437,"pending":6,"failed":5,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.103Z\",\"url\":\"https://radiopaedia.org/feature_images/previous?lang=us\",\"added\":\"2023-09-05T00:14:35.349Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.308Z\",\"url\":\"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:48.711Z\",\"url\":\"https://radiopaedia.org/quizzes/all?lang=us\",\"added\":\"2023-09-05T00:14:35.338Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.054Z\",\"url\":\"https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us\",\"added\":\"2023-09-05T00:14:35.349Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:51.911Z\",\"url\":\"https://radiopaedia.org/?lang=us\",\"added\":\"2023-09-05T00:14:35.340Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:35.414Z\",\"url\":\"https://radiopaedia.org/edits?lang=us\",\"added\":\"2023-09-05T00:14:35.335Z\",\"depth\":1}"]}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:00.319Z","context":"general","message":"Invalid Page - URL must start with http:// or https://","details":{"url":"javascript:;","page":"https://radiopaedia.org/edits?lang=us","workerid":0}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.355Z","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://radiopaedia.org/edits?lang=us"],"page":"https://radiopaedia.org/edits?lang=us","workerid":0}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.355Z","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://radiopaedia.org/edits?lang=us","page":"https://radiopaedia.org/edits?lang=us","workerid":0}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.357Z","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://radiopaedia.org/edits?lang=us","page":"https://radiopaedia.org/edits?lang=us","workerid":0}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.357Z","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://radiopaedia.org/edits?lang=us","workerid":0}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.358Z","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://radiopaedia.org/edits?lang=us","workerid":0}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.379Z","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.380Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":14,"total":494,"pending":6,"failed":5,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.103Z\",\"url\":\"https://radiopaedia.org/feature_images/previous?lang=us\",\"added\":\"2023-09-05T00:14:35.349Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.378Z\",\"url\":\"https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.308Z\",\"url\":\"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:48.711Z\",\"url\":\"https://radiopaedia.org/quizzes/all?lang=us\",\"added\":\"2023-09-05T00:14:35.338Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.054Z\",\"url\":\"https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us\",\"added\":\"2023-09-05T00:14:35.349Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:51.911Z\",\"url\":\"https://radiopaedia.org/?lang=us\",\"added\":\"2023-09-05T00:14:35.340Z\",\"depth\":1}"]}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.472Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg","workerid":3}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:00.510Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference","workerid":0}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:01.069Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us","workerid":5}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:01.069Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us returned status code 429","page":"https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us","workerid":5}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:01.069Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us returned status code 429","stack":"Error: Page https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 5)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us","workerid":5}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:01.070Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/articles/solitary-fibrous-tumour-of-the-dura?lang=us","workerid":5}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.072Z","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://radiopaedia.org/?lang=us"],"page":"https://radiopaedia.org/?lang=us","workerid":1}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.072Z","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://radiopaedia.org/?lang=us","page":"https://radiopaedia.org/?lang=us","workerid":1}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.074Z","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://radiopaedia.org/?lang=us","page":"https://radiopaedia.org/?lang=us","workerid":1}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.076Z","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://radiopaedia.org/?lang=us","workerid":1}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.077Z","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://radiopaedia.org/?lang=us","workerid":1}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.127Z","context":"worker","message":"Starting page","details":{"workerid":5,"page":"https://radiopaedia.org/podcast"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.128Z","context":"worker","message":"Starting page","details":{"workerid":1,"page":"https://radiopaedia.org/articles/playlists-1"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.129Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":16,"total":494,"pending":6,"failed":6,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.127Z\",\"url\":\"https://radiopaedia.org/podcast\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.127Z\",\"url\":\"https://radiopaedia.org/articles/playlists-1\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.103Z\",\"url\":\"https://radiopaedia.org/feature_images/previous?lang=us\",\"added\":\"2023-09-05T00:14:35.349Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.378Z\",\"url\":\"https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.308Z\",\"url\":\"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:48.711Z\",\"url\":\"https://radiopaedia.org/quizzes/all?lang=us\",\"added\":\"2023-09-05T00:14:35.338Z\",\"depth\":1}"]}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.129Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":16,"total":494,"pending":6,"failed":6,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.127Z\",\"url\":\"https://radiopaedia.org/podcast\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.127Z\",\"url\":\"https://radiopaedia.org/articles/playlists-1\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.103Z\",\"url\":\"https://radiopaedia.org/feature_images/previous?lang=us\",\"added\":\"2023-09-05T00:14:35.349Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.378Z\",\"url\":\"https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.308Z\",\"url\":\"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:14:48.711Z\",\"url\":\"https://radiopaedia.org/quizzes/all?lang=us\",\"added\":\"2023-09-05T00:14:35.338Z\",\"depth\":1}"]}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.241Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/podcast","workerid":5}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.302Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/articles/playlists-1","workerid":1}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:01.504Z","context":"general","message":"Invalid Page - URL must start with http:// or https://","details":{"url":"javascript:;","page":"https://radiopaedia.org/quizzes/all?lang=us","workerid":2}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:01.549Z","context":"general","message":"Invalid Page - URL must start with http:// or https://","details":{"url":"javascript:;","page":"https://radiopaedia.org/quizzes/all?lang=us","workerid":2}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.678Z","context":"behavior","message":"Running behaviors","details":{"frames":1,"frameUrls":["https://radiopaedia.org/quizzes/all?lang=us"],"page":"https://radiopaedia.org/quizzes/all?lang=us","workerid":2}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.679Z","context":"behavior","message":"Run Script Started","details":{"frameUrl":"https://radiopaedia.org/quizzes/all?lang=us","page":"https://radiopaedia.org/quizzes/all?lang=us","workerid":2}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.692Z","context":"behavior","message":"Run Script Finished","details":{"frameUrl":"https://radiopaedia.org/quizzes/all?lang=us","page":"https://radiopaedia.org/quizzes/all?lang=us","workerid":2}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.693Z","context":"behavior","message":"Behaviors finished","details":{"finished":1,"page":"https://radiopaedia.org/quizzes/all?lang=us","workerid":2}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.694Z","context":"pageStatus","message":"Page Finished","details":{"loadState":4,"page":"https://radiopaedia.org/quizzes/all?lang=us","workerid":2}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.732Z","context":"worker","message":"Starting page","details":{"workerid":2,"page":"https://radiopaedia.org/courses/editing-radiopaedia-articles"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:01.733Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":17,"total":546,"pending":6,"failed":6,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.127Z\",\"url\":\"https://radiopaedia.org/podcast\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.732Z\",\"url\":\"https://radiopaedia.org/courses/editing-radiopaedia-articles\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.127Z\",\"url\":\"https://radiopaedia.org/articles/playlists-1\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.103Z\",\"url\":\"https://radiopaedia.org/feature_images/previous?lang=us\",\"added\":\"2023-09-05T00:14:35.349Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.378Z\",\"url\":\"https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.308Z\",\"url\":\"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}"]}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:01.974Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference","workerid":0}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:01.975Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference returned status code 429","page":"https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference","workerid":0}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:01.975Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference returned status code 429","stack":"Error: Page https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 0)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference","workerid":0}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:01.975Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/courses/radiopaedia-2023-virtual-conference","workerid":0}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.005Z","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://radiopaedia.org/impact"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.006Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":18,"total":546,"pending":6,"failed":7,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.127Z\",\"url\":\"https://radiopaedia.org/podcast\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.732Z\",\"url\":\"https://radiopaedia.org/courses/editing-radiopaedia-articles\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.127Z\",\"url\":\"https://radiopaedia.org/articles/playlists-1\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.103Z\",\"url\":\"https://radiopaedia.org/feature_images/previous?lang=us\",\"added\":\"2023-09-05T00:14:35.349Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.005Z\",\"url\":\"https://radiopaedia.org/impact\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.308Z\",\"url\":\"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}"]}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.177Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/courses/editing-radiopaedia-articles","workerid":2}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.270Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/impact","workerid":0}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.322Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/feature_images/previous?lang=us","workerid":4}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.323Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/feature_images/previous?lang=us returned status code 429","page":"https://radiopaedia.org/feature_images/previous?lang=us","workerid":4}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.323Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/feature_images/previous?lang=us returned status code 429","stack":"Error: Page https://radiopaedia.org/feature_images/previous?lang=us returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 4)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/feature_images/previous?lang=us","workerid":4}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:02.325Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/feature_images/previous?lang=us","workerid":4}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.374Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/podcast","workerid":5}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.374Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/podcast returned status code 429","page":"https://radiopaedia.org/podcast","workerid":5}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.374Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/podcast returned status code 429","stack":"Error: Page https://radiopaedia.org/podcast returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 5)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/podcast","workerid":5}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:02.375Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/podcast","workerid":5}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.639Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/articles/playlists-1","workerid":1}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.642Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/articles/playlists-1 returned status code 429","page":"https://radiopaedia.org/articles/playlists-1","workerid":1}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.642Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/articles/playlists-1 returned status code 429","stack":"Error: Page https://radiopaedia.org/articles/playlists-1 returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 1)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/articles/playlists-1","workerid":1}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:02.644Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/articles/playlists-1","workerid":1}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.662Z","context":"worker","message":"Starting page","details":{"workerid":4,"page":"https://radiopaedia.org/courses/help-creating-cases"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.676Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":21,"total":546,"pending":5,"failed":10,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.363Z\",\"url\":\"https://radiopaedia.org/courses/help-creating-cases\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.732Z\",\"url\":\"https://radiopaedia.org/courses/editing-radiopaedia-articles\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.005Z\",\"url\":\"https://radiopaedia.org/impact\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.308Z\",\"url\":\"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.415Z\",\"url\":\"https://radiopaedia.org/courses/help-multiple-choice-questions\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}"]}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.679Z","context":"worker","message":"Starting page","details":{"workerid":5,"page":"https://radiopaedia.org/courses/help-multiple-choice-questions"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.684Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":21,"total":546,"pending":5,"failed":10,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.363Z\",\"url\":\"https://radiopaedia.org/courses/help-creating-cases\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.732Z\",\"url\":\"https://radiopaedia.org/courses/editing-radiopaedia-articles\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.005Z\",\"url\":\"https://radiopaedia.org/impact\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.308Z\",\"url\":\"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.415Z\",\"url\":\"https://radiopaedia.org/courses/help-multiple-choice-questions\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}"]}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.686Z","context":"worker","message":"Starting page","details":{"workerid":1,"page":"https://radiopaedia.org/peer-review-policy"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.687Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":21,"total":546,"pending":6,"failed":10,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.363Z\",\"url\":\"https://radiopaedia.org/courses/help-creating-cases\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.732Z\",\"url\":\"https://radiopaedia.org/courses/editing-radiopaedia-articles\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.005Z\",\"url\":\"https://radiopaedia.org/impact\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:00.308Z\",\"url\":\"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg\",\"added\":\"2023-09-05T00:14:35.350Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.684Z\",\"url\":\"https://radiopaedia.org/peer-review-policy\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.415Z\",\"url\":\"https://radiopaedia.org/courses/help-multiple-choice-questions\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}"]}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.760Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg","workerid":3}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.760Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/articles/general-overview-of-radiopaediaorg returned status code 429","page":"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg","workerid":3}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:02.760Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/articles/general-overview-of-radiopaediaorg returned status code 429","stack":"Error: Page https://radiopaedia.org/articles/general-overview-of-radiopaediaorg returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 3)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg","workerid":3}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:02.761Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/articles/general-overview-of-radiopaediaorg","workerid":3}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.797Z","context":"worker","message":"Starting page","details":{"workerid":3,"page":"https://radiopaedia.org/continuing-medical-education-cme"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.798Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":22,"total":546,"pending":6,"failed":11,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.363Z\",\"url\":\"https://radiopaedia.org/courses/help-creating-cases\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.732Z\",\"url\":\"https://radiopaedia.org/courses/editing-radiopaedia-articles\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.796Z\",\"url\":\"https://radiopaedia.org/continuing-medical-education-cme\",\"added\":\"2023-09-05T00:14:35.353Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.005Z\",\"url\":\"https://radiopaedia.org/impact\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.684Z\",\"url\":\"https://radiopaedia.org/peer-review-policy\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.415Z\",\"url\":\"https://radiopaedia.org/courses/help-multiple-choice-questions\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}"]}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:02.975Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/courses/help-multiple-choice-questions","workerid":5}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:03.023Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/peer-review-policy","workerid":1}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:03.026Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/courses/help-creating-cases","workerid":4}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:03.050Z","context":"general","message":"Awaiting page load","details":{"page":"https://radiopaedia.org/continuing-medical-education-cme","workerid":3}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:04.781Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/peer-review-policy","workerid":1}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:04.781Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/peer-review-policy returned status code 429","page":"https://radiopaedia.org/peer-review-policy","workerid":1}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:04.781Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/peer-review-policy returned status code 429","stack":"Error: Page https://radiopaedia.org/peer-review-policy returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 1)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/peer-review-policy","workerid":1}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:04.782Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/peer-review-policy","workerid":1}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:04.799Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/impact","workerid":0}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:04.799Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/impact returned status code 429","page":"https://radiopaedia.org/impact","workerid":0}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:04.800Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/impact returned status code 429","stack":"Error: Page https://radiopaedia.org/impact returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 0)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/impact","workerid":0}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:04.800Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/impact","workerid":0}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:04.816Z","context":"worker","message":"Starting page","details":{"workerid":1,"page":"https://radiopaedia.org/editors"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:04.817Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":24,"total":546,"pending":5,"failed":13,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:04.815Z\",\"url\":\"https://radiopaedia.org/editors\",\"added\":\"2023-09-05T00:14:35.353Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.363Z\",\"url\":\"https://radiopaedia.org/courses/help-creating-cases\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.732Z\",\"url\":\"https://radiopaedia.org/courses/editing-radiopaedia-articles\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.796Z\",\"url\":\"https://radiopaedia.org/continuing-medical-education-cme\",\"added\":\"2023-09-05T00:14:35.353Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.415Z\",\"url\":\"https://radiopaedia.org/courses/help-multiple-choice-questions\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}"]}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:04.844Z","context":"worker","message":"Starting page","details":{"workerid":0,"page":"https://radiopaedia.org/radiopaedia-educational-board"}}
{"logLevel":"info","timestamp":"2023-09-05T00:15:04.850Z","context":"crawlStatus","message":"Crawl statistics","details":{"crawled":24,"total":546,"pending":6,"failed":13,"limit":{"max":0,"hit":false},"pendingPages":["{\"seedId\":0,\"started\":\"2023-09-05T00:15:04.815Z\",\"url\":\"https://radiopaedia.org/editors\",\"added\":\"2023-09-05T00:14:35.353Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.363Z\",\"url\":\"https://radiopaedia.org/courses/help-creating-cases\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:01.732Z\",\"url\":\"https://radiopaedia.org/courses/editing-radiopaedia-articles\",\"added\":\"2023-09-05T00:14:35.351Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:04.843Z\",\"url\":\"https://radiopaedia.org/radiopaedia-educational-board\",\"added\":\"2023-09-05T00:14:35.353Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.796Z\",\"url\":\"https://radiopaedia.org/continuing-medical-education-cme\",\"added\":\"2023-09-05T00:14:35.353Z\",\"depth\":1}","{\"seedId\":0,\"started\":\"2023-09-05T00:15:02.415Z\",\"url\":\"https://radiopaedia.org/courses/help-multiple-choice-questions\",\"added\":\"2023-09-05T00:14:35.352Z\",\"depth\":1}"]}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:04.905Z","context":"general","message":"Page Load Error, skipping page","details":{"statusCode":429,"page":"https://radiopaedia.org/courses/editing-radiopaedia-articles","workerid":2}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:04.906Z","context":"general","message":"Page Load Error, skipping page","details":{"msg":"Page https://radiopaedia.org/courses/editing-radiopaedia-articles returned status code 429","page":"https://radiopaedia.org/courses/editing-radiopaedia-articles","workerid":2}}
{"logLevel":"error","timestamp":"2023-09-05T00:15:04.906Z","context":"worker","message":"Worker Exception","details":{"type":"exception","message":"Page https://radiopaedia.org/courses/editing-radiopaedia-articles returned status code 429","stack":"Error: Page https://radiopaedia.org/courses/editing-radiopaedia-articles returned status code 429\n    at Crawler.loadPage (file:///app/crawler.js:1083:17)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async Crawler.default [as driver] (file:///app/defaultDriver.js:3:3)\n    at async Crawler.crawlPage (file:///app/crawler.js:451:5)\n    at async PageWorker.timedCrawlPage (file:///app/util/worker.js:165:7)\n    at async PageWorker.runLoop (file:///app/util/worker.js:206:9)\n    at async PageWorker.run (file:///app/util/worker.js:187:7)\n    at async Promise.allSettled (index 2)\n    at async Crawler.crawl (file:///app/crawler.js:793:5)\n    at async Crawler.run (file:///app/crawler.js:311:7)","page":"https://radiopaedia.org/courses/editing-radiopaedia-articles","workerid":2}}
{"logLevel":"warn","timestamp":"2023-09-05T00:15:04.907Z","context":"pageStatus","message":"Page Load Failed","details":{"loadState":1,"page":"https://radiopaedia.org/courses/editing-radiopaedia-articles","workerid":2}}

The crawler could be enhanced by:

  • detecting HTTP 429 errors, and in such situation waiting some time (configurable) before continuing
  • retrying the same page on HTTP 429 errors (the page is available, the website just asked us to slow down)
  • some websites are even returning an HTTP header Retry-After indicating how long the user agent should wait, could be great to use them
  • counting the number of HTTP 429 errors and finishing the crawl early if too many of them have been returned in a row (configurable), to not continue to overwhelm a website
@benoit74
Copy link
Contributor Author

FYI, I finally have a repro of #387, but this is way better handled as stated in this issue:

  • Cloudflare is responding with an HTTP 429
  • Cloudflare is returning a Retry-After header with a decent value of 60 seconds, which progressively decreases (59 secs, 57 secs, ...) as the crawler does not respect this parameter

I'm working on a PR, so you could assign me this issue.

@benoit74 benoit74 linked a pull request Sep 25, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Triage
Development

Successfully merging a pull request may close this issue.

1 participant