Implement Automatic Memory Management in Playwright for Enhanced Stability in Web Crawling Operations #2308
wojtekKrol
started this conversation in
General
Replies: 1 comment 2 replies
-
I am not sure what you are proposing here (you just said you want us to add a magic option to deal with memory leaks if I read things right), but I can say one thing - we at Apify use playwright on a daily basis, for scrapes that can take days, sometimes even weeks. So if you have issues with memory, it's probably a memory leak in your code, not something to fix at our end. We'll be happy to help with a complete reproduction of your problem, but don't expect us to add some magical toggles like this. There is no "playwright url cache" we just need to clear. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Which package is the feature request for? If unsure which one to select, leave blank
@crawlee/playwright (PlaywrightCrawler)
Feature
Dear Development Team,
I am encountering a significant challenge with our web crawling application that utilizes Playwright. The core issue pertains to excessive memory consumption leading to system instability.
The application's primary function is to perform web crawling in headless mode. It successfully processes a considerable number of websites (approximately 1000-1500) before encountering performance degradation. Post this threshold, the application experiences a drastic slowdown, ultimately culminating in a freeze. This has necessitated a complete system reboot, as the application becomes unresponsive.
Detailed Workflow:
A critical observation is the memory usage reported by Playwright. Logs indicate that memory consumption reaches 400%, translating to roughly 800MB. This is a clear indication of resource overutilization, leading to the aforementioned issues.
The objective of this request is to explore potential strategies to optimize memory usage without restarting the entire application. Any insights or suggestions on memory management techniques, code optimizations, or best practices in this context would be immensely beneficial.
Thank you for your attention to this matter.
Best regards
Motivation
The motivation for requesting this feature is to enhance the stability and efficiency of the application by optimizing memory usage, thereby preventing crashes and ensuring smoother, more reliable web crawling operations.
Ideal solution or implementation, and any additional constraints
Ideal Solution:
The ideal solution involves integrating a feature in Playwright, possibly through its constructor, that allows for automatic memory management. This could be implemented as a flag or an object within the Playwright constructor, which enables the clearing of cache or freeing up memory at specified intervals or after processing a certain number of URLs.
Implementation:
Automatic Memory Clearing Option: Introduce a parameter like autoMemoryManagement in the Playwright constructor. This parameter could accept an object defining the conditions for triggering memory cleanup, such as time intervals or the number of URLs processed.
Example:
Alternative solutions or implementations
Dynamic Resource Allocation: The feature should dynamically monitor memory usage and adjust its behavior based on current system resource availability, ensuring optimal performance without manual intervention.
Customizable Settings: Offer customization for different levels of memory cleanup, enabling users to balance between performance and memory usage based on their specific requirements.
Other context
No response
Beta Was this translation helpful? Give feedback.
All reactions