You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using AsyncIterator i have a substential memory leak when used in for-x-of-y
I need this when scraping a HTML-Page which includes the information about the next HTML-Page to be scraped:
Scrap Data
Evaluate Data
Scrape Next Data
The async Part is needed since axios is used to obtain the HTML
Here is a repro, which allows to see the memory rising von ~4MB to ~25MB at the end of the script. The memory is not freed till the program terminates.
a references b references c
The Request of the ScraperBrowser must be completed in order to obtain the next Dataset.
To reduce http requests it would be nice to return the html from the Browser and give it to the Parser.
Data which has an index:
index references n urls
The Request of the ScraperBrowser returns n datasets which might be paginated. The Datasets are not needed to obtain the next ones.
To enable parallelization it would be nice to return Promises for the HTML-Requests from the Browser and give it to the Parser.
🐛 Bugreport
When using AsyncIterator i have a substential memory leak when used in for-x-of-y
I need this when scraping a HTML-Page which includes the information about the next HTML-Page to be scraped:
The async Part is needed since axios is used to obtain the HTML
Here is a repro, which allows to see the memory rising von ~4MB to ~25MB at the end of the script. The memory is not freed till the program terminates.
It looks like that the
data
of the for-await-x-of-y is dangling in memory. The callstack gets huge aswell.In the repro the Problem could still be handled. But for my actual code a whole HTML-Page stays in memory which is ~250kb each call.
In this screenshot you can see the heap memory on the first iteration compared to the heap memory after the last iteration
The expected workflow would be the following:
I am unsure an AsyncIterator is the right choice here to archive what is needed.
Any help/hint would be appriciated!
See: https://stackoverflow.com/questions/58454833/for-await-x-of-y-using-an-asynciterator-causes-memory-leak
The text was updated successfully, but these errors were encountered: