Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Temporary Memory Leak with Async Iterators in For Await X of Y #30298

Closed
ulfgebhardt opened this issue Nov 6, 2019 · 16 comments
Closed

Temporary Memory Leak with Async Iterators in For Await X of Y #30298

ulfgebhardt opened this issue Nov 6, 2019 · 16 comments
Labels
stream Issues and PRs related to the stream subsystem.

Comments

@ulfgebhardt
Copy link

ulfgebhardt commented Nov 6, 2019

  • Version: v12.13.0
  • Platform: Linux 5.3.7-arch1-2-ARCH x86_64 GNU/Linux and also Linux 4.4.0-109-generic #132-Ubuntu x86_64 GNU/Linux
  • Subsystem:

In Short

When using an AsyncIterator the Memory is rising drastically. It drops once the Iteration is done.

The x in `for await (x of y) is not freed till the Iteration is done. Also every Promise awaited inside the for-loop is not freed.

I came to the conclusion that the Garbage Collector cannot catch the contents of Iteration, since the Promises generated by the AsyncIterator will only fully resolve once the Iteration is done.
I think this might be a Bug.

Repro & Background

When using AsyncIterator i have a substential memory leak when used in for-x-of-y

I need this when scraping a HTML-Page which includes the information about the next HTML-Page to be scraped:

  1. Scrap Data
  2. Evaluate Data
  3. Scrape Next Data

The async Part is needed since axios is used to obtain the HTML

Here is a repro, which allows to see the memory rising von ~4MB to ~25MB at the end of the script. The memory is not freed till the program terminates.

const scraper = async ():Promise<void> => {
    let browser = new BrowserTest();
    let parser = new ParserTest();

    for await (const data of browser){
        console.log(await parser.parse(data))
    }
}

class BrowserTest {
    private i: number = 0;

    public async next(): Promise<IteratorResult<string>> {
        this.i += 1;
        return {
            done: this.i > 1000,
            value: 'peter '.repeat(this.i)
        }
    }

    [Symbol.asyncIterator](): AsyncIterator<string> {
        return this;
    }
}

class ParserTest {
    public async parse(data: string): Promise<string[]> {
        return data.split(' ');
    }
}

scraper()

It looks like that the data of the for-await-x-of-y is dangling in memory. The callstack gets huge aswell.

In the repro the Problem could still be handled. But for my actual code a whole HTML-Page stays in memory which is ~250kb each call.

In this screenshot you can see the heap memory on the first iteration compared to the heap memory after the last iteration

Cannot post inline Screenshots yet

The expected workflow would be the following:

  • Obtain Data
  • Process Data
  • Extract Info for the next "Obtain Data"
  • Free all Memory from the last "Obtain Data"
  • Use extracted information to restart the loop with new Data obtained.

I am unsure an AsyncIterator is the right choice here to archive what is needed.

Workaround Repro

As workaround to free the contents of the Parser we encapsulate the Result in a lightweight Container. We then free the contents, so only the Container itself remains in Memory.
The data Object cannot be freed even if you use the same technic to encapsulate it - so it seems to be the case when debugging at least.

const scraper = async ():Promise<void> => {
    let browser = new BrowserTest();
    
    for await (const data of browser){
        let parser = new ParserTest();
        let result = await parser.parse(data);
        console.log(result);
        
        /**
         * This avoids memory leaks, due to a garbage collector bug
         * of async iterators in js
         */
        result.free();
    }
}

class BrowserTest {
    private i: number = 0;
    private value: string = "";

    public async next(): Promise<IteratorResult<string>> {
        this.i += 1;
        this.value = 'peter '.repeat(this.i);
        return {
            done: this.i > 1000,
            value: this.value
        }
    }

    public [Symbol.asyncIterator](): AsyncIterator<string> {
        return this;
    }
}

/**
 * Result class for wrapping the result of the parser.
 */
class Result {
    private result: string[] = [];

    constructor(result: string[]){
        this.setResult(result);
    }

    public setResult(result: string[]) {
        this.result = result;
    }

    public getResult(): string[] {
        return this.result;
    }

    public free(): void {
        delete this.result;
    }
}

class ParserTest {
    public async parse(data: string): Promise<Result>{
        let result = data.split(' ');
        return new Result(result);
    }
}

scraper())

Workaround in actual context

What is not shown in the Repro-Solution is that we also try to free the Result of the Iteration itself. This seems not to have any effect tho(?).

public static async scrape<D,M>(scraper: IScraper<D,M>, callback: (data: DataPackage<Object,Object> | null) => Promise<void>) {
        let browser = scraper.getBrowser();
        let parser = scraper.getParser();

        for await (const parserFragment of browser) {
            const fragment = await parserFragment;
            const json = await parser.parse(fragment);
            await callback(json);
            json.free();
            fragment.free();
        }
    }

See: https://github.com/demokratie-live/scapacra/blob/master/src/Scraper.ts
To test with an actual Application: https://github.com/demokratie-live/scapacra-bt (yarn dev ConferenceWeekDetail)

References

Conclusion

We are not sure if the whole issue is a bug or a feature. Any clarifying comment would be appreciated.

@ulfgebhardt ulfgebhardt changed the title Temporary Memory Leak with Async Iterators in For Await Temporary Memory Leak with Async Iterators in For Await X of Y Nov 6, 2019
@bnoordhuis
Copy link
Member

Can you post JS test cases rather than (what I infer are) TS snippets? Thanks.

@dnalborczyk
Copy link
Contributor

dnalborczyk commented Nov 8, 2019

@ulfgebhardt

when I stripped out the type syntax, and ran on node v12.13.0 on macOS, if you keep running your code from above for longer iterations the garbage collector will eventually do it's job as far as I can tell.

if not, it could be a memory leak caused by TypeScript and the transpilation target you chose.

@lpinca lpinca added the stream Issues and PRs related to the stream subsystem. label Nov 8, 2019
@ulfgebhardt
Copy link
Author

@dnalborczyk can you provide the code with stripped out Types? Then I will do another test without Typescript

@dnalborczyk
Copy link
Contributor

dnalborczyk commented Dec 21, 2019

@ulfgebhardt only changed some variables to run longer and write out memory usage.

let counter = 0;

const scraper = async () => {
  let browser = new BrowserTest();
  let parser = new ParserTest();

  console.log(process.memoryUsage());

  for await (const data of browser) {
    await parser.parse(data);

    if (++counter % 100000 === 0) {
      console.log("counter", counter);
      console.log(process.memoryUsage());
    }

    // console.log(await parser.parse(data));
  }

  setTimeout(() => console.log(process.memoryUsage()), 10000);
};

class BrowserTest {
  constructor() {
    this.i = 0;
  }

  async next() {
    this.i += 1;
    return {
      done: this.i > 10000000,
      value: "peter ".repeat(1000) // 100000000
    };
  }

  [Symbol.asyncIterator]() {
    return this;
  }
}

class ParserTest {
  parse(data) {
    // console.log(data);
    return data.split(" ");
  }
}

scraper();

@Saiv46
Copy link

Saiv46 commented Mar 17, 2020

Same problem I encountered when I tried to benchmark streams:

(node:10224) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 end listeners added to [Serializer]. Use emitter.setMaxListeners() to increase limit
(node:10224) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 finish listeners added to [Serializer]. Use emitter.setMaxListeners() to increase limit
(node:10224) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 error listeners added to [Serializer]. Use emitter.setMaxListeners() to increase limit
(node:10224) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 close listeners added to [Serializer]. Use emitter.setMaxListeners() to increase limit
(node:10224) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 readable listeners added to [Serializer]. Use emitter.setMaxListeners() to increase limit

Code:

export function benchStream (stream, values) {
  return async () => {
    values.forEach(v => stream.write(v))
    for await (const _ of stream) {
      if (_) continue
    }
  }
}

@Legogris
Copy link

This still seems present in 14.3.0. Severe memory leak when using async iterators.

@TomVance
Copy link

Hey all 👋

Currently hit this same issue I believe and wanted to check-in if there has been any progress or thoughts around work around. We're currently processing a large dataset through an async generator and coming up against memory leaks.

Thanks for all the great work
Tom

@ulfgebhardt
Copy link
Author

ulfgebhardt commented Dec 22, 2020

Hey @TomVance ,
this issue is dormant from my side - tho if you have the problem currently what you can do is to follow the suggestion above:
Remove the typescript and see if it still behaves the same way.

Other things you can do are:

  • Determine whos responsible - nodejs, typescript, environment used or ???
  • Provide an easier repro for this (maybe a jsfiddle) - the example of @Saiv46 is pretty good already tho

Doing this may increase the chances of devs fixing it. If its not a problem of the libararies but rather our implementation you may discover a solution you can share here as well.

My solution was a workaround - make a lightweight container and free its contents - the container keeps dangling in memory till the iteration is finished - but since its small it will not cause too much trouble.

@ronag
Copy link
Member

ronag commented Dec 24, 2020

Can somebody confirm whether this is a problem in v15?

@TomVance
Copy link

TomVance commented Dec 24, 2020

@ulfgebhardt @ronag

I cannot 100% confirm if this was now the issue I was seeing. I have since done 2 things:

  1. Bumped the version of TypeORM I was using. After running some checks via node-clinic (https://github.com/clinicjs/node-clinic). It highlighted some potential issues there.

  2. Bumped to NodeV15, unfortunately, I do not have a build in-between so cannot verify which of these two changes had the largest impact here.

However, my hunch is point 1 from another issue threadsI have dived into.

As always many thanks with the great work here all, and a merry xmas :)

@SeanLMcCullough
Copy link

Came across this same issue today with Mongoose queries using for await from an async iterable query. Turns out that because of this bug, just querying the entire set into memory is more efficient than having the memory leak of the for await loop.

@targos
Copy link
Member

targos commented Mar 4, 2022

@nodejs/streams

@ronag
Copy link
Member

ronag commented Mar 4, 2022

@SeanLMcCullough What version of node are you running and do you have a minimal repro?

@benjamingr
Copy link
Member

This is an issue in libraries incorrectly implementing (like in Mongoose's case) async iterability. Please open the issues upstream.

With "raw" streams or async generators I see no memory leak in Node.js master.

@mspensieri
Copy link

This is an issue in libraries incorrectly implementing (like in Mongoose's case) async iterability

@benjamingr do you have examples or documentation of how to implement it "correctly" vs "incorrectly"? What specific patterns should we be avoiding to ensure the async generator is not subject to the memory issues reported in this thread?

@benjamingr
Copy link
Member

@mspensieri basically "prefer async generators over manual implementations which are trickier, be careful around cleanup and always cleanup in finally clauses in those generators, never yield/await in the finally though"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

No branches or pull requests