Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use gc.collect only when needed to avoid slow downs #2483

Merged
merged 2 commits into from
Jun 11, 2021

Conversation

lhoestq
Copy link
Member

@lhoestq lhoestq commented Jun 11, 2021

In 42320a1 we added a call to gc.collect to resolve some issues on windows (see #2482)

However calling gc.collect too often causes significant slow downs (the CI run time doubled).
So I just moved the gc.collect call to the exact place where it's actually needed: when post-processing a dataset

@lhoestq lhoestq changed the title Use gc.collect only when needed Use gc.collect only when needed to avoid slow downs Jun 11, 2021
@lhoestq lhoestq merged commit 5ba1497 into master Jun 11, 2021
@lhoestq lhoestq deleted the only-gc-collect-when-needed branch June 11, 2021 15:31
@albertvillanova
Copy link
Member

I continue thinking that the origin of the issue has to do with tqdm (and not with Arrow): this issue only arises for version 4.50.0 (and later) of tqdm, not for previous versions of tqdm.

My guess is that tqdm made a change from version 4.50.0 that does not properly release the iterable.

@aDrIaNo34500
Copy link

FR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants