Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed figures linger in memory #8519

Closed
cknd opened this issue Apr 20, 2017 · 28 comments
Closed

Closed figures linger in memory #8519

cknd opened this issue Apr 20, 2017 · 28 comments
Milestone

Comments

@cknd
Copy link
Contributor

cknd commented Apr 20, 2017

tl;dr: When I repeatedly create a large figure, save it, and close it, memory usage keeps growing.

Over at this discussion about when MPL should trigger garbage collection, @efiring had some lingering doubts about the chosen solution:

It would certainly be good to have a clearer understanding of when, if ever in practice, it would lead to troublesome increases in memory consumption

I ran into such a case today, when my batch job filled up 60G of RAM over night.

I repeatedly create a large figure, save it, then close it. If I don't manually call gc.collect() after closing each figure, memory consumption saturates at around 10x of what an individual figure needs. In my case, with several fairly complex figures, this was enough to fill a big machine.

Since this is not obvious from the docs, I think there should be an official way to go back to more aggressive GC for cases like this where the trade-off discussed at #3045 fails. Maybe close(force_gc=True)?

Code for reproduction

image

from memory_profiler import profile # https://pypi.python.org/pypi/memory_profiler
from memory_profiler import memory_usage

import matplotlib.pyplot as plt
import numpy as np
import gc

N = 80

@profile
def do_plots():
    fig = plt.figure()
    plt.plot(np.random.rand(50000))
    plt.savefig('/tmp/bla.png')
    plt.close(fig)

def default():
    for k in range(N):
        print(k)
        do_plots()

def manual_gc():
    for k in range(N):
        print(k)
        do_plots()
        gc.collect()


mem_manual_gc = memory_usage((manual_gc, [], {}))
mem_default = memory_usage((default, [], {}))


plt.plot(mem_manual_gc, label='gc.collect() after close')
plt.plot(mem_default, label='default behaviour')
plt.ylabel('MB')
plt.xlabel('time (in s * 0.1)')  # `memory_usage` logs every 100ms
plt.legend()
plt.title('memory usage')
plt.show()

Matplotlib version

  • Operating System: Ubuntu 16.10
  • Matplotlib Version: 2.0.0
  • Python Version: 3.5.2
@tacaswell
Copy link
Member

The underlying issue is that there are circular references between the figure and the axes.

We could make some of those weak refs (and patch over it with properties), but that would be a pretty big effort to find all of the references back and forth. However, if we did this we would put a bunch of extra work on the user to make sure stuff did not get erroneously garbage collected.

I am not sure that adding a kwarg to close is any clearer to users than having to do the gc them selves.

@cknd
Copy link
Contributor Author

cknd commented Apr 20, 2017

Right, that does sound like a tangled ball of wool.

I am not sure that adding a kwarg to close is any clearer to users than having to do the gc them selves.

I appreaciate the messiness of a kwarg (or rcparam) like that. On the other hand, it would have saved me an afternoon of poking around with memory_profile if there was something that signaled "here be dragons, and here's the one official way to hack around them". Should I write a paragraph for the docs?

@tacaswell
Copy link
Member

A paragraph for the docs would be appreciated. I am not sure where the best place to put it is though. Maybe in the faq section?

@efiring
Copy link
Member

efiring commented Apr 21, 2017

Maybe this wouldn't help, but: Suppose that when closing a figure we walked through the hierarchy of children, and for each child artist, set its figure reference to None. Would that break the circles?

@WeatherGod
Copy link
Member

WeatherGod commented Apr 22, 2017 via email

@anntzer
Copy link
Contributor

anntzer commented Apr 22, 2017

This proposal (which I agree with) is related to #6793 and #6982.

@jeffheaton
Copy link

What is the best practice, if you are looping and generating a large number of images with matplotlib, what is the solution? Is there a way to just force matplotlib to unload any memory that it has? So far, the only way I can do this is put each call to matplotlib in a separate process, but I would like to think I could generate multiple plots in a single process (or notebook). I've tried various combos of .clf() .close(figure) and python del's but it still leaks and eventually crashes.

@cknd
Copy link
Contributor Author

cknd commented Apr 14, 2019

Is there a way to just force matplotlib to unload any memory that it has?

From what I found back then, it helps to manually trigger garbage collection after closing each figure, with gc.collect() (see the code snippet above). That should find & remove the circular object graph associated with the closed figure.

@jeffheaton
Copy link

jeffheaton commented Apr 14, 2019

I am forcing a garbage collect, but no luck. I still run out of memory and crash. So far the only workarounds I have are:

  • Each figure in its own process (this works the best, but is the most complex)
  • Reuse figures as much as possible (just pass them around and update subplots and other items inside of already allocated objects. (this also works well, there are still leaks, but I can usually keep the ship afloat long enough to finish)

@efiring efiring added this to the v3.2.0 milestone Apr 14, 2019
@WeatherGod
Copy link
Member

WeatherGod commented Apr 15, 2019 via email

@jklymak
Copy link
Member

jklymak commented Apr 15, 2019

so closing a figure doesn't remove references to it in the pyplot state manager?

@WeatherGod
Copy link
Member

WeatherGod commented Apr 15, 2019 via email

@efiring efiring added the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label Apr 15, 2019
@tacaswell tacaswell removed the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label Sep 10, 2019
@tacaswell tacaswell modified the milestones: v3.2.0, v3.3.0 Sep 10, 2019
@adeeb10abbas
Copy link

adeeb10abbas commented Apr 3, 2020

Turn the interactive mode off before running any of your loops. It did the trick for me.
plt.ioff()

@QuLogic QuLogic modified the milestones: v3.3.0, v3.4.0 May 7, 2020
bittremieux added a commit to mwang87/MetabolomicsSpectrumResolver that referenced this issue May 26, 2020
Memory leaks due to Matplotlib circular dependencies: matplotlib/matplotlib#8519
@brennmat
Copy link

What happens if you do not include a legend in your plots? See issue #19345.

@QuLogic QuLogic modified the milestones: v3.4.0, v3.5.0 Jan 27, 2021
@analkumar2
Copy link

analkumar2 commented Mar 4, 2021

What worked for me was to use fig.savefig() instead of plt.savefig(). And then close the figure with plt.close().
No need for gc.collect() or anything else.

@adeeb10abbas
Copy link

@analkumar2 OP says they closed it and it still didn't work

@analkumar2
Copy link

analkumar2 commented Mar 5, 2021

@adeeb10abbas The issue is still open. With python 3.6.9, matplotlib 3.3.4 , Ubuntu 20.04, if you run the following code:

import matplotlib.pyplot as plt
import numpy as np

for i in range(100000):
	fig,axs = plt.subplots(1,1,figsize=(19.20,10.80))
	axs.plot([1,2,3])
	plt.savefig(f'temp.png')
	# fig.savefig(f'temp.png')
	plt.close('all')

You will run out of RAM very soon. You've to use fig.savefig() instead of plt.savefig() to avoid the memory leak.
And what's 'OP'?

@adeeb10abbas
Copy link

@analkumar2 I just tested with python 3.8, matplotlib 3.3.4, Ubuntu 20.04 and was not able to reproduce your issue. There was no memory leak. Can you update your python version and see if it works?
OP means original poster/author of the post.

@jklymak
Copy link
Member

jklymak commented Mar 8, 2021

I'm going to close this, because I don't think every memory leak issue should be put in the same issue. #8519 (comment) is quite different from he original post, and I think it is confusing to conflate them.

Feel free to re-ope if the original post is still leaking. However, I couldn't make that run on my machine, so I'm not sure how relevant it is 4 years later.

@jklymak jklymak closed this as completed Mar 8, 2021
@tacaswell
Copy link
Member

If you are doing batch work setting

matplotlib.use('agg')

may also help. With some version of Qt if you never spin the event loop we end up with many windows which are "closed", but still exist and are waiting for the Qt main loop to spin so they can finish deleting themselves!

@analkumar2
Copy link

@adeeb10abbas The issue is still open. With python 3.6.9, matplotlib 3.3.4 , Ubuntu 20.04, if you run the following code:

import matplotlib.pyplot as plt
import numpy as np

for i in range(100000):
	fig,axs = plt.subplots(1,1,figsize=(19.20,10.80))
	axs.plot([1,2,3])
	plt.savefig(f'temp.png')
	# fig.savefig(f'temp.png')
	plt.close('all')

You will run out of RAM very soon. You've to use fig.savefig() instead of plt.savefig() to avoid the memory leak.
And what's 'OP'?

Sorry. Even I cannot reproduce this now. I have not updated my system or any of the packages. I should have saved the memory profiler output when I was having the issue.

bittremieux added a commit to mwang87/MetabolomicsSpectrumResolver that referenced this issue Aug 17, 2021
Memory leaks due to Matplotlib circular dependencies: matplotlib/matplotlib#8519
@vishalmhjn
Copy link

I am facing a similar issue on Ubuntu 18.04, python 3.8.12, matplotlib 3.4.2. The program consumes all RAM and SWAP and then crashes. any ideas? Thanks in advance. Indicative code is below:

for j in df.id.unique():
       ....
       ....
	for plot_times in ['hour']:#['dow', 'hour', 'month', 'day']:
		fig, ax = plt.subplots(3, 1, figsize=(30,8))
		cmap = get_cmap(len(temp[plot_times].unique()))
		col_list = [cmap(i) for i in list(temp[plot_times])]
		for i, attr in enumerate(['a', 'b', 'c']):
			scatter_x = np.array(epoch_seconds)
			scatter_y = np.array(temp[attr])
			group = np.array(temp[plot_times])
			for g in np.sort(np.unique(group)):
				ix = np.where(group == g)
				if plot_times=='hour':
					ax[i].scatter(scatter_x[ix], scatter_y[ix], s=2, color=cmap(g), label = labels[plot_times][g])
				else:
					ax[i].scatter(scatter_x[ix], scatter_y[ix], s=2, color=cmap(g), label = labels[plot_times][g-1])
		ax[1].legend(mode = "expand", ncol = len(labels[plot_times]), prop={'size': 14})
		ax[0].set_ylabel("F")
		ax[1].set_ylabel("O")
		ax[2].set_ylabel("S")
		plt.xlabel("T")
		plt.tight_layout()
		# plt.show()
		fig.savefig("../images/str(j)+".png", dpi=300)
		fig.clf()
		plt.close(fig)
		plt.close('all')

@jklymak
Copy link
Member

jklymak commented Jan 15, 2022

Please open a new issue with a self-contained minimal reproducible example.

@adeeb10abbas
Copy link

@vishalmhjn did you try it with plt.ioff()?

@Prajval-1608
Copy link

@adeeb10abbas The issue is still open. With python 3.6.9, matplotlib 3.3.4 , Ubuntu 20.04, if you run the following code:

import matplotlib.pyplot as plt
import numpy as np

for i in range(100000):
	fig,axs = plt.subplots(1,1,figsize=(19.20,10.80))
	axs.plot([1,2,3])
	plt.savefig(f'temp.png')
	# fig.savefig(f'temp.png')
	plt.close('all')

You will run out of RAM very soon. You've to use fig.savefig() instead of plt.savefig() to avoid the memory leak.
And what's 'OP'?

Sorry. Even I cannot reproduce this now. I have not updated my system or any of the packages. I should have saved the memory profiler output when I was having the issue.

I can reproduce the issue with python 3.7.9 and matplotlib 3.5.1 on Windows 10 version 20H2. Observing the task manager, you can clearly see memory building and after nearly 1-1.5 GB of build up the following error shows "Fail to create pixmap with Tk_GetPixmap in TkImgPhotoInstanceSetSize"

@Prajval-1608
Copy link

It's interesting. I never faced this issue on my old laptop but was surprised when I encountered it on my new laptop. So I had to go back and check the matplotlib version on old laptop which was 3.3.4. So I tried to reproduce the above error on windows 10 20H2, python 3.7.9 and matplotlib 3.3.4 on my new laptop and surprisingly I wasn't able to. Looks like there is some issue with the latest release of matplotlib, I suggest using matplotlib version 3.3.4 if anyone is facing this issue. Hoping this helps in other issues as well.

@jklymak
Copy link
Member

jklymak commented Jan 27, 2022

@Prajval-1608 can you please open a new issue with all the relevant details? Perhaps most pertinent would be to also include what backend you are using....

@jklymak
Copy link
Member

jklymak commented Jan 27, 2022

I'm actually going to lock this to stop the me-too comments on a five-year-old issue. If you think you have a memory leak with a recent matplotlib version that is reproducible, please fill out a new issue with all the relevant information requested. Thanks!

@matplotlib matplotlib locked as resolved and limited conversation to collaborators Jan 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests