Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lots of memory leaks everywhere #1286

Closed
lyklev opened this issue Feb 5, 2018 · 8 comments
Closed

Lots of memory leaks everywhere #1286

lyklev opened this issue Feb 5, 2018 · 8 comments

Comments

@lyklev
Copy link

lyklev commented Feb 5, 2018

Version of Singularity:

2.4, master

Expected behavior

Not leak memory.

Actual behavior

Singularity leaks memory everwhere.

Steps to reproduce behavior

Many of the routines called in singularity allocate memory, often implicitly, which are never freed. Although not a serious problem if the actual singularity executable is short-lived, it is bad practice, and could cause problems if routines become part of daemons.

Cases include (but are not limited to):

  • any call to strdup
  • joinpath (uses malloc)
  • singularity_registry_get
@bauerm97
Copy link
Contributor

bauerm97 commented Feb 6, 2018

@lyklev You're absolutely correct that we're a bit lackluster, so to say, with our C runtime. Right now this isn't such a big issue because, as you pointed out, the runtime is very short lived and that memory is freed upon termination. If you're up for the task, you're welcome to submit a pull request fixing some of the memory leaks you've found in the code! We'd love the help :)

Specifically about singularity_registry_get: I believe that since the underlying data type of the registry is a hash table, we actually don't allocate memory each time the get function is called. The hash table is just one contiguous chunk of memory, and singularity_registry_get returns a pointer to a location inside that memory where the value of the key is stored. I could be wrong about this, so correct me if I'm wrong. We're using `hsearch_r.

@keiranmraine
Copy link

Looks like #1438 is related to part of this, release 2.4.6, yes?

@dtrudg
Copy link
Contributor

dtrudg commented Jun 29, 2018

Hi @keiranmraine - as you noted above, there have been patches, e.g. #1438 and #1620 that have been merged, addressing a number of these issues (and I believe some others have been fixed as a side effect of other patches over the past months).

We've heard from some attendees at GCCBOSC this week that you had general concerns about large memory impacts of running tools in Singularity. The leaks noted here are going to be very small - they are in the code that Singularity uses when setting up a container to run. Also, Singularity is, as @bauerm97 mentions above, very short lived and the memory will be recovered on exit.

If you have particular concerns, or an example of where a run under singularity is consuming more RAM than native, or docker would you be able to raise an issue for that? We'd love to look at in depth to see what is going on.

Many thanks!

@keiranmraine
Copy link

keiranmraine commented Jul 4, 2018

Hi @dctrud - yes, we have seen much larger than anticipated memory footprints when running processes that require multiple cpus (e.g. 18) and run ~95% usage for 24 hours with many file operations. We found upgrading to 2.4.6 had quite a large impact on reducing this overhead.

Singularity old:
41784 MB
 
Singularity 2.4.6:
31611 MB

I see that a new release was made within the last week and will look to trial this to see if it improves memory further.

Please be aware that I raised this at GCCBOSC to see if others had seen similar issues, not to bash Singularity. Singularity has been invaluable for our collaborators that can't use docker. Memory differences are possibly more obvious to us as we run the tools both natively and within containers.

Thanks

@dtrudg
Copy link
Contributor

dtrudg commented Jul 6, 2018

@keiranmraine - thanks for the information. This would be interesting for us to look into properly. Are you able to share the details of what app it is? With those characteristics is it part of the GATK suite? I ask as I used to work in an HPC center where we had significant issues with GATK which were ultimately a result of RHEL6 / CentOS6 transparent huge pages, rather than a Singularity issue.

Are the footprints listed total memory usage on a host, including cached RAM etc - or are they usage of a particular process? Since Singularity has a single container image format, where the container fs is mounted from 1 large file, it isn't uncommon in my experience to see larger indicated RAM usage vs native host OS execution. However, this is just associated with the container being cached and the caching doesn't stop applications being able to use that RAM if necessary.

Finally - what version of singularity is singularity old? It'd be helpful for us to think about if there have been any changes that could have an impact.

@keiranmraine
Copy link

@dctrud we have seen larger than expected memory footprint using singularity with the following docker images, we do not use GATK.

The problem is data dependent (which is unavoidable), however it seems to result in a much wider variability than native or standard docker execution.

I've dug back through email to determine the "singularity old" version as 2.4.1.

@keiranmraine
Copy link

@dctrud we've recently isolated that this was in most cases due to LSF being configured incorrectly in our farm. Disabling the following items fixed huge differences in memory footprint (as reported by the scheduler):

  • LSF_PROCESS_TRACKING
  • LSF_LINUX_CGROUP_ACCT

@dtrudg
Copy link
Contributor

dtrudg commented Jul 2, 2019

Closing this as it's been reported to be an LSF issue - not Singularity. Thanks @keiranmraine for the follow up!

@dtrudg dtrudg closed this as completed Jul 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants