Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUMM-610 Hotfix memory usage when doing intensive logging #185

Conversation

ncreated
Copy link
Member

What and why?

🧪 As reported in #178, the data upload memory leaks fixed in #181 are not solving out-of-memory issue on very intensive logging. This PR adds necessary optimisation to keep the allocations graph flat.

How?

I added this benchmarking code in Example app to stress-test logging:

internal class SendLogsFixtureViewController: UIViewController {
   var objcLogger = DDLogger.builder().build()
   let queue1 = DispatchQueue(label: "intensive-logging-1")
   let queue2 = DispatchQueue(label: "intensive-logging-2")
   let queue3 = DispatchQueue(label: "intensive-logging-3")

   override func viewDidLoad() {
      super.viewDidLoad()

      keepLogging(delay: 0.01, on: queue1)
      keepLogging(delay: 0.02, on: queue2)
      keepLogging(delay: 0.03, on: queue3)
   }

   func keepLogging(delay: TimeInterval, on queue: DispatchQueue) {
      queue.asyncAfter(deadline: .now() + delay) { [weak self] in
         self?.objcLogger.info(createBigMessage())
         self?.keepLogging(delay: 0.01, on: queue)
      }
   }
}

private func createBigMessage() -> String {
    let seed = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 "
    return (0..<1_024).map({ _ in seed }).joined(separator: "")
}

And noticed crazy high number of system allocations coming from try purgeFilesDirectoryIfNeeded() in FilesOrchestrator. This method is iterating through the list of files in data directory, doing a lot of OS-internal _FileCache and _NSFastEnumerationEnumerator allocations (out of our control).

Above benchmark results with this allocations graph, leading to out-of-memory crash when iOS process exceeds 1.8GB limit:

Screenshot 2020-07-16 at 11 48 37

To mitigate this impact, I tuned the FilesOrchestrator to call try purgeFilesDirectoryIfNeeded() only when necessary - if it knows that a new file will be created (vs. each time when asking for a writable file). This keeps the memory graph flat for the same benchmark:

Screenshot 2020-07-16 at 11 45 57

IMHO, by doing this, we give the OS enough time to reuse _FileCache things in a memory-performant way.

Review checklist

- [ ] Feature or bugfix MUST have appropriate tests (unit, integration)

  • Make sure each commit and the PR mention the Issue number or JIRA reference

@ncreated ncreated self-assigned this Jul 16, 2020
@ncreated ncreated marked this pull request as ready for review July 16, 2020 10:14
@ncreated ncreated requested a review from a team as a code owner July 16, 2020 10:14
Copy link
Contributor

@buranmert buranmert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when you wait idle for a while after peaking at ~1.5Gb memory, it goes down to normal, correct?

and does that mean that now if we log intensively, SDK may exceed max file size limit with one single file?

@ncreated ncreated force-pushed the ncreated/RUMM-610-optimize-memory-usage-when-doing-intensive-logging branch from ddf2620 to 499b27d Compare July 16, 2020 16:18
@ncreated
Copy link
Member Author

@buranmert

when you wait idle for a while after peaking at ~1.5Gb memory, it goes down to normal, correct?

Good question. I did a test - if we stress the _FileCache only a bit, it goes back to normal - however, I'd expect a gradual slope, but instead it makes a significant dump:

Screenshot 2020-07-16 at 18 30 43

When we stress it too much (~>1GB) it results with random crashes and:

2020-07-16 18:32:02.729981+0200 Example[57917:40426048] [ServicesDaemonManager] interruptionHandler is called. -[FontServicesDaemonManager connection]_block_invoke
Example(57917,0x16d61f000) malloc: can't allocate region securely
Example(57917,0x16d61f000) malloc: *** set a breakpoint in malloc_error_break to debug
Example(57917,0x16d61f000) malloc: can't allocate region securely
Example(57917,0x16d61f000) malloc: *** set a breakpoint in malloc_error_break to debug
Example(57917,0x16d61f000) malloc: can't allocate region securely
Example(57917,0x16d61f000) malloc: *** set a breakpoint in malloc_error_break to debug
Example(57917,0x16d61f000) malloc: can't allocate region securely
Example(57917,0x16d61f000) malloc: *** set a breakpoint in malloc_error_break to debug

With the optimisation applied in this PR, all remains functional even a long while after excessive logging ends.

and does that mean that now if we log intensively, SDK may exceed max file size limit with one single file?

SDK starts a new file if the recently used one exceeds 4MB, so it's fine 👌.

Copy link
Contributor

@buranmert buranmert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good 👍 there is no memory leak as it goes back to normal after a while: i was afraid there was a deeper problem within filesystem 😅

@hyling
Copy link

hyling commented Jul 17, 2020

@ncreated If it's not too much trouble would you mind describing how you tracked it down to try purgeFilesDirectoryIfNeeded() 🙏 ? I'm trying to get better at debugging these types of issues. Which Instruments views did you use? Did you just use the Allocations instrument? How did you connect the code in the background thread to your own code. Instruments doesn't show who started the thread. 🤔
image

@ncreated
Copy link
Member Author

@hyling I used Allocations instrument. When running the benchmark snippet, you can see that _FileCache is one of the most popular and consuming allocation on the list. When going into details it gives the stack trace. For most of this allocations it is try purgeFilesDirectoryIfNeeded():

Screenshot 2020-07-17 at 13 28 41

ncreated added 2 commits July 17, 2020 14:44
…en-doing-intensive-logging

# Conflicts:
#	DatadogSDK.podspec
#	DatadogSDKObjc.podspec
#	Sources/Datadog/Datadog.swift
@hyling
Copy link

hyling commented Jul 17, 2020

@hyling I used Allocations instrument. When running the benchmark snippet, you can see that _FileCache is one of the most popular and consuming allocation on the list. When going into details it gives the stack trace. For most of this allocations it is try purgeFilesDirectoryIfNeeded():

@ncreated Ah thanks for the explanation and the screenshot, they help me realize that debugging symbols for datadog didn't get loaded into Instruments. After I fixed that the Allocations instrument was much more helpful. 😄

@ncreated ncreated merged commit 969c2d4 into master Jul 17, 2020
@ncreated ncreated deleted the ncreated/RUMM-610-optimize-memory-usage-when-doing-intensive-logging branch July 17, 2020 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants