-
Notifications
You must be signed in to change notification settings - Fork 754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OutOfMemoryError and NegativeArraySizeException in DiskBasedCache.streamToBytes #12
Comments
More about this bug: Almost all of the OOME failures happen while reading the string-string map of headers during initialize. Below is a typical stack trace, from https://code.google.com/p/android/issues/detail?id=230219
There are also a few rare instances where OOME is thrown from this same location in DiskBasedCache, but on behalf of a A bad entry may also manifest as a NegativeArraySizeException (NASE) when trying to get the entry. These exceptions are relatively rare and occur after the header map has been read. They result when the excessively large number of bytes read is subtracted from the file length and then used as an array size. If this value is negative, NASE is thrown. (Theoretically it's possible that a bad entry could go undetected.) A nasty property of these exceptions is that they persist. Once the problem occurs, it will occur every time the cache is initialized. This is why some additional catch-all exception handling is needed: preferably per entry, serving to remove the errant entry if an exception is thrown when reading or writing it. Otherwise, something else is needed in order to prevent these bad cache entries from crashing the app. One developer recommends the use of an uncaught exception handler that will clean the entire cache in the event of an uncaught OOME or NASE. See https://github.com/anyercastillo/volley-crash Let's go back to the original bug report in 2013: https://code.google.com/p/android/issues/detail?id=61406 The reporter attached a bad cache entry to the bug report. The problem occurs after the "Content-Length: 2574" header is serialized. This string pair should be serialized as:
But in this case it was serialized as:
Here's what happens after the "Content-Length" and "2574" strings are read:
Open questions: Why are the string byte array lengths serialized as longs? |
That report is quite old, and seems like it might have been addressed by https://android-review.googlesource.com/#/c/99201/. Are there more recent reports of this happening? |
The change you refer to was a cache-all on write header. It seems to have no impact on the thousands of OOME crashes that are happening now. The first stack trace I referenced in my previous comment is very new, but is essentially identical to the report from 4 years ago. |
OK, fair - but I think that the example entry above might have been affected by that bug. Hard to tell given how old it is. I can't help but wonder, from reading things like http://stackoverflow.com/questions/24095909/volley-out-of-memory-error-weird-allocation-attempt/42196956#42196956, whether this might be to "misuse" of Volley/DiskBasedCache rather than corruption happening out of the control of the app. There are many apps that use Volley and have never seen this issue before, whereas the ones that do see it see it frequently from what I'm gathering. That does seem to suggest a usage pattern that might be causing the problem. It is very certainly the case that DiskBasedCache is not and does not purport to be thread safe (or process safe). If you're trying to write the same directory location in multiple threads or processes that could absolutely cause bad things to happen. Would be good to get more info... something like an example app w/ source that exhibits this problem. But it's at least true that the cache can be cleared by the user at any time, which is something the app doesn't control, and guarding against at least that is reasonable. I just want to have some confidence that we're hitting the real issue here and not papering over a more fundamental cause than random disk corruption. (100 times per day for an app does not sound like disk corruption, at all). |
I wonder whether a combination of a FileLock and some sort of thread lock on the cache directory could root out that kind of thing in a more obvious manner. Note that Volley only accesses the cache in a single thread (CacheDispatcher), so it doesn't have to worry about thread safety. But if you have multiple request queues, or multiple processes, as a few of these reports seem to suggest were the case, then you'll have problems. |
Given that the stack traces I've seen are all similar, and that no code has been added that I believe really addresses this problem, I assume the old repro case still applies.
Once the problem occurs, it recurs every time the cache is initialized. This tends to amplify the problem in the apps where the problem occurs. I saw the suspicion about misuse and this occurred to me as well, but I can't find a concrete case. I think a usage pattern or data pattern is more likely. Perhaps there is a flaw in the DiskBasedCache implementation that is only hit by some apps? For example, the mapping from key to file name is pretty poor. What if two keys map to the same file? Currently, DiskBasedCache has barely any unit tests, and testing for the proper handling of key collision is not one of them.
Every overridden method is synchronized, so I think DBC does intend to be thread safe, and I don't see any obvious flaws in that aspect of its implementation. I have no evidence that any of the apps encountering this problem are using multiple processes. |
http://stackoverflow.com/a/38106070 "The main application process and a 'SyncAdapter' process following the sync adapter pattern. Only the app with two processes would crash. These two processes would independently initialize the cache." mcxiaoke/android-volley#61 "you can check your reqeustQueue is Singleton, I has solved the problem that is OOM caused by context leak with every context init a requestQueue."
Even that were to happen, I don't see how it would lead to corruption in this manner, as opposed to an entry in the cache that had unexpected (but well-formed) data, given the locks you pointed out.
You're right - sorry; it is indeed the case that multiple NetworkDispatcher threads as well as the CacheDispatcher thread are accessing the cache in parallel. However, the only ensured thread safety is for a single Cache instance. If you're creating more than one instance in a process, or more than one instance across multiple processes, and those both point to the same location on disk, then you'll run into issues like this, and that's exactly what both the cases noted above were doing. |
It is not clear that the two comments referred to above had any relation to the reporter's problem. In the first case, another answer was marked as correct. In my experience with this bug, there are not and never were AFAICT multiple processes. However, I agree that multiple processes is one of the easiest root causes to imagine. Other causes I can imagine:
Once the problem has occurred, by means that remain a mystery after 4 years, the exception that results is surprisingly stable. It manifests either as a string byte array size that is too large, or in rarer cases as a negative remaining data size (now fixed). Focusing in on the location of the exception itself, there is one small change that would help rid the cache of bad entries. These lines in the
Should be:
Every time a header is read or an entry is fetched from cache, |
Multiprocess might be rare but mutlithread is certainly not, and I could definitely see bugs where multiple request queues are being created. Agreed that delete() before close() doesn't seem right and should probably be fixed. The other possibilities feel unlikely to me. I think I'm fine with adding a length check there as a sanity measure. And of course, I'm fine with more unit tests. |
In the process of adding more tests to DiskBasedCacheTest, I observe that the NegativeArraySizeException (NASE) handling in the get() method is not actually removing the entry. Apparently this is because the finally handler is closing the file input stream after an attempt is made to delete the underlying file. If the failure in get() does not actually remove the file, a subsequent initialize() should also fail, crashing the app because there is no handler for NASE in the initialize() method. It appears that neither NASE nor the more prevalent OOME are being handled in either get() or initialize(). |
Fixed by PR #52 |
Hi, sorry to polute the thread however I can't see anywhere else to resolve this. I can't seem to find any documentation on how to actually get the update. I have 1.0.0 and had the below error: DiskBasedCache.java line 323 in DiskBasedCache.streamToBytes() |
Please see #11 (comment) for how to use a snapshot build if you'd like to pick up this fix. We haven't done an official release yet as there are a few more fixes I'd like to get in before we do. |
DiskBasedCache may fail repeatedly if a cached header is corrupt. In the event of failure, the most common error is OutOfMemoryError thrown from streamToBytes, caused by trying to allocate a huge number of bytes resulting from the bogus length calculated due to the corruption.
Once a header has been corrupt, the cache initialization will continue to fail, and the errant header will not be removed.
A less common failure was NegativeArraySizeException, but that was recently patched.
DiscBasedCache needs to be updated to guard against corruption.
Related to:
https://code.google.com/p/android/issues/detail?id=230219
mcxiaoke/android-volley#141 (mirror)
http://stackoverflow.com/a/42196956/901597
The text was updated successfully, but these errors were encountered: