Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in Assertions.checkArgument on Samsung Devices with Android 7.0 #4532

Closed
jschamburger opened this issue Jul 18, 2018 · 44 comments
Closed
Assignees

Comments

@jschamburger
Copy link

Issue description

Hi,
since updating the ExoPlayer from 2.4.4 to 2.8.1, we noticed a significant amount of crashes on Samsung Devices with Android 7.0. Unfortunately, we haven't been able to reproduce the crash ourselves - we are aware of the issue only due to Crashlytics reports.
Therefore, this bug report certainly lacks some information, but maybe you might have an idea (or somebody might have encountered a similar problem) anyway. Even some explanation of possible reasons for this crash might help a lot.

Reproduction steps

From the logs, we know that the crash occurs after stopping a stream (using SimpleExoPlayer.stop()). The Player State becomes idle: onPlayerStateChanged, playWhenReady=true playbackState=1 and the DRM session is released. It seems to happen only for DRM protected streams and it does not happen every time a stream is stopped.

It might be worth noting that we are using a single ExoPlayer instance in several fragments within a ViewPager - users can switch streams by navigating within the ViewPager. However, the crash occurs not only when swiping left/right, but also when the user is stopping the stream in another way (e.g. by sending the app to the background).

Link to test content

I will email the link to dev.exoplayer@gmail.com using a subject in the format "Issue #1234".

Version of ExoPlayer being used

2.8.1 - we will update to 2.8.2 soon, but from the release notes, I don't think that this will fix our problem.

Device(s) and version(s) of Android being used

Android 7.0
only Samsung devices - most frequent devices:

  • Galaxy Tab A 10.1 (42%)
  • SM-J530F (7%)
  • SM-J730F (6%)
  • Galaxy J7 (2016) (4%)

A full bug report captured from the device

As mentioned above, I am unable to create a full bug report. This is the stack trace of the Crashlytics Crashes:
Fatal Exception: java.lang.IllegalArgumentException at com.google.android.exoplayer2.util.Assertions.checkArgument(SourceFile:39) at com.google.android.exoplayer2.upstream.DefaultAllocator.release(SourceFile:121) at com.google.android.exoplayer2.source.SampleQueue.clearAllocationNodes(SourceFile:609) at com.google.android.exoplayer2.source.SampleQueue.reset(SourceFile:111) at com.google.android.exoplayer2.source.SampleQueue.reset(SourceFile:98) at com.google.android.exoplayer2.source.chunk.ChunkSampleStream.onLoaderReleased(SourceFile:324) at com.google.android.exoplayer2.upstream.Loader$ReleaseTask.run(SourceFile:434) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1133) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:607) at java.lang.Thread.run(Thread.java:762)

I'm sorry for the sparse information here - hopefully you can support me anyway.

@ojw28
Copy link
Contributor

ojw28 commented Jul 18, 2018

That looks strange. It's also strange that we've not seen this reported by anyone else, unless it's somehow also specific to your application. Do you have any custom components that you're injecting into ExoPlayer, or are you using only components provided by the library?

@jschamburger
Copy link
Author

jschamburger commented Jul 19, 2018

Thanks for the quick answer! The fact that you haven't heard this from anyone else certainly points towards an issue in our application.

We are using a custom DrmSessionManager. In our application, rapid zapping between different streams (most of them DRM protected with the same init data) is crucial. Therefore, we have implemented a DrmSessionManager which uses a DefaultDrmSessionManager as delegate. When a DRM session with a specific UUID is aquired for the first time, our DrmSessionManager acquires an additional DRM session (using the respective method in DefaultDrmSessionManager). Additionally, we have a method to release all of these artificial DRM sessions which is called in our Activity's onPause(). Hence, the stream switching is a lot faster since the key request is made only for the start of the first stream because the artificial DRM session is kept open throughout the lifecycle of the app.
I'm aware that this is somewhat experimental, but with most devices (except the ones mentioned above), everything works fine.

Apart from that, we recently also implemented the dummy surface workaround discussed in #3835 (comment). However, we saw the crashes before implementing this workaround, so the issue should not be related to those changes.

@ojw28
Copy link
Contributor

ojw28 commented Jul 19, 2018

Do you have any customizations to the allocator? Are you using the standard DashMediaSource without doing anything special to it, or do you have customizations in the source as well. If so, they would be the most likely culprit.

Could you also check whether the reports you're getting are from official Samsung builds (vs some random ROM that a few users have flashed onto their devices)? Some build fingerprints would probably be helpful. Are the crashes just from these four devices?

@jschamburger
Copy link
Author

jschamburger commented Jul 19, 2018

We have no customizations to the allocator and we are using the standard DashMediaSource:
new DashMediaSource(uri, buildDataSourceFactory(false), new DashManifestParser(), new DefaultDashChunkSource.Factory(mediaDataSourceFactory), DEFAULT_MIN_LOADABLE_RETRY_COUNT, LIVE_PRESENTATION_DELAY_MS, mainHandler, eventLogger);

The crash occurs only on Samsung devices with Android 7.0, but the ones I mentioned above are only the most frequent devices:

Galaxy Tab A 10.1 43%
SM-J530F 8%
SM-J730F 6%
Galaxy J7(2016) 3%
SM-A320FL 1%
SM-J701F 1%
SM-G570F 1%
Everything else seems to be below 1%.

I am quite sure that the crashes are from official builds because

  • according to Fabric, 0% of the devices are rooted (in my experience, most custom ROM users keep root access after flashing the custom ROM)
  • Android 7.0 is the latest official version for the devices I checked
    Of course, this isn't conclusive proof, but I would assume that the devices use official Samsung builds.

Anyway, I will try to get some build fingerprints - this might be helpful for other issues as well. However, it will take some time until the next app version is released to the general public, so I will most likely get build fingerprint information in about 4 weeks.

@ojw28
Copy link
Contributor

ojw28 commented Jul 20, 2018

What do the percentages mean? Percentage of what? Thanks!

@ojw28 ojw28 self-assigned this Jul 22, 2018
@jschamburger
Copy link
Author

Sorry, should have mentioned that.

The percentages refer to the total number of crashes we had. We had ~500 crashes (~300 users) in the last 30 days, of which 100% were on Samsung devices with Android 7.0. Of those 500 crashes, 43% happened on the Galaxy Tab A etc.

@ojw28
Copy link
Contributor

ojw28 commented Jul 23, 2018

Can you derive an (approximate) percentage of playbacks that result in the crash on the affected devices? In particular, do you think this crash occurs for every playback on the affected devices, or sporadically?

You say you've been unable to reproduce the crash yourselves. Is this because you don't have one of the affected devices, or is it the case that you do have one of the affected devices, and are still unable to reproduce? If the latter, have you tried to reproduce with your proper release apk (e.g. install from Play Store, as your end users would), rather than just with a debug apk?

@jschamburger
Copy link
Author

Unfortunately, I cannot derive a percentage of playbacks that result in the crash, but in the logs, I saw that not every playback leads to a crash on the affected devices. For some users, I have seen several "zapping" events (= playback stop + playback start of another stream) before the crash occurred.

I haven't been able to reproduce the crash because I do not have an affected device. I'm trying to get a Galaxy Tab A 10.1, but buying it (or using a "real device testing" service) needs approval, of course...
Judging by the numbers, I'm optimistic that I will be able to reproduce the crash on the device.
Good point regarding the release APK! I usually try the APK from the store first before installing the debug version in order to dig deeper.

@OMArikan
Copy link

I know that exception since 1.5.8 version. In old ExoPlayer i changed ASSERTIONS_ENABLED to false to get rid of crash but now I'm using 2.8.2 and same problem still appeared on Samsung Devices with OS 7.0. ASSERTIONS_ENABLED is public static final in ExoPlayerLibraryInfo and can't modified this time cause I'm using 2.8.2 with gradle implementation.

Device model ratio is changing according to popularity for application. For us 9 different Samsung devices with OS 7.0 facing with this exception. I couldn't reproduce crash yet but in a week around 90 crashes happened even %10 of total play session is using 2.8.2 on our app. I'm worried about that our app with real time play session over 20K how it will be affected.

By the way on my experience it's not related with neither media is protected with DRM or not nor content type is HLS or Dash. On my Crashlytics reports I set some keys about my contents and all possible combinations are available(Dash and HLS with or not DRM).

Here is my rendererFactory, allocator, loadControl and playerInstance;

DefaultRenderersFactory rendererFactory = new DefaultRenderersFactory(this, DefaultRenderersFactory.EXTENSION_RENDERER_MODE_OFF);
            DefaultAllocator allocator = new DefaultAllocator(true, C.DEFAULT_BUFFER_SEGMENT_SIZE);
            DefaultLoadControl loadControl = new DefaultLoadControl.Builder()
                    .setAllocator(allocator)
                    .setBufferDurationsMs(360000, 600000, 2500, 500)
                    .setTargetBufferBytes(C.LENGTH_UNSET)
                    .setPrioritizeTimeOverSizeThresholds(true)
                    .createDefaultLoadControl();

            player = ExoPlayerFactory.newSimpleInstance(rendererFactory, trackSelector, loadControl, drmSessionManager);

If you need any information by me let me know.

@jschamburger
Copy link
Author

@OMArikan Thanks for your input! It's good to know that the problem occurs for others as well.

@ojw28
Copy link
Contributor

ojw28 commented Jul 25, 2018

@OMArikan Please can you provide some sample stack traces, so we can double check it's exactly the same issue.

@OMArikan
Copy link

OMArikan commented Jul 25, 2018

@ojw28 I sent 3 sample stack traces files from crashlytics to dev.exoplayer@gmail.com with subject "Issue #4532". Also lately sent screen shot from Dashboard of Crashlytics to see only Samsung and OS 7.0

@ojw28
Copy link
Contributor

ojw28 commented Jul 26, 2018

Thanks. If either of you manage to reproduce the issue locally, it would be very useful to know if it reproduces only with your release build (but not with your dev build), and whether enabling/disabling ProGuard makes a difference. I suspect this might be a strange VM/ProGuard issue, but that's just a guess.

@OMArikan
Copy link

OMArikan commented Aug 3, 2018

I tried many times to reproduce issue locally with same devices but never catch. I don't believe it'd be related with my app ProGuard if it's not related with 3rd party SDKs such as Retrofit, okHttp, Fabric, Adjust.

@nicklasl
Copy link

I just wanted to pitch in and say that we've seen this issue in our app as well. I've got nothing new that I can think of to add to this, will be sending some stack traces to the gmail address.
The crash occurs on a very small percent of the playbacks and we have not been able to reproduce it ourselves (all though we have not made much effort using the production/proguarded build).
These are the devices that show up in our crash logs:

SM-T580 (NRD90M.T580XXU3BRF1)
SM-T585 (NRD90M.T585XXU3BRF1)
SM-J530F (NRD90M.J530FXXU2ARC3)

@nicklasl
Copy link

nicklasl commented Aug 16, 2018

One thing, and I'm just spitballing here:
From my understanding, we have a similar UX as @jschamburger the user can switch streams using a viewpager, however we only use one fragment with one PlayerView and then we create a new instance of SimpleExoPlayer for each new stream, binding it to that single view.

What I wanted to conclude is that we could be releasing and creating new instances rapidly...

@ojw28
Copy link
Contributor

ojw28 commented Aug 16, 2018

Interesting, although intuitively I wouldn't ever expect the design of the UI to result in a seemingly device specific assertion in this part of the player. As an initial step, what we'll do is add more information to the exception that's thrown (in the exception message). If this is a random VM/Proguard issue, then it's possible that doing this might just make the issue go away. If it doesn't go away, we'll at least have a little bit more information about the failure.

@OMArikan
Copy link

Not like @nicklasl and @jschamburger integrations I don't have any pager UI. SimpleExoPlayer implemented as demo app, releasing and preparing in each actions. Also I don't use PlayerView in my app, to customize everything I use my SurfaceView and custom controller with anchor methodology like 1.5.0 version. Letting us you know that maybe helps you to find out different thing.

Also we already reached 630 users with 756 crashes related with this issue.

@ojw28
Copy link
Contributor

ojw28 commented Aug 16, 2018

What percentage of your total active users is 630? Absolute numbers really don't tell us anything about how prevalent this issue is.

ojw28 added a commit that referenced this issue Aug 16, 2018
Issue: #4532

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=208968252
@jschamburger
Copy link
Author

For us, approximately 0.16% of the users had the crash, half of them on the Galaxy Tab A 10.1.

ojw28 added a commit that referenced this issue Aug 17, 2018
Issue: #4532

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=208968252
@ojw28
Copy link
Contributor

ojw28 commented Aug 17, 2018

Thanks. There's a Galaxy Tab A 10.1 in the office, so I'll try and get hold of it next week.

In the meantime, we've cherry-picked the more informative exception message into 2.8.4, which will be released soon. Once someone has captured a failure with this release, please share the message that you see! Note that the change will result in a slightly different stack trace (it'll be the same down to DefaultAllocator.release, but will no longer have the Assertions.checkArgument line).

@jschamburger
Copy link
Author

Hopefully you'll manage to reproduce it with this device - we got one as well (SM-T580) in the meantime, but weren't lucky so far.

@OMArikan
Copy link

@ojw28 since the releases of my app have exoPlayer 2.8.2;
645K Total Users
415K Samsung Users
189K Samsung 7.0 Users
650 Users faced with crash due to Assertions.checkArgument
Ratio of crash for that user specification : 0.34%
SM-G610F %43 of these devices

Also I realized this crash is not appearing on GA/Behavior/Crashes and Exceptions.

As soon as 2.8.4 will be released I'll update my app fast and inform you on here.

@ojw28
Copy link
Contributor

ojw28 commented Aug 31, 2018

I tested on a Galaxy Tab A 10.1, but was unable to reproduce the issue. Would you mind linking to your application(s) on Play Store? I'll try installing them to see if I can reproduce with those.

@OMArikan
Copy link

OMArikan commented Sep 2, 2018

@ojw28 BluTV. However it built with ExoPlayer 2.8.2, newest version 2.8.4 already integrated to application but not distributed yet. You can follow last update date to realize latest version later next week.

@jschamburger
Copy link
Author

@ojw28 waipu.tv Current version still uses ExoPlayer 2.8.2 - the upcoming version 3.7.0 will use ExoPlayer 2.8.4 (will be released in ~ 2 weeks).
Unfortunately, the app is only usable inside the EU - in other countries, the streams will not start. In case you are outside the EU, please let me know, so we can try to create a working setup for testing.

@nicklasl
Copy link

nicklasl commented Sep 4, 2018

We just released Com Hem Play v4.3.5 which has ExoPlayer 2.8.4. No crashes reported yet though.

@ojw28
Copy link
Contributor

ojw28 commented Sep 4, 2018

How frequently were you seeing reports before (i.e. how long would you have to wait before you'd expect to see a crash report, if the problem still remains)?

@nicklasl
Copy link

nicklasl commented Sep 4, 2018

@ojw28 I would suspect we will see a crash before weeks end. I'll post here when I have more information.

@nicklasl
Copy link

nicklasl commented Sep 5, 2018

@ojw28, got the first crash:

Android: 7.0
Android Build: NRD90M.T580XXU3BRF1
Manufacturer: samsung
Model: SM-T580
Thread: Loader:ChunkSampleStream-13144
CrashReporter Key: F05C6376-3715-4373-8165-A4B1A2A44A853D08E23C
Start Date: Tue Sep 04 15:10:00 GMT+02:00 2018
Date: Tue Sep 04 17:21:44 GMT+02:00 2018

java.lang.IllegalArgumentException: Unexpected allocation: 12719672, 0, 0, 65536
	at com.google.android.exoplayer2.upstream.DefaultAllocator.release(DefaultAllocator.java)
	at com.google.android.exoplayer2.source.SampleQueue.clearAllocationNodes(SampleQueue.java)
	at com.google.android.exoplayer2.source.SampleQueue.reset(SampleQueue.java)
	at com.google.android.exoplayer2.source.SampleQueue.reset(SampleQueue.java)
	at com.google.android.exoplayer2.source.chunk.ChunkSampleStream.onLoaderReleased(ChunkSampleStream.java)
	at com.google.android.exoplayer2.upstream.Loader$ReleaseTask.run(Loader.java)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1133)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:607)
	at java.lang.Thread.run(Thread.java:762)

@ojw28
Copy link
Contributor

ojw28 commented Sep 5, 2018

Thanks! This shows that the check is failing because allocation.data.length == 0, where-as we expect it to be 65536. I still think this is a platform/VM issue, probably at a fairly low level, because:

  • There are only two places in the library that instantiate Allocation, here and here. This is easy to verify if you have the project checked out in Android Studio.
  • The first case is irrelevant, since you can determine from the IllegalArgumentException logging that an initial allocation hasn't been made.
  • In the second case the length of the allocated array is individualAllocationSize, which must be greater than 0 due to this assertion.
  • Hence for any Allocation, it must be true that data.length > 0. This implies that data.length evaluating to 0 at the point where the IllegalArgumentException is thrown is incorrect.

I'm discussing next steps with the Android VM team, but without a way to reproduce the problem it might be a little tricky to figure out exactly what's going wrong. If it is a VM issue, it's likely caused by something Samsung have done specifically for their devices, because this thread implies it doesn't reproduce anywhere else.

@jschamburger
Copy link
Author

jschamburger commented Sep 24, 2018

Took a while for us to release the version with ExoPlayer 2.8.4, but now I have seen several occurrences of the crash. The values differ quite a bit - the length is sometimes 0, but quite often it is a large negative number:

0, 0, 0, 65536
200894819, 0, 0, 65536
93042059, 0, -1249135221, 65536
8224893, 0, -2139258755, 65536
111515010, 0, -1549490778, 65536
45975423, 0, 1, 65536
8421504, 0, -2139062144, 65536

@ojw28
Copy link
Contributor

ojw28 commented Sep 24, 2018

This is definitely a platform/VM bug then. I don't think array.length can ever be negative if the VM is functioning correctly, right?

@jschamburger
Copy link
Author

@ojw28
Copy link
Contributor

ojw28 commented Sep 25, 2018

For those encountering this issue: Do you have any native code in your app? It's one theory that:

The bad length is presumably in a large object allocation, which could be preceded by native memory, and thus also could conceivably have been scribbled over by someone else?

@jschamburger
Copy link
Author

We do not have any native code in the project.

@OMArikan
Copy link

@ojw28 , I have just replied with new stacktraces to old e-mail with issue number.
Crash number is fewer than old but can't say crash ratio is decreased according to playing/crashes ratio.
And yes I have some native codes but they are just constants to keep secrets really secret :)

I'm totally agreed to you about device specific problem related with VM. However what if we could change ASSERTIONS_ENABLED to false, is it gonna be problem for further steps or not ? When I was using 1.5.8 always changed that to false and never seen any problem with player. Either opening issue to Samsung or letting change value of ASSERTIONS_ENABLED could help us fast.

Is it possible that developers evaluate the risk of changing it to not to final?

@ojw28
Copy link
Contributor

ojw28 commented Sep 25, 2018

When I was using 1.5.8 always changed that to false and never seen any problem with player.

A question I'd ask about this, is how closely were you looking? If playback failed but the app didn't crash, would you have noticed?

@OMArikan
Copy link

Why do you have AnalyticsListener or letting listen PAYER_STATE to developers ?
Changing it to false is on developers responsibility and they could handle all failures otherwise it's their problem not ExoPlayer.

Since 1.5.8 all events on EventLogger I'm sending to GoogleAnalytics and our bigData side to analysis problems and fixed some of them with our head-end side which are related with encoder & transcoder or manifest standards. Also some DRM errors that we faced on our app related with devices already sent to Google and they found problem on their side. This errors neither disappear with ASSERTIONS_ENABLED nor decreased. For us only change that we realize is this crash on Samsung 7.0 devices. Because of this I asked that do you know or realize any other risk to disable it. Don't you have any Test Automation to compare differences of ASSERTIONS_ENABLED true or false?

@ojw28
Copy link
Contributor

ojw28 commented Sep 28, 2018

Sorry, but I don't understand what the post above is trying to say. Please try and rephrase if possible.

@OMArikan
Copy link

OMArikan commented Oct 5, 2018

We can watch all of the error logs via AnalyticsListener and DefaultEventlistener. If a developer disables ASSERTIONS_ENABLED property from the code, he/she has to have the responsibility of the solution.

Since 1.5.8 version, i'm storing all of the EventLogger logs and checking. Since now wtih these logs we just learnt about some DRM and transcoding problems. For DRM problem we informed Google and they accepted the problem of Widevine. In that period ASSERTIONS_ENABLED was always false. After 2.xx upgrade i started to use Exoplayer from maven and i couldn't change ASSERTIONS_ENABLED anymore and the only problem with this the Samsung 7.0 problem.

If you have some information about this problem and if you can share with me, that would be good. Do you have any test result with ASSERTIONS_ENABLED false or true usage.

If you can allow developers to change this property it would be helpful. I would prefer to serve lower quality than the application crash, i can fix and understand issues better with this way.

@ojw28
Copy link
Contributor

ojw28 commented Oct 15, 2018

Regarding ASSERTIONS_ENABLED: We only assert things that should always be true. In most cases disabling an assertion will result in a subsequent failure shortly thereafter, which is often much harder to debug (e.g. because it occurs on a different thread to where an API was originally misused). Hence it's clearly preferable to leave them enabled.

For the particular assertion being discussed here, it's pretty unclear to me whether a subsequent failure would occur. I think we should probably just make sure we're catching it and propagating it as a playback error, rather than crashing the process.

@ojw28
Copy link
Contributor

ojw28 commented Oct 15, 2018

Actually, we're just going to remove the assertion in question. It's not a particularly useful one. What happens to playbacks on the affected devices once the assertion is removed, we don't really know.

ojw28 added a commit that referenced this issue Oct 15, 2018
The assertion was so weak it probably wouldn't detect genuine
misuse of the DefaultAllocator API, so it seems fine just to
remove it.

We don't really know what happens when the player is allowed to
continue on the affected devices, but hopefully it either "just
works" or fails in a more graceful way.

Issue: #4532

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=217113060
@ojw28
Copy link
Contributor

ojw28 commented Oct 15, 2018

Removal of the assertion will be in the next release.

@ojw28 ojw28 closed this as completed Oct 15, 2018
ojw28 added a commit that referenced this issue Oct 20, 2018
The assertion was so weak it probably wouldn't detect genuine
misuse of the DefaultAllocator API, so it seems fine just to
remove it.

We don't really know what happens when the player is allowed to
continue on the affected devices, but hopefully it either "just
works" or fails in a more graceful way.

Issue: #4532

-------------
Created by MOE: https://github.com/google/moe
MOE_MIGRATED_REVID=217113060
@google google locked and limited conversation to collaborators May 16, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants