Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeking ExoPlayer Rapidly with OkHttp with http2 connection causes sockettimeoutexception #4078

Closed
natez0r opened this issue Apr 3, 2018 · 51 comments
Assignees

Comments

@natez0r
Copy link
Contributor

natez0r commented Apr 3, 2018

Issue description

When using the OkHttp extension and playing media over http/2, if you seek the player rapidly the OkHttp connection pool eventually will contain a connection that is hung and will never work. This is much easier to get to reproduce in our own application, but it's definitely possible to get to happen in a demo app.

I suspect that a connection isn't being closed properly and the following h/2 connections which are multiplexed over the same socket fail.

Reproduction steps

I have emailed the sample app and the stream to the alias. You need to serve the file over http/2. I used https://www.npmjs.com/package/@http2/ to serve the stream from my machine and used genymotion/emulator.

Setup @http2 to have host.js with the IP of your machine and start it pointing at the directory containing your HLS files.

replace "YOUR_IP" in test_app.nroy.twitter.com.exoplayerplayground.MainActivity#M3U8_URL with the IP of your machine

Run the provided app and play the video. Scrub back and forth using the red scrubber at the top quickly until the video no longer continues playing (it may take 30+ seconds). The sample app includes logging to ensure that you're using http/2 (this issue does not manifest unless using http/2). If you connect stetho, you'll see the player attempting to re-request the .ts segment, receiving the 200 response and headers, but never receiving any bytes.

Link to test content

emailed.

Version of ExoPlayer being used

2.7.0

Device(s) and version(s) of Android being used

geny motion emulator 8.0
nexus 5x 8.0

It takes awhile to reproduce in the sample, it happens almost instantly in our app.

A full bug report captured from the device

emailed

@ojw28
Copy link
Contributor

ojw28 commented Apr 3, 2018

If the issue is unique to the OkHttp extension then do you think it's likely that the issue is in OkHttp rather than ExoPlayer? Would it be possible for you to validate whether we're forgetting to close connections in the ExoPlayer layer? If we're not, that points to an issue in OkHttp.

By the way, we've just updated the OkHttp extension to point to the latest release of OkHttp (3.10.0, whereas previously we were depending on 3.9.0). The change is c861b50. It might be worth seeing if the issue still reproduces when using the newer release.

@natez0r
Copy link
Contributor Author

natez0r commented Apr 3, 2018

We are using 3.10 locally and are still seeing the issue.

It appears to correlate with the underlying Http2 stream getting InterruptedIoException, it may be an okhttp issue with how it handles Interrupts (square/okhttp#1903). I noticed that ExoPlayer is using ExecutorService to do the downloads and calling shutdown on them interrupts the thread doing the OKHttp operations (java.util.concurrent.ThreadPoolExecutor#interruptIdleWorkers(boolean)). I think this is the underlying cause and is a grey area, but feels like an ExoPlayer issue with its use of OkHttp?

Let me know what you decide

@ojw28
Copy link
Contributor

ojw28 commented Apr 9, 2018

It would be good if you could spend some more time root causing the issue. ExoPlayer calls into OkHttp are isolated to a single class, so it should be pretty easy for you to work out whether we're misusing the OkHttp API. If we're not misusing the API, which I suspect is the case, then it should be possible for you to create a simple reproduction case that calls OkHttp directly and does not depend on ExoPlayer. That issue should then be filed on the OkHttp project.

@natez0r
Copy link
Contributor Author

natez0r commented Apr 9, 2018

ExoplayerPlayground.zip

I have attached a VERY rough test project that illustrates the issue. I was able to reproduce 100% by using a genymotion emulator and turning the network connection to GPRS so the timing of the loader.cancelLoading call would be while the OKHTTP client was performing a read from the network.

The second loadable will error every time with a sockettimeoutexception because the initial cancel call left the socket in a bad state. I know this doesn't use the exact ExtractorLoadables or anything, but it shows that if we call cancelload() on a loader class that is reading from an OKHTTP socket, it will cause the situation. Do you think this should be fixed on ExoPlayer's end, or should I file this over in the OKHttp project?

@ojw28
Copy link
Contributor

ojw28 commented Apr 10, 2018

This looks a lot like square/okhttp#3146 to me. Which is apparently fixed in 3.10.0, but given this issue, I'm not so sure. I'd suggest you try and further cut down and clean up your test project, then file an issue on OkHttp.

A few notes:

  • I tried removing the interrupt in Loader, and I don't think your sample app is shutting down the executors, but the issue is still reproduced. So unless you know of any interrupt calls, I don't think thread interruption is a cause.
  • Don't return 0 from onLoadError. The issue still reproduces if you return Loader.DONT_RETRY_FATAL.
  • I tried adding Call.cancel calls in OkHttpDataSource.closeConnectionQuietly, but that didn't help either.
  • The test project does recover if you leave it for long enough! I'm guessing 5 minutes as per the OkHttp issue ref'd above.

@natez0r
Copy link
Contributor Author

natez0r commented Apr 11, 2018

I think this is actually caused by: square/okhttp#3915

@mireille0426
Copy link

mireille0426 commented Sep 11, 2018

Hi @natez0r, I encountered the same issue as what's been described here. I'd like to know if the issue has been resolved for you?

I traced this thread to issue square/okhttp#3915 and square/okhttp#4127, used okhttp-3.11.0 + exoplayer 2.7.3 but still experience the same issue after some frequent seek. Bumping up the readTimeOut threshold does remedy the endless buffering, but it still take a long wait before it come back.

So I'm wondering what's the current status on your end. How long does your playback back after some frequent seeking? Also did you do any specific configuration on your okHttpClient config?

Thanks!

@natez0r
Copy link
Contributor Author

natez0r commented Sep 11, 2018

we have code in place to mitigate the ability to seek rapidly right now (we don't perform a seek until the user has lifted their finger on a scrub).

One of the main underlying issues causing it was fixed in okhttp 3.11, which @ojw28 has told me has been bumped in the current dev-v2 branch. I'd imagine this will be a lot better after that.

@ojw28
Copy link
Contributor

ojw28 commented Oct 31, 2018

@natez0r - Any idea what the status of this issue is at this point?

@natez0r
Copy link
Contributor Author

natez0r commented Oct 31, 2018

@ojw28 We are currently upgrading to exo 2.9 and okhttp 3.11, plan to have something testable by end of week where we can determine if this is fixed/mitigated

@ojw28
Copy link
Contributor

ojw28 commented Nov 23, 2018

@natez0r Any update on this?

@natez0r
Copy link
Contributor Author

natez0r commented Nov 26, 2018

upgraded to 2.9.1, I tested locally with our throttling code removed and i was still getting socket timeouts on rapidly cancelled requests:

11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: Source error.
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: com.google.android.exoplayer2.upstream.HttpDataSource$HttpDataSourceException: java.net.SocketTimeoutException: timeout
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	at com.google.android.exoplayer2.ext.okhttp.OkHttpDataSource.read(OkHttpDataSource.java:217)
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	at com.google.android.exoplayer2.upstream.DefaultDataSource.read(DefaultDataSource.java:255)
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	at com.google.android.exoplayer2.upstream.cache.CacheDataSource.read(CacheDataSource.java:328)
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	at com.google.android.exoplayer2.upstream.StatsDataSource.read(StatsDataSource.java:91)
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	at com.google.android.exoplayer2.extractor.DefaultExtractorInput.readFromDataSource(DefaultExtractorInput.java:260)
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	at com.google.android.exoplayer2.extractor.DefaultExtractorInput.advancePeekPosition(DefaultExtractorInput.java:135)
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	at com.google.android.exoplayer2.extractor.DefaultExtractorInput.peekFully(DefaultExtractorInput.java:116)
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	at com.google.android.exoplayer2.source.hls.HlsMediaChunk.peekId3PrivTimestamp(HlsMediaChunk.java:317)
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	at com.google.android.exoplayer2.source.hls.HlsMediaChunk.prepareExtraction(HlsMediaChunk.java:271)
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	at com.google.android.exoplayer2.source.hls.HlsMediaChunk.loadMedia(HlsMediaChunk.java:246)
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	at com.google.android.exoplayer2.source.hls.HlsMediaChunk.load(HlsMediaChunk.java:195)
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	at com.google.android.exoplayer2.upstream.Loader$LoadTask.run(Loader.java:381)
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1162)
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:636)
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	at java.lang.Thread.run(Thread.java:764)
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: Caused by: java.net.SocketTimeoutException: timeout
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	at okhttp3.internal.http2.Http2Stream$StreamTimeout.newTimeoutException(Http2Stream.java:616)
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	at okhttp3.internal.http2.Http2Stream$StreamTimeout.exitAndThrowIfTimedOut(Http2Stream.java:624)
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	at okhttp3.internal.http2.Http2Stream$FramingSource.waitUntilReadable(Http2Stream.java:400)
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	at okhttp3.internal.http2.Http2Stream$FramingSource.read(Http2Stream.java:349)
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	at okhttp3.internal.http2.Http2Codec$StreamFinishingSource.read(Http2Codec.java:213)
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	at okio.RealBufferedSource$1.read(RealBufferedSource.java:440)
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	at com.google.android.exoplayer2.ext.okhttp.OkHttpDataSource.readInternal(OkHttpDataSource.java:364)
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	at com.google.android.exoplayer2.ext.okhttp.OkHttpDataSource.read(OkHttpDataSource.java:214)
11-26 12:08:33.368  1780  9784 E ExoPlayerImplInternal: 	... 14 more

@noamtamim
Copy link
Contributor

What's the current status of this issue? The underlying issue in okhttp seems to be resolved (yet @natez0r says a variant of it still happens).
When testing with an HLS sample on http/2 it still occurs.
We're considering moving from DefaultHttpDataSource to OkHttp (we use okhttp for other network requests), and this is a blocker.

Is there another okhttp issue that tracks this?

@ojw28
Copy link
Contributor

ojw28 commented Jan 2, 2019

It's unlikely to be an ExoPlayer issue. In which case this is really a question about whether OkHttp still has a problem, and if so, whether there's a corresponding tracking bug for it.

@natez0r
Copy link
Contributor Author

natez0r commented Jan 2, 2019

@noamtamim I have not filed a new OKHttp issue for the underlying cause (I have not been able to spend any time on this recently). It definitely still occurs, but it does appear to be an OKHttp issue vs ExoPlayer. In our codebase, we have a throttling mechanism that works around the issue right now.

@noamtamim
Copy link
Contributor

noamtamim commented Jan 6, 2019

@natez0r is there a way to prove to okhttp that this issue still occurs (and that it's caused by their library and not ExoPlayer)? As a workaround, is it possible to avoid the http/2 "upgrade", so that the connection remains http/1.1 even if the server supports http/2?
It's really easy to reproduce with ExoPlayer 2.9.3, but I can't prove that the issue is in okhttp and not in ExoPlayer's okhttp extension.

@noamtamim
Copy link
Contributor

Just to be clear: right now, it's not safe to use ExoPlayer with OkHttp if there's a chance you'll end up talking to a http/2 server - at least not without throttling seek() calls.

@ojw28
Copy link
Contributor

ojw28 commented Jan 6, 2019

It's really easy to reproduce with ExoPlayer 2.9.3, but I can't prove that the issue is in okhttp and not in ExoPlayer's okhttp extension.

If it's really easy to reproduce then file another issue on OkHttp with really easy reproduction steps. I don't think there's a plausible explanation for it being anything other than an OkHttp issue.

@ojw28
Copy link
Contributor

ojw28 commented Jan 6, 2019

As a workaround, is it possible to avoid the http/2 "upgrade", so that the connection remains http/1.1 even if the server supports http/2?

The internet suggests so:
https://stackoverflow.com/questions/41925501/java-okhttp3-only-use-http-1-1-or-http-2

@noamtamim
Copy link
Contributor

If it's really easy to reproduce then file another issue on OkHttp with really easy reproduction steps. I don't think there's a plausible explanation for it being anything other than an OkHttp issue.

As I wrote, it's easy to reproduce using ExoPlayer. But that doesn't prove the bug is not in how ExoPlayer uses okhttp. For example, they claim that if you're not reading the entire response (the part that was buffered so far) you're doing it wrong.

@ojw28
Copy link
Contributor

ojw28 commented Jan 7, 2019

I think it was concluded in that thread (possibly in the hidden items part) that calling response.body().close() should be sufficient.

As above, if it's really easy to reproduce then the correct thing to do is file an issue on OkHttp with really easy reproduction steps. Even if it is an issue with ExoPlayer's usage, they are the best people to explain what we're doing wrong. Trying to prove"something uses an API correctly isn't really a feasible thing to do. Unless you can find something it's doing incorrectly then you pretty much have to assume it's correct, after you've stared at it for long enough...!

It's really easy to reproduce with ExoPlayer 2.9.3, but I can't prove that the issue is in okhttp and not in ExoPlayer's okhttp extension.

Just to be clear: right now, it's not safe to use ExoPlayer with OkHttp if there's a chance you'll end up talking to a http/2 server - at least not without throttling seek() calls.

Are these statement of facts (i.e. you see this in your own experimentation) or are they questions?

@noamtamim
Copy link
Contributor

@ojw28 All that's required to reproduce the issue is:

  1. Put PlayerView in an Activity layout
  2. Setup SimpleExoPlayer and call playerView.setPlayer(player)
  3. Create a MediaSource hosted on an http/2 server
  4. Use OkHttpDataSourceFactory with the default OkHttpClient
  5. Prepare and play.

When the player loads, let it start playing. Then tap the fast-forward button quickly 5-10 times. Playback gets stuck.

When http/2 is disabled (using the builder and protocols(Collections.singletonList(Protocol.HTTP_1_1)), the player seeks as expected.

This is what I call "really easy to reproduce" -- it fails when using the default parameters and basic setup. I tested with the basic Widevine sample. This is also why I argue it's not safe to use ExoPlayer with okhttp (unless you explicitly disable http/2).

@ojw28
Copy link
Contributor

ojw28 commented Jan 7, 2019

Ok. So turn that into a small demo project, and file an issue on OkHttp. The fact it works fine with HTTP_1_1 is in itself strong evidence that it's an OkHttp issue, because the protocol that's selected is (presumably) supposed to be transparent to the user. Which obviously isn't the case here, because the behavior is different.

@Jogan
Copy link

Jogan commented Jan 7, 2019

I believe this issue is already being tracked in the OkHttp repo

square/okhttp#3146

@ojw28
Copy link
Contributor

ojw28 commented Jan 30, 2019

This is also why I argue it's not safe to use ExoPlayer with okhttp (unless you explicitly disable http/2).

I was just looking at whether we could disable http/2 in the ExoPlayer extension, but it looks like application code has to do this when building the OkHttpClient, which is always injected into the extension. As follow on conclusions:

  • I think it's more accurate to say that it's not safe to use OkHttp (with anything) unless you explicitly disable http/2. There's nothing about this that's unique to ExoPlayer.
  • If http/2 needs disabling by default anywhere until this issue can be properly resolved, I think it probably needs to be done directly in OkHttp.

I agree the issue here is what's being tracked by square/okhttp#3146.

@noamtamim
Copy link
Contributor

I was just looking at whether we could disable http/2 in the ExoPlayer extension, but it looks like application code has to do this when building the OkHttpClient, which is always injected into the extension.

The DataSourceFactory could derive a new OkHttpClient from the one provided (using OkHttpClient.newBuilder()) and disable http/2 there. But I don't think it's a good idea, because at some point OkHttp will (hopefully) get fixed, and then ExoPlayer will remain with an unnecessary restriction.

It might be a good idea to add a warning somewhere, telling developers about this issue -- so that users of the extension will disable http/2 (unless they use a fixed version, etc).

I think it's more accurate to say that it's not safe to use OkHttp (with anything) unless you explicitly disable http/2. There's nothing about this that's unique to ExoPlayer.

That's a valid point, although in most use cases you don't repeatedly and quickly cancel and restart requests. For playback (with seeking), that's very common unless you do what @natez0r did (add some artificial latency in the UI).

@natez0r
Copy link
Contributor Author

natez0r commented Sep 11, 2019

in OKHTTP 4.2 (and backported to 3.14.3), it appears that they may've found the underlying issue that I is causing this:

https://square.github.io/okhttp/changelog/#version-420 (https://square.github.io/okhttp/changelog_3x/#version-3143)

Is there any plan to upgrade to above 3.12 for okhttp extension? I know it deprecates old versions of android, but could there be some configuration we could use to force the newer versions of OKHTTP?

@natez0r
Copy link
Contributor Author

natez0r commented Sep 12, 2019

unfortunately, it's API 21.

@ojw28
Copy link
Contributor

ojw28 commented Sep 13, 2019

It looks like they're still doing 3.12.X releases, and that the fix was back-ported to 3.12.5 (release notes, details of 3.12.X releases. So I think we can just upgrade to 3.12.5 and mark this as fixed. Do you agree?

ojw28 added a commit that referenced this issue Sep 13, 2019
Issue: #4078
PiperOrigin-RevId: 268887744
@natez0r
Copy link
Contributor Author

natez0r commented Sep 13, 2019 via email

ojw28 added a commit that referenced this issue Sep 17, 2019
Issue: #4078
PiperOrigin-RevId: 268887744
@noamtamim
Copy link
Contributor

noamtamim commented Sep 18, 2019

@ojw28 Sadly, this error still happens (with the current tip of dev-v2-r2.10.5 branch, commit 2fca01b). I'll attach a test app soon.
Tested in two different methods:

  1. Latest ExoPlayer release (comes with okhttp 3.12.1) + explicitly specify okhttp 3.12.5 in build.gradle
  2. Use 2fca01b via JitPack. It seems that there's a audio/video sync issue with this build, but I don't think it's related to this http/2 timeout error.

@noamtamim
Copy link
Contributor

noamtamim commented Sep 18, 2019

Unzip and open in Android Studio:
ExoOkHttp-4078.zip

The current setup uses method 2 above, edit build.gradle to change to method 1.

Reproduction steps:

  1. Build, install, run the app in Android Studio
  2. Wait until the video starts playing.
  3. Quickly and repeatedly tap the FF button. Many times, until the seek bar almost reaches the end (but don't let it reach the end).

In logcat, filter on "okhttp". After every seek you'll see java.io.InterruptedIOException - this is expected and non-fatal, because every seek cancels a requests. However, after many of these, you'll eventually see a java.net.SocketTimeoutException. Within 10-15 seconds, ExoPlayer throws an HttpDataSource$HttpDataSourceException, caused by that timeout. Playback doesn't recover after that.

Now, to verify that it's an http/2 issue, uncomment line 53:
.protocols(Collections.singletonList(Protocol.HTTP_1_1))
This will prevent OkHttp from upgrading to http/2.

Repeat the steps above -- no timeouts (you'll see many connected protocol=http/1.1 instead of just a single connected protocol=h2).

@noamtamim
Copy link
Contributor

As for the audio/video sync issue I mentioned before - sometimes (about 1 in 4 attempts), when I start tapping the FF button, the music continues to play even when the video is buffering (due to seeks). When that happens, even when I stop seeking and let it play, I hear the original audio playing from the beginning and the "current" audio. Tapping pause pauses the video and audio, but the "original" audio continues to play until the app is killed.

If this is not a known issue you're currently working on, I'll file a separate bug -- please let me know.

@noamtamim
Copy link
Contributor

An alternative way to observe the http/2 timeout issue -- using the ExoPlayer demo app.
Open DemoApplication.java and modify buildHttpDataSourceFactory() to the following:

  public HttpDataSource.Factory buildHttpDataSourceFactory() {
    final OkHttpClient client = new OkHttpClient.Builder()
        .eventListener(new EventListener() {
          @Override
          public void connectEnd(Call call, InetSocketAddress inetSocketAddress, Proxy proxy, Protocol protocol) {
            Log.d(TAG, "okhttp connected protocol=" + protocol);
          }

          @Override
          public void callFailed(Call call, IOException ioe) {
            Log.d(TAG, "okhttp callFailed " + ioe);
          }
        })
        // Uncomment the "protocols" below to forbid http/2
//                .protocols(Collections.singletonList(Protocol.HTTP_1_1))
        .build();

    return new OkHttpDataSourceFactory(client, null);
  }

From the HLS section, play "Apple 4x3 basic stream". The issue reproduces when repeatedly and quickly tapping the FF button, as explained above.

@ojw28
Copy link
Contributor

ojw28 commented Sep 18, 2019

Thanks for the updates. You should probably post your sample and reproduction steps on the OkHttp issue, since this remains an OkHttp problem (although you may want to fix the bug below first :)). It's not something we're likely to be able to fix on our side.

It seems that there's a audio/video sync issue with this build, ... If this is not a known issue you're currently working on, I'll file a separate bug -- please let me know.

I think it's because your demo app isn't hooking the player into the activity lifecycle properly. It creates and starts a player in onCreate and sends it on its way. I'm not sure under what conditions it happens, but probably the activity is being recreated in the cases where you can reproduce this issue. Since you don't stop the player from the old instance of the activity when it's destroyed, its audio will continue to play until you kill the process. Creating and starting the player in onStart and releasing it in onStop, will likely resolve the problem. If you really think this is an ExoPlayer issue, please let us know (via a different issue) asap, since 2.10.5 is ready to ship. Thanks!

@noamtamim
Copy link
Contributor

@ojw28 thanks, you're right about the bug. It was late at night (past 11pm in Israel :-)), I was too lazy to handle Android methods in a test app, and I missed the obvious. I wanted to make sure you're not missing anything so I jumped the gun on this report.

ojw28 added a commit that referenced this issue Jan 8, 2020
Issue: #4078
PiperOrigin-RevId: 288651166
@ojw28
Copy link
Contributor

ojw28 commented Jan 8, 2020

Fingers crossed this is finally fixed!

@ojw28 ojw28 closed this as completed Jan 8, 2020
@noamtamim
Copy link
Contributor

noamtamim commented Jan 8, 2020 via email

ojw28 added a commit that referenced this issue Jan 17, 2020
Issue: #4078
PiperOrigin-RevId: 288651166
@hans-de-clercq
Copy link

I had a similar issue in my app. When I tapped my skip button quickly 5-10 times, playback got stuck with a SocketTimeoutException.

Updating exoplayer to 2.11.2 did not solve this issue for me. Disabling http/2 with .protocols(listOf(Protocol.HTTP_1_1)) did the trick though.

So I'm not sure this issue is fixed now.
@noamtamim, did you manage to verify this?

@noamtamim
Copy link
Contributor

Sadly, I didn't get the chance to do it. For now we have disabled http/2.

@ojw28
Copy link
Contributor

ojw28 commented Feb 19, 2020

I suggest you (once again) report this to the OkHttp project along with reproduction steps. There's not much we can do about it on the ExoPlayer side, unfortunately. You could alternatively use a network stack that doesn't have this issue (e.g., Cronet).

@noamtamim
Copy link
Contributor

I can confirm that the issue still occurs in the test app attached to #4078 (comment). I've updated it to use ExoPlayer 2.11.3 and a new stream.
Http2Test-Issue4078-Exo2.11.3.zip

It seems to be harder to reproduce now (you have to play with the seek bar more) - but it still happens.

I'll try to create a test app that does not include ExoPlayer.

@noamtamim
Copy link
Contributor

@ojw28 I couldn't reproduce it with OkHttp alone. As I mentioned above, it's much harder to reproduce it now. So the question is, how to recover?

These are the final logcat lines when it happens:

2020-02-19 22:28:33.715 32766-375/com.exotest.http2test D/MainActivity: okhttp callFailed java.io.InterruptedIOException
2020-02-19 22:28:33.715 32766-376/com.exotest.http2test D/MainActivity: okhttp callFailed java.io.InterruptedIOException
2020-02-19 22:28:34.498 32766-376/com.exotest.http2test D/MainActivity: okhttp callFailed java.io.InterruptedIOException
2020-02-19 22:28:34.498 32766-375/com.exotest.http2test D/MainActivity: okhttp callFailed java.io.InterruptedIOException
2020-02-19 22:28:44.852 32766-375/com.exotest.http2test D/MainActivity: okhttp callFailed java.net.SocketTimeoutException: timeout
2020-02-19 22:28:44.852 32766-376/com.exotest.http2test D/MainActivity: okhttp callFailed java.net.SocketTimeoutException: timeout
2020-02-19 22:28:54.889 32766-376/com.exotest.http2test D/MainActivity: okhttp callFailed java.net.SocketTimeoutException: timeout
2020-02-19 22:28:54.890 32766-375/com.exotest.http2test D/MainActivity: okhttp callFailed java.net.SocketTimeoutException: timeout
2020-02-19 22:29:05.927 32766-376/com.exotest.http2test D/MainActivity: okhttp callFailed java.net.SocketTimeoutException: timeout
2020-02-19 22:29:05.927 32766-375/com.exotest.http2test D/MainActivity: okhttp callFailed java.net.SocketTimeoutException: timeout
2020-02-19 22:29:17.986 32766-375/com.exotest.http2test D/MainActivity: okhttp callFailed java.net.SocketTimeoutException: timeout
2020-02-19 22:29:17.989 32766-376/com.exotest.http2test D/MainActivity: okhttp callFailed java.net.SocketTimeoutException: timeout
2020-02-19 22:29:17.998 32766-361/com.exotest.http2test E/ExoPlayerImplInternal: Source error.
    com.google.android.exoplayer2.upstream.HttpDataSource$HttpDataSourceException: java.net.SocketTimeoutException: timeout
        at com.google.android.exoplayer2.ext.okhttp.OkHttpDataSource.read(OkHttpDataSource.java:277)
        at com.google.android.exoplayer2.upstream.StatsDataSource.read(StatsDataSource.java:91)
        at com.google.android.exoplayer2.extractor.DefaultExtractorInput.readFromDataSource(DefaultExtractorInput.java:287)
        at com.google.android.exoplayer2.extractor.DefaultExtractorInput.readFully(DefaultExtractorInput.java:74)
        at com.google.android.exoplayer2.extractor.mp4.FragmentedMp4Extractor.readAtomHeader(FragmentedMp4Extractor.java:336)
        at com.google.android.exoplayer2.extractor.mp4.FragmentedMp4Extractor.read(FragmentedMp4Extractor.java:310)
        at com.google.android.exoplayer2.source.chunk.ContainerMediaChunk.load(ContainerMediaChunk.java:135)
        at com.google.android.exoplayer2.upstream.Loader$LoadTask.run(Loader.java:391)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
        at java.lang.Thread.run(Thread.java:919)
     Caused by: java.net.SocketTimeoutException: timeout
        at okhttp3.internal.http2.Http2Stream$StreamTimeout.newTimeoutException(Http2Stream.java:672)
        at okhttp3.internal.http2.Http2Stream$StreamTimeout.exitAndThrowIfTimedOut(Http2Stream.java:680)
        at okhttp3.internal.http2.Http2Stream$FramingSource.read(Http2Stream.java:398)
        at okhttp3.internal.http2.Http2Codec$StreamFinishingSource.read(Http2Codec.java:205)
        at okio.RealBufferedSource$1.read(RealBufferedSource.java:439)
        at com.google.android.exoplayer2.ext.okhttp.OkHttpDataSource.readInternal(OkHttpDataSource.java:428)
        at com.google.android.exoplayer2.ext.okhttp.OkHttpDataSource.read(OkHttpDataSource.java:274)
        at com.google.android.exoplayer2.upstream.StatsDataSource.read(StatsDataSource.java:91) 
        at com.google.android.exoplayer2.extractor.DefaultExtractorInput.readFromDataSource(DefaultExtractorInput.java:287) 
        at com.google.android.exoplayer2.extractor.DefaultExtractorInput.readFully(DefaultExtractorInput.java:74) 
        at com.google.android.exoplayer2.extractor.mp4.FragmentedMp4Extractor.readAtomHeader(FragmentedMp4Extractor.java:336) 
        at com.google.android.exoplayer2.extractor.mp4.FragmentedMp4Extractor.read(FragmentedMp4Extractor.java:310) 
        at com.google.android.exoplayer2.source.chunk.ContainerMediaChunk.load(ContainerMediaChunk.java:135) 
        at com.google.android.exoplayer2.upstream.Loader$LoadTask.run(Loader.java:391) 
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167) 
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641) 
        at java.lang.Thread.run(Thread.java:919) 

The InterruptedIOException exceptions are actually ok, because they are caused by seek/cancel. However, after so many cancellations (this specific run had 140 of them), you get a new error - SocketTimeoutException. When that happens, the connection is doomed - but it takes a lot of time until ExoPlayer reports the error:

  • First timeout - 22:28:44.852
  • ExoPlayer error - 22:29:17.998

It took 33 seconds for ExoPlayer to report this error, too much for a recovery (the user won't wait).

There are two questions here:

  1. Is it possible to recover faster?
  2. Is it really so bad? The user had to work really hard to cause this behavior. Even without any smart recovery, he/she will go back and retry the video.

@ojw28
Copy link
Contributor

ojw28 commented Feb 19, 2020

I think (1) is a question for the OkHttp team, and (2) is a question for the app developer.

Ultimately, we're providing an extension that allows you to use the OkHttp network stack. No one has ever diagnosed an issue with our integration. If the underlying network stack doesn't work properly, it's the responsibility of the maintainers of that network stack to resolve that. We also provide an extension that allows you to use Cronet, which as far as I'm aware does not suffer from this kind of issue.

@noamtamim
Copy link
Contributor

@ojw28 I agree with you on 2.

By 1, I'm asking if you think any recovery (or quick failure, allowing the app to recover faster) code can be added to OkHttpDataSource. Kaltura's backend and CDN support http/2 and it's a shame we cannot utilize it on Android, which accounts for most of the plays.

@ojw28
Copy link
Contributor

ojw28 commented Feb 20, 2020

By 1, I'm asking if you think any recovery (or quick failure, allowing the app to recover faster) code can be added to OkHttpDataSource.

From the stack trace above, it looks like the issue is surfaced to OkHttpDataSource as a regular SocketTimeoutException. So I doubt it's possible to disambiguate this problem from other socket timeouts. I also doubt it's possible to do anything (sensible) that allows for recovery in a time shorter than it takes for the initial SocketTimeoutException to be surfaced. Does that already take too long, or does that happen relatively quickly?

If the initial SocketTimeoutException is surfaced relatively quickly, then it may be possible to implement some different behavior for all SocketTimeoutExceptions, just in case it's an instance of this problem. You could investigate ways of somehow resetting the OkHttpClient from within OkHttpDataSource when a SocketTimeoutException occurs (for experimentation, I think you can just cast callFactory to OkHttpClient). I wonder what ((OkHttpClient) callFactory).connectionPool().evictAll(); would do, for example. More generally, you could have a dig around the Javadoc to see what other methods are available via OkHttpClient, and whether any of them sufficiently resets the state to allow for recovery.

Let us know if you find anything useful!

Kaltura's backend and CDN support http/2 and it's a shame we cannot utilize it on Android, which accounts for most of the plays.

Why is it not an option for you to switch to a network stack that supports http/2 reliably?

@noamtamim
Copy link
Contributor

Does that already take too long, or does that happen relatively quickly?

The initial SocketTimeoutException fires relatively quickly - a few seconds I think. But as you can see below, ExoPlayer only gives up and reports an exception the app can handle after more than 30 sec.

I'll try to experiment with the client.

Why is it not an option for you to switch to a network stack that supports http/2 reliably?

AFAIK my options are (a) HttpURLConnection - does not support http/2 (b) OkHttp and (c) Cronet.

I'm sure that since Cronet is Chrome's native http stack, it more reliable than any other stack; however, last I checked it increases app size significantly. I can't find the size now, but I remember it was more than our entire SDK.
So, Cronet isn't an option, sadly.

@ojw28
Copy link
Contributor

ojw28 commented Feb 20, 2020

last I checked it increases app size significantly

Cronet is bundled inside Google Play Services these days, which means you no longer have to bundle the native stack directly in your apk. You can probably just rip out the 'org.chromium.net:cronet-embedded:76.3809.111' dependency and drop in 'com.google.android.gms:play-services-cronet:17.0.0' instead, which I think is the difference between ~8MB and ~40K.

I'm actually not sure why we haven't done that by default. It looks to me like we should do. Applications that still want to bundle the native stack can still do so, and CronetEngineWrapper will still detect and use it if it's present.

When using the Google Play Services stack, CronetDataSourceFactory will fall back to using DefaultHttpDataSourceFactory (or some other factory of your choosing) in the case that Google Play Services isn't present on the device.

So I'd suggest you take another look.

@noamtamim
Copy link
Contributor

This is great news, I will sure give it a go.

@ojw28
Copy link
Contributor

ojw28 commented May 5, 2020

Note: We will be merging #7324 to default the Cronet extension to using Google Play Services rather than the embedded library.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants