Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hanging in close() #9

Closed
dracos opened this issue Nov 5, 2013 · 11 comments
Closed

Hanging in close() #9

dracos opened this issue Nov 5, 2013 · 11 comments

Comments

@dracos
Copy link

dracos commented Nov 5, 2013

Mac OS X 10.9, ffmpeg 2.1 from MacPorts, audioread 1.0.1 from pypi.

If I have the following small file (with the directory of MP3s being the one from this repository - https://github.com/mysociety/sayit ):

import audioread.ffdec
import os

def get_audio_duration(in_filename):
    f = audioread.ffdec.FFmpegAudioFile(in_filename)
    return round(f.duration)

root = 'speeches/fixtures/expected_outputs/mp3'
for mp3 in os.listdir(root):
    mp3 = os.path.join(root, mp3)
    print mp3, get_audio_duration(mp3)

Then sometimes it will run fine, but frequently it will hang on one of the files. As far as I have been able to work out, it is hanging on the wait() inside close(), but I'm not sure what I should do in order to debug it further. Running ffmpeg manually, I can't see any problems. I can Ctrl-C, in which case I get a message: Exception KeyboardInterrupt: KeyboardInterrupt() in <bound method FFmpegAudioFile.__del__ of <audioread.ffdec.FFmpegAudioFile object at 0x1087cff90>> ignored and the file continues running without issue (printing the duration, in this case), but leaves an ffmpeg process lying about that I have to manually kill -9.

Hope that's useful, do let me know if you'd like any further information to help debug or fix this.

@sampsyo
Copy link
Member

sampsyo commented Nov 6, 2013

Thanks for the detailed & complete bug report. Alas, I still can't seem to reproduce this on my machine (also ffmpeg 2.1, but on a Linux box; my Mac has a dead battery today 😲).

I can think of two possibilities for what could be going on here:

  • The ffmpeg process is filling up an OS buffer somewhere that we're failing to drain and is therefore not exiting.
  • ffmpeg is waiting for user input for some reason—perhaps because of an error—and is hanging because we're not providing it.

In either case, maybe seeing ffmpeg's output will help clarify things. Can you try putting a print statement into your copy of audioread in the ReaderThread loop? Specifically, print data after line 57 (here) should do it

@dracos
Copy link
Author

dracos commented Nov 6, 2013

[ Original reply, please see below. https://gist.github.com/dracos/7334094 has two files, "works" and "hangs" - I cut streams of binary data and replaced with [Binary data], and changed to have print mp3 and print get_audio_duration(mp3) on separate lines. They look the same to me, apart from one just stops on the fourth file with no data returned. If I hit Ctrl-C, then it outputs the contents of "hangs-post-ctrl-c" - giving the output fine, but missing any ffmpeg output from the Ctrl-C file, or the last file either? ]

Ooh, if I get rid of the function and have this script:

import audioread.ffdec
import os
import sys

root = 'speeches/fixtures/expected_outputs/mp3'
for mp3 in os.listdir(root):
    mp3 = os.path.join(root, mp3)
    print >>sys.stderr, mp3
    f = audioread.ffdec.FFmpegAudioFile(mp3)
    print >>sys.stderr, round(f.duration)

Then I can't get that to hang ever, and STDOUT prints a lot more data than with the "in a function" version. Hang on, I'll print len(data) rather than data:

$ python duration.py 
speeches/fixtures/expected_outputs/mp3/lamb_first_three_seconds.mp3
3.0
speeches/fixtures/expected_outputs/mp3/lamb_from_four_seconds_onwards.mp3
Data length= 1024
[repeated many times]
1.0Data length= 405
Data length= 0
Data length= 512
Data length= 0

speeches/fixtures/expected_outputs/mp3/lamb_from_three_to_four_seconds.mp3
Data length= 1024
[repeated many times]
Data length= 405
Data length= 0
1.0
 Data length= 512
Data length= 0
speeches/fixtures/expected_outputs/mp3/lamb_mp3_from_3gp.mp3
Data length= 1024
[repeated many times]
Data length= Data length= 457512
6.0
speeches/fixtures/expected_outputs/mp3/lamb_mp3_from_iphone.mp3

Data length= Data length= 00

Data length= 1024
[repeated many times]
5.0
speeches/fixtures/expected_outputs/mp3/lamb_mp3_from_stereo.mp3
Data length= 509Data length= 
128
Data length= 0
Data length= 0
Data length= 1024
[repeated many times]
5.0
speeches/fixtures/expected_outputs/mp3/lamb_whole.mp3
Data length= Data length= 128401

Data length= Data length= 0
0
Data length= 1024
[repeated many times]
Data length= 512
Data length= 0
Data length= 405
Data length= 0
5.0
Data length= 1024
[repeated many times]
Data length= 405

Whilst if I switch it back to using the function:

$ python duration.py 
speeches/fixtures/expected_outputs/mp3/lamb_first_three_seconds.mp3
3.0
speeches/fixtures/expected_outputs/mp3/lamb_from_four_seconds_onwards.mp3
Data length=Data length=  0477

Data length= 0
Data length= 1024
Data length= 1024
Data length= 1024
Data length= 1024
Data length= 404
Data length= 0
1.0
speeches/fixtures/expected_outputs/mp3/lamb_from_three_to_four_seconds.mp3
Data length= 512
Data length= 0
^CException KeyboardInterrupt: KeyboardInterrupt() in <bound method FFmpegAudioFile.__del__ of <audioread.ffdec.FFmpegAudioFile object at 0x10eda2f50>> ignored
1.0
speeches/fixtures/expected_outputs/mp3/lamb_mp3_from_3gp.mp3
Data length= 1024
6.0Data length= 508Data length= 

speeches/fixtures/expected_outputs/mp3/lamb_mp3_from_iphone.mp3Data length= 0
128
Data length= 0

5.0
speeches/fixtures/expected_outputs/mp3/lamb_mp3_from_stereo.mp3
Data length= Data length= 4730

Data length= 0
5.0
Data length= 477Data length= 0
speeches/fixtures/expected_outputs/mp3/lamb_whole.mp3

Data length= 0
5.0Data length= 477
Data length=Data length= 0
 0

No idea why it being in the function makes a difference, but hope that's helpful

@sampsyo
Copy link
Member

sampsyo commented Nov 6, 2013

Very interesting. I think the reason there's a difference between the two is that, with the function wrapper, f is deleted as soon as the function returns. In the other case, it's a global variable, so it gets deleted later, when the Python interpreter shuts down.

Based on this output, here's my best guess at what's going on. (Perplexingly, I can't reproduce this on my Mac either, so we're flying somewhat blind.)

  • The FFmpegAudioFile constructor gets called. An ffmpeg process is forked.
  • Immediately, the ffmpeg process fills up its stdout buffer. It somehow does this before writing to its stderr.
  • We attempt to read a line from the stderr stream in the _get_info method. There's nothing to read since the process has not written anything to stderr yet.

So we're deadlocked: the Python program is waiting for ffmpeg to write to stderr and the ffmpeg process is waiting for the Python program to empty its stdout buffer.

To verify whether this is happening, can you drop a few print statements into the FFmpegAudioFile constructor and other methods? Before and after the self._get_info() call are most critical, but inside read_data and close would also be helpful.

@dracos
Copy link
Author

dracos commented Nov 6, 2013

It's a global variable, but (e.g. if I put the print in close) I can see it's being called on each iteration (as there's no reference to what f was pointing at once it's overwritten in the next iteration, the same that happens when it's in a function (at least, theoretically!).

Okay, print statements before/after _get_info and at start of read_data/close. Not using function, no problem:

speeches/fixtures/expected_outputs/mp3/lamb_first_three_seconds.mp3
I am in the constructor, before _get_info <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>
I am in the constructor, after _get_info <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>
3.0
speeches/fixtures/expected_outputs/mp3/lamb_from_four_seconds_onwards.mp3
I am in the constructor, before _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10320a150>
I am in the constructor, after _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10320a150>
I am in close <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>
1.0
speeches/fixtures/expected_outputs/mp3/lamb_from_three_to_four_seconds.mp3
I am in the constructor, before _get_info <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>
I am in the constructor, after _get_info <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>
I am in close <audioread.ffdec.FFmpegAudioFile object at 0x10320a150>
1.0
speeches/fixtures/expected_outputs/mp3/lamb_mp3_from_3gp.mp3
I am in the constructor, before _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10320a150>
I am in the constructor, after _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10320a150>
I am in close <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>
6.0
speeches/fixtures/expected_outputs/mp3/lamb_mp3_from_iphone.mp3
I am in the constructor, before _get_info <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>
I am in the constructor, after _get_info <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>
I am in close <audioread.ffdec.FFmpegAudioFile object at 0x10320a150>
5.0
speeches/fixtures/expected_outputs/mp3/lamb_mp3_from_stereo.mp3
I am in the constructor, before _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10320a210>
I am in the constructor, after _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10320a210>
I am in close <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>
5.0
speeches/fixtures/expected_outputs/mp3/lamb_whole.mp3
I am in the constructor, before _get_info <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>
I am in the constructor, after _get_info <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>
I am in close <audioread.ffdec.FFmpegAudioFile object at 0x10320a210>
5.0
I am in close <audioread.ffdec.FFmpegAudioFile object at 0x1031fbf90>

Using a function (the fourth time I ran the script, first three runs did not hang):

speeches/fixtures/expected_outputs/mp3/lamb_first_three_seconds.mp3
I am in the constructor, before _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>
I am in the constructor, after _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>
I am in close <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>
3.0
speeches/fixtures/expected_outputs/mp3/lamb_from_four_seconds_onwards.mp3
I am in the constructor, before _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>
I am in the constructor, after _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>
I am in close <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>
1.0
speeches/fixtures/expected_outputs/mp3/lamb_from_three_to_four_seconds.mp3
I am in the constructor, before _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>
I am in the constructor, after _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>
I am in close <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>
1.0
speeches/fixtures/expected_outputs/mp3/lamb_mp3_from_3gp.mp3
I am in the constructor, before _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>
I am in the constructor, after _get_info <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>
I am in close <audioread.ffdec.FFmpegAudioFile object at 0x10d1bafd0>

Ctrl-C as before lets it carry on successfully and complete. I note the code reference is always the same when using a function; dunno if that's of any relevance.

@sampsyo
Copy link
Member

sampsyo commented Nov 6, 2013

So much for that theory! Looks like _get_info is returning successfully, only to hang later in close.

Can we determine which line in close is hanging? Presumably, it's the wait call, but it would be good to check that assumption. (FWIW, the reason that this should never happen is the proc.terminate() call above, which should kill the process. We also flush the stdout and stderr buffers, so it can't be blocked on those.)

@dracos
Copy link
Author

dracos commented Nov 6, 2013

Yes, it's hanging at the wait() line.

@sampsyo
Copy link
Member

sampsyo commented Nov 6, 2013

Very mysterious. My next step, if I could reproduce this on my machine, would be to attach to the hung ffmpeg process with gdb and get a backtrace to see exactly where it's stuck. (It's likely in a system call, so hopefully debug symbols won't be necessary.) But I realize that's a lot to ask you to do—if you're comfortable with gdb basics, then please go ahead, but otherwise, maybe we can work harder to reproduce the problem on a box I have access to.

@dracos
Copy link
Author

dracos commented Nov 7, 2013

Interesting diversion into getting gdb to run on Mavericks :) I don't know gdb very well, so all I can give you is:

(gdb) backtrace
#0  0x00007fff9719e746 in ?? ()
#1  0x00007fff96761779 in ?? ()
#2  0x0000000000000000 in ?? ()

Which I'm guessing isn't that helpful. If I type run, it says do I want to start from the beginning and if I say yes it runs fine (and the parent python script then runs to completion).

Ran it a few more (hanging) times, generally get the same, but I have also seen longer backtraces, such as:

(gdb) backtrace
#0  0x00007fff9719e746 in ?? ()
#1  0x00007fff96761779 in ?? ()
#2  0xe200e162994d3e04 in ?? ()
#3  0x0000000101f37440 in ?? ()
#4  0x0000000101f37444 in ?? ()
#5  0x0000000101f3743c in ?? ()
#6  0x0000000101f37428 in ?? ()
#7  0x0000000000004100 in ?? ()
#8  0x0000000000004303 in ?? ()
#9  0x0000000000f50095 in ?? ()
#10 0x0000000000005e27 in ?? ()
#11 0x0000000000000000 in ?? ()

@sampsyo
Copy link
Member

sampsyo commented Nov 7, 2013

Argh; I was hoping we'd at least get to see symbols for the libc calls. No such luck. 😢

One more thing to try: while ffmpeg is hanging, run sudo sc_usage ffmpeg (a command provided on OS X). Ideally, that will show us the system call that ffmpeg is blocking in.

I wish there were a way for me to reproduce this so I could observe what's going on. If we need to resort to it, would it be possible for you to let me ssh into your machine to reproduce the problem? I realize that's a big security problem, so I understand if it's not possible, but I promise not to trash your machine.

@dracos
Copy link
Author

dracos commented Nov 7, 2013

Afraid all it gives is:

TYPE                           NUMBER        CPU_TIME   WAIT_TIME
------------------------------------------------------------------------------
System         Idle                                     00:00.000
System         Busy                                     00:00.000
ffmpeg         Usermode                      00:00.000

And nothing further.
We could share a screen in Skype, perhaps?

@sampsyo
Copy link
Member

sampsyo commented Nov 8, 2013

Ah, too bad! All my tools are failing me.

Yes, a Skype screen-share might be just the thing. (You can watch me fumble around with CLI tools.) I'm on Pacific time (US). Send me an email (on my profile page) and maybe we can find a time that works?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants