Use soxi instead of gstreamer to determine a track's length #67

chrysn · 2016-11-07T14:58:27Z

this patch changes the AudioLengthTask to use the external soxi program (part of sox) instead of running a gst pipeline. after this patch (and dropping the import gst in morituri/common/checksum.py -- the way to do this properly is probably to make all the GstPipelineTask subclasses use self.gst instead of gst, but what to do in __init__ when it's not available yet?), i could do an image verification on a system without a gst module.

the patch is rather simplistic in that it just replaces the gst AudioLengthTask with a sox one; for other tasks it seems that whipper has different implementations, but being new to the whipper tasks system didn't right now find how to do that. more potential for improvement seems to be status reporting (could there be progress on such short tasks?) and parallelization (the new implementation has no explicit locking but seems to be executed in series anyway).

please let me know in which areas the patch would need enhancement, and (for the known shortcomings above if they are actual issues) where in the code to look for examples.

MerlijnWajer · 2016-11-07T15:52:09Z

I think this should go in morituri/program/ - like arc.py and sox.py

chrysn · 2016-11-07T17:10:21Z

thanks @MerlijnWajer, i've moved the code to a soxi.py in the appropriate location. (this might also form a handy base when soxi is used for more; sample rate and metadata extraction are the only other uses that seem practical to me in the whipper context).

RecursiveForest · 2016-11-08T01:53:19Z

morituri/program/soxi.py

+        self._path = path
+        self.logName = os.path.basename(path).encode('utf-8')
+
+        self.command = [SOXI, '-s', self._path]


I've tried to learn as little about morituri.common.task as possible; do we need to assign self._path = path here if we only use it once?

RecursiveForest · 2016-11-08T01:53:47Z

morituri/program/soxi.py

+        self._output = []
+
+    def commandMissing(self):
+        raise common.MissingDependencyException('sox')


Should this say soxi instead of sox? I'm aware that the package is named sox on almost every system (if not every), but the command in particular we care about is soxi.

RecursiveForest · 2016-11-08T02:37:48Z

So the reason tests fail is that soxi balks at filenames that don't end in a file extension that indicates their encoding: it doesn't know what decoder to use for .12.

However, I did notice that sox whatever.format.12 -n stat, which is the same command we use to get the peak value, reports the number of samples read accurately for tracks of any name. The value soxi returns and the value whipper wants is the number of 'sample pairs' (a sample for each channel), so taking this value and dividing by 2 gives the desired value for determining track length.

Extending morituri.program.sox instead of depending on soxi seems to be the way to go here: I am happy to code this up myself, but you are welcome to do so and submit a PR as well. I'll hold off on pushing anything until I hear back from you.

chrysn · 2016-11-08T09:41:05Z

i was originally on the plain sox track until i found a discussion on
it on stack overflow1; using soxi generally seems more robust than
parsing the output of a -n stat. that disadvantage is relativised by
peak_level already depending on stat output. i do have reservations
against doing arithmetics with constants (what if at some point we start
storing mirrored-mono cds as mono flac?), but maybe that's just my pet
peeve.

as for the failing tests, i'd argue that as long as the presented file
is not a proper flac file, a "unknown type" extension is as good as good
as a GstException; the test should rather produce an oddly named but
otherwise correct flac file (here: with a .flac extension).

thus, i see two paths forward:

stick with soxi, provide proper test flacs (by creating a temporary
copy of track.flac under a weird name)
move to sox (this might mean making audio length discovery a function
rather than a task; that might be good or bad depending on the
project's roadmap for tasks and parallelization)

unless this changed your mind about which is the way to go in soxi vs
sox, i'll abandon this approach and let you go ahead with the sox.py
expansion.

RecursiveForest · 2016-11-08T14:41:30Z

I'm happy with using soxi if we change the tests, especially in light of your point about dealing with mono* (I wanted to use Samples read, not Length, from sox, so I don't think the SO discussion applies). In actual practice we only use this AudioLengthTask to verify rips we've made (as far as I can tell), and the expectation is that we've just ripped to .flac.

*If and only if you can prove to me that soxi gives the right value for a mono flac / flac of arbitrary channels.

And as an aside, we already divide by common.SAMPLES_PER_FRAME— not a magic number but definitely a constant.

Sorry for the comment/editing thrashing here, I should not github immediately after waking up.

JoeLametta · 2016-11-28T21:58:04Z

@chrysn
Should I close this pull request?

chrysn · 2016-11-29T10:39:52Z

@JoeLametta ad close: Sorry, i missed the last post and was waiting for feedback on mine.

@RecursiveForest ad proof (output abbreviates):

$ file test-original.flac
test-original.flac: flac audio bitstream data, 16 bit, stereo, 44.1 khz, 7898016 samples
$ sox test-original.flac -n stat           
samples read:          15796032
$ soxi -s test-original.flac    
7898016
$ sox test-original.flac -c1 test-mono.flac
$ file test-mono.flac                      
test-mono.flac: flac audio bitstream data, 16 bit, mono, 44.1 khz, 7898016 samples
$ sox test-mono.flac -n stat               
samples read:           7898016
$ soxi -s test-mono.flac    
7898016

(Replace -c1 with -c5 or an arbitrary number of channels, results are equivalent).

i'll add a patch that uses a .flac name for the file in the unit test, then it should pass.

Contributes-To: whipper-team#29

* Use extensions soxi understands (ie. ".flac") * Actually test for result correctness on files with odd characters in their names by copying the test track * Relax the requirements on the "track absent" task to only raise some TaskError (the previously tested behavior was backend dependent, and the application did not actually depend on that behavior)

chrysn · 2016-11-29T11:59:39Z

The updated branch contains one code change (catch- and testable Task.setException on failed soxi execution to indicate that the task failed instead of .error which just sys.exits), and an overworked AudioLengthTask unit test that does tests based on actual (just funnily named) flac files, and on an absent file for comparison.

RecursiveForest · 2016-12-15T20:01:46Z

Sorry for how long this took, there was a miscommunication about who would merge it.

chrysn force-pushed the sox-audiolengthtask branch from ac4629b to eff9cb9 Compare November 7, 2016 17:06

RecursiveForest reviewed Nov 8, 2016

View reviewed changes

chrysn added 3 commits November 29, 2016 12:56

Use soxi instead of gstreamer to determine a track's length

ba10b57

Contributes-To: whipper-team#29

AudioLengthPath: move unit test in parallel to code

89d6c8f

chrysn force-pushed the sox-audiolengthtask branch from eff9cb9 to bacda81 Compare November 29, 2016 11:57

RecursiveForest merged commit a443f30 into whipper-team:master Dec 15, 2016

Freso mentioned this pull request Mar 6, 2017

Remove notes related to GStreamer flacparse #140

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use soxi instead of gstreamer to determine a track's length #67

Use soxi instead of gstreamer to determine a track's length #67

chrysn commented Nov 7, 2016

MerlijnWajer commented Nov 7, 2016

chrysn commented Nov 7, 2016

RecursiveForest Nov 8, 2016 •

edited

Loading

RecursiveForest Nov 8, 2016 •

edited

Loading

RecursiveForest commented Nov 8, 2016 •

edited

Loading

chrysn commented Nov 8, 2016

RecursiveForest commented Nov 8, 2016 •

edited

Loading

JoeLametta commented Nov 28, 2016 •

edited

Loading

chrysn commented Nov 29, 2016

chrysn commented Nov 29, 2016

RecursiveForest commented Dec 15, 2016

Use soxi instead of gstreamer to determine a track's length #67

Use soxi instead of gstreamer to determine a track's length #67

Conversation

chrysn commented Nov 7, 2016

MerlijnWajer commented Nov 7, 2016

chrysn commented Nov 7, 2016

RecursiveForest Nov 8, 2016 • edited Loading

Choose a reason for hiding this comment

RecursiveForest Nov 8, 2016 • edited Loading

Choose a reason for hiding this comment

RecursiveForest commented Nov 8, 2016 • edited Loading

chrysn commented Nov 8, 2016

RecursiveForest commented Nov 8, 2016 • edited Loading

JoeLametta commented Nov 28, 2016 • edited Loading

chrysn commented Nov 29, 2016

chrysn commented Nov 29, 2016

RecursiveForest commented Dec 15, 2016

RecursiveForest Nov 8, 2016 •

edited

Loading

RecursiveForest Nov 8, 2016 •

edited

Loading

RecursiveForest commented Nov 8, 2016 •

edited

Loading

RecursiveForest commented Nov 8, 2016 •

edited

Loading

JoeLametta commented Nov 28, 2016 •

edited

Loading