this doesn't seem to work on short files, about 3s or under? #87

RossArnott · 2018-11-22T21:56:09Z

If you want to report a bug, or have a specific question, please make sure to include this information:

Your operating system
Your Python version / distribution
Your ffmpeg version
The exact command you were trying to run
Any output you get when running the command with the --debug flag

The text was updated successfully, but these errors were encountered:

slhck · 2018-11-22T21:57:24Z

Please provide some details as mentioned in the issue template.

RossArnott · 2018-11-22T22:08:25Z

I'm running this command:

ffmpeg-normalize $i -c:a aac -nt ebu -t -5 -f -o processed_audio/$i.m4v

And I find that short files, less than 3 seconds or so, don't get normalized. This may be an artefact of the algorithm needing more samples to work?

ProductName: Mac OS X
ProductVersion: 10.13.6
BuildVersion: 17G65

Python 2.7.10 (default, Oct 6 2017, 22:29:07)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.31)] on darwin

DEBUG LOG:

Ross-MBP:audio_clips rossarnott$ ffmpeg-normalize W5S1-Rest-3-9-Introduction.m4v -c:a aac -nt ebu -t -5 -f --debug -o processed_audio/W5S1-Rest-3-9-Introduction2.m4v
DEBUG: found executable in path: /usr/local/bin/ffmpeg
DEBUG: found executable in path: /usr/local/bin/ffmpeg
DEBUG: Running command: ['/usr/local/bin/ffmpeg', '-filters']
DEBUG: Parsing streams of W5S1-Rest-3-9-Introduction.m4v
DEBUG: Running command: ['/usr/local/bin/ffmpeg', '-i', 'W5S1-Rest-3-9-Introduction.m4v', '-c', 'copy', '-t', '0', '-map', '0', '-f', 'null', '/dev/null']
DEBUG: Stream parsing command output:
DEBUG: ffmpeg version 4.0.2 Copyright (c) 2000-2018 the FFmpeg developers
built with Apple LLVM version 10.0.0 (clang-1000.11.45.2)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.0.2 --enable-shared --enable-pthreads --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-gpl --enable-libass --enable-libfdk-aac --enable-libfreetype --enable-libmp3lame --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-opencl --enable-videotoolbox --disable-lzma --enable-nonfree
libavutil 56. 14.100 / 56. 14.100
libavcodec 58. 18.100 / 58. 18.100
libavformat 58. 12.100 / 58. 12.100
libavdevice 58. 3.100 / 58. 3.100
libavfilter 7. 16.100 / 7. 16.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 1.100 / 5. 1.100
libswresample 3. 1.100 / 3. 1.100
libpostproc 55. 1.100 / 55. 1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'W5S1-Rest-3-9-Introduction.m4v':
Metadata:
major_brand : M4V
minor_version : 1
compatible_brands: M4V M4A mp42isom
creation_time : 2018-11-19T19:25:57.000000Z
description : This video is about W5S1 3-9 Section 3 Rest Audio
album_artist : Gabriel Kava
keywords : Week 5,w5 s1 audio
artist : Gabriel Kava
title : W5S1 3-9 Section 3 Rest Audio
Duration: 00:00:02.00, start: 0.000000, bitrate: 108 kb/s
Stream #0:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 99 kb/s (default)
Metadata:
creation_time : 2018-11-19T19:25:57.000000Z
handler_name : Core Media Audio
Output #0, null, to '/dev/null':
Metadata:
major_brand : M4V
minor_version : 1
compatible_brands: M4V M4A mp42isom
title : W5S1 3-9 Section 3 Rest Audio
description : This video is about W5S1 3-9 Section 3 Rest Audio
album_artist : Gabriel Kava
keywords : Week 5,w5 s1 audio
artist : Gabriel Kava
encoder : Lavf58.12.100
Stream #0:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 99 kb/s (default)
Metadata:
creation_time : 2018-11-19T19:25:57.000000Z
handler_name : Core Media Audio
Stream mapping:
Stream #0:0 -> #0:0 (copy)
Press [q] to stop, [?] for help
size=N/A time=00:00:00.00 bitrate=N/A speed= 0x
video:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

DEBUG: Found audio stream at index 0
INFO: Normalizing file W5S1-Rest-3-9-Introduction.m4v (1 of 1)
DEBUG: Running normalization for W5S1-Rest-3-9-Introduction.m4v
DEBUG: Parsing normalization info for W5S1-Rest-3-9-Introduction.m4v
INFO: Running first pass loudnorm filter for stream 0
DEBUG: Running ffmpeg command: ['/usr/local/bin/ffmpeg', '-nostdin', '-y', '-i', 'W5S1-Rest-3-9-Introduction.m4v', '-filter_complex', '[0:0]loudnorm=i=-5.0:lra=7.0:tp=-2.0:offset=0.0:print_format=json', '-vn', '-sn', '-f', 'null', '/dev/null']
DEBUG: Loudnorm first pass command output:
DEBUG: ffmpeg version 4.0.2 Copyright (c) 2000-2018 the FFmpeg developers
built with Apple LLVM version 10.0.0 (clang-1000.11.45.2)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.0.2 --enable-shared --enable-pthreads --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-gpl --enable-libass --enable-libfdk-aac --enable-libfreetype --enable-libmp3lame --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-opencl --enable-videotoolbox --disable-lzma --enable-nonfree
libavutil 56. 14.100 / 56. 14.100
libavcodec 58. 18.100 / 58. 18.100
libavformat 58. 12.100 / 58. 12.100
libavdevice 58. 3.100 / 58. 3.100
libavfilter 7. 16.100 / 7. 16.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 1.100 / 5. 1.100
libswresample 3. 1.100 / 3. 1.100
libpostproc 55. 1.100 / 55. 1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'W5S1-Rest-3-9-Introduction.m4v':
Metadata:
major_brand : M4V
minor_version : 1
compatible_brands: M4V M4A mp42isom
creation_time : 2018-11-19T19:25:57.000000Z
description : This video is about W5S1 3-9 Section 3 Rest Audio
album_artist : Gabriel Kava
keywords : Week 5,w5 s1 audio
artist : Gabriel Kava
title : W5S1 3-9 Section 3 Rest Audio
Duration: 00:00:02.00, start: 0.000000, bitrate: 108 kb/s
Stream #0:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 99 kb/s (default)
Metadata:
creation_time : 2018-11-19T19:25:57.000000Z
handler_name : Core Media Audio
Stream mapping:
Stream #0:0 (aac) -> loudnorm
loudnorm -> Stream #0:0 (pcm_s16le)
Output #0, null, to '/dev/null':
Metadata:
major_brand : M4V
minor_version : 1
compatible_brands: M4V M4A mp42isom
title : W5S1 3-9 Section 3 Rest Audio
description : This video is about W5S1 3-9 Section 3 Rest Audio
album_artist : Gabriel Kava
keywords : Week 5,w5 s1 audio
artist : Gabriel Kava
encoder : Lavf58.12.100
Stream #0:0: Audio: pcm_s16le, 192000 Hz, stereo, s16, 6144 kb/s (default)
Metadata:
encoder : Lavc58.18.100 pcm_s16le
size=N/A time=00:00:02.00 bitrate=N/A speed=38.7x
video:0kB audio:1504kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown
[Parsed_loudnorm_0 @ 0x7fab4fc095c0]
{
"input_i" : "-22.02",
"input_tp" : "-9.06",
"input_lra" : "0.00",
"input_thresh" : "-32.82",
"output_i" : "-21.55",
"output_tp" : "-8.62",
"output_lra" : "0.00",
"output_thresh" : "-32.36",
"normalization_type" : "linear",
"target_offset" : "16.55"
}
DEBUG: Loudnorm stats parsed: {"input_i": "-22.02", "input_tp": "-9.06", "input_lra": "0.00", "input_thresh": "-32.82", "output_i": "-21.55", "output_tp": "-8.62", "output_lra": "0.00", "output_thresh": "-32.36", "normalization_type": "linear", "target_offset": "16.55"}
INFO: Running second pass for W5S1-Rest-3-9-Introduction.m4v
DEBUG: Running ffmpeg command: ['/usr/local/bin/ffmpeg', '-y', '-nostdin', '-i', 'W5S1-Rest-3-9-Introduction.m4v', '-filter_complex', '[0:0]loudnorm=i=-5.0:lra=7.0:tp=-2.0:offset=0.0:measured_i=-22.02:measured_lra=0.0:measured_tp=-9.06:measured_thresh=-32.82:linear=true:print_format=json[norm0]', '-map_metadata', '0', '-map_chapters', '0', '-c:v', 'copy', '-map', '[norm0]', '-c:a', 'aac', '-c:s', 'copy', '/var/folders/rp/0cqd1c012p7g9jf3mc7nbrvc0000gn/T/qcv46tfu.m4v']
DEBUG: Moving temporary file from /var/folders/rp/0cqd1c012p7g9jf3mc7nbrvc0000gn/T/qcv46tfu.m4v to processed_audio/W5S1-Rest-3-9-Introduction2.m4v
DEBUG: Normalization finished
INFO: Normalized file written to processed_audio/W5S1-Rest-3-9-Introduction2.m4v

slhck · 2018-11-22T22:14:20Z

In the log it says that an output file was written. Is this file the same as the input file, or silent, or...?

It could be that the EBU-type normalization requires more input, but I'd have to check.

RossArnott · 2018-11-22T22:20:26Z

Thanks for the quick response! The output file is the same amplitude as the input file for the example given. The longer files get normalised, which in this case is mostly making them significantly louder. The short files don't get changed. I'm batch processing dozens of files and I end up with a few (the short ones, it seems) at significantly different volume levels.

slhck · 2018-11-22T22:24:03Z

OK, thanks for clarifying this. I'll see if there's a way to tune the parameters to make it work for small files. If not I'll have to at least print a warning.

michaelcrossland · 2018-11-22T22:29:50Z

I can you this it is an ffmpeg thing. There docs and even in there mail list it's stated that ffmpeg needs at lest 30 seconds of sound to be able to do any kind of normailizen. There no way around it outside of adding x many seconds of silence to the end of the file to pad it to a 1 minute mark.

…

On Thu, Nov 22, 2018, 4:24 PM Werner Robitza ***@***.*** wrote: OK, thanks for clarifying this. I'll see if there's a way to tune the parameters to make it work for small files. If not I'll have to at least print a warning. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#87 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABhMBCjuwTbW7RzMbOpPpBYbWFSddqvXks5uxyQDgaJpZM4Yv63B> .

slhck · 2018-11-22T22:37:30Z

Can you please point to a reference for your claim that ffmpeg requires at least 30 seconds of audio material to be able to normalize a file?

RossArnott · 2018-11-22T22:37:44Z

Fair enough. Empirically it looks like ffmpeg-normalize does actually work on files of about 4s or longer, but that's not exactly a scientific test and it could be luck or depend on the audio content.
I'll need to figure out a way to try and automatically flag these files that are not being adjusted correctly.

slhck · 2018-11-22T22:40:46Z

You can use the option to print the statistics and inspect the loudness before and after. But that's not a proper solution either. I'll see what I can do.

kylophone · 2018-11-26T22:32:58Z

I've known about this for a while but haven't had time to fix it. I should really just fix the ffmpeg filter. Can you leave this open and assign to me?

slhck · 2018-11-26T22:37:39Z

I guess that would be way more efficient than me digging through your code. Thanks!

slhck · 2018-11-26T22:41:13Z

Seems I can't assign you, unless you're a collaborator or the OP. I'll leave it open and assign to me in the meantime.

NiloCK · 2019-04-24T05:41:52Z

@kylophone Hoping for some input before putting shovels in the ground:

Is file-length the only issue here? Do you expect a scripted solution that pads a file with 5 seconds of blank audio, runs loudnorm, and then strips the blank audio to work?

kylophone · 2019-04-24T23:46:43Z

The problem has to do with the definition of Integrated Loudness in BS 1770 / EBU R128. IL by definition needs at least 3 seconds. I haven't had a chance to look but padding with silence should work, I think.

NiloCK · 2019-04-25T05:40:13Z

That's as much of a go-ahead as I need. It'll be at least a couple of weeks before I try this, but I'll report back. Thanks.

NiloCK · 2019-05-03T05:31:25Z

For anyone interested, the steps outlined in my issue:

ffmpeg -i input -af "adelay=10000|10000" enlarged

Pads the audio with ten seconds of silence at the beginning. Necessary because of this bug

ffmpeg -i enlarged -af loudnorm=I=-16:TP=-1.5:LRA=11:print_format=json -f null -

Gets loudness data from the file.

ffmpeg -i enlarged -af loudnorm=I=-16:TP=-1.5:LRA=11:measured_I=-XXX:measured_LRA=XXX:measured_TP=-XXX:measured_thresh=-XXX:offset=0.58:linear=true:print_format=summary -ar 48k paddedNormalized

Feeds the loudness data back into the normalization alg for better results

ffmpeg -i paddedNormalized -ss 00:00:10.000 -acodec copy normalized

Removes the 10 seconds of silence

Work just fine. This could be added to this library as a work-around for the upstream bug.

slhck · 2019-05-03T16:23:57Z

Thanks for sharing this. I have to admit, I'm not in favor of adding functionality to automatically pad and truncate the audio streams. That always bears potential for issues with audio-video sync. I'd rather just provide a warning when the audio stream is < 3 s and link to a FAQ entry.

csestili · 2020-03-13T23:36:24Z

The warning pointing to this issue appears even when using RMS normalization (-nt rms). If I understand the explanation above correctly, there is no minimum length requirement for RMS normalization; is that correct? If so, could this warning please be removed for RMS normalization? Thank you!

slhck · 2020-03-14T12:58:04Z

@csestili True, this should only affect EBU-type normalization. I fixed the message in 69ac934, v1.15.7 available now.

5tan · 2020-08-17T14:33:31Z

Method presented by @NiloCK works nice, except it can clip file a little bit, e.g.:

I am not ffmpeg expert, but after some experiments I have concluded that following step

# Removes the 10 seconds of silence
ffmpeg -i paddedNormalized -ss 00:00:10.000 -acodec copy normalized

works only with 2048 samples accuracy.

Thus for my 16kHz sound files I use 16 seconds padding (16s=256kS being LCD of 16k and 2048) to avoid clipping.

dotancohen · 2020-10-06T10:59:54Z

Thanks for sharing this. I have to admit, I'm not in favor of adding functionality to automatically pad and truncate the audio streams. That always bears potential for issues with audio-video sync. I'd rather just provide a warning when the audio stream is < 3 s and link to a FAQ entry.

That would make this otherwise wonderful tool useless for shorter clips, which as evidenced by the existence of this bug, people need. A use case for shorter audio clips is when normalized single spoken words when learning a language, as seen here. In this use case, audio normalization is important but the ability to sync is not important.

Therefore I suggest implementing the warning that audio may not sync properly after normalization, but enabling the pad-then-truncate to happen.

slhck · 2020-10-06T11:25:43Z

@dotancohen Thanks for your feedback. I'm not against such a feature per se, it's just that it is a bit of additional work and may lead to files out of sync, so it needs to be well-tested. I'll look into how to implement it, but I can't give you an ETA on it, unfortunately.

GabArl · 2021-05-11T17:44:28Z

@5tan
no issues with clipping if in your last line you change -acodec copy to -acodec %codec_name%
After getting %codec_name% from
FOR /F "tokens=*" %%C IN ('"ffprobe -i "%input_file%" -select_streams a:0 -show_entries stream=codec_name -hide_banner -v quiet -of csv=p=0"') DO ( SET codec_name=%%C)

This way the codec will be preserved instead of copied (the exact difference I was not able to understand so far 👍 )

Yes, (when using the approach with -acodec copy ) for the padding time (t) in terms of accuracy a size of 2048 samples does work for 16bit and any sample rate (Fs) using the formula t = LCM( Fs , size ) / Fs (not LCD!), but it did not work for me anymore once I dealt with 24bit files. And keep in mind that for example 44100Hz results in a pad time of 512 seconds...

Honestly, I did not fully understand what was going on, but I have a table if someone wants so experiment with it more 😄
I was not able to figure out the math for 24bit, all my values became ridicously high and did not even work in any way.

NiloCK · 2021-08-17T00:35:09Z

@dotancohen Thanks for your feedback. I'm not against such a feature per se, it's just that it is a bit of additional work and may lead to files out of sync, so it needs to be well-tested.

Another potentially useful distinction is that there are no sync issues on pure audio files (ie, non-videos). From this thread, it looks like most people running into this bug are normalizing single spoken words, which is much more likely to be audio than video.

Clipping issues notwithstanding (thanks to everyone who pointed this out), I think a "better" fix for THIS utility might be to keep throwing that error for <3s video files, but do a pad-and-truncate hack on audio files and spit out a warning. Would you consider a PR that adds this behavior?

Heck, vine doesn't even exist anymore!

(although, honestly, some ex-vine content processing people are exactly the ones who have a fully-baked solution to this problem!)

dotancohen · 2021-08-17T05:34:26Z

I think a "better" fix for THIS utility might be to keep throwing that error for <3s video files, but do a pad-and-truncate hack on audio files and spit out a warning.

I agree that this is the best solution, given the use cases stated above.

slhck · 2021-08-17T06:10:38Z

Yes, that seems like a useful solution. It should apply to audio-only files then, which would make the processing easier.

richardpl · 2022-11-09T10:58:11Z

Can this be reproduced somehow reliably? I could not found any input sample to test.
Also I pushed some patches to FFmpeg master to loudnorm filter, I guess one of them should fix this bug.

slhck · 2022-11-09T14:24:06Z

Thanks! I guess in particular this one: FFmpeg/FFmpeg@36572a0

I will leave this open until I get time to test that. I will leave the warning in until this fix lands in a specific ffmpeg version.

richardpl · 2022-11-09T16:48:06Z

Maybe, maybe not, there is also fix for report of 0.0 for LRA for short audio but will look about posting it too.

homocomputeris · 2023-04-23T22:15:43Z

Can the current warning for <3s files be ignored somehow? Or maybe it's been solved?

slhck · 2023-04-24T06:58:04Z

This fix should be in FFmpeg v6.0 or higher. I will close this issue for now.

dailylama · 2023-07-22T06:40:40Z

if it's fixed, then why does it show warning redirecting here

	ffmpeg -version
ffmpeg version 6.0 Copyright (c) 2000-2023 the FFmpeg developers

	ffmpeg-normalize 1.wav
WARNING: Audio stream has a duration of less than 3 seconds. Normalization may not work. See https://github.com/slhck/ffmpeg-normalize/issues/87 for more info.
`

slhck · 2023-07-22T18:34:38Z

Because I forgot to remove it. It no longer shows a warning now.

lorenblue · 2024-02-26T03:39:46Z

Hey there, sorry but I just want to clarify... I am trying to get short (< 3s) spoken word audio to normalize to around -14 LUFS, is this supported or not? Cheers thanks.

slhck · 2024-02-26T07:38:03Z

This should work better now. Just make sure to use a recent ffmpeg version.

slhck closed this as completed Nov 22, 2018

slhck reopened this Nov 22, 2018

slhck self-assigned this Nov 26, 2018

slhck added the bug label Nov 26, 2018

slhck added the upstream-bug label Jan 2, 2019

NiloCK mentioned this issue Apr 29, 2019

Add background service for processing uploaded content NiloCK/vue-skuilder#57

Closed

slhck added enhancement and removed bug labels Jun 2, 2019

slhck closed this as completed in 42ef9ca Jun 17, 2019

pranshurastogi29 mentioned this issue Nov 28, 2019

Question about normalize-resample.sh maum-ai/voicefilter#10

Closed

slhck mentioned this issue Dec 4, 2019

Error initializing complex filters. Numerical result out of range #110

Closed

fripso mentioned this issue Feb 7, 2020

Add flag to enable/disable short-term loudness (S) measurement to decrease minimum file length #113

Closed

HugoGresse mentioned this issue Mar 26, 2020

Normalize all files using ffmpef-normalize (EBU R128) 2ec0b4/kaamelott-soundboard#140

Closed

auricgoldfinger pushed a commit to auricgoldfinger/audio-normalize that referenced this issue Dec 15, 2020

warn when file is too short, fixes slhck#87

0db19fa

slhck reopened this Aug 17, 2021

MinmoTech mentioned this issue Dec 13, 2021

[FEATURE] Use ffmpeg-normalize to level audio volume migaku-official/Migaku-Anki-Addon#8

Closed

slhck closed this as completed Apr 24, 2023

slhck added a commit that referenced this issue Jul 22, 2023

remove warning for short files (#87)

a3d1c11

this doesn't seem to work on short files, about 3s or under? #87

this doesn't seem to work on short files, about 3s or under? #87

Comments

RossArnott commented Nov 22, 2018 • edited Loading

slhck commented Nov 22, 2018

RossArnott commented Nov 22, 2018

slhck commented Nov 22, 2018

RossArnott commented Nov 22, 2018

slhck commented Nov 22, 2018

michaelcrossland commented Nov 22, 2018 via email

slhck commented Nov 22, 2018

RossArnott commented Nov 22, 2018

slhck commented Nov 22, 2018

kylophone commented Nov 26, 2018

slhck commented Nov 26, 2018 • edited Loading

slhck commented Nov 26, 2018

NiloCK commented Apr 24, 2019

kylophone commented Apr 24, 2019 • edited Loading

NiloCK commented Apr 25, 2019

NiloCK commented May 3, 2019 • edited by slhck Loading

slhck commented May 3, 2019

csestili commented Mar 13, 2020

slhck commented Mar 14, 2020

5tan commented Aug 17, 2020

dotancohen commented Oct 6, 2020

slhck commented Oct 6, 2020

GabArl commented May 11, 2021 • edited Loading

NiloCK commented Aug 17, 2021

dotancohen commented Aug 17, 2021

slhck commented Aug 17, 2021

richardpl commented Nov 9, 2022

slhck commented Nov 9, 2022

richardpl commented Nov 9, 2022

homocomputeris commented Apr 23, 2023

slhck commented Apr 24, 2023

dailylama commented Jul 22, 2023

slhck commented Jul 22, 2023

lorenblue commented Feb 26, 2024

slhck commented Feb 26, 2024

RossArnott commented Nov 22, 2018 •

edited

Loading

slhck commented Nov 26, 2018 •

edited

Loading

kylophone commented Apr 24, 2019 •

edited

Loading

NiloCK commented May 3, 2019 •

edited by slhck

Loading

GabArl commented May 11, 2021 •

edited

Loading