Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MP3 cues shifted in time going from Traktor to Rekordbox (AKA 26ms problem) #3

Closed
pestrela opened this issue Jan 24, 2019 · 81 comments · Fixed by #18
Closed

MP3 cues shifted in time going from Traktor to Rekordbox (AKA 26ms problem) #3

pestrela opened this issue Jan 24, 2019 · 81 comments · Fixed by #18
Assignees
Labels
bug Something isn't working

Comments

@pestrela
Copy link

pestrela commented Jan 24, 2019

UPDATE SEP 2019 - FINDINGS SUMMARY:

  • We have found that 6% of the files have a shift of 26 milliseconds when going from Traktor to Rekordbox. The other 94% of the files will be fine.
  • This shift is very noticeable and breaks beatgrids/loops. See below for a graphical example of this issue.
  • Root issue is different interpretations of the tricky MP3 LAME tag (and their derivations LACV/LAVF).
  • Problem: Zero LAME CRC ("case c"):
    • Traktor doesn't accept the LAME tag, but interprets the whole MPEG frame as "music", producing 26ms of garbage;
    • Rekordbox the same, but skips the whole MPEG frame instead.
  • Problem: LAVC/LAVF reduced tags ("case b"):
    • Traktor produces 26ms of garbage because it doesnt understand this tag;
    • Rekordbox accepts the tag as a control frame
  • We’ve now SOLVED this problem in dj-data-converter, a free command-line tool that works in all systems (Windows, Mac, Linux, WSL).
    • This is done without any dependencies using our own home-grown mp3 LAME headers decoder.

LINKS:

ALGORITHM: (updated: 16 Sep 2019)

if mp3 does NOT have a Xing/INFO tag:
     case = 'A'
     correction = 0ms
 
 elif mp3 has Xing/INFO, but does NOT have a LAME tag:
     # typical case: has LAVC header instead
     case = 'B'
     correction = 26ms
 
 elif LAME tag has invalid CRC:
     # typical case: CRC= zero
     case = 'C'
     correction = 26ms
     
 elif LAME tag has valid CRC:
     case = 'D'
     correction = 0ms

EXAMPLE:

51672895-ebb7ee00-1fcc-11e9-8d11-6d1c00af48ff

@alza-bitz
Copy link
Contributor

Hi @pestrela,

Thanks for reporting this. I wasn't aware of it, since for my audio files I didn't encounter the issue yet!

Is the issue intermittent depending on the files, or for you does this issue occur for every file? If it's intermittent depending on the file, it could be tricky to fix, but first of all let's see if I can reproduce it.

If you could attach a zip archive containing an audio file for which you've observed the issue, it might help to speed up the investigation. From there I can try and reproduce it by analysing in Traktor, and then using the converter and checking the result in Rekordbox.

Thanks!

@pestrela
Copy link
Author

pestrela commented Jan 29, 2019

Edit 30 July 2019: Original request follows:

Hi, many thanks for this project
just tested the 0.2.0 release, which produced a valid XML file.

However this is suffering from the "cues are shifted in time" issue that all translators face when going to/from RB/Traktor.
The result looked the same as this example below (taken from CrossDJ):
crossdj - cues shifted in time

The root cause is different definitions of the 00:00:00 time point:
https://www.youtube.com/watch?v=Vl4nbvYmiP4

AFAIK only these 2x tools are able to fix this issue:

could you please consider addressing this issue? I can provide demo mp3s if you don't see this issue in your mp3s.
thanks!

Edit 30 July 2019: First reply follows:

Hi Alza,
This issue only happens for specific MP3s. Below an example, I can provide a lot more later:
https://www.dropbox.com/s/phdpvhv9s8k9u3y/demo%20shifted%20cues%20rekordbuddy2.mp3?dl=1

I've tested many converters - they all suffer from the same issue for this example.
Exceptions:

  • DJCU: initially this gets confused as well. A post correction step fixes this, but depends on rekordbox post-analyzing the file.
  • rekordbuddy: somehow they detect the shift, WITHOUT the rekordbox post-step. (speculation: is this correlated to specific versions of LAME?)

@pestrela pestrela reopened this Jan 29, 2019
@pestrela
Copy link
Author

pestrela commented Jan 29, 2019

I've now analysed by hand 67 different files and found an almost perfect pattern.

If the file was encoded by LAME "3.99" or "3.99.5", the simple conversion produces shifted cues. Exception is "3.99r"
Same story for "3.98", except "3.98r" or "3.98 space "

For the other LAME versions / encoders, no shifted cues were seen.
Note: "unk" means the tag was empty/not present.

please see the below table for my results so far:
image

python code:

import pandas as pd
from io import StringIO

df1 = pd.read_csv(StringIO(a), sep="\t", names=['version','shift']).dropna()
df1['version'] = df1['version'].str.replace(" ", "_")
print("number of entries: %d" % (len(df)))

df2 = pd.crosstab(index=df1["version"], columns=df1["shift"]).sort_values(["bad", "good"], ascending=False)
df2

To analyse the encoder of the files, I've used: https://mediaarea.net/en/MediaInfo
To customize the output: preferences / custom / edit / audio / %Encoded_Library%

what do you think?

@pestrela
Copy link
Author

pestrela commented Jan 30, 2019

extended the analysis to 300 files, analysed manually.
Of these 300, I've subjectivelly found that 11% have shifted cues.

For lame 3.99 files, all of them result in shifted cues.
lame 3.99.5 is now mixed, because it has 60% wrong predictions.
Everything else, including 3.99r etc, only result in 2% false positives.

code and data:
https://github.com/pestrela/music_scripts/blob/master/lame_shifted_cues.py

image

@pestrela
Copy link
Author

Rekorbuddy is able to correct this issue in a single go. Well done!
In their own words:
"Rekord Buddy deals with 5 different issues related to cue timings, and one that we are aware of but haven’t found enough data to compose a decent fix for."
https://forums.next.audio/t/traktor-rekordbox-cues-shifted-in-time/415

@alza-bitz
Copy link
Contributor

alza-bitz commented Feb 3, 2019

Hi @pestrela,

Ok I've started to look into this now, I have some interesting results!

First, I actually had some LAME 3.99 & LAME 3.98 encoded files already, so I tried to reproduce the issue with those. In this case, I found that the cue shifting did not occur with any of the 3.99 and 3.98 files I tried.

Second, I tried to reproduce the issue with the file you provided:

https://www.dropbox.com/s/phdpvhv9s8k9u3y/demo%20shifted%20cues%20rekordbuddy2.mp3?dl=1

In this case, I found that the cue shifting did occur, but notably when I checked the encoder metadata for this particular file:

ffprobe -v verbose <file>

It was not LAME, but Lavf a.k.a libavformat (& the related libavcodec). I believe this encoder string indicates that FFMPEG was used to encode the file. Internally, libavcodec uses libmp3lame for mp3 encoding, but for this file it seems that the version used is not present in the file metadata, it just states Lavf.

Based on this, I then tried to reproduce the issue with Lavf and Lavc xx.xx encoded files. In this case, I found that cue shifting issue did occur, for the vast majority of files with these encoder values (although not all the files, there was at least one exception).

Conclusion: my findings do support the encoder version hypothesis to some extent, however I found that a different encoder is the culprit, Lavf and/or Lavc.

Next steps: our findings are different, so we need to clarify the situation there first before I can proceed.

Assuming we can account for this, I would then try and work out what the shift value(s) are (in seconds), and whether it's constant or not etc.

Let me know what you think!

@pestrela
Copy link
Author

pestrela commented Feb 4, 2019

I've now sent you privately a link to an upload of 35x files that have a clear shift.
Also changed my analysis scripts to use latest ffprobe 4.1.

  • Of the 35x files with bad shifts,
    • 22x have no tag
    • 8x are made by Lav*
    • 5x are LAME
  • regarding good files: of the ~300,
    • 5x from LAV had no shift.

note: "good" files could be actually bad files, but with a very small shift. When I used RECU it sometimes reported marginal (but present) shifts

image

@pestrela
Copy link
Author

pestrela commented Feb 6, 2019

yet another program to guess the encoder:
http://www.rarewares.org/rrw/encspot.php

which is a wrapper around this lib:
http://mp3guessenc.sourceforge.net/

found this program on a list of mp3 tools collected by Pulse@Pioneer (mp3 information / mp3 error checkers)
https://forums.pioneerdj.com/hc/en-us/articles/204681699-MP3-Tools-More

@alza-bitz
Copy link
Contributor

alza-bitz commented Feb 7, 2019

Hi @pestrela,

Thanks for these.. What's your thinking here, is this regarding a method of detecting the encoder for files that don't have an encoder tag (or the encoder tag is empty)? I'll call these files "unknown files".

I assume this is your focus, since unknown files are the biggest proportion of files in your dataset of 300(although LAME files are a close second), and the proportion with the biggest number of shifted cues?

However, it's worth noting that although this proportion has the biggest number of shifted cues, it's not the proportion with the biggest percentage of shifted cues - that goes to Lavf/Lavc:

Category Total Number Shifted % Shifted
Lavf/Lavc (all versions?) 13 8 62%
Unknown 143 22 15%
Lame (all versions?) 122 5 4%

Based on the above, I am thinking that the % numbers are the most helpful indicator for determining what to do next. Although the number of Lavf/Lavc files in your dataset is comparatively small, the percentage result for those does correlate somewhat with my findings.

My current thinking for a solution is to implement a "blacklist lookup table", which would map source + target + encoder (string regex) -> shift (seconds)

For example (shift values are just made up):

Source Target Encoder Shift
Traktor Rekordbox Lavc57.80 0.135
Traktor Rekordbox Lavc* 0.143
Traktor Rekordbox Lavf* 0.143
Traktor Rekordbox LAME3.99 0.128

I am assuming that for a given conversion (source -> target, encoder), the shift is a fixed value (this could be verified using a random sample of files for each encoder).

Of course, this solution doesn't consider unknown files.. some options for those:

  • Include them in the converted output by default, but print a warning and generate a report with a list of the unknown files
  • Don't include them in the converted output by default (but still print a warning and generate a report)

There could also be a command-line option to override whether they are included or not.

For a unknown files, mp3guessenc might be helpful to determine the encoder (I've used it before), but unfortunately there doesn't seem to be a build/version for Mac OSX, which is a show-stopper in any case..

What do you think?

@pestrela
Copy link
Author

pestrela commented Feb 8, 2019

Today tried the following experiment: identify the precise sample of the 0:0:0 point of DJ software.

Method:
For this I've played mp3s file in DJ software while recording, and putting the play position on negative values beforehand. Then I've aligned the recordings on the first downbeat, and then normalized the first 16-bit samples that are greater than zero (mod).

A description of the test procedure, inputs and all outputs are in this zip: https://www.dropbox.com/s/pgpnrw4sl3xv2tp/DAW%20shifted%20cues.zip?dl=0
DAW shifted cues.txt

Results:

  • The CUE Shift amount is different for every mp3.
    • This is also the experience using RECU
  • Traktor plays always the same data
  • Rekordbox adds variable amounts of data when playing mp3

Example:

  • 3rd vs 4th row: RBox starts outputing data (values 0..-1) at a variable time. In this example, it varied 2ms (3rd vs 4th row)
  • 3rd vs 2nd row: RBox outputs a a shift of 29ms. This is not a constant value across mp3s. In the beginning it has a characteristic pattern of (0..-1).
  • 1st vs 2nd row: Traktor ouputs the same data at the same time.
    sample 3 explanation - render

@pestrela
Copy link
Author

pestrela commented Feb 8, 2019

Maybe found an hint on the RekordboxBox release notes.
This mentions an issue about LAME gapless encoding, and claims the 44.1Khz shift to be a constant 26ms.

https://rekordbox.com/en/support/releasenote.php

What's new in rekordbox Ver.2.0.2
● Fixed an issue with beat grid inaccuracy created with v1.6.0/v1.6.2/v2.0.0/v2.0.1.

Ver.1.6.2 (2012.08.21)
What's new in rekordbox Ver.1.6.2
...
●Improved the accuracy of beat grid information analyzed by rekordbox.
●Added a function to fix the misaligned BeatGrid and cue points in mp3 files which
(i) have been encoded by LAME encoder with the gapless setting and
(ii) have been analyzed and adjusted by rekordbox before version 1.5.3.
(As of version 1.5.4, rekordbox has disabled gapless playback of LAME-encoded mp3 files.)
...

Ver.1.5.4 (2012.07.03)
About rekordbox Version 1.5.4
Version 1.5.4 is only for MEP-4000 and new rekordbox users.
Version 1.5.4 disables gapless playback for MP3 files encoded with the LAME encoder on players such as the CDJ-2000.
Disabling gapless playback for MP3 files encoded with the LAME encoder in Version 1.5.4 will shift existing beat grids,
loops or cue points of mp3 files encoded with the LAME encoder that have been analysed and adjusted with an older version of rekordbox.
The offset value depends on the sampling frequency of the file: 24ms (in the case of 48kHz), 26ms (in the case of 44.1 kHz).
However, it does not alter the audio play back on the CDJ's just visually inside rekordbox, therefore you do not need to
reanalyse your tracks and redefine the beat grids, loops or cue points.
Pioneer will provide a tool to automatically adjust the beat grids, loops or cue point data in a future update. We recommend that you wait.
Thank you for your understanding.

@pestrela
Copy link
Author

pestrela commented Feb 8, 2019

Gapless encoding is detectable using byte $AF of the full lame mp3 info tag:
https://wiki.hydrogenaud.io/index.php?title=Gapless_playback#Format_support
http://gabriel.mp3-tech.org/mp3infotag.html

eyeD3 -P lameinfo displays --nogap:
https://eyed3.readthedocs.io/en/latest/_modules/eyed3/mp3/headers.html

however this doesn't match my current dataset:

$ for DIR in {bad,good} ; do echo -n "$DIR " ; (for FILE in $DIR/*.mp3 ; do  eyeD3 -P lameinfo "$FILE" 2>/dev/null | grep -a -c nogap ; done
) | awk '{A=A+$1} END{ print NR-A, A}' ; done
what no nogap has nogap
bad 29 0
good 237 25

@alza-bitz
Copy link
Contributor

Hi @pestrela,

Thanks for continuing the investigation.

The CUE Shift amount is different for every mp3.

Just to clarify, you're saying that the shift is different, even for files with the same encoder? This is contrary to the hypothesis in the video above, cited as root cause: https://www.youtube.com/watch?v=Vl4nbvYmiP4

Rekordbox adds variable amounts of data when playing mp3

Just to clarify, you're saying that the 2nd, 3rd or 4th load of a given file in Rekordbox will have an additional shift, compared with the 1st load of the file? Although it's small, i.e. 2ms as you say. I wonder if this is just a related, but separate Rekordbox peculiarity that can be ignored (since it's only 2ms).

Re: gapless encoding, my conclusion based on the results in your other comment, is that it's not related, it's just a coincidence due to the similar values 24/26ms vs 29ms.

@alza-bitz alza-bitz self-assigned this Feb 9, 2019
@pestrela
Copy link
Author

pestrela commented Feb 9, 2019

the shift is different, even for files with the same encoder

This comment was because on sample1 vs sample2, which have the same encoder, according to the above method, would have different offsets.
The issue is I now see that the above method (find first non-zero byte after play) doesn't seem to predict the correct offset shift that we need to apply.

the 2nd, 3rd or 4th load of a given file in Rekordbox will have an additional shift, compared with the 1st load of the file?

yes.
This is yet another sign that this method is not reliable enough

@pestrela
Copy link
Author

pestrela commented Feb 9, 2019

moving forward, I think we should recreate parts of what RECU does, to get proper statistics on all the offsets from a whole collection.
I expect a lot of outliers from the different beat-grid algorithms, but I expect that most >5ms offsets will cluster somehow correlated with mp3guessenc / ffprobe / eyeD3.

RECU is a tool that takes 2 files:

  1. converted RBox XML, as converted by DJCU/DJDC
  2. original RBox XML, as analysed by rekordbox

it then matches files on the first beat, computes the offset, and applies such offset to all cues of the converted XML
The current RECU requires the first beat to be marked, below some code to avoid this

def bpm_period(bpm):
    return (60.0 / bpm )

def find_min_beat(bpm, cue):
    period = bpm_period(bpm)
    
    beats = int(cue / period)
    ret = cue - beats * period
    return ret

def  find_offset(bpm1, cue1, bpm2, cue2):
    return find_min_beat(bpm1, cue1) - find_min_beat(bpm2, cue2)

@alza-bitz
Copy link
Contributor

Hi @pestrela,

moving forward, I think we should recreate parts of what RECU does, to get proper statistics on all the offsets from a whole collection.
I expect a lot of outliers from the different beat-grid algorithms, but I expect that some >5ms offsets will cluster somehow correlated with mp3guessenc / ffprobe / eyeD3.

Ok I can see how this would be useful. Then we can cross-reference the shifts with other info e.g. encoder, to see if there's a pattern?

RECU is a tool that takes 2 files:

converted RBox XML, as converted by DJCU/DJDC
original RBox XML, as analysed by rekordbox

it then matches files on the first beat, computes the offset, and applies such offset to all cues of the converted XML

Ok so the process could be:

  1. Run the converter app to convert collection.nml to rekordbox.xml (whole collection converted)
  2. In an empty Rekordbox collection, add music folders matching Traktor, let Rekordbox analyse all of the files to create the grid for each (tempo data), and export to rekordbox-2.xml
  3. Create a new mini-app which will take the rekordbox.xml and rekordbox-2.xml, calculate the offset based on the earliest tempo position for each track, and output csv data.
  4. Join csv data from step 3. with csv data output from ffprobe (for encoder etc) (or, parse the tag in step 3. to get the encoder, avoiding the need for this step).

One issue I can see with the above process, is that step 2. could take a long time, for a large collection? For example, my collection is ~10,000 tracks...

The current RECU requires the first beat to be marked, below some code to avoid this

I was thinking I'd just take the inizio (i.e. the position) of the earliest tempo for each track, that would be the simplest?

@pestrela pestrela reopened this Feb 10, 2019
@pestrela
Copy link
Author

pestrela commented Feb 10, 2019

Ok I can see how this would be useful.

I see 2x different use cases for this effort:

  1. Minimum: provide statistics of the offsets, optionally trigger mp3guessenc etc
  2. Optional: Serve as a post-correction tool, just like RECU, f no definitive encoder patterns arise from Universal spec and model  #1

Ok so the process could be:
...

Indeed, this is how RECU works

is that step 2. could take a long time, for a large collection?

We can match the files exactly by filenames.
In my python tool I'm matching my 7000 collection exactly using AUDIO_ID, this is quite fast. In python strings are checksums and matched using hash tables.

I was thinking I'd just take the inizio (i.e. the position) of the earliest tempo for each track, that would be the simplest?

We should take the closest Rbox Inizio to TK, and reduce both to the same beat using the simple function from the last post. This was an issue in RECU that required the exact same beat to be marked.

An example will make this clear:

converted XML:
      <TEMPO Inizio="1.95724" Bpm="126.000000" Metro="4/4" Battito="1"/>

original XML:
      <TEMPO Inizio="0.024" Bpm="126.00" Metro="4/4" Battito="3"/>
      <TEMPO Inizio="6.215" Bpm="126.00" Metro="4/4" Battito="4"/>
      <TEMPO Inizio="164.787" Bpm="126.00" Metro="4/4" Battito="1"/>
      <TEMPO Inizio="343.359" Bpm="126.00" Metro="4/4" Battito="4"/>

find_offset(126.00000, 1.95724, 126.00,0.024)
0.028478095238095434

@alza-bitz
Copy link
Contributor

Hi @pestrela,

Minimum: provide statistics of the offsets, optionally trigger mp3guessenc etc

Agreed, I'm currently working on a separate mini-app to get the offset based on two Rekordbox files. I'll post the results when I have them.

Optional: Serve as a post-correction tool, just like RECU, if no definitive encoder patterns arise from #1

Ok let's see what the stats tell us first. It would be good to avoid the post-correction like RECU, because even if we have a method that avoids manually marking the first beat (tempo inizio), users will still be required to do a full analysis in Rekordbox which isn't ideal.

is that step 2. could take a long time, for a large collection?

We can match the files exactly by filenames.
In my python tool I'm matching my 7000 collection exactly using AUDIO_ID, this is quite fast. In python strings are checksums and matched using hash tables.

I was referring to the analysis time in Rekordbox when starting from an empty collection and adding the music folders, in order to export the rekordbox-2.xml. Actually I just left my laptop analyzing for a while, and it's finished now!

I was thinking I'd just take the inizio (i.e. the position) of the earliest tempo for each track, that would be the simplest?

We should take the closest Rbox Inizio to TK, and reduce both to the same beat using the simple function from the last post. This was an issue in RECU that required the exact same beat to be marked.
An example will make this clear:
converted XML:

original XML:



find_offset(126.00000, 1.95724, 126.00,0.024)
0.028478095238095434

I'm not 100% sure about the correctness of find_offset, after trying a few examples, but I'll use it as-is for now and let's see what the stats look like.

@pestrela
Copy link
Author

agreed that a RECU-like step that depends on Rekordbox analysis is slow and cumbersome.
Hopefully we will catch the LAME pattern and to correct it in a single go.

regarding slowness: the Rekordbox analysis is always required; it happens anyway when the user imports the converted XML

@pestrela
Copy link
Author

Trying to guess which decoding library the DJ software uses:

$ strings  Traktor\ Pro\ 3/Traktor.exe | grep FhG
FhG-IIS MP3sDec Libinfo
$ strings rekordbox\ 5.4.1/rekordbox.exe | egrep -i "libmpg123.dll"
libmpg123.dll

some interesting comments from library maintainers:

https://sourceforge.net/p/lame/mailman/message/27315501/
as maintainer of the mpg123 decoding engine, I can tell you what works:
Simply encode your files with lame and decode them with mpg123/libmpg123, with gapless decoding enabled. Lame stores all the necessary information by default and libmpg123 just omits the leading/trailing junk. I tested this with encode/decode roundtrips ... if you don't get the exactly same sample count that you had in the intial WAV, you found a bug in either lame or mpg123 and it should be fixed.

https://thebreakfastpost.com/2016/11/26/mp3-decoding-with-the-mad-library-weve-all-been-doing-it-wrong/
If an mp3 file starts with a Xing/LAME information frame, they are feeding that frame to the mp3 decoder rather than filtering it out, resulting in an unnecessary 1152 samples of silence at the start of the decoded audio.

@pestrela
Copy link
Author

pestrela commented Feb 14, 2019

In a really interesting development, some users start seeing this issue when upgrading TP2 collections to TP3. Mentioned patterns were locked files ON and multi-processing OFF.

It would be very useful to replicate this issue using traktor alone.

 TP3 release dates:
- 3.0.0 — 2018-10-18 
- 3.0.1 — 2018-11-01 
- 3.0.2 — 2018-12-06 

@alza-bitz
Copy link
Contributor

Hi @pestrela,

Thanks for the updates. I was thinking to include the Traktor and Rekordbox version numbers in the analysis, since the decoders used might change between versions, affecting the results.

I've completed my initial analysis using the offset algorithm above, comparing Traktor and Rekordbox data. The code I wrote to produce the data is in a new project here: https://github.com/digital-dj-tools/dj-data-offset-analysis

The ETL happens in two steps:

  • Firstly, /dev/notebook-1-ffprobe.clj gets ffprobe data for the data set, and saves it to sample-ffprobe-df.edn
  • Secondly, /dev/notebook-2-offset-encoder.clj loads Traktor data from a collection.nml file, loads Rekordbox data from a rekordbox.xml file that was exported from Rekordbox, joins them, adds the offset values, joins that to the ffprobe data, calculates the stats and outputs csv data.

Please see the sheet here, for the raw offset data, the calculated stats and the included candlestick chart: https://docs.google.com/spreadsheets/d/1uTBJSNc7zB2dN05LMkMORbxP4HxN7wc15MtoYAH6Qv0/edit?usp=sharing

Points of interest:

  1. If we assume the offset algorithm is correct (which I am still not sure about), then to my eye
    • I can see a small +ve shift for all LAVC encoded files. The shift amount depends on the LAVC encoder version.
    • I can't see a shift for LAME encoded files.
  2. The sample size was ~10,000 files, but in the ETL this gets filtered down to ~2,000 for my collection, since quite a big proportion of my Traktor collection is not analysed! I didn't actually realise this until I saw the numbers. Non-analysed files translate to a missing tempo (inizio and bpm) which results in no offset value being calculated, and these rows are then filtered out. So I could get a better data set by analysing all remaining files in Traktor. I might have a go at that.
  3. For my collection, the raw data says ~18% have unknown encoder. I don't know if we can do anything about these.
  4. If we trust these stats (or the stats of a bigger sample if we can make one, using the same code), then we could use the encoder to lookup the median offset for example, and make an adjustment when converting. If we don't trust these stats, then we need to get a bigger sample or try some other hypothesis.

Please let me know your thoughts and opinions on these results.

Thanks,

Alex.

@pestrela
Copy link
Author

pestrela commented Feb 23, 2019

Hi
thanks for this new tool, and for analyzing 1/5 of your collection.

far below the same data as CDFs, broken by encoder version.

  • for AV there is a cluster around 28ms.
  • For UNK the cluster also there - but a smaller percentage of times
  • for LAME the values are all over the place.

script: https://github.com/pestrela/music_scripts/blob/master/offsets/offset-encoder.py

offset-encoder

I'm now wondering how much noise the TK and RB beatgrid analysis algorithms introduce.
as an example, this is the difference of the reference track in both MP3 and WAV formats (generated by winamp v5.666 build 3516).
In this particular example the WAV differences is just 2.2ms.
also interesting that traktor sees an extra 38ms, and RB an extra 12ms between MP3 and WAV.

mp3 vs wav differences

I'm currently travelling, will analyse later my collection and the hand-tagged dataset as well (good shift / bad shift).

@alza-bitz
Copy link
Contributor

Hi @pestrela,

I'm now wondering how much noise the TK and RB beatgrid analysis algorithms introduce.
as an example, this is the difference of the reference track in both MP3 and WAV formats (generated by winamp v5.666 build 3516).
In this particular example the WAV differences is just 2.2ms.
also interesting that traktor sees an extra 38ms, and RB an extra 12ms between MP3 and WAV.

I have a few questions and thoughts on this:

  • I am thinking that offset issues for other formats e.g. WAV (or FLAC possibly) ought to be treated as a separate issue? Although it may be related, I am just concerned that opening the investigation to other formats might slow us down narrowing down and resolving the issue for MP3. Having said that, I was actually vaguely aware of an offset issue some time ago (for Traktor alone) between FLAC and MP3, since I had converted a lot of files from FLAC to MP3 after I had previously analysed the FLAC files and then used relocate in Traktor to point at the MP3 files. Ultimately though, I am thinking to consider offsets between different formats as a separate (but possibly related issue), and perhaps even an expected issue due to the natural differences between formats. There is also AAC to consider, I haven't even looked at that!

  • I am wondering how you calculated these millisecond values, and what they actually represent? Could you give a worked example?

Also, a few other updates:

  • Just to let you know I am planning to update the Google Sheet stats soon, after analysing the rest of my Traktor collection.

  • Based on the results so far, do you agree there are any "definitive encoder patterns" yet? As in, are we closer to a solution, using the encoder? For example, the results for LAME and LAVC mostly correspond with the examples I observed visually (but I didn't try many files). If so, I am wondering if it's the right time to implement and test a solution:

    • With each track, get the encoder
    • Lookup the median offset for the encoder (based on my summary dataset, not ideal but it's the best we've got.. we can always make this dataset better over time by using the analysis code and combining various user's data)
    • Adjust the tempo and cues using that value
    • Files with unknown encoder would not be adjusted, and optionally not included in the output
    • Files with an encoder for which there is no sample data would not be adjusted, and optionally not included in the output
    • A report would be generated showing what adjustments were made (or none) for each file
  • Or, do you think there is no definitive pattern yet, and we need to investigate further? Perhaps run the analysis code against your collection and compare the results?

Let me know!

Thanks,

Alex.

@pestrela
Copy link
Author

pestrela commented Mar 2, 2019

now made CDFs for your whole 8335 files collection. I've zoomed both in 50ms and 500ms.
source code: https://github.com/pestrela/music_scripts/blob/master/offsets/offset-encoder.py

offset-encoder - 8335 files

some comments:

  • Medians of UNK/LAME are now tight around zero
  • these continue to not be representative. Deviation is still way to large, all the way to a half a beat (0.5s at 60 BPM)
  • the 28ms cluster of AV is now even more clear. But median is still not representative (too much deviation)
    • UNK also has the 28ms cluster as well

@pestrela
Copy link
Author

pestrela commented May 31, 2019

QUESTION2: how many of these files are fixed correctly, and how many we going to screw up?

**Answer: we predict we will successfully fix 463 files, but will damage 393 in the process (because of FPs)


why? the problem is the False Positives that we have.
This is especially painful in case D, where the latest FFMPEG always produces FPs:

regarding case D, this script encodes a FLAC with latest FFMPEG and latest LAME;
Offline we saw that both are case D - as expected - but FFMPEG produced bad shifts while LAME produced good shifts. We also saw all remaining outliers being on case D.

https://github.com/pestrela/music_scripts/blob/master/offsets/fhg/bin/encode%20flac%20with%20latest%20ffmpeg%20and%20lame%20(self-encode).sh

to find this value I've convoluted the last two tables:

case fp file_count % (total) % (case)
case A False 4311 32% 97%
True 134 1% 3%
case B False 386 14% 93%
True 27 1% 7%
case C False 77 20% 100%
True 0 0% 0%
case D False 1208 26% 84%
True 232 5% 16%

@pestrela
Copy link
Author

pestrela commented May 31, 2019

SUMMARY SUMMARY SUMMARY SUMMARY

Hopefully showing all results in sequence explains clearly what is going on between TK->RB.
Our biggest problem is FPs on case D, as generated by latest FFMPEG.

MP3 HEADER CASES

Case Signature TK->RB Correction
case A no extra headers 0 ms
case B Only Xing 26 ms
case C Lame + CRC fail 26 ms
case D Lame + CRC ok 0 ms

TRAINING DATASET: 99 files

case file_count % (total)
case A 33 33%
case B 15 15%
case C 20 20%
case D 31 31%

MANUAL TRAINING STEP: 99 files (with 7 measured FPs)

case fp file_count % (total) % (case)
case A False 32 32% 97%
True 1 1% 3%
case B False 14 14% 93%
True 1 1% 7%
case C False 20 20% 100%
True 0 0% 0%
case D False 26 26% 84%
True 5 5% 16%

COLLECTION DATASET: 6375 files

case file_count % (total)
case A 4445 70%
case B 413 6%
case C 77 1%
case D 1440 23%

PREDICTION STEP: 6375 files (with 393 predicted FPs)

case fp file_count % (total) % (case)
case A False 4311 68% 97%
True 134 2% 3%
case B False 386 6% 93%
True 27 0% 7%
case C False 77 1% 100%
True 0 0% 0%
case D False 1208 19% 84%
True 232 4% 16%

HOW MANY CORRECTIONS AND FALSE POSITIVES?

We predict we will successfully fix 463 files, but will damage 393 files in this process from FPs

TP: 463 files (=386+77, 7% of 6375); FP: 393 files (=134+27+0+232, 6% of 6375)
Success metric: 8% (-100% is only_FPs; 0% is 50/50; 100% is only_TPs)

@pestrela pestrela changed the title FEATURE REQUEST: Fix cues shifted in time from TK->RB Fix cues shifted 26ms in time from Traktor into Rekordbox (caused by mp3 extra headers confusing their mp3 decoders) May 31, 2019
@pestrela pestrela changed the title Fix cues shifted 26ms in time from Traktor into Rekordbox (caused by mp3 extra headers confusing their mp3 decoders) Fix cues shifted in time from Traktor into Rekordbox (26ms problem) May 31, 2019
@pestrela pestrela changed the title Fix cues shifted in time from Traktor into Rekordbox (26ms problem) MP3 cues shifted in time going from Traktor to Rekordbox (AKA 26ms problem) Jun 1, 2019
@pestrela
Copy link
Author

pestrela commented Jun 1, 2019

About the "case D" False Positives from the training dataset:

  • I've now noticed that all of them have encoder padding to zero.
    • There is a single exception on these 31 case D files.
  • all other files have random numbers or "UNK".

image

using this new criteria to split case D would raise the "Success metric" quite a bit

@pestrela
Copy link
Author

pestrela commented Jun 2, 2019

NEW ALGORITHM USING EYED3

I've now reduced the False positives a lot for "Case D" by doing the following:

  • check if Xing tag is there using mp3guessenc
  • if and only if this is the case, check if the lame tag is valid according to "eyeD3"

Doing these exact steps, using this exact sequence and exact versions, will very closely predicts what will happen between TK->RB:

We predict we will fix 403 files out of 6299, at the cost of damaging 10 files, while leaving 134 files unfixed

This is because I'm now finding all kinds of exceptions where mp3s have both LAVC and LAME tags, or null strems, etc etc etc.
In the details section below please find the new algorithm, new revised cases, and latest results.

DETAILS

MP3 HEADER CASES

note: XING is measured by "mp3guessenc -v"; lame is measured by "eyeD3 -P lameinfo"

Case Signature TK->RB Correction
case A no headers / lame error 0 ms
case B Only Xing 26 ms
case C (deprecated)
case D Lame is valid 0 ms
case Z VBRI (not tested)

Current algorithm

if mp3guessenc sees VBRI tag:
    case = "Z"
    exit

if ! (mp3guessenc sees Xing/INFO tag):
    case = "A"
    correction = 0ms

elif ! (eyeD3 sees correct LAME tag):
    case = "B"
    correction = 26ms
         
elif eyeD3 sees correct LAME tag:
    case = "D"
    correction = 0ms

TRAINING DATASET: 99 files

case file_count % (total)
case A 33 33%
case B 40 40%
case D 26 26%
case Z 0 0%

MANUAL TRAINING STEP: 99 files (with 2 measured FPs)

case fp file_count % (total) % (case)
case A False 32 32% 97%
True 1 1% 3%
case B False 39 39% 98%
True 1 1% 2%
case D False 26 26% 100%
True 0 0% 0%
case Z False 0 0% 0%
True 0 0% 0%

WHOLE COLLECTION DATASET: 6299 files

case file_count % (total)
case A 4445 71%
case B 413 7%
case D 1441 23%
case Z 0 0%

PREDICTION STEP: 6299 files (with 144 predicted FPs)

case fp file_count % (total) % (case)
case A False 4311 68% 97%
True 134 2% 3%
case B False 403 6% 98%
True 10 0% 2%
case D False 1441 23% 100%
True 0 0% 0%
case Z False 0 0% 0%
True 0 0% 0%

HOW MANY CORRECTIONS AND FALSE POSITIVES?

We predict we will fix 403 files out of 6299 (6%), at the cost of damaging 10 files, while leaving 134 files unfixed (2%)

TP: 403 (=403, 6% of 6299)
FP: 10 (=10, 0% of 6299)
Miss: 134 (=134+0+0, 2% of 6299)

Success metric EXCLUDING misses: 95% (success value between -100% and 100%)
Success metric INCLUDING misses: 47% (success value between -100% and 100%)

@pestrela
Copy link
Author

pestrela commented Jun 3, 2019

EyeD3

The last algorithm using eyeD3 fixes a really important use case: latest FFMPEG encoded files featuring LAVC/LAVF. These are now correctly classified in "case B" instead of "case D".

In a sentence: because eyeD3 0.8.4 was not updated for LAVC/LAVF, and mp3guessenc 0.27.4 was updated, they see different things for the LAME tag. Please see examples below.

And somehow this matches TK->RB experience at the moment.

Implication

Implication: The situation will change when the versions change; If traktor gets updated first, the whole problem just disappears; if eyeD3 gets updated first, we lose this current detection method.

Examples

LAME ENCODED FILE:

> mp3_check_encoder.sh */*vbr* -1 -q


file -> 1 good shift/self-encode-lame-vbr-0.mp3

case -> D

eyed3_lame_present -> yes    <<<<<<<<<<<<<<<<<
eyed3_lame_tag_valid -> yes
mp3guessenc_lame_present -> yes  <<<<<<<<<<<<<<
mp3guessenc_lame_tag_valid -> yes

raw_dd_anything -> LAME 64BITS VERSION 3.100 (HTTP://LAME.SF.NET)TPE1_XING_DLAME3.100

FFMPEG ENCODED FILE:

file -> 2 bad shift/self-encode-lavc-vbr-0.mp3

case -> B

eyed3_lame_present -> no   <<<<<<<<<<<<<<<<<
eyed3_lame_tag_valid -> unk
mp3guessenc_lame_present -> yes  <<<<<<<<<<<<<<
mp3guessenc_lame_tag_valid -> yes

raw_dd_anything -> XING_LAVC57.10

@pestrela
Copy link
Author

pestrela commented Jun 5, 2019

collection of mp3 test files, including many LAME versions and VBRI:
https://github.com/JamesHeinrich/getID3-testfiles/tree/master/mp3

@pestrela
Copy link
Author

pestrela commented Aug 17, 2019

latest version of mp3_check_encoder improvements:

  • now uses eyed3 ONLY for the decision algorithm. No more mp3guessenc dependency!
  • now supports mp3-parser from this project
  • cleaned up the code A LOT to dump the basic elements to make the decision (xing tag/lame tag/lame_valid) of each of the 3x tools.

using this eyed3-only algorithm we have a single FP on the whole dataset!
(Adam Beyer, Bart Skils - Your Mind (Original Mix).mp3)

source:
https://github.com/pestrela/music_scripts/tree/master/offsets/fhg/bin

@pestrela
Copy link
Author

pestrela commented Aug 17, 2019

As mentioned above, the analysis depends on eyed3 ONLY. What does it do differently than mp3guessenc? lets see the source code.

eyeD3 source code:

MPEG header:

https://github.com/nicfit/eyeD3/blob/master/src/eyed3/mp3/__init__.py
line 58: only existing loop to find the FIRST valid mpeg frame.

https://github.com/nicfit/eyeD3/blob/master/src/eyed3/mp3/headers.py
line 239: calculates the mpeg frame_length, accoring to the bitrate etc.
Note that any possible LAME tag outside this range is ignored!

https://github.com/nicfit/eyeD3/blob/master/src/eyed3/mp3/__init__.py
line 81: reads the FIRST valid mpeg frame (identified by the mpeg sync bits)

XING/INFO header:

https://github.com/nicfit/eyeD3/blob/master/src/eyed3/mp3/__init__.py
line 82: searches for the FIRST regular string "re.compile(b'Xing|Info').search(mp3_frame)"

https://github.com/nicfit/eyeD3/blob/master/src/eyed3/mp3/headers.py
line 327: adjusts byte position "pos", then confirms the Xing/Info string is on the right position
Other possible matches are ignored.

(note: eyed3 only dumps these infos on debug mode)

LAME header:

https://github.com/nicfit/eyeD3/blob/master/src/eyed3/mp3/__init__.py
line 96: goes straight to try creating the lame header object

https://github.com/nicfit/eyeD3/blob/master/src/eyed3/mp3/headers.py
line 560: finds the FIRST occurence of exactly "LAME" in the frame. Later occurences are ignored.
This ignores possivle LAVC/LAVF entries (matching the Traktor behavior)
line 569: computes the CRC at position 190. If it fails, it raises an warning (leading to case C).
In all cases the fields are populated and presented "as-is"

https://github.com/nicfit/eyeD3/blob/master/src/eyed3/plugins/lameinfo.py
line 54 of the plug-in: prints error if no LAME tag detected (trigger for case A+B)

Source Code

       try:
            pos = frame.index(b"LAME")
        except:                                                     # noqa: B901
            return

@pestrela
Copy link
Author

pestrela commented Aug 18, 2019

Same analysis for the mp3guessenc source code.
Differences to Eyed3:

  • they accept the tag "lavc", case-insensitively, as a lame tag. See their changelog below.
    • note that they ignore tag "lavf", which is produced by ffmpeg
  • if there is any error on the stream they ignore any possible Xing tag there (this will confuse case A & case B)

mp3guessenc source code:

http://mp3guessenc.sourceforge.net/

MPEG header:

mp3guessenc.c: line 3169: main loop to process mpeg frames

XING/INFO header:

mp3guessenc.c: line 3212: we try calling checkvbrinfotag()
tags.c: line 493: checkvbrinfotag(): memcmp to find Xing/Info tags
tags.c: line 498: if yes, we now call parse_xing_tag() to try find lame/lavc tags

LAME/LAVC header

tags.c: line 436: parse_xing_tag(): besides the string "LAME", we detect the "lavc" string case insensitively as a valid LAME tag.
tags.c: line 454: parse_xing_tag(): CRC calculation

mp3guessenc.c: line 3359: call show_info_tag()
tags.c: line 522: show_info_tag(): if we find a lame/lavc tag, we print the details here

ChangeLog:

version 0.27.2 (2017/08/25 "Roundabout")

  • support for "reduced" lame tag written by Lavc when encoding layerIII
    files. The encoding engine is lame indeed, nevertheless the tool fills some
    tag fields with unreliable infos

Code:

    if (
        (buf[offset] == 'L' && buf[offset+1] == 'A' && 
        buf[offset+2] == 'M' && buf[offset+3] == 'E')
        ||
        (buf[offset] == 'L' && isdigit((int)buf[offset+1]) && 
        buf[offset+2] == '.' && isdigit((int)buf[offset+3]) && isdigit((int)buf[offset+4]))
        ||
        (tolower(buf[offset])=='l' && tolower(buf[offset+1])=='a' && 
         tolower(buf[offset+2])=='v' && tolower(buf[offset+3])=='c')

@pestrela
Copy link
Author

pestrela commented Sep 16, 2019

I've now uploaded a small set of mp3 examples, all tagged, that covers the known corner cases.
https://github.com/pestrela/music_scripts/tree/master/traktor/26ms_offsets/examples_tagged

I have a lot more tagged examples available by PM request.

See also the hexdumps of the MP3s. This is a simple way to understand the corner cases and why there is so much variety in MP3 files:
https://github.com/pestrela/music_scripts/blob/master/traktor/26ms_offsets/examples_tagged/hexdump%20of%20all%20examples.txt

@pestrela
Copy link
Author

pestrela commented Sep 16, 2019

Analysis of FFMPEG and LIBAV source code

encoder

https://ffmpeg.org/doxygen/2.7/mp3enc_8c_source.html

231     // encoder short version string
232     if (enc) {
233         uint8_t encoder_str[9] = { 0 };
234         if (   strlen(enc->value) > sizeof(encoder_str)
235             && !strcmp("Lavc libmp3lame", enc->value)) {
236             memcpy(encoder_str, "Lavf lame", 9);
237         } else
238             memcpy(encoder_str, enc->value, FFMIN(strlen(enc->value), sizeof(encoder_str)));
239 
240         avio_write(dyn_ctx, encoder_str, sizeof(encoder_str));
241     } else
242         avio_write(dyn_ctx, "Lavf\0\0\0\0\0", 9);

decoder

https://ffmpeg.org/doxygen/2.7/mp3dec_8c_source.html

227     /* Encoder delays */
228     v= avio_rb24(s->pb);
229     if(AV_RB32(version) == MKBETAG('L', 'A', 'M', 'E')
230         || AV_RB32(version) == MKBETAG('L', 'a', 'v', 'f')
231         || AV_RB32(version) == MKBETAG('L', 'a', 'v', 'c')
232     ) {
233 

@pestrela
Copy link
Author

I've tested the latest version of the dj-data-converter and I FINALLY now have no shifts going from TK->RB
Thanks @alzadude for making the final implementation of the recommendations of this huge research effort!

@pestrela
Copy link
Author

The problem is fully fixed;
but re-opening the ticket to add some further follow-ups and research comments.

@pestrela
Copy link
Author

Implementation is going great for MIXXX:
https://github.com/mixxxdj/mixxx/blob/8876f5b8ec9af9fd9a948f2c2aa1483b2a486e12/lib/mp3guessenc-0.27.4/tags.c

line 515:

int check_timing_shift_case(vbrtagdata_t *p)
{
    if (p->infoTag!=TAG_VBRITAG)
    {
        if (p->lametag[0]==0 || (tolower(p->lametag[0])=='l' && tolower(p->lametag[1])=='a' && tolower(p->lametag[2])=='v' && tolower(p->lametag[3])=='c'))
        {
            return EXIT_CODE_CASE_B;
        }
        if (!p->lametagVerified)
        {
            return EXIT_CODE_CASE_C;
        } 
    }

    return EXIT_CODE_CASE_D;

@alza-bitz alza-bitz reopened this Oct 2, 2019
@alza-bitz alza-bitz added the bug Something isn't working label Oct 3, 2019
@alza-bitz
Copy link
Contributor

Opened on 24th Jan, closed on 24th October.. 9 months of effort!

This has been a complex issue that has taken many months to resolve.. Special thanks go to @pestrela for his amazing analysis work, without which this fix wouldn't have been possible! 🙂

@FunctionDJ
Copy link

I'm so glad i stumbled across this, this will help me a lot with this project! https://github.com/FunctionDJ/rekordbox.xml.js

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants