-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MP3 cues shifted in time going from Traktor to Rekordbox (AKA 26ms problem) #3
Comments
Hi @pestrela, Thanks for reporting this. I wasn't aware of it, since for my audio files I didn't encounter the issue yet! Is the issue intermittent depending on the files, or for you does this issue occur for every file? If it's intermittent depending on the file, it could be tricky to fix, but first of all let's see if I can reproduce it. If you could attach a zip archive containing an audio file for which you've observed the issue, it might help to speed up the investigation. From there I can try and reproduce it by analysing in Traktor, and then using the converter and checking the result in Rekordbox. Thanks! |
Edit 30 July 2019: Original request follows:Hi, many thanks for this project However this is suffering from the "cues are shifted in time" issue that all translators face when going to/from RB/Traktor. The root cause is different definitions of the 00:00:00 time point: AFAIK only these 2x tools are able to fix this issue:
could you please consider addressing this issue? I can provide demo mp3s if you don't see this issue in your mp3s. Edit 30 July 2019: First reply follows:Hi Alza, I've tested many converters - they all suffer from the same issue for this example.
|
I've now analysed by hand 67 different files and found an almost perfect pattern. If the file was encoded by LAME "3.99" or "3.99.5", the simple conversion produces shifted cues. Exception is "3.99r" For the other LAME versions / encoders, no shifted cues were seen. please see the below table for my results so far: python code:
To analyse the encoder of the files, I've used: https://mediaarea.net/en/MediaInfo what do you think? |
extended the analysis to 300 files, analysed manually. For lame 3.99 files, all of them result in shifted cues. code and data: |
Rekorbuddy is able to correct this issue in a single go. Well done! |
Hi @pestrela, Ok I've started to look into this now, I have some interesting results! First, I actually had some LAME 3.99 & LAME 3.98 encoded files already, so I tried to reproduce the issue with those. In this case, I found that the cue shifting did not occur with any of the 3.99 and 3.98 files I tried. Second, I tried to reproduce the issue with the file you provided: https://www.dropbox.com/s/phdpvhv9s8k9u3y/demo%20shifted%20cues%20rekordbuddy2.mp3?dl=1 In this case, I found that the cue shifting did occur, but notably when I checked the encoder metadata for this particular file:
It was not LAME, but Lavf a.k.a libavformat (& the related libavcodec). I believe this encoder string indicates that FFMPEG was used to encode the file. Internally, libavcodec uses libmp3lame for mp3 encoding, but for this file it seems that the version used is not present in the file metadata, it just states Lavf. Based on this, I then tried to reproduce the issue with Lavf and Lavc xx.xx encoded files. In this case, I found that cue shifting issue did occur, for the vast majority of files with these encoder values (although not all the files, there was at least one exception). Conclusion: my findings do support the encoder version hypothesis to some extent, however I found that a different encoder is the culprit, Lavf and/or Lavc. Next steps: our findings are different, so we need to clarify the situation there first before I can proceed. Assuming we can account for this, I would then try and work out what the shift value(s) are (in seconds), and whether it's constant or not etc. Let me know what you think! |
I've now sent you privately a link to an upload of 35x files that have a clear shift.
note: "good" files could be actually bad files, but with a very small shift. When I used RECU it sometimes reported marginal (but present) shifts |
yet another program to guess the encoder: which is a wrapper around this lib: found this program on a list of mp3 tools collected by Pulse@Pioneer (mp3 information / mp3 error checkers) |
Hi @pestrela, Thanks for these.. What's your thinking here, is this regarding a method of detecting the encoder for files that don't have an encoder tag (or the encoder tag is empty)? I'll call these files "unknown files". I assume this is your focus, since unknown files are the biggest proportion of files in your dataset of 300(although LAME files are a close second), and the proportion with the biggest number of shifted cues? However, it's worth noting that although this proportion has the biggest number of shifted cues, it's not the proportion with the biggest percentage of shifted cues - that goes to Lavf/Lavc:
Based on the above, I am thinking that the % numbers are the most helpful indicator for determining what to do next. Although the number of Lavf/Lavc files in your dataset is comparatively small, the percentage result for those does correlate somewhat with my findings. My current thinking for a solution is to implement a "blacklist lookup table", which would map For example (shift values are just made up):
I am assuming that for a given conversion (source -> target, encoder), the shift is a fixed value (this could be verified using a random sample of files for each encoder). Of course, this solution doesn't consider unknown files.. some options for those:
There could also be a command-line option to override whether they are included or not. For a unknown files, mp3guessenc might be helpful to determine the encoder (I've used it before), but unfortunately there doesn't seem to be a build/version for Mac OSX, which is a show-stopper in any case.. What do you think? |
Today tried the following experiment: identify the precise sample of the 0:0:0 point of DJ software. Method: A description of the test procedure, inputs and all outputs are in this zip: https://www.dropbox.com/s/pgpnrw4sl3xv2tp/DAW%20shifted%20cues.zip?dl=0 Results:
Example:
|
Maybe found an hint on the RekordboxBox release notes.
|
Gapless encoding is detectable using byte $AF of the full lame mp3 info tag: eyeD3 -P lameinfo displays --nogap: however this doesn't match my current dataset:
|
Hi @pestrela, Thanks for continuing the investigation.
Just to clarify, you're saying that the shift is different, even for files with the same encoder? This is contrary to the hypothesis in the video above, cited as root cause: https://www.youtube.com/watch?v=Vl4nbvYmiP4
Just to clarify, you're saying that the 2nd, 3rd or 4th load of a given file in Rekordbox will have an additional shift, compared with the 1st load of the file? Although it's small, i.e. 2ms as you say. I wonder if this is just a related, but separate Rekordbox peculiarity that can be ignored (since it's only 2ms). Re: gapless encoding, my conclusion based on the results in your other comment, is that it's not related, it's just a coincidence due to the similar values 24/26ms vs 29ms. |
This comment was because on sample1 vs sample2, which have the same encoder, according to the above method, would have different offsets.
yes. |
moving forward, I think we should recreate parts of what RECU does, to get proper statistics on all the offsets from a whole collection. RECU is a tool that takes 2 files:
it then matches files on the first beat, computes the offset, and applies such offset to all cues of the converted XML
|
Hi @pestrela,
Ok I can see how this would be useful. Then we can cross-reference the shifts with other info e.g. encoder, to see if there's a pattern?
Ok so the process could be:
One issue I can see with the above process, is that step 2. could take a long time, for a large collection? For example, my collection is ~10,000 tracks...
I was thinking I'd just take the |
I see 2x different use cases for this effort:
Indeed, this is how RECU works
We can match the files exactly by filenames.
We should take the closest Rbox Inizio to TK, and reduce both to the same beat using the simple function from the last post. This was an issue in RECU that required the exact same beat to be marked. An example will make this clear:
|
Hi @pestrela,
Agreed, I'm currently working on a separate mini-app to get the offset based on two Rekordbox files. I'll post the results when I have them.
Ok let's see what the stats tell us first. It would be good to avoid the post-correction like RECU, because even if we have a method that avoids manually marking the first beat (tempo inizio), users will still be required to do a full analysis in Rekordbox which isn't ideal.
I was referring to the analysis time in Rekordbox when starting from an empty collection and adding the music folders, in order to export the
I'm not 100% sure about the correctness of |
agreed that a RECU-like step that depends on Rekordbox analysis is slow and cumbersome. regarding slowness: the Rekordbox analysis is always required; it happens anyway when the user imports the converted XML |
Trying to guess which decoding library the DJ software uses:
some interesting comments from library maintainers:
|
In a really interesting development, some users start seeing this issue when upgrading TP2 collections to TP3. Mentioned patterns were locked files ON and multi-processing OFF. It would be very useful to replicate this issue using traktor alone.
|
Hi @pestrela, Thanks for the updates. I was thinking to include the Traktor and Rekordbox version numbers in the analysis, since the decoders used might change between versions, affecting the results. I've completed my initial analysis using the offset algorithm above, comparing Traktor and Rekordbox data. The code I wrote to produce the data is in a new project here: https://github.com/digital-dj-tools/dj-data-offset-analysis The ETL happens in two steps:
Please see the sheet here, for the raw offset data, the calculated stats and the included candlestick chart: https://docs.google.com/spreadsheets/d/1uTBJSNc7zB2dN05LMkMORbxP4HxN7wc15MtoYAH6Qv0/edit?usp=sharing Points of interest:
Please let me know your thoughts and opinions on these results. Thanks, Alex. |
Hi far below the same data as CDFs, broken by encoder version.
script: https://github.com/pestrela/music_scripts/blob/master/offsets/offset-encoder.py I'm now wondering how much noise the TK and RB beatgrid analysis algorithms introduce. I'm currently travelling, will analyse later my collection and the hand-tagged dataset as well (good shift / bad shift). |
Hi @pestrela,
I have a few questions and thoughts on this:
Also, a few other updates:
Let me know! Thanks, Alex. |
now made CDFs for your whole 8335 files collection. I've zoomed both in 50ms and 500ms. some comments:
|
QUESTION2: how many of these files are fixed correctly, and how many we going to screw up?**Answer: we predict we will successfully fix 463 files, but will damage 393 in the process (because of FPs) why? the problem is the False Positives that we have.
to find this value I've convoluted the last two tables:
|
SUMMARY SUMMARY SUMMARY SUMMARYHopefully showing all results in sequence explains clearly what is going on between TK->RB. MP3 HEADER CASES
TRAINING DATASET: 99 files
MANUAL TRAINING STEP: 99 files (with 7 measured FPs)
COLLECTION DATASET: 6375 files
PREDICTION STEP: 6375 files (with 393 predicted FPs)
HOW MANY CORRECTIONS AND FALSE POSITIVES?We predict we will successfully fix 463 files, but will damage 393 files in this process from FPs TP: 463 files (=386+77, 7% of 6375); FP: 393 files (=134+27+0+232, 6% of 6375) |
About the "case D" False Positives from the training dataset:
using this new criteria to split case D would raise the "Success metric" quite a bit |
NEW ALGORITHM USING EYED3I've now reduced the False positives a lot for "Case D" by doing the following:
Doing these exact steps, using this exact sequence and exact versions, will very closely predicts what will happen between TK->RB:
This is because I'm now finding all kinds of exceptions where mp3s have both LAVC and LAME tags, or null strems, etc etc etc. DETAILSMP3 HEADER CASESnote: XING is measured by "mp3guessenc -v"; lame is measured by "eyeD3 -P lameinfo"
Current algorithm
TRAINING DATASET: 99 files
MANUAL TRAINING STEP: 99 files (with 2 measured FPs)
WHOLE COLLECTION DATASET: 6299 files
PREDICTION STEP: 6299 files (with 144 predicted FPs)
HOW MANY CORRECTIONS AND FALSE POSITIVES?We predict we will fix 403 files out of 6299 (6%), at the cost of damaging 10 files, while leaving 134 files unfixed (2%) TP: 403 (=403, 6% of 6299) Success metric EXCLUDING misses: 95% (success value between -100% and 100%) |
EyeD3The last algorithm using eyeD3 fixes a really important use case: latest FFMPEG encoded files featuring LAVC/LAVF. These are now correctly classified in "case B" instead of "case D". In a sentence: because eyeD3 0.8.4 was not updated for LAVC/LAVF, and mp3guessenc 0.27.4 was updated, they see different things for the LAME tag. Please see examples below. And somehow this matches TK->RB experience at the moment. ImplicationImplication: The situation will change when the versions change; If traktor gets updated first, the whole problem just disappears; if eyeD3 gets updated first, we lose this current detection method. ExamplesLAME ENCODED FILE:
FFMPEG ENCODED FILE:
|
collection of mp3 test files, including many LAME versions and VBRI: |
latest version of mp3_check_encoder improvements:
using this eyed3-only algorithm we have a single FP on the whole dataset! source: |
As mentioned above, the analysis depends on eyed3 ONLY. What does it do differently than mp3guessenc? lets see the source code. eyeD3 source code:MPEG header:https://github.com/nicfit/eyeD3/blob/master/src/eyed3/mp3/__init__.py https://github.com/nicfit/eyeD3/blob/master/src/eyed3/mp3/headers.py https://github.com/nicfit/eyeD3/blob/master/src/eyed3/mp3/__init__.py XING/INFO header:https://github.com/nicfit/eyeD3/blob/master/src/eyed3/mp3/__init__.py https://github.com/nicfit/eyeD3/blob/master/src/eyed3/mp3/headers.py (note: eyed3 only dumps these infos on debug mode) LAME header:https://github.com/nicfit/eyeD3/blob/master/src/eyed3/mp3/__init__.py https://github.com/nicfit/eyeD3/blob/master/src/eyed3/mp3/headers.py https://github.com/nicfit/eyeD3/blob/master/src/eyed3/plugins/lameinfo.py Source Code
|
Same analysis for the mp3guessenc source code.
mp3guessenc source code:http://mp3guessenc.sourceforge.net/ MPEG header:mp3guessenc.c: line 3169: main loop to process mpeg frames XING/INFO header:mp3guessenc.c: line 3212: we try calling checkvbrinfotag() LAME/LAVC headertags.c: line 436: parse_xing_tag(): besides the string "LAME", we detect the "lavc" string case insensitively as a valid LAME tag. mp3guessenc.c: line 3359: call show_info_tag() ChangeLog:
Code:
|
I've now uploaded a small set of mp3 examples, all tagged, that covers the known corner cases. I have a lot more tagged examples available by PM request. See also the hexdumps of the MP3s. This is a simple way to understand the corner cases and why there is so much variety in MP3 files: |
Analysis of FFMPEG and LIBAV source codeencoderhttps://ffmpeg.org/doxygen/2.7/mp3enc_8c_source.html
decoderhttps://ffmpeg.org/doxygen/2.7/mp3dec_8c_source.html
|
I've tested the latest version of the dj-data-converter and I FINALLY now have no shifts going from TK->RB |
The problem is fully fixed; |
Implementation is going great for MIXXX: line 515:
|
Opened on 24th Jan, closed on 24th October.. 9 months of effort! This has been a complex issue that has taken many months to resolve.. Special thanks go to @pestrela for his amazing analysis work, without which this fix wouldn't have been possible! 🙂 |
I'm so glad i stumbled across this, this will help me a lot with this project! https://github.com/FunctionDJ/rekordbox.xml.js |
UPDATE SEP 2019 - FINDINGS SUMMARY:
LINKS:
ALGORITHM: (updated: 16 Sep 2019)
EXAMPLE:
The text was updated successfully, but these errors were encountered: