2024-05-01 MP3 Detection improvements #63
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #32
MP3's are a strange beast, many bits have been grafted on over the decades, the word 'standard' requires a big ⭐ next to it when talking about them.
To get a higher (and hopefully definitive) match I've added versioned main fingerprints and a lot of multi match data (seriously loads). This should match pretty much any MP3 you come across with 0.8 confidences (assuming correct extension), beating false
.koz
matches into the dirt. I left the non-versioned.mp3
match in and added aTAG
multi match to allow for fringe cases.The .json has grown somewhat in file size to accommodate these matches, technically we could strip some of the 4 letter matches from 2.3 if I could find which ones did not apply until 2.4, however, there is little data regarding exactly what the additional ones were. Also to ensure the best confidence I had to duplicate the 4 letter matches for both v2.3 and v2.4. Again, it would also be possible to maybe sacrifice some of the more obscure 3/4 letter matches, but as there is no set rule for the ordering of the frame headers there is the potential for fringe cases where a rarely used one comes first.
Main fingerprints:
0x4944330200
/ID3
= ID3v2.2.0, Rare0x4944330300
/ID3
= ID3v2.3.0, Common0x4944330400
/ID3
= ID3v2.4.0, CommonMulti-Part fingerprints:
0x544147
/TAG
= v1.x tag marker at -128 bytes, if a file has tags it will have this. NOTE: It is possible for a v2 file to not have v1 tags but unlikely.AENC, APIC, ASPI, COMM, COMR, ENCR, EQU2, ETCO, GEOB, GRID, LINK, MCDI, MLLT, OWNE, PRIV, PCNT, POPM, POSS, RBUF, RVA2, RVRB, SEEK, SIGN, SYLT, SYTC, TALB, TBPM, TCOM, TCON, TCOP, TDEN, TDLY, TDOR, TDRC, TDRL, TDTG, TENC, TEXT, TFLT, TIPL, TIT1, TIT2, TIT3, TKEY, TLAN, TLEN, TMCL, TMED, TMOO, TOAL , TOFN, TOLY, TOPE, TOWN, TPE1, TPE2, TPE3, TPE4, TPOS, TPRO, TPUB, TRCK, TRSN, TRSO, TSOA, TSOP, TSOT, TSRC, TSSE, TSST, TXXX, UFID, USER, USLT, WCOM, WCOP, WOAF, WOAR, WOAS, WORS, WPAY, WPUB, WXXX
= 4 Letter Frame codes at byte 10 for 2.3/2.4 filesBUF, CNT, COM, CRA, CRM, ETC, EQU, GEO, IPL, LNK, MCI, MLL, PIC, POP, REV, RVA, SLT, STC, TAL, TBP, TCM, TCO, TCR, TDA, TDY, TEN, TFT, TIM, TKE, TLA, TLE, TMT, TOA, TOF, TOL, TOR, TOT, TP1, TP2, TP3, TP4, TPA, TPB, TRC, TRD, TRK, TSI, TSS, TT1, TT2, TT3, TXT, TXX, TYE, UFI, ULT, WAF, WAR, WAS, WCM, WCP, WPB, WXX
= three letter codes used by v2.2 filesTest file:
This is a weird one I found on a corner of a drive. It would have not matched as a
.koz
as it's a v2.2 but equally would have a low confidence match as it had no tags, using the additional 3 letter frame match you'll get a solid match. The output comes from my own confidence test script so I can easily see/test patterns.congratulations.zip
Example matches:
Links: