Unfortunately, I was never able to finish this project. Fortunately, James Elliott has produced some fantastic documentation on the subject. Check out his work over at the crate-digger repo.
This document attempts to describe the format of the Pioneer .pdb database file, which can be found on removable storage devices that have been synced with RecordBox. The .edb file, found on the host RecordBox system has a different format, though the database schema seems to be the same.
The .pdb file is a relational database. It’s data is organised into tables of rows and columns. For example, a typical .pdb file will have have a table containing rows that represent tracks, and another table that contains rows representing artists. Each row in a table has a unique id, and the data for each row is stored in the same layout, according to the table schema.
The file consists of fixed size blocks of 4096 bytes. I have not yet managed to decode the data in the first block, although the 4 byte integer at 0x04 is always 4096, so this probably defines the block size. Integers are stored in little endian format.
The remaining blocks each consist of a 40 byte header, followed by a body of row data. At the end of the block, there is a variable sized footer. The rows within a block all seem to belong to the same table, though a block might not contain all of the rows for a table, since the block has a fixed size. Therefore, large tables will be split across multiple blocks.
Here’s what I’ve decoded from the 40 byte header so far:
Location | Data type | Description |
---|---|---|
0x00 - 0x03 | Zero? | |
0x04 | uint32 | Block id / index |
0x08 - 0x17 | Unknown | |
0x18 | uint8 | Number of rows in block |
0x19 | uint8 | Next row id? |
0x1a - 0x1b | Unknown | |
0x1c | uint16 | Remaining bytes in block |
0x1e | uint16 | Size of data in block |
0x20 - 0x27 | Unknown |
The bytes at 0x20 and 0x22 tend to add up to the number of rows in the block.
The block footer contains the locations of the rows within the block. These are stored as uint16. The footer ends with four bytes which I have not yet decoded.
For example, in a block that contains 4 rows, the footer will look something like this:
+---------------------------------------------+
| 54 00 | 3C 00 | 1C 00 | 00 00 | 1F 00 10 00 |
+-------+-------+-------+-------+-------------+
| Row 4 | Row 3 | Row 2 | Row 1 | Unknown |
+---------------------------------------------+
This tells us that the first row is located at byte offset 0, and the second row at 28 etc. These locations are relative to the block body, so we need to add 40 (header size) to get the location relative to the start of the block.
Each row can be broken into three sections; a four byte header, followed by the fixed size column data, followed by variable sized string data.
The first two bytes of the header indicate the table that the row belongs to. So each table has a unique two byte id. The next two bytes seem to represent an index, or an id for rows within just that block; The first row in a block will have value 0x00, and successive rows will increase by 0x20 each time.
The second section contains the actual column data. The data is always stored in the same order for rows in the same table, so if you know the table schema, you can easily locate the value for a given column. Whilst this section does not contain actual string data, it does provide the location of strings, relative to the start of the row.
The final section contains the actual string data (if there are any strings in the table schema). Each string starts with one byte indicating the number of bytes to follow. However, this length byte seems to be coded in a strange way (or am I missing something obvious?). To get the actual length, take the byte value, subtract 1, divide by 2, and then subtract 1 again.
Let’s look at an example for getting the name of a track. We know that, according to the track table schema, the two bytes at 0x80 hold the location of the track name string. This value may be 0xde00, in which case we then read the byte at that offset, which is the length byte. If say, this byte has a value of 0x25 (37), then we calculate (37-1)/2 - 1 = 17. So we read the next 17 bytes as a UTF8 string.
I can’t seem to find an index for looking up the location of rows, and rows don’t seem to have any data indicating the total row size. Therefore, to iterate over rows, you must first seek to the end of the current row (by finding the last string and it’s size). Rows are not tightly packed, and tend to be padded with null bytes (Need to confirm this as I think I had one instance where these bytes were not zero). Once you have found the end of a row, you can keep reading bytes whilst they are zero, or keep reading until you find a valid row header.
Also, you will find that there are often duplicate rows. If you edit data for a track in Recordbox, it will often add a new row to the .pdb file rather than modifying the existing one. This means that, in order to get a row, you have to find all of the rows with the same id first, and then use the last one.
Location | Data type | Description |
---|---|---|
0x00 | uint16 | Table ID = 0x24 |
0x02 | uint16 | Count / Row ID (increases by 0x20) |
0x04 - 0x07 | Unknown | |
0x08 | uint32 | Sample Rate |
0x0c | uint32 | Composer (Artist ID) |
0x10 | uint32 | File size (bytes) |
0x14 | uint32 | Track ID |
0x18 - 0x23 | Unknown | |
0x24 | uint32 | Original artist (Artist ID) |
0x28 - 0x2b | Unknown | |
0x2c | uint32 | Remixer (Artist ID) |
0x30 | uint32 | Bitrate (kbps) |
0x34 | uint32 | Track number |
0x38 - 0x3f | Unknown | |
0x40 | uint32 | Album (Album ID) |
0x44 | uint32 | Artist (Artist ID) |
0x48 - 0x4b | Unknown | |
0x4c | uint16 | Disc number |
0x4e | uint16 | Play count |
0x50 - 0x53 | Unknown | |
0x54 | uint16 | Duration (seconds) |
0x56 - 0x5d | Unknown | |
0x5e | uint16 | ? (string location) |
0x60 | uint16 | Lyricist (string location) |
0x62 | uint16 | ? (string location) |
0x64 | uint16 | ? (string location) |
0x66 | uint16 | ? (string location) |
0x68 | uint16 | KUVO (string location) |
0x6a | uint16 | Public (string location) |
0x6c | uint16 | Autoload HotCue (string location) |
0x6e | uint16 | ? (string location) |
0x70 | uint16 | ? (string location) |
0x72 | uint16 | Date (string location) |
0x74 | uint16 | ? (string location) |
0x76 | uint16 | Mix Name (string location) |
0x78 | uint16 | ? (string location) |
0x7a | uint16 | DAT file (string location) |
0x7c | uint16 | Date (string location) |
0x7e | uint16 | Comment (string location) |
0x80 | uint16 | Track Name (string location) |
0x82 | uint16 | ? (string location) |
0x84 | uint16 | File name (string location) |
0x86 | uint16 | File path (string location) |
Todo...
Todo...
See example-parse for a very basic example of how to parse a pdb file.