Rust port of the Stockfish binpack reader from the C++ version.
Binpacks store chess positions and their evaluations in a compact format. Instead of storing complete positions, they store the differences between moves. This makes them very space efficient - using only 2.5 bytes per position on average.
If your machine has the fast BMI2 instruction set (Zen 3+), you should enable the feature flag
cargo build --release --features bmi2;
or define it in your Cargo.toml
file (change version).
[dependencies]
binpack = { version = "0.1.0", features = ["bmi2"] }
Run the following Cargo command in your project directory:
cargo add sfbinpack
use sfbinpack::CompressedTrainingDataEntryReader;
fn main() {
let mut reader = CompressedTrainingDataEntryReader::new(
"test80.binpack", // path to file
)
.unwrap();
while reader.has_next() {
let entry = reader.next();
println!("entry:");
println!("fen {}", entry.pos.fen());
println!("uci {:?}", entry.mv.as_uci());
println!("score {}", entry.score);
println!("ply {}", entry.ply);
println!("result {}", entry.result);
println!("\n");
// progress percentage
// let percentage = reader.read_bytes() as f64 / reader.file_size() as f64 * 100.0;
}
}
More examples can be found in the examples directory.
If you are doing some counting keep in mind to use a u64
type for the counter.
When compressing new data, it is advised to store the entire continuation of the actual game.
This will allow for a much better compression ratio.
Failure to do so will result in a larger file size, than compared to other alternatives.
Slightly faster when compiled with bmi2 because of _pdep_u64 trick which is missing in the upstream version.
The extended Backus-Naur form (EBNF) of the binpack format is as follows:
(* BINP Format EBNF Specification *)
File = { Block } ;
Block = ChunkHeader , { Chain } ;
ChunkHeader = Magic , ChunkSize ;
Magic = '"BINP"' ;
ChunkSize = UINT32LE ; (* 4 bytes, little endian *)
Chain = Stem , Count , MoveText ;
Stem = Position , Move , Score , PlyResult , Rule50 ;
Count = UINT16BE ; (* 2 bytes, big endian *)
MoveText = { MoveScore } ;
(* Stem components - total 32 bytes )
Position = CompressedPosition ; ( 24 bytes *)
Move = CompressedMove ; (* 2 bytes *)
Score = INT16BE ; (* 2 bytes, big endian, signed *)
PlyResult = UINT8 ; (* 2 byte, big endian unsigned *)
Rule50 = UINT16BE ; (* 2 bytes, big endian *)
(* MoveText components *)
MoveScore = EncodedMove , EncodedScore ;
(* Encoded components )
EncodedMove = VARLEN_UINT ; ( Variable length encoding *)
EncodedScore = VARLEN_INT ; (* Variable length encoding *)
(* Terminal symbols *)
UINT32LE = ? 4-byte unsigned integer in little-endian format ? ;
UINT16BE = ? 2-byte unsigned integer in big-endian format ? ;
INT16BE = ? 2-byte signed integer in big-endian format ? ;
UINT8 = ? 1-byte unsigned integer ? ;
VARLEN_UINT = ? Variable-length encoded unsigned integer ? ;
VARLEN_INT = ? Variable-length encoded signed integer ? ;
CompressedPosition = ? 24-byte compressed chess position ? ;
CompressedMove = ? 2-byte compressed chess move ? ;
GNU General Public License v3.0