Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zstandard decompressor #14394

Merged
merged 62 commits into from
Feb 21, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
19984d8
std.hash: add XxHash64 and XxHash32
Jan 20, 2023
61cb514
std.compress: add zstandard decompressor
Jan 21, 2023
1809172
std.compress.zstandard: cleanup decodeBlock
Jan 22, 2023
05e63f2
std.compress.zstandard: add functions decoding into ring buffer
Jan 22, 2023
c819e58
std.compress.zstandard: add decodeZStandardFrameAlloc
Jan 22, 2023
cbfaa87
std.compress.zstandard: cleanup ReverseBitReader
Jan 23, 2023
fc64c27
std.compress.zstandard: clean up api
Jan 23, 2023
082acd7
std.compress.zstandard: clean up integer casts
Jan 23, 2023
6b85373
std.compress.zstandard: validate sequence lengths
Jan 24, 2023
95953e1
std.compress.zstandard: fix dictionary field size
Jan 24, 2023
31d1cae
std.compress.zstandard: validate fse table value count
Jan 24, 2023
774e2f5
std.compress.zstandard: add input length safety checks
Jan 24, 2023
d40b135
std.compress.zstandard: properly track consumed count in decodeFrameB…
Jan 24, 2023
ab18adf
std.compress.zstandard: remove debug logging
Jan 24, 2023
7558bf6
std.compress.zstandard: minor cleanup and add doc comments
Jan 24, 2023
e2306ef
std.compress.zstandard: add integer casts u64 -> usize
Jan 26, 2023
1e5b8be
std.compress.zstandard: add window size limit param
Jan 26, 2023
3c06e2e
std.compress.zstandard: add doc comments for RingBuffer
Jan 27, 2023
3bfba36
std.compress.zstandard: clean up error sets and line lengths
Jan 28, 2023
e92575d
std.compress.zstandard: verify checksum in decodeFrameAlloc()
Jan 28, 2023
2d35c16
std.compress.zstandard: add init/deinit for ring buffer, fix len()
Jan 31, 2023
947ad3e
std.compress.zstandard: add FrameContext and add literals into Decode…
Jan 31, 2023
5723291
std.compress.zstandard: add `decodeBlockReader`
Feb 2, 2023
a180fcc
std.compress.zstandard: add `ZstandardStream`
Feb 2, 2023
6e3e728
std.compress.zstandard: fix crashes
Feb 2, 2023
7e27556
std.compress.zstandard: split decompressor into multiple files
Feb 2, 2023
89f9c5c
std.compress.zstandard: improve doc comments
Feb 2, 2023
ddeabc9
std.compress.zstandard: add `decodeFrameAlloc()`
Feb 2, 2023
3f1c430
std.compress.zstandard: fix capitalisation of Zstandard
Feb 2, 2023
a651704
std.compress.zstandard: free allocated result on error
Feb 2, 2023
596a97f
std.compress.zstandard: fix crashes
Feb 3, 2023
1c509f4
std.compress.zstandard: fix crashes
Feb 3, 2023
a625df4
std.compress.zstandard: fix fse decoding crash
Feb 4, 2023
06ab5a2
std.compress.zstandard: add multi-frame decoding functions
Feb 4, 2023
a9c8376
std.compress.zstandard: make ZstandardStream decode multiple frames
Feb 4, 2023
ece52e0
std.compress.zstandard: verify content size and fix crash
Feb 5, 2023
98bbd95
std.compress.zstandard: improve block size validation
Feb 6, 2023
2134769
std.compress.zstandard: validate skippable frame size
Feb 6, 2023
d9a90e1
std.compress.zstandard: fix decodeAlloc() and remove decodeFrameAlloc()
Feb 7, 2023
77ca1f7
std.compress.zstandard: remove UnusedBitSet error
Feb 7, 2023
3975a9d
std.compress.zstandard: error when FSE bitstream is no fully consumed
Feb 9, 2023
6d48b05
std.compress.zstandard: add decodeFrameHeader
Feb 9, 2023
55e6e94
std.compress.zstandard: fix content size check
Feb 9, 2023
31cc460
std.compress.zstandard: fix errors and crashes in ZstandardStream
Feb 9, 2023
ee5af3c
std.compress.zstandard: cleanup high-level api docs and error sets
Feb 9, 2023
1530e73
std.compress.zstandard: bytes read assert to error in decodeBlockReader
Feb 10, 2023
373d8ef
std.compress.zstandard: check FSE bitstreams are fully consumed
Feb 11, 2023
476d2fe
std.compress.zstandard: fix zstandardStream finishing early
Feb 12, 2023
8fd4131
std.compress.zstandard: remove unneeded branch
Feb 12, 2023
5a31fc2
std.compress.zstandard: fix erroneous literal stream empty checks
Feb 12, 2023
a53cf29
std.compress.zstandard: add error condition to ring buffer decoding
Feb 12, 2023
12aa478
std.compress.zstandard: also check block size when sequence count is 0
Feb 13, 2023
1a86217
std.compress.zstandard: fix zstandardStream content size validation
Feb 13, 2023
2766b70
std.compress.zstandard: add DictionaryIdFlagUnsupported ZstandardStre…
Feb 14, 2023
a74f800
std.compress.zstandard: update for multi-for-loop change
Feb 20, 2023
a34c2de
std.hash: use std.math.rotl in Xxhash64 and Xxhash32
Feb 21, 2023
12d9f73
std.compress.zstandard: remove use of usingnamespace
Feb 21, 2023
1c518bd
std.compress.zstandard: rename ZStandardStream -> DecompressStream
Feb 21, 2023
c7c35bf
std.RingBuffer: add (non-concurrent) RingBuffer implementation
Feb 21, 2023
c6ef83e
std.compress.zstandard: clean up streaming API
Feb 21, 2023
32cf1d7
std.compress.zstandard: fix error sets for streaming API
Feb 21, 2023
765a6d3
std.compress.zstd: renamed from std.compress.zstandard
Feb 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions build.zig
Original file line number Diff line number Diff line change
Expand Up @@ -113,8 +113,11 @@ pub fn build(b: *std.Build) !void {
".gz",
".z.0",
".z.9",
".zstd.3",
".zstd.19",
"rfc1951.txt",
"rfc1952.txt",
"rfc8478.txt",
// exclude files from lib/std/compress/deflate/testdata
".expect",
".expect-noinput",
Expand Down
136 changes: 136 additions & 0 deletions lib/std/RingBuffer.zig
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
//! This ring buffer stores read and write indices while being able to utilise
//! the full backing slice by incrementing the indices modulo twice the slice's
//! length and reducing indices modulo the slice's length on slice access. This
//! means that whether the ring buffer if full or empty can be distinguished by
//! looking at the difference between the read and write indices without adding
//! an extra boolean flag or having to reserve a slot in the buffer.
//!
//! This ring buffer has not been implemented with thread safety in mind, and
//! therefore should not be assumed to be suitable for use cases involving
//! separate reader and writer threads.

const Allocator = @import("std").mem.Allocator;
const assert = @import("std").debug.assert;

const RingBuffer = @This();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API has quite a bit of overlap with std.fifo.LinearFifo(), I'm not sure we should have both in the standard library.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is quite a bit of overlap (I wasn't aware of LinearFifo), however if I'm reading correctly LinearFifo as it is currently isn't designed to be used the way the ring buffer is used, so I'm not sure it would be a good option to adapt/extend LinearFifo and make use of it. If it would be confusing to have both, I can take it back out of the std namespace, or we can leave it pending the pre 1.0 stdlib review. Thoughts @andrewrk?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add to this discussion, it looks like the lzma implementation has its own LzCircularBuffer type as well which is also a somewhat specialized ring buffer.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then maybe LzCircularBuffer should be replaced by this RingBuffer, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #19231 to track this


data: []u8,
read_index: usize,
write_index: usize,

pub const Error = error{Full};

/// Allocate a new `RingBuffer`; `deinit()` should be called to free the buffer.
pub fn init(allocator: Allocator, capacity: usize) Allocator.Error!RingBuffer {
const bytes = try allocator.alloc(u8, capacity);
return RingBuffer{
.data = bytes,
.write_index = 0,
.read_index = 0,
};
}

/// Free the data backing a `RingBuffer`; must be passed the same `Allocator` as
/// `init()`.
pub fn deinit(self: *RingBuffer, allocator: Allocator) void {
allocator.free(self.data);
self.* = undefined;
}

/// Returns `index` modulo the length of the backing slice.
pub fn mask(self: RingBuffer, index: usize) usize {
return index % self.data.len;
}

/// Returns `index` modulo twice the length of the backing slice.
pub fn mask2(self: RingBuffer, index: usize) usize {
return index % (2 * self.data.len);
}

/// Write `byte` into the ring buffer. Returns `error.Full` if the ring
/// buffer is full.
pub fn write(self: *RingBuffer, byte: u8) Error!void {
if (self.isFull()) return error.Full;
self.writeAssumeCapacity(byte);
}

/// Write `byte` into the ring buffer. If the ring buffer is full, the
/// oldest byte is overwritten.
pub fn writeAssumeCapacity(self: *RingBuffer, byte: u8) void {
self.data[self.mask(self.write_index)] = byte;
self.write_index = self.mask2(self.write_index + 1);
}

/// Write `bytes` into the ring buffer. Returns `error.Full` if the ring
/// buffer does not have enough space, without writing any data.
pub fn writeSlice(self: *RingBuffer, bytes: []const u8) Error!void {
if (self.len() + bytes.len > self.data.len) return error.Full;
self.writeSliceAssumeCapacity(bytes);
}

/// Write `bytes` into the ring buffer. If there is not enough space, older
/// bytes will be overwritten.
pub fn writeSliceAssumeCapacity(self: *RingBuffer, bytes: []const u8) void {
for (bytes) |b| self.writeAssumeCapacity(b);
}

/// Consume a byte from the ring buffer and return it. Returns `null` if the
/// ring buffer is empty.
pub fn read(self: *RingBuffer) ?u8 {
if (self.isEmpty()) return null;
return self.readAssumeLength();
}

/// Consume a byte from the ring buffer and return it; asserts that the buffer
/// is not empty.
pub fn readAssumeLength(self: *RingBuffer) u8 {
assert(!self.isEmpty());
const byte = self.data[self.mask(self.read_index)];
self.read_index = self.mask2(self.read_index + 1);
return byte;
}

/// Returns `true` if the ring buffer is empty and `false` otherwise.
pub fn isEmpty(self: RingBuffer) bool {
return self.write_index == self.read_index;
}

/// Returns `true` if the ring buffer is full and `false` otherwise.
pub fn isFull(self: RingBuffer) bool {
return self.mask2(self.write_index + self.data.len) == self.read_index;
}

/// Returns the length
pub fn len(self: RingBuffer) usize {
const wrap_offset = 2 * self.data.len * @boolToInt(self.write_index < self.read_index);
const adjusted_write_index = self.write_index + wrap_offset;
return adjusted_write_index - self.read_index;
}

/// A `Slice` represents a region of a ring buffer. The region is split into two
/// sections as the ring buffer data will not be contiguous if the desired
/// region wraps to the start of the backing slice.
pub const Slice = struct {
first: []u8,
second: []u8,
};

/// Returns a `Slice` for the region of the ring buffer starting at
/// `self.mask(start_unmasked)` with the specified length.
pub fn sliceAt(self: RingBuffer, start_unmasked: usize, length: usize) Slice {
assert(length <= self.data.len);
const slice1_start = self.mask(start_unmasked);
const slice1_end = @min(self.data.len, slice1_start + length);
const slice1 = self.data[slice1_start..slice1_end];
const slice2 = self.data[0 .. length - slice1.len];
return Slice{
.first = slice1,
.second = slice2,
};
}

/// Returns a `Slice` for the last `length` bytes written to the ring buffer.
/// Does not check that any bytes have been written into the region.
pub fn sliceLast(self: RingBuffer, length: usize) Slice {
return self.sliceAt(self.write_index + self.data.len - length, length);
}
2 changes: 2 additions & 0 deletions lib/std/compress.zig
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ pub const lzma = @import("compress/lzma.zig");
pub const lzma2 = @import("compress/lzma2.zig");
pub const xz = @import("compress/xz.zig");
pub const zlib = @import("compress/zlib.zig");
pub const zstd = @import("compress/zstandard.zig");

pub fn HashedReader(
comptime ReaderType: anytype,
Expand Down Expand Up @@ -44,4 +45,5 @@ test {
_ = lzma2;
_ = xz;
_ = zlib;
_ = zstd;
}
Loading