Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle incorrect VZV bytes representation #3883

Merged
merged 3 commits into from
Aug 17, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion utils/zerovec/src/varzerovec/components.rs
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,7 @@ impl<'a, T: VarULE + ?Sized, F: VarZeroVecFormat> VarZeroVecComponents<'a, T, F>
/// `things`, such that it parses to a `T::VarULE`
#[inline]
pub fn parse_byte_slice(slice: &'a [u8]) -> Result<Self, ZeroVecError> {
// The empty VZV is special-cased to the empty slice
if slice.is_empty() {
return Ok(VarZeroVecComponents {
len: 0,
Expand Down Expand Up @@ -219,6 +220,7 @@ impl<'a, T: VarULE + ?Sized, F: VarZeroVecFormat> VarZeroVecComponents<'a, T, F>
/// The bytes must have previously successfully run through
/// [`VarZeroVecComponents::parse_byte_slice()`]
pub unsafe fn from_bytes_unchecked(slice: &'a [u8]) -> Self {
// The empty VZV is special-cased to the empty slice
if slice.is_empty() {
return VarZeroVecComponents {
len: 0,
Expand Down Expand Up @@ -485,12 +487,13 @@ where
}

/// Collects the bytes for a VarZeroSlice into a Vec.
pub fn get_serializable_bytes<T, A, F>(elements: &[A]) -> Option<Vec<u8>>
pub fn get_serializable_bytes_non_empty<T, A, F>(elements: &[A]) -> Option<Vec<u8>>
where
T: VarULE + ?Sized,
A: EncodeAsVarULE<T>,
F: VarZeroVecFormat,
{
debug_assert!(!elements.is_empty());
let len = compute_serializable_len::<T, A, F>(elements)?;
debug_assert!(len >= LENGTH_WIDTH as u32);
let mut output: Vec<u8> = alloc::vec![0; len as usize];
Expand Down
17 changes: 11 additions & 6 deletions utils/zerovec/src/varzerovec/owned.rs
Original file line number Diff line number Diff line change
Expand Up @@ -84,13 +84,18 @@ impl<T: VarULE + ?Sized, F: VarZeroVecFormat> VarZeroVecOwned<T, F> {
where
A: EncodeAsVarULE<T>,
{
Ok(Self {
marker: PhantomData,
// TODO(#1410): Rethink length errors in VZV.
entire_slice: components::get_serializable_bytes::<T, A, F>(elements).ok_or(
"Attempted to build VarZeroVec out of elements that \
Ok(if elements.is_empty() {
Self::from_slice(VarZeroSlice::new_empty())
} else {
Self {
marker: PhantomData,
// TODO(#1410): Rethink length errors in VZV.
entire_slice: components::get_serializable_bytes_non_empty::<T, A, F>(elements)
.ok_or(
"Attempted to build VarZeroVec out of elements that \
cumulatively are larger than a u32 in size",
)?,
)?,
}
})
}

Expand Down
4 changes: 2 additions & 2 deletions utils/zerovec/src/varzerovec/slice.rs
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,8 @@ pub struct VarZeroSlice<T: ?Sized, F = Index16> {
impl<T: VarULE + ?Sized, F: VarZeroVecFormat> VarZeroSlice<T, F> {
/// Construct a new empty VarZeroSlice
pub const fn new_empty() -> &'static Self {
let arr: &[u8] = &[];
unsafe { mem::transmute(arr) }
// The empty VZV is special-cased to the empty slice
unsafe { mem::transmute(&[] as &[u8]) }
}

/// Obtain a [`VarZeroVecComponents`] borrowing from the internal buffer
Expand Down
34 changes: 30 additions & 4 deletions utils/zerovec/src/varzerovec/vec.rs
Original file line number Diff line number Diff line change
Expand Up @@ -425,8 +425,12 @@ where
{
#[inline]
fn from(elements: &[A]) -> Self {
#[allow(clippy::unwrap_used)] // TODO(#1410) Better story for fallibility
VarZeroVecOwned::try_from_elements(elements).unwrap().into()
if elements.is_empty() {
VarZeroSlice::new_empty().into()
} else {
#[allow(clippy::unwrap_used)] // TODO(#1410) Better story for fallibility
VarZeroVecOwned::try_from_elements(elements).unwrap().into()
}
}
}

Expand All @@ -451,8 +455,14 @@ where
{
#[inline]
fn eq(&self, other: &VarZeroVec<'b, T, F>) -> bool {
// VarULE has an API guarantee that this is equivalent
// to `T::VarULE::eq()`
// VZV::from_elements used to produce a non-canonical representation of the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine allowing this to slip

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though this is ok too, note that this also breaks the VarULE "equality is byte equality" invariant on VZS

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we remove this we should also special case is_empty to check for empty length

Copy link
Member Author

@robertbastian robertbastian Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_empty should already work, because it checks VZComponent's indices slice, which gets special-case initialised.

note that this also breaks the VarULE "equality is byte equality" invariant on VZS

We already broke that invariant

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that testdata CI is failing so the incorrect representation is out there

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, that's fine by me then, since it's legacy-only. We may wish to deprecate this in the long run.

// empty VZV, so we cannot use byte equality for empty vecs.
if self.is_empty() || other.is_empty() {
return self.is_empty() && other.is_empty();
}
// VarULE has an API guarantee that byte equality is semantic equality.
// For non-empty VZVs, there's only a single metadata representation,
// so this guarantee extends to the whole VZV representation.
self.as_bytes().eq(other.as_bytes())
}
}
Expand Down Expand Up @@ -503,3 +513,19 @@ impl<'a, T: VarULE + ?Sized + Ord, F: VarZeroVecFormat> Ord for VarZeroVec<'a, T
self.iter().cmp(other.iter())
}
}

#[test]
fn assert_single_empty_representation() {
assert_eq!(
VarZeroVec::<str>::new().as_bytes(),
VarZeroVec::<str>::from(&[] as &[&str]).as_bytes()
);
}

#[test]
fn weird_empty_representation_equality() {
assert_eq!(
VarZeroVec::<str>::parse_byte_slice(&[0, 0, 0, 0]).unwrap(),
VarZeroVec::<str>::parse_byte_slice(&[]).unwrap()
);
}
Loading