Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add const accessors for data #56

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Caellian
Copy link

@Caellian Caellian commented Nov 6, 2024

I didn't want to use as_ptr because that exists on Deref with str Target. I don't like this name, but it's very explicit and long so it won't be confused with std which doesn't use get_.

Alternatively, maybe as_raw_ptr would be good. But IMO it's way to similar to as_ptr, so I opted for get_raw_ptr because it's clear what it does and just enough imperfect/annoying to make collisions with Cowed types unlikely.

Once const traits get into stable, this could maybe be deprecated and point to str::as_ptr or addr_of!(**deref). But I feel like that won't be the case for additional 4-5 years.

Signed-off-by: Tin Švagelj <tin.svagelj@live.com>
@maciejhirsz
Copy link
Owner

maciejhirsz commented Nov 7, 2024

@Caellian I reckon if we don't want to confuse people by overloading methods on the underlying deref types, the idiomatic way of doing it is to not make it a method at all and instead of &self take this: &Self as parameter. This is pretty common on smart pointer types in std:

The first one of these is very analogous to what's going on here, so if you don't mind if it's "incredibly convoluted" 🙃 then Cow::as_ptr(&my_cow) should work well.

src/generic.rs Outdated Show resolved Hide resolved
Signed-off-by: Tin Švagelj <tin.svagelj@live.com>
- Add is_empty as suggested by clippy.

Signed-off-by: Tin Švagelj <tin.svagelj@live.com>
@Caellian Caellian changed the title Add const accessor for data Add const accessors for data Nov 7, 2024
src/generic.rs Outdated
Comment on lines 173 to 229
impl<'a, T> Cow<'a, T, Wide>
where
T: Beef + ?Sized,
{
/// Returns `true` if the contained data is empty
pub const fn is_empty(&self) -> bool {
self.len() == 0
}

/// Returns length of underlying data
#[inline]
pub const fn len(&self) -> usize {
self.fat
}

/// Returns capacity of underlying data
#[inline]
pub const fn capacity(&self) -> Option<usize> {
match self.cap {
Some(_) => unsafe {
// SAFETY: Transmute from Option<NonZero<usize>> to usize is
// sound by definition.
Some(core::mem::transmute::<
Option<core::num::NonZero<usize>>,
usize,
>(self.cap))
},
None => None,
}
}
}
impl<'a, T> Cow<'a, T, Lean>
where
T: Beef + ?Sized,
{
/// Returns `true` if the contained data is empty
pub const fn is_empty(&self) -> bool {
self.len() == 0
}

/// Returns length of underlying data
#[inline]
pub const fn len(&self) -> usize {
Lean::mask_len(self.fat)
}

/// Returns capacity of underlying data
#[inline]
pub const fn capacity(&self) -> Option<usize> {
let cap = Lean::mask_cap(self.fat);
if cap == 0 {
None
} else {
Some(cap)
}
}
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After using my version with as_ptr I realized it's only useful if one has externally stored length (in my case I took a string until separator). But if not, just the pointer without length doesn't help with constructing strings in const context.

I added these additional functions to allow access to all information. I'm not 100% familiar with the library internals so let me know whether they are correct.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, whether Lean capacity is correct - is it 0 for borrowed values?

Added mask_cap does (value & MASK_HI) >> 4.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it 0 for borrowed values?

Correct

src/generic.rs Outdated
Comment on lines 162 to 164
fn capacity(&self) -> Option<U::NonZero> {
fn get_capacity(&self) -> Option<U::NonZero> {
U::maybe(self.fat, self.cap)
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just had to rename the existing internal method to something else.

Signed-off-by: Tin Švagelj <tin.svagelj@live.com>
Signed-off-by: Tin Švagelj <tin.svagelj@live.com>
@maciejhirsz
Copy link
Owner

maciejhirsz commented Nov 7, 2024

This looks like a solution to XY problem and I'm not really a big fan of leaking internals. If you just need a way to get a &str from Cow<str> then we should just have Cow::as_str[_const](my_cow) implemented for Cow<str, Lean> and Cow<str, Wide>.

@Caellian
Copy link
Author

Caellian commented Nov 7, 2024

I don't see it as leaking internals as they don't expose something that will ever change, nor use internal types in signatures:

  • Accessing data ptr is very useful for unsafe code. I don't think repr(C) is an option to enable transmute of the struct. So not having as_ptr makes Cow opaque to external code. That's why Rc, Box, etc. provide as_ptr and Cow falls into the same category of smart wrappers as those, so it should have it.
  • Accessing len/cap makes sense to me given that Cow actually stores those, so as a container (like Vec and String) it makes sense for it to expose them.

std Cow probably doesn't expose these methods because a large part of it was designed without const in mind, aannd once const traits land they would be just noise because those methods are always defined on cowed types as well. In case of a library they can be removed with a simple semver change.

we should just have Cow::as_str_const implemented for Cow<str, Lean> and Cow<str, Wide>

The exposed functions are building blocks for functions like as_str - this only looks like more code now that my use case is just str.

Exposed functions only need to handle additional representations which are known to beef (if they ever get added (unlikely)). Whereas as_[T][_const] can't ever be exhaustive because some FooBar defined in one of dependents isn't known to beef and whoever deals with Cow<'a, FooBarRef> types would then need to fork this crate to have const access to len and capacity.

@maciejhirsz
Copy link
Owner

Accessing data ptr is very useful for unsafe code.

If you have &str in const context you can use its as_ptr method to get the pointer.

Accessing len/cap makes sense to me given that Cow actually stores those, so as a container (like Vec and String) it makes sense for it to expose them.

capacity is internal, the fact that it can be 0 for non-empty strings is an implementation detail, if we ever want to change how beef differentiates between owned and borrowed values this might no longer be the case and could break usercode down the road. I also don't really know what you need actual capacity for since you can't create heap allocated (non-empty) Strings in const expressions.

Signed-off-by: Tin Švagelj <tin.svagelj@live.com>
@Caellian
Copy link
Author

Caellian commented Nov 7, 2024

If you have &str in const context you can use its as_ptr method to get the pointer.

I know that, but I don't. I forked this crate and made a PR because I'm using Cow which doesn't have that in const context. I'm using this crate because I need a no_std compatible one.

I also don't really know what you need actual capacity for

I don't. I added it without thinking for completeness.

@maciejhirsz
Copy link
Owner

I know that, but I don't.

My point is that if you have a Cow<str> in const context and you can get &str out of it with a const function, you can also get a pointer out of it this way:

const PTR: *mut u8 = COW_STR.as_str().as_ptr();

Likewise you don't need the len method if you can just get it from the &str. It makes the API surface smaller and more versatile at the same time.

Don't think there are any issues if we override as_str if it actually returns a &str, there will need to be two implementations for Cow<str, Lean> and Cow<str, Wide> to make them const but I reckon that's acceptable.

@Caellian
Copy link
Author

Caellian commented Nov 7, 2024

It makes the API surface smaller and more versatile at the same time.

It's not more versatile if it only works for &str type and not for other Ts:

#[repr(transparent)]
struct TagBorrowed {
    name: [u8]
}
impl beef::generic::Beef for TagBorrowed {
    // Assuming TagBorrowed isn't sealed
}

struct TagOwned {
    name: alloc::string::String
}

impl core::ops::Deref for TagOwned {
    type Target = TagBorrowed;

    fn deref(&self) -> &Self::Target {
        unsafe {
            &*(self.name.as_bytes() as *const [u8] as *const TagBorrowed)
        }
    }
}

let example: beef::Cow<TagBorrowed> = beef::Cow::owned(TagOwned {
    name: alloc::string::String::from("hui")
});

But that assumes beef::Cow can be used with types other than str and [u8].

Something like #20 would be fine if it worked in const, but I see it's using a trait. Hmmm....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants