-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make string interner and backends compatible with more types #42
Conversation
Hi and thanks for your PR! Looks interesting! I have yet to grasp what it actually provides users of this library in essence. So will it be possible to use C-style strings or I also conducted the compiled I also conducted some benchmarks using |
Thank you for the quick response!
Noted for the next time!
In Boa, a Javascript interpreter, we use
IIRC The most notable additions to the API are
Finally,
That's... weird. Maybe the generics are adding additional indirection to the trait methods? Probably inlining should fix that. |
Thanks for your elaborate explanations!
I think it is best if you just run the benchmarks on your own computer to get a feeling. Just checkout With respect to the suggested API I am always looking for the simplest design. Yet, I think having all this flexibility with |
Will do!
Sorry! I Exposed
Ah, so unify the requirements of all backends into a single trait? I could do that for most of the backends. If that's the plan, then the exposed types are a lot less; only |
On another note, I had to adjust the allocation benchmarks for the string-interner/src/backend/bucket/mod.rs Lines 119 to 123 in d552e76
Calling string-interner/src/backend/bucket/mod.rs Lines 152 to 155 in 30f6186
And obviously, fixing the bug increased the memory consumption, so I had to adjust the expected min and max memory used. |
@Robbepop did you have the chance to check the latest changes? |
Sure but what does @jedel1043 think about the state of this PR?
I'd like for
Can you elaborate on this? I don't see how this results in undefined behavior atm. The memory consumption regressions for the |
Sure :) string-interner/src/backend/bucket/mod.rs Lines 55 to 60 in d552e76
This backend works by creating several However, Unfortunately, when we call the string-interner/src/backend/bucket/fixed_str.rs Lines 60 to 63 in d552e76
And calling pub fn shrink_to_fit(&mut self) {
// The capacity is never less than the length, and there's nothing to do when
// they are equal, so we can avoid the panic case in `RawVec::shrink_to_fit`
// by only calling it with a greater capacity.
if self.capacity() > self.len {
self.buf.shrink_to_fit(self.len);
}
} This calls pub fn shrink_to_fit(&mut self, cap: usize) {
handle_reserve(self.shrink(cap));
} Which calls fn shrink(&mut self, cap: usize) -> Result<(), TryReserveError> {
assert!(cap <= self.capacity(), "Tried to shrink to a larger capacity");
let (ptr, layout) = if let Some(mem) = self.current_memory() { mem } else { return Ok(()) };
let ptr = unsafe {
// `Layout::array` cannot overflow here because it would have
// overflowed earlier when capacity was larger.
let new_layout = Layout::array::<T>(cap).unwrap_unchecked();
self.alloc
.shrink(ptr, layout, new_layout)
.map_err(|_| AllocError { layout: new_layout, non_exhaustive: () })?
};
self.set_ptr_and_cap(ptr, cap);
Ok(())
}
} Which calls new_size => unsafe {
let new_ptr = self.allocate(new_layout)?;
ptr::copy_nonoverlapping(ptr.as_ptr(), new_ptr.as_mut_ptr(), new_size);
self.deallocate(ptr, old_layout);
Ok(new_ptr)
}, Oops, the old memory is deallocated! This causes a use after free and results in undefined behaviour. |
I don't think that's possible. You cannot require that a generic type ought to be an implementor of a certain trait without exposing said trait. The alternative would be to maintain a different definition of each backend for each string type, which is not ideal.
I don't think is a good idea to name a trait as a specific type of the standard library. Aside from the name conflict, it would be very confusing if a trait, which is implemented by types, has the same name as a specific type. What about |
@Robbepop Did you have the chance to read my explanation and comments? |
…rner` (#2147) So, @raskad and myself had a short discussion about the state of #736, and we came to the conclusion that it would be a good time to implement our own string interner; partly because the `string-interner` crate is a bit unmaintained (as shown by Robbepop/string-interner#42 and Robbepop/string-interner#47), and partly because it would be hard to experiment with custom optimizations for UTF-16 strings. I still want to thank @Robbepop for the original implementation though, because some parts of this design have been shamelessly stolen from it 😅. Having said that, this PR is a complete reimplementation of the interner, but with some modifications to (hopefully!) make it a bit easier to experiment with UTF-16 strings, apply optimizations, and whatnot :)
I think I would like to keep the |
…rner` (#2147) So, @raskad and myself had a short discussion about the state of #736, and we came to the conclusion that it would be a good time to implement our own string interner; partly because the `string-interner` crate is a bit unmaintained (as shown by Robbepop/string-interner#42 and Robbepop/string-interner#47), and partly because it would be hard to experiment with custom optimizations for UTF-16 strings. I still want to thank @Robbepop for the original implementation though, because some parts of this design have been shamelessly stolen from it 😅. Having said that, this PR is a complete reimplementation of the interner, but with some modifications to (hopefully!) make it a bit easier to experiment with UTF-16 strings, apply optimizations, and whatnot :)
This PR rewrites the interner to be generic, making it compatible with many more types.
Now the interner explicitly needs a backend parameter, because the type checker cannot resolve&str
as the input argument tostr
based backends. (There are more than oneimpl AsRef
for the&str
type). One solution would be to restrict T toBorrow
, with the downside of making it less ergonomic to pass a&[String]
to theextend
method of the interner. I'd like to hear your ideas :)BucketBackend
.<S as ToOwned>::Owned
as the buffer type of theStringBackend
, but that made the API a bit more cleaner with less generic parameters. Would it be preferrable as it is or do you prefer more general types?Comments are very much appreciated :)