-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update query API to borrow Arrays instead of consume them #151
base: main
Are you sure you want to change the base?
Conversation
@@ -0,0 +1,51 @@ | |||
error[E0277]: `Rc<context::RawContext>` cannot be shared between threads safely |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to check this in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a little strange for me to assert on compiler output. Is the error format guaranteed to be stable going forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can simply assert it fails to compile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to check this in?
Yep
Seems a little strange for me to assert on compiler output. Is the error format guaranteed to be stable going forward?
I don't imagine it would drift too much but I agree we should not assume it won't change. The trybuild
crate makes it easy to recapture (run with TRYBUILD=1
).
Maybe we can simply assert it fails to compile.
That is the goal but I think just "it fails to compile" is far too broad. Syntax error is also a failure to compile but doesn't tell me what I'm looking for and possibly hides a success. I'd be happy just checking the error number, but the other option to do that using doc tests did not work for some reason. I'm fine with the extra verbose output that we might have to recapture every once in a while as a tradeoff.
I'd definitely love to use some kind of regexp or something if trybuild
had the means...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quoting from https://docs.rs/trybuild/latest/trybuild/
In all of these cases, the compiler’s output can change because our crate or one of our dependencies broke something, or as a consequence of changes in the Rust compiler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest of that paragraph seems like an endorsement rather than a reason not to capture the compiler errors.
The beginning of the section seems more on point:
When it comes to compile-fail tests, write tests for anything for which you care to find out when there are changes in the user-facing compiler output. As a negative example, please don’t write compile-fail tests simply calling all of your public APIs with arguments of the wrong type; there would be no benefit.
This test is closer to "calling a public API with the wrong type" than "changes in user-facing compiler output".
I want to test that Array
is not Sync
. For some reason the doc test compile_fail,E0277
syntax did not work - it correctly checked that compilation failed but the error number was not used for some reason. I would like to have something stronger than "fails to compile" but not as sensitive "if the compiler changes the text of its error message then the test fails and need to evaluate recapturing", but I haven't found an option for that. And I think the over-sensitivity is the better choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to test that
Array
is notSync
. For some reason the doc testcompile_fail,E0277
syntax did not work -
As recently as a year ago, that required cargo nightly to work:
,E0515 indicates the error number you expect, which avoids any other errors that would make the test false negative. (This error number check will be ignored if cargo is not running as nightly)
via https://stackoverflow.com/questions/74989492/how-to-assert-that-code-wont-compile-in-a-test
@@ -44,10 +44,10 @@ impl Drop for RawQuery { | |||
} | |||
} | |||
|
|||
pub trait Query { | |||
fn base(&self) -> &QueryBase; | |||
pub trait Query<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call 'a 'array instead? Is it meant to represent the lifetime of the array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is, yep. Happy to do that. Most of the standard library uses 'a
but I agree that the lifetime names ought to be more verbose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with a comment which makes it clear instead of renaming everything
Overall looks good. Left a few questions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is intended to make it easier to submit multiple queries over an iteration. Moving the array into and out of each iteration is a bit of a pain.
Can you expand on this? I'm not entirely sure what it means and without understanding the motivation I'm having a hard time seeing wha this change accomplishes.
I can say that now that I've read even more Rust written by neither of us, it seems to me that lifetime annotations in public APIs are relatively rare in practice (as in, where we might have defaulted to lifetimes, it seems more like idiomatic Rust uses other primitives like Arc, Rc, and similar constructs). So adding a whole bunch of lifetime annotations in a public API is becoming a bit of a code smell to me.
That said, there are certainly places where lifetimes do show up, but those are seem to be in temporary instances like Iterators that are expected to be consumed directly rather than held onto for indeterminate lengths of time. I wouldn't consider an instance of Query to meet this sort of temporary status given how much internal state it keeps.
Multiple concurrent executions of tiledb_query_submit from different threads are only possible if Array is Sync, so we add a test which ensures that Array is not Sync.
I haven't spent the time trying to figure out whether or not you did something fancy or whether there's a difference between Sync and Send marker traits, but this assertion reminded me that we're already skirting the edges of safety with things like SendArray
. Is there something that prevents the same trick from working for SyncArray
?
The difff obfuscates it a bit but this is probably the best place to look. We open an array. We iterate over a vec of write inputs and turn each into a Write query. Previously, the construction of the Write query consumes the open array, so we have to get the array back from This gets more complicated if we want to defer the construction of the Write query to a function inside the loop. The previous API would require passing the array in as a consumed argument and then returning it from the function and then assigning back to the array. This is a bit awkward but it manifests in a more ugly fashion when interacting with the |
Using If Write query construction required that the |
We (or someone else) can do |
Another reason to change the API in a fashion like this - |
Re-reading this discussion after a few weeks and more time spent in other Rust code bases, I'm going to double down on "lifetime annotations that spider through a code base seems like very non-idiomatic Rust" stance. However, I'm pretty sure this is a text book case for using a RefCell and its try_borrow_mut method. The one thing I think you'd want that I don't have an immediate answer to is how to hold a mutable reference for the lifetime of the Query instance. Once we give a query that ref cell, nothing else should be able to borrow it (I think). Where a naive approach would only have the query hold a mutable ref during each method call, which would allow for folks attempting to interleave method calls on two different Query instances using a RefCell to the same Array which would be bad. |
This pull request changes the query API to borrow an Array rather than consume it. The entry points of
ReadBuilder::new(array: Array)
andWriteBuilder::new(array: Array)
are changed toReadBuilder::new(array: &Array)
andWriteBuilder::new(array: &mut Array)
respectively.This change is intended to make it easier to submit multiple queries over an iteration. Moving the array into and out of each iteration is a bit of a pain.
The reason that the API originally consumed the Array is because it is unsafe to submit multiple queries against the same open array in parallel in the core library.
https://app.shortcut.com/tiledb-inc/story/35602/add-c-multithreaded-query-example
This story is about that (though it may not contain that detail directly, one of the linked stories might instead).
Since
WriteBuilder::new
now takes an exclusive reference, the array it borrows cannot be used as long as the query object is in scope - this satisfies the safety guarantee we want.ReadBuilder::new
takes a shared reference, which does allow construction of multipleReadBuilder
objects borrowing the same array. Multiple concurrent executions oftiledb_query_submit
from the same thread are not possible since there is no async query support from core (there is an API but we do not currently wrap it and I think Paul suggested it is not stable for some reason?). Multiple concurrent executions oftiledb_query_submit
from different threads are only possible ifArray
isSync
, so we add a test which ensures thatArray
is notSync
.