-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: [sc-57786] [rs] minimize set of functions which need to link with core tiledb to run #179
Conversation
test-utils/cells/Cargo.toml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cells
crate contains the structure definitions and trait impls of code which used to live in api/src/query/strategy.rs
and api/src/query/write/strategy.rs
. This code has been exported and is used in tables
tests. This crate depends on tiledb-serde
because it uses the schema to make some decisions about values and etc.
I don't think there is any new code in the cells
crate itself, but splitting this out is what has led to the ToReadQuery
and ToWriteQuery
traits in tiledb-api
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new crate proptest-config
just contains the configuration settings for proptest that we previously had. I moved it to its own crate because cells
and tiledb-serde
both use it.
test-utils/signal/Cargo.toml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had some signal stuff in tiledb-test-utils
, this crate contains that code.
test-utils/strategy-ext/Cargo.toml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tiledb-test-utils
previously had a bunch of generic strategy extensions which now live in this new crate.
test-utils/uri/Cargo.toml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This crate is for the URI generator stuff which previously lived in tiledb-test-utils
.
Initially I had split all of the tiledb-test-utils
modules into separate features rather than separate crates; but this was sort of gnarly for dependencies, which is when I moved them all to their own crates.
No new code in this crate, just moved around.
@@ -473,330 +476,18 @@ impl Builder { | |||
} | |||
} | |||
|
|||
/// Trait for data which can be used as a fill value for an attribute. | |||
pub trait IntoFillValue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was moved into tiledb-common
because of the impls which build upon PhysicalType
. I did this before adding the CellValue
trait; that would probably be more appropriate to use and we could move this trait back into tiledb-api
.
} | ||
} | ||
|
||
impl TryFrom<&Attribute> for AttributeData { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conversions between AttributeData
and Attribute
now live in the serde
module, i.e. tiledb/api/src/array/attribute/serde.rs
.
This is the case for all the other ThingData
structs as well
@@ -170,275 +158,16 @@ impl PartialEq<Dimension> for Dimension { | |||
} | |||
} | |||
|
|||
#[derive(Clone, Debug, Deserialize, OptionSubset, PartialEq, Serialize)] | |||
pub enum DimensionConstraints { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now in tiledb-common
though perhaps it should go into tiledb-serde
.
let c_name = cstring!(name); | ||
let c_domain = constraints.domain_ptr(); | ||
let c_extent = constraints.extent_ptr(); | ||
let c_domain = dimension_constraints_go!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the body of DimensionConstraints::domain_ptr
.
It didn't really feel right to have an otherwise Rust-only crate provide methods which deal with raw pointers.
#[test] | ||
fn subarray_strategy_dense() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test doesn't actually use any tiledb APIs so it now lives in serde/src/array/dimension/strategy.rs
}; | ||
|
||
#[derive(Clone, Debug, PartialEq)] | ||
pub enum Mode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the enums have been moved to tiledb-common
.
Except for Encryption which is not used by anything else yet. Though it does seem like something we might want to put in an RPC, so it probably should be moved there.
|
||
/// Represents tiledb (`Datatype`, `CellValNum`) compatibility for an arrow `DataType`. | ||
#[derive(Clone, Debug, Eq, PartialEq)] | ||
pub enum DatatypeToArrowResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of this is moved to tiledb-common/src/datatype/arrow.rs
tiledb/api/src/datatype/mod.rs
Outdated
|
||
use crate::error::DatatypeErrorKind; | ||
use crate::Result as TileDBResult; | ||
impl ToStringCore for Datatype { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The enum is moved to tiledb-common/src/datatype/mod.rs
.
What's here now is just trait impls for new traits ToStringCore
and FromStringCore
which call the tiledb core string functions.
Since these require linking with libtiledb, tiledb-common
specifically does not want them. Hence Debug is derived and Display just uses that.
@@ -26,107 +29,6 @@ impl Drop for RawError { | |||
} | |||
} | |||
|
|||
#[derive(Clone, Debug)] | |||
pub enum DatatypeErrorKind { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some variants are moved to tiledb_common::datatype::Error
and others are now just Error
variants.
#[derive( | ||
Clone, Debug, Default, Deserialize, OptionSubset, PartialEq, Serialize, | ||
)] | ||
pub struct FilterListData(Vec<FilterData>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In contrast to other items which are moved somewhere else, this one I just deleted and replaced uses of it with Vec<FilterData>
.
#[derive( | ||
Copy, Clone, Debug, Deserialize, Eq, OptionSubset, PartialEq, Serialize, | ||
)] | ||
pub enum CompressionType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to tiledb_common::filter
use util::option::OptionSubset; | ||
|
||
#[derive(Clone, Debug, Deserialize, OptionSubset, PartialEq, Serialize)] | ||
pub enum Value { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is moved to tiledb_common
. This module now just has functions for FFI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops. I totes forgot that I've got a dinner thing tonight so I didn't get as far into this as I'd like.
After my skimming so far, about the only thing I wish were different is the proptest strategies in the pod crate being pushed into their own. Granted I'm well aware of why that didn't and might not ever happen so its not really important. And its not like this isn't an improvement anyway.
Also heads up that there's a bunch of modules in the API crate all called serde that look like they should have been renamed to match the s/serde/pod/ rename you did earlier.
Other than that, this all looks pretty good. But given how large it is I am gonna give it another look over tomorrow before I approve. Today I mostly just looked at the new structure and spot checked that things pretty much ended up where I expected and so far. Tomorrow I'll give a closer skim on the actual diff before +1'ing.
Good work though! This will be a nice clean up of some of our initial decisions made while earlier in our Rust learning adventure.
pub nullability: Option<bool>, | ||
} | ||
|
||
#[cfg(any(test, feature = "proptest-strategies"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason why you don't just move this impl to the strategy module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a stylistic choice to have the impl
in the same file that the struct
is defined in.
I have a weak preference for this because it means only searching one file for the impls. Though admittendly one should probably build rustdocs for exploring the impls.
Good catch, fixed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big change but everything looks good to me. The only thing I spotted that had me wondering are the extra features that don't feel very feature level. But those are easy enough to undo later that I'm not gonna request any changes there.
Other than that, all looked good. The error handling cleanup was quite nice to see throughout. I left a few notes, but I think more than half were basically just notes to myself.
+1
@@ -303,6 +291,19 @@ impl PartialEq<Attribute> for Attribute { | |||
} | |||
} | |||
|
|||
#[cfg(any(test, feature = "pod"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems slightly odd to have the pod crate as an optional dependency here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well... it definiitely makes use of the API way easier but it's not necessary to use it. If it's a feature than there is a leaner option to use tiledb as a dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it was just that it seems like the pod definitions are. the way to interact with the bindings, so making it a feature that requires opting into didn't feel quite right. And this isn't super duper well thought out. I was just thinking "How often are we going to not want those pod APIs?" and I'm reasonably sure "Almost never, and the cost of having them and not using them would likely be miniscule."
Also its fine for now. We can always go back and change things up if we decide that it really is unnecessary for some particular use case where it matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed we could - and everyone loves PRs with lots of red
} | ||
} | ||
|
||
impl TryFrom<Attribute> for AttributeData { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this was just moved from wherever it was before, but seeing a try_from for both a ref and consuming seems odd. I can't quite convince myself this version would never be needed, but it sure seems unlikely.
Not asking for a change here. I'm just noting something I saw for future me writing trait implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really remember why I added this the first time, yeah. Probably trying to do something mildly clever with iterators.
#[cfg(any(test, feature = "proptest-strategies"))] | ||
pub mod strategy; | ||
#[cfg(any(test, feature = "pod"))] | ||
pub mod pod; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not sure on this pod as feature thing. Also, we should probably standardize that all module definitions (that aren't tests
) should go at the top of the file. Finding the code/test transition isn't the easiest thing to navigate to just to check if there's anything interesting going on. Don't worry about that as part of this PR. Just another thing I've noticed while reading through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think we have some stylization to do
- modules at the top
use
declaration un-grouping
serde_json = { workspace = true, optional = true } | ||
thiserror = { workspace = true } | ||
tiledb-proc-macro = { workspace = true, optional = true } | ||
tiledb-sys = { workspace = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reading this PR pretty quickly, so I may have missed an actual ffi call, but I would only expect (and have only seen) references to various constants. We could pretty easily split those off into a tiledb-sys-defs crate to avoid this crate from requiring a link to libtiledb.so. Not needed for this PR, just a thought I had keeping in mind part of the motivation for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each time I add a new crate I have to add a new build.rs
file for that crate. Isn't that what sets the linker path, not the presence of the sys
crate?
That said I do think it might be nice to split out the typedefs and constants for sys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's two halves: The dependency on tiledb-sys introduces the requirement that libtiledb.so is linked to the final executable. The build.rs in non-tiledb-sys crates tells the linker where to find it on the current system (assuming its not in a system standard path).
So basically, if we split out tiledb-sys-defs that contains just the constant definitions, then anything that doesn't depend on tiledb-sys (directly or indirectly) will no longer need to provide the build.rs.
Or in other words, the build.rs requirement is a symptom of what's actually linking to libtiledb.so, when we remove that need we should be able to remove a lot of the build.rs scripts from the crates that no longer need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see that more clearly now from looking at sys/build.rs
. Definitely will follow up with that in a subsequent PR. Thanks!
Copy, Clone, Debug, Deserialize, Eq, OptionSubset, PartialEq, Serialize, | ||
)] | ||
#[derive(Copy, Clone, Debug, Eq, PartialEq)] | ||
#[cfg_attr(feature = "option-subset", derive(OptionSubset))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another feature that I'm not 100% sold on. Easy to change down the line, so not requesting a change. Just seems a bit "feature happy" is all.
serde_json = { workspace = true, optional = true } | ||
strategy-ext = { workspace = true, optional = true } | ||
thiserror = { workspace = true } | ||
tiledb-common = { workspace = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a note to self that this hasn't removed the libtiledb.so linkage since we've got this transitive dependency. Though it looks pretty straightforward to cut that tie in a follow up PR that I assume is probably already half written waiting for me to approve this one. So on with the show!
- move cfg impls into strategy.rs - remove ToStringCore and FromStringCore - fix 'cargo check --tests' from API crate - Fix comment typo - Remove empty pod mod query
Story details: https://app.shortcut.com/tiledb-inc/story/57786
This pull request is a rather large "Part 1" of the linked story, whose goal is to restructure our crates in
tiledb-rs
so that the only code which requires linking with core tiledb is the code which actually calls the C APIs. Doing this will simplify the build process for users of thetiledb-rs
crates which do not need to use the C APIs directly, such as a client which sends RPCs to a server in tiledb-cloud.We have written handles to tiledb schema objects:
struct Attribute
,struct Dimension
,struct Domain
,struct Schema
, and so on. When we turned our eye towards property-based testing, we concluded that we needed to write structures whose fields duplicate the fields of the objects inside of the core library (whereas the handles just provide methods to get or set them). But over time we have come to use these secondary structures more and more, as they are often more user-friendly. The goal of this "Part 1" pull request is to move these secondary structures to their own crate, which is tentatively namedtiledb-serde
in the diff.Unfortunately this goal requires moving lots and lots of other pieces.
The new
tiledb
crates:tiledb-common
: contains enums and other utility pieces, most notably includingDatatype
,FilterData
, andRange
.tiledb-serde
: contains the secondary structures described above. The name comes from the idea that these structures can be used to deriveDesesrialize
andSerialize
for the API handles. I think the name should be changed but I wanted to get things compiling and running again before taking that step.tiledb-api
: renamed from the previous monolithictiledb
.Each of the above crates has features
serde
andproptest-strategies
. Theserde
feature enables derivedDeserialize
andSerialize
impls, but fortiledb-api
this also addstiledb-serde
as a dependency. Theproptest-strategies
feature similarly definesArbitrary
impls (for the most part) and is used as adev-depedency
by thetiledb-api
crate.There is a bit of
thiserror
-ization as entities which previously usedTileDBError
were moved into a new crate which does not haveTileDBError
. But thetiledb-api
crate itself still uses the largeTileDBError
enum.Additionally, this pull request adds a new top-level directory to the workspace,
test-utils
. This moves the various disjoint items from the previoustiledb-test-utils
crate into their own crates. All of theCells
testing code is also moved into a new cratecells
in this directory. This was mostly done to split outstrategy-ext
but the rest followed from there.For the most part, code has just moved around, without any changes, but there are some exceptions:
PhysicalType
trait. Because this trait is now defined in a different crate thantiledb-api
, we cannot provide generic impls onPhysicalType
for some of the query stuff. A new traitCellValue
is introduced to work around this.cells
crate cannot have itsattach_read
andattach_write
methods anymore since those use structures fromtiledb-api
, which uses oncells
as a test dependency. To resolve this thetiledb-api
crate declares traitsToReadQuery
andToWriteQuery
and provides impls for the items fromcells
.FilterListData
to represent a filter list handle for proptests; this has been replaced withVec<FilterData>
.