-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add geo-traits crate #1157
base: main
Are you sure you want to change the base?
Add geo-traits crate #1157
Conversation
Geometry::MultiPolygon(p) => GeometryType::MultiPolygon(p), | ||
Geometry::GeometryCollection(p) => GeometryType::GeometryCollection(p), | ||
Geometry::Rect(p) => GeometryType::Rect(p), | ||
_ => todo!(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bloop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because I didn't add Line
and Triangle
traits yet, but we can do that before merging if you'd like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for driving this initiative @kylebarron! I'm excited to use these traits in my own projects.
This pull request looks mergeable to me! With the understanding it is not yet ready to be published. I want to hear from @michaelkirk and @urschrei before merging though.
In the medium term (in 2024) I would love to add 3d support to geoarrow, so I want to keep thinking about xyz geometry traits, but I'm pretty happy with these for 2d geometries. They've worked well so far in geoarrow. |
Thanks for keeping this up to date WRT to main @kylebarron! I said in #1011 (and still believe) that having some examples of algorithms that use these trait definitions would be key to evaluating them. Are there examples anywhere that utilize these traits? @frewsxcv - have you tried, or would you be willing to see how this code works in your project? |
Are the usages in geoarrow-rs sufficient? Or are you looking for something else? Also would love to hear why you find this a blocker to merging these in to the repository. |
In rgis I would like to be able to read geospatial data and perform operations on the underlying data without having to convert through intermediary representations like |
Examples of how they're used today in geoarrow-rs:
@JosiahParry has also implemented these traits in serde_esri, and I believe he's tested against geoarrow. |
Thanks for the examples @kylebarron! I'll take a look. And relevant to @frewsxcv question:
I continue to think it's reasonable that code in main is intended to be useful, and that seeing code actually used in pursuit of the problem it purports to solve seems like a reasonable bar. I look forward to looking at @kylebarron's examples to that end.
I understand that's the theoretical goal, and a worthy one. I'm asking to see if this actually works towards that goal. Care to give it a shot? Is there something that keeps you from building against this branch, for example? |
@JosiahParry - can you elaborate on what these traits enable you to do? Presumably it's useful to you or you wouldn't have done it 😉 |
Nope! There is nothing blocking me from building on-top off a branch. The point of merging the code into |
@michaelkirk Correct me if this is wrong, but it sounds like you are skeptical of the usefulness of these traits without actually saying that. Am I mind reading correctly? I feel like there's an unusual and surprising amount of resistance to merging this in. |
I feel a bit attacked! But it's true - I am skeptical of the traits code. Truly though, I am skeptical of all code. I am admittedly bit extra skeptical of the geo-traits code in that it's not just a single isolated algorithm, it's a cross cutting concern. I feel like I've said similar things before, but maybe not clearly enough. I appreciate @kylebarron patience with me.
This implementation looks good to me! You're using the trait accessors to implement a useful algorithm and it doesn't seem overly ceremonious compared to the implementation on geo-types.
Ugh, these ones on the other hand, seem a bit unfortunate. Is there a way to make these look more like the BoundingRect one? Or is it reasonable to assume that most algorithms implemented on geo-traits will need to have a separate trait definition ( I haven't yet grokked why these implementations can't look more like the BoundingRect one - something about having a dedicated struct for the algorithm (i.e. Also, am I reading it correctly that the "trait based" implementation for FrechetDistance is simply converting to a geo-types and then calling FrechetDistance on that? Like we're not actually implementing the algorithm using the trait here in any meaningful way - or am I misunderstanding? Is there something preventing a geo-types free implementation? All the conversion API's seem reasonable at first blush, but can you connect the dots for me a little bit? let line_string_1_wkb: &[u8] = "..."
let line_string_bbox = ???
let line_string_2_wkb: &[u8] = "..."
let frechet_distance = ??? |
All algorithm traits from geo need a separate implementation in geoarrow because geo's traits always return scalar objects while geoarrow's traits need to return arrays or chunked arrays. I would love to have only one algorithm trait, e.g. just
In BoundingRect I'm reimplementing the core algorithm in a native way on the traits. I do this for The copy to a
I added a test for the first one in this example PR: geoarrow/geoarrow-rs#542 // A builder for a columnar WKB arrays
let mut wkb_builder = WKBBuilder::<i32>::new();
// Add a geo polygon to the WKB array
// This uses geo-traits to serialize to WKB and adds the binary to the array
wkb_builder.push_polygon(Some(&p0()));
// Finish the builder, creating an array of logical length 1.
let wkb_arr = wkb_builder.finish();
// Access the WKB scalar at position 0
// This is a reference onto the array. At this point the WKB is just a "blob" with no other
// information.
let wkb_scalar = wkb_arr.value(0);
// This is a "parsed" WKB object. The [WKBGeometry] type is an enum over each geometry
// type. WKBGeometry itself implements GeometryTrait but we need to unpack this to a
// WKBPolygon to access the object that has the PolygonTrait impl
let wkb_object = wkb_scalar.to_wkb_object();
// This is a WKBPolygon. It's already been scanned to know where each ring starts and ends,
// so it's O(1) from this point to access any single coordinate.
let wkb_polygon = match wkb_object {
WKBGeometry::Polygon(wkb_polygon) => wkb_polygon,
_ => unreachable!(),
};
// Add this wkb object directly into the BoundingRect
let mut bounding_rect = BoundingRect::new();
bounding_rect.add_polygon(&wkb_polygon);
assert_eq!(bounding_rect.minx, -111.);
assert_eq!(bounding_rect.miny, 41.);
assert_eq!(bounding_rect.maxx, -104.);
assert_eq!(bounding_rect.maxy, 45.); Run with
|
I think part of why these implementations look the way they do is because I want to have compile-time safety across geometry types. E.g. I don't want to have (The chunked array abstraction, where you might have 10 million geometries in 100 separate chunks where each of those chunks is a contiguous buffer holding 100,000 geometries, is quite useful and is currently my unit of "automatic parallelism". I.e. a rayon But the largest area of complexity in geoarrow-rs is handling these combinations. I think it's useful for UX to be able to call an operation like |
I would quite like this! I think it would be really neat if the geo-type traits also had a Take this super derived example (i adjusted @kylebarron's geo-traits/src/line_string.rs file) https://gist.github.com/JosiahParry/f3469f8df4706a4d36f8a47db9ae3e09) trait StartEnd where Self: LineStringTrait {
fn start_end(&self) -> Self::Owned {
let start = self.coord(0).unwrap();
let end = self.coord(self.num_coords() - 1).unwrap();
Self::new(vec![start, end])
}
}
impl<T: CoordNum> StartEnd for LineString<T> {}
#[test]
fn test_start_end() {
let coords = vec![
Coord {x: 1.0, y: 2.0},
Coord {x: 3.0, y: 4.0},
Coord {x: 5.0, y: 6.0},
];
let line_string = LineString::new(coords);
let start_end = line_string.start_end();
println!("{:?}", start_end);
assert_eq!(start_end.num_coords(), 2);
}
|
I think that trait definition has the same downsides as mentioned elsewhere, where you'll need a different trait definition for each geometry type. (You'd need a I also think that it's just extra unnecessary complexity, because I argue that the implementation of the algorithm should choose the most appropriate return type, as long as the geometry trait is also implemented on that return type. That means, for any fn convert_to_my_repr(geom: &LineStringTrait) -> MyLineStringStruct {} But for other libraries, especially in the context of libraries like geoarrow where you want to optimize the vectorized execution and not the scalar execution, it's much better to defer the data transformation until you have an array of geometries. |
I need something like this for Geodesy, but obviously only need the First, given your comments, @kylebarron, about wanting to extend towards 3D, I find the name of the Both because for my TeX-eyes, x_y looks like x with subscript y, and because with 3D, 4D and/or Measure extensions, it will be too much of an eyecatcher to spot Second, I find it reasonable to extract the Also, if I understand correctly, most of the reservations expressed above would not apply to Third, as a general remark of the Georust naming conventions Geodetically speaking, coordinates are ordered, and can be referred to as 1st, 2nd, 3rd and 4th, and as you never know what is the CRS specific convention regarding (N,E) vs (E,N), any kind of implied (mis)understanding is unfortunate. I suggest to let the trait implementer provide x(), y(), z(), t(), and let the trait autoprovide first(), second(), third(), fourth() (or preferably the other way round), and then just postulate that the naming is an internal convention, not to be confused with any external conventions. Evidently, trying to extend this to the entire georust ecosystem would neither be valuable or reasonable - wherever the culture descends from an "earth-is-flat-as-my-monitor" computer graphics world, x and y are obviously the more useful convention. But in a EDIT 2: On second thoughts, I actually think the Third item can be ignored: It doesn't matter how the "this is the internal convention" is communicated, and first()...fourth() is probably a unnecessary complication, that impedes communication. Fourth, I have implemented an extended version of As you can see, I took the liberty to extend to 4D+M, but probably not in a very elegant way. Also, I need the I would find it awesome to be able to implement a tighter integration between Geodesy and the Georust universe. And a somewhat geodesy-aware I know, however, that I am stomping on your grass, but please consider my suggestions - and I would be very happy for any comments regarding how I am wrong, and how I could work further towards a Georust<->Geodesy bridge. |
I'm interested in multi-dimensional geometry support, and think that handling it in traits could be a good way to incrementally add support. But I think it would probably be ideal if the dimension could be a generic, so that you'd statically know what type of coordinate you have, and which type of operations could be done on it. I'm happy to switch
geo has had some discussion about whether to merge Point and Coord, and so it's not 100% clear that we do want both |
That would require a good deal more genericity than just the dimension. In PROJ, we check the pipelines upon construction, such that the output type of step That's why I entirely dropped the "different coordinate types" in Rust Geodesy, and only support different dimensionalities: It just doesn't make sense - coordinates are interpreted by operators which expect a specific input type, and the gamut of possible types is infinite. No reason to try to make the type system reject having me feeding a utm zone 32 coordinate into a pipeline expecting some other type of projected coordinate (or even geographical or geocentric cartesians): The output becomes garbage, but its consistent garbage :-) and there are just too many kinds of parametrizations, each of which would lead to principally different gamuts for valid operations. So it's a tough job! |
@kylebarron My current, and much improved version, takes off from a much less modified version of your original pub trait CoordTrait {
type T: CoordNum;
const DIMENSION: usize;
const MEASURE: bool;
/// Accessors for the coordinate tuple components
fn x(&self) -> Self::T;
fn y(&self) -> Self::T;
fn z(&self) -> Self::T;
fn t(&self) -> Self::T;
fn m(&self) -> Self::T;
/// Returns a tuple that contains the two first components of the coord.
fn x_y(&self) -> (Self::T, Self::T) {
(self.x(), self.y())
}
} |
I had been thinking more along the lines of the proposal in this PR, where the struct is generic over a In particular, when Or maybe the better way of looking at this is that |
Essentially, I believe it is up to the person implementing the trait for a concrete data type to decide: It's your trait, but their data! In Geodesy, I use NaN for t and 0 for z, because that fits well with geodetical reasoning: If you have no height information, your coordinate is probably assumed to be placed directly on the ellipsoid, whereas if you have no time coordinate, but accidentally call a time dependent operation on the coordinate, you actually want the NaN value to spill out all over your coordinate, to indicate its invalidity ("stomping on it with the x-large NaN boots"). In a sense, NaN is the IEEE 754 equivalent of |
In general, I disagree because I think it's important to have a well-specified data contract for these traits. If
In this case I think it would be better to have |
I think the dream is that rather than NaN spilling out all over at runtime, you have the type system refuse to even compile the code, saying "hey dummy, you're trying to use a time-dependent algorithm with non-time coords". Maybe in practice it's not always possible when you instantiate your coordinates to know a priori what dimensions they'll need to have, but it seems like it would be possible some of the time. |
💯
In geoarrow this is where dynamic downcasting comes in, and where there's both geometry-specific types like |
@kylebarron said:
I, on the contrary, stipulate, that it is the implementer of the trait for a concrete type, who knows how best to adapt the concrete type to the (well documented, but not necessarily perfectly obtainable) expectations of the trait In Geodesy, I provide implementations of the
In that case, I believe all should be optional. Partially for consistency, partially to support "only height" or "only M" systems. My unfortunate knack for premature optimization is also concerned by the fact that making things optional will introduce branching right in the inner coordinate loop. In general, it will probably be of minor influence, though |
@michaelkirk said
As I also indicated wrt. PROJ pipelines in a previous message, that will be virtually impossible: You can never know what run-time pipelines a user may enter into the system, so compile-time checks will at best be spotty. Also note that in both PROJ and Geodesy, the data flow model is such that conversions are done in place, so the actual memory location holding the coordinate, will change coordinate type between the steps of a transformation pipeline. So the only thing you can reasonably say about a coordinate tuple is that it is a collection of 0 or more numbers. Their interpretation is entirely up to the associated CRS. While geoinformatics (and hence Geo) is, broadly speaking, a branch of mathematics, where a CRS is interpreted as a mathematical platonic ideal, geodesy (and hence Geodesy, and PROJ) is a branch of geophysics, and a CRS is an entirely empirical contraption, encompassing all the noise and messy realities of the physical world. |
I do worry about this, though I'd love to see some numbers/data on the ballpark overhead to expect here. In general, I also wonder whether there exists a trait that matches up to the expectations of geo and geodesy. It seems to me that geo tries to follow the Simple Features spec as closely as possible, which I believe, for example, stipulates that x and y always exist. It seems that geodesy needs a fundamentally different model of coordinates?
I've been coming around to this. Some people will want to use three-dimensional coords to mean XYM instead of XYZ, and maybe that's ok to allow in the trait? Maybe the trait should define structure more than intent? I've been thinking more about additional dimensions in geoarrow-rs, because I want to potentially use geoarrow-rs as a dependency in some Python projects where 3D coordinates are allowed. I started outlining some thoughts but to support more dimensions I think a const generic for the structural size is the way to go. Then the interpretation of XYM vs XYZ and XYZM vs XYZT are a function of the associated metadata, not the array types. |
In particular, thinking about something like pub trait CoordTrait<const DIM: usize> {
type T: CoordNum;
/// x component of this coord
fn x(&self) -> Self::T;
/// y component of this coord
fn y(&self) -> Self::T;
fn third_unchecked(&self) -> Self::T;
fn third_opt(&self) -> Option<Self::T> {
if DIM >= 3 {
Some(self.third_unchecked())
} else {
None
}
}
fn fourth_unchecked(&self) -> Self::T;
fn fourth_opt(&self) -> Option<Self::T> {
if DIM >= 4 {
Some(self.fourth_unchecked())
} else {
None
}
}
/// Returns a tuple that contains the x/horizontal & y/vertical component of the coord.
fn xy(&self) -> (Self::T, Self::T) {
(self.x(), self.y())
}
} then e.g. a LineStringTrait would become pub trait LineStringTrait<const DIM: usize>: Sized {
type T: CoordNum;
type ItemType<'a>: 'a + CoordTrait<DIM, T = Self::T>
where
Self: 'a; The unchecked variants allow for no-overhead access. I'm not sure if there's any way to make that smarter. Maybe having a method like fn array(&self) -> [Self::T; DIM]; makes sense. (Though I don't know when the compiler will catch an out of bounds access on that array). |
If we were to introduce a Something like: trait Coord {
type Scalar: GeoNum;
fn x(&self) -> Self::Scalar;
fn y(&self) -> Self::Scalar;
}
pub trait Area<C: Coord> {
fn signed_area(&self) -> Coord::Scalar;
fn unsigned_area(&self) -> Coord::Scalar;
} |
Won't you have to be generic over the |
Something like this: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=4ade74994f39a0e5fd995f2e237670e6
Does this code answer your question? The |
I think so! To me, the Coord trait seems the most imminently useful, if only to paper over our current split-brain Is that the use case you were thinking of, or did you have some other idea where parameterizing over a |
Bikeshed time– should the one-dimensional trait unit be called |
I have both coord and point traits at this point to mirror geo |
For whatever reason, GitHub stopped reminding me of updates on this PR, so I have not been following the discussion the last month or more. But just to follow up on @kylebarron's observation that:
You are probably right about that @kylebarron, and realizing this, I reworked Geodesy internals to be more trait based, and have recently merged trait implementations for both coordinate tuples (skip to line 110, to ignore macro based implementations of index, add, sub etc.), and ellipsoids. I believe this has led to huge code maintainability improvements. I have, however, not checked what it means for the speed. But Geodesy is still amply fast for my use cases. |
@kylebarron - I can understand why you'd do it for symmetry, but if you weren't worried about symmetry, do you anticipate any complications arising from getting rid of one of them? They seem functionally redundant from what I understand. Provided that there will be either a re: name bikeshedding: I think it mostly doesn't matter, but I'd vote for Point. I think it's a little more natural: For example:
Being confused by the second example might be a stretch... but somehow "Coordinate Geometry" feels to me like it could be referring to any geometry that contains coordinates (any of Point, LineString, Polygon, etc.), as opposed to "Point Geometry" which feels like it's more likely talking about only a single spot. I'd also say "Point" is maybe slightly more approachable of a term. And we don't need to abbreviate it! Any counter examples where it's worse? It wouldn't be the end of the world for me either way. |
I agree they're functionally redundant, and only exist to be semantically explicit.
That's fine with me.
Not that I can think of right now |
I followed the trail over from this issue in geopolars to this discussion here in geo. And then the trail seemed to go cold and I was a little sad. But lo and behold! An open PR that implements the traits! I really admire your perseverance here @kylebarron and I think you're doing great work. Actually on topic: I agree with @michaelkirk that |
CHANGES.md
if knowledge of this change could be valuable to users.Updated version of #1019 that's clean against the latest
main
. Aside from having no extra diff frommain
, it also has:Sized
required bound, but means that all sources get iterators out of the box with minimal required impls.