Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A Consistent and Principled Approach to GDAL C Pointers #445

Closed
wants to merge 4 commits into from

Conversation

metasim
Copy link
Contributor

@metasim metasim commented Oct 1, 2023


Note: This PR is likely to be abandoned. While the foreign-types changes work well with the raster-related types, the ownership model behind the vector and mdarray types is sufficiently complicated that a transition would require more interested participants than currently available, and is likely not worth the effort at this time.


Remaining Tasks:

  • Convert Dataset (dependency Maintenance cleanup on dataset.rs #447)
  • Convert GCP (possibly; it has a Rust type mirroring C type)
  • Figure out what to do with {Owned}Layer (it's really strange)
  • Convert FieldDefn
  • Convert mdarray::MDArray
  • Convert mdarray::Group
  • Convert mdarray::Dimension
  • Convert mdarray::ExtendedDataType
  • Convert mdarray::Attribute
  • Rename SpatialRef to SpatialReference to avoid SpatialRefRef
  • Convert any relevant secondary APIs, e.g. things from vsi, cpl, etc.
  • Unify mutable reference handling (see below) (dependency Inconsistent mutable reference tracking in vector module. #448)
  • Extended CHANGELOG documentation to help users migrate
  • Run under Sanitizer workflow
  • I agree to follow the project's code of conduct.
  • I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

Background

The Rust bindings to the GDAL library have been developed and maintained by many contributors over a time span of almost a decade (the first commit on record is March 25, 2014). As is normal in such a situation, inconsistencies and diverging principles arise. The scope of this PR is to address inconsistencies in the handling of pointers to C-instantiated GDAL types, making use use of the foreign-types crate as a means.

The foreign-types crate is mature and widely used, with 45 public dependents (including openssl), approximately 120,000 daily downloads, and > 65 million total downloads. It is dual licensed under Apache and MIT licenses. It provides an opinionated set of types and traits for wrapping pointers to C memory, with support for borrowed vs owned references, Clone and Drop, with optional proc-macros to eliminate some boilerplate. The included documentation and examples are somewhat sparse, but the maintainer is responsive to questions. If you dig deep enough (including the very large projects that use the crate), you can find good examples. While the names and abstractions declared in foreign-types may not be to the preferences of all users, maturity, consistency and friction-lowering were given primacy in considering foreign-types.

Note: For simplicity I use the term "pointer" to include memory addresses, handles, or any other identifier to a C-allocated resource in GDAL.

Goals

The impetus for this PR is several logged issues around pointer handling:

This PR aims to provide foundational work helpful in addressing these issues, primarily focusing on "safe consistency". Specifically:

  • Visibility: There's inconsistency as to when pointer access functions are pub. Because georust/gdal is (currently) a subset of the functionality covered by OSGeo/gdal, it's important to enable advanced users to access GDAL pointers in a principled but enabling way.
  • unsafe: There's inconsistency in when pointer-related methods are marked as unsafe. It is not idiomatic to treat pointer reading, manipulation or arithmetic as unsafe. Rust considers it safe to manipulate pointers and other handles, while dereferencing or turning them back into something useable is unsafe.
  • Naming :
    • Accessor methods: Currently we have names like c_dataset, to_c_hsrs, c_options, to_c_hct, c_geometry, etc. to access the C pointer.
    • Constructor methods: Similarly, there are currently various mechanisms for constructing Rust types from C pointers
  • Owned vs borrowed values: GDAL assumes a delineation between owned vs. borrowed pointers, as declared in the documentation of most functions. With foreign-types we can lift those documented concepts into compiler-checked constraints, handling the allocation, deallocation, cloning, ownership transfer, and lifetime tracking aspects of the resource lifecycle.

Benefits

The intended benefits are:

  • Improved user ergonomics via consistency
  • Improved efficiency when binding to additional GDAL features (e.g. VRTs, Warp API, etc.)
  • Reduced maintenance burden via lower cognitive load
  • Better ergonomics for advanced users implementing alternative or bespoke bindings to unexposed GDAL features
  • The Big One: Full compiler-checked resource lifecycle handling, including allocation, deallocation, cloning, ownership transfer, and lifetimes

Scope

This PR migrates the following types to use foreign-types

  • Defn
  • Feature
  • Geometry
  • SpatialRef
  • CoordTransform
  • CoordTransformOptions
  • RasterBand

(See Remaining Tasks below)

Note: Because of this scope, many APIs will break, necessitating clear documentation and appropriate semantic version handling (albeit we're still < 1.0).


Notes on Migration

These are some thoughts I had about the migration experience, some of which are copied from the related discussion on this topic. I include them here for those interested in the deeper deails behind certain implementation decisions.

  • At first I didn't love the idea of using the foreign_type! macros, but after implementing ForeignType and ForeignTypeRef manually, it's definitely worth it when it's appropriate. it is not always appropriate, such as the case for RasterBand where we are always working with shared references (always owned by Dataset). See below.
  • When using the macro, looking at the generated code can be helpful in learning how to properly use the types. See here for an example.
  • You can handle both generics as well as lifetime parameters, but takes deeper understanding of foreign-types.
  • ForeignType::from_ptr, which is unsafe, is how you create instances.
  • ForeignType::as_ptr, which is "safe" (as we desire here), lets you get the pointer.
  • ForeignType::into_ptr consumes/moves ownership, helpful for C functions that take over ownership.
  • Except for constructors and other methods that do not take a &self parameter, most methods should be implemented on the borrowed XYZRef type. This is because the owned XYZ type auto de-refs to it, enabling methods to work on both types.
  • If Clone is supported, then ToOwned is implemented by the macro, which is nice. This is how you go from XYZRef to XYZ (via XYZRef::to_owned), i.e. how you convert from a borrowed reference to an owned instance.
  • (I think) The value of type CType is implicitly expected to be Sized. Therefore (and I could be missing something), the type alias in bindgen can't be explicitly used. For example bindgen gives us type OGRSpatialReferenceH = *mut libc::c_void, but in ForeignType implementation, type CType = OGRSpatialReferenceH will cause errors. Instead it ends up needing to be type CType = libc::c_void, which seems unfortunate when using foreign-types with bindgen. Might be worth a question to the maintainer.
  • There are several cases (such as Layer) where the code can be made significantly more principled through foreign_types:
    • e.g.: Layer was holding onto a &Defn solely for the lifetime parameter tracking, which wasn't technically needed originally, and easier to eliminate with foreign_types.
    • OwnedLayer can go away.
    • We can eliminate owned: bool fields that exist in some types because this is explicit in the type; e.g. Geometry::owned.
  • In cases where the type is always owned by the C library and never managed directly in Rust, the implementation is quite simple if you use Opaque and implement ForeignTypeRef. e.g. RasterBand.
  • Haven't yet figured out if pointers to pointers to C structures (such as CslStringList) can be effectively managed by this library.
  • The mutability semantics around RasterBand are decoupled from Dataset. Even though Dataset owns every RasterBand and defines its lifetime, you can mutate a RasterBand without having a mutable reference to Dataset, which seems wrong. I think this is what we want to do: Under foreign_types, Dataset::rasterband(&self, band_index: isize) would return Result<&mut Rasterband>. Or maybe there's a new method Dataset::rasterband_mut(&mut self, band_index: isize). See Remaining Tasks above.

@metasim metasim changed the title Initial foreign-types prototype. A Consistent and Principled Approach to GDAL C Pointers Oct 1, 2023
@metasim metasim marked this pull request as draft October 1, 2023 17:09
@metasim metasim force-pushed the prototype/foreign_types branch 2 times, most recently from 1974267 to c2e29cc Compare October 1, 2023 20:09
@metasim metasim mentioned this pull request Oct 1, 2023
2 tasks
bors bot added a commit that referenced this pull request Oct 2, 2023
447: Maintenance cleanup on `dataset.rs` r=lnicola a=metasim

- [X] I agree to follow the project's [code of conduct](https://github.com/georust/gdal/blob/master/CODE_OF_CONDUCT.md).
- [X] I added an entry to `CHANGES.md` if knowledge of this change could be valuable to users.
---

Changes: 
* Moved ancillary types from `dataset.rs` into their own or related files.
* Moved non-core Dataset methods into respective `raster` or `vector` modules. Aims to simplify maintainability via "separation of concerns".
* Removed `unsafe` from pointer accessor methods.


_Possibly_ breaking changes:

* Moved `LayerIterator`, `LayerOptions` and `Transaction` to `crate::vector`.

---

Aside: The impetus for this refactor is to focus the changes necessary for adding `Dataset` to #445. But even if that PR takes a different route, this change will benefit long-term maintenance.

Co-authored-by: Simeon H.K. Fitch <fitch@astraea.io>
<MDI key="STATISTICS_STDDEV">83.68444773935</MDI>
<MDI key="STATISTICS_VALID_PERCENT">100</MDI>
</Metadata>
</PAMRasterBand>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll revert this.

@@ -25,13 +26,13 @@ impl Dataset {
///
/// Applies to raster datasets, and fetches the
/// rasterband at the given _1-based_ index.
pub fn rasterband(&self, band_index: isize) -> Result<RasterBand> {
pub fn rasterband(&self, band_index: isize) -> Result<&mut RasterBand> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need to change. Semantically, it's no different from what we have now where you can get a RasterBand from a non-mut Dataset, and then mutate it. This just makes it more clear that (even now) you can get multiple mutable references to the same band data. See PR notes on rasterband_mut.

@metasim metasim force-pushed the prototype/foreign_types branch 2 times, most recently from a2b3614 to 0b010cd Compare October 2, 2023 21:44
@@ -153,7 +154,7 @@ impl Dataset {

Ok(Some(sql::ResultSet {
layer,
dataset: c_dataset,
dataset: c_dataset, // TODO: We're ending up with a shared reference here...
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need a second pair of eyes on this one.

unsafe { SpatialRef::from_c_obj(c_ptr) }.ok()
// NB: Creating a `SpatialRefRef` from a pointer acknowledges that GDAL is giving us
// a shared reference. We then call `to_owned` to appropriately get a clone.
Some(unsafe { SpatialRefRef::from_ptr(c_ptr) }.to_owned())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lnicola said in 7db2868#r128939164

We could return a SpatialRefRef here, right?
One more reason to rename these to SpatialReference.

@metasim
Copy link
Contributor Author

metasim commented Feb 25, 2024

While I think the approaches herein are ones we should adopt, I think it's too late (with volunteer labor) to apply a unified approach across the whole library. While the porting raster bindings was pretty straight forward, the vector and mdarray bindings are complicated enough to require the original author of those to do the work, or explain how ownership/exclusive access should be handled to someone else to take on the work.

@metasim metasim closed this Feb 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants