Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for mixed-geometry-type layers #298

Open
02JanDal opened this issue Jun 27, 2024 · 10 comments
Open

Support for mixed-geometry-type layers #298

02JanDal opened this issue Jun 27, 2024 · 10 comments

Comments

@02JanDal
Copy link

QGIS Enhancement: Support for mixed-geometry-type layers

Date 2024/06/27

Author Jan Dalheimer (@02JanDal)

Contact jan.dalheimer at sweco dot se

Version QGIS 3

Summary

Background

In the GIS world we have traditionally had a very clear split of one layer per geometry type, even if the layers otherwise are identical (same fields etc.). This however has a few drawbacks, in the case of QGIS this manifests in a few ways:

  • Attribute form configurations etc. need to be kept in sync between the layers
  • Relationships need to be configured for each layer, if both layers have multiple geometry types you'll quickly end up with 9 (given each layer has point, line and polygon geometries) or even more relationships, rather than just one
  • Processing needs to handle each geometry type separately initially (for example three parallell buffers followed by a union, rather than just a buffer)

Though slowly, modern geospatial technology is moving away from this restriction; GeoPackage, GeoJSON, OGC API Features, PostGIS and many other formats, protocols and databases allow for mixed geometry types per layer/table/collection.

Allowing for mixed geometry types gives a lot of additional flexibility in data modelling, a common example I like to use is shrubs. From a data modelling perspective it can make a lot of sense to have a single layer (with fields such as species, spread (how far from the center does it spread, as seen from above), etc.) and then geometries such as points (for single bushes), lines (for linear hedges) and polygons (for larger shrubs).

Another practical example, which triggered this need/proposal, are comprehensive plans (sv. översiktsplan, example), for which work is ongoing to produce a national specification in Sweden, as well as a national API to retrieve such plans from all the municipalities. The specification (and, consequently, the API) has a lot of mixed geometry classes to allow for flexibility in the planning.

Apart from layers in QGIS currently being locked to a single geometry type, there is also no consistent handling of geometry type selection; some data providers (like PostGIS) allow you to select which geometry type to use, others show a dialog allowing you to add a layer each (like GeoPackage) and further others just pick the first geometry type the find and ignore the rest (like OAPIF).

Proposed Solution

Introduce "mixed" as a geometry type for vector layers and streamline how geometry type selection is handled.

  • Qgis.GeometryType and possibly Qgis.WkbType gain a Mixed member
  • The geometry override will be moved from the data providers that implement it themselves into the base class
  • By default, pulling in a layer from the browser will add a mixed layer, while adding a layer from the "add layer" dialog will allow the user to select the geometry type (either specific or mixed)
  • Layers with mixed geometries will allow setting setting different symbols per geometry type, possible also some convenient fallbacks (i.e. if you only specify a point symbol then lines and polygons will be rendered with the point symbol at the centroid, though that can also be approximated by the user using geometry generators)
  • The Layers dock widget will, for mixed geometry layers, show an additional level after the layer name for each geometry type (either directly showing the symbol when one symbol per geometry type, like today per layer, or with a subtree each when using classified/etc. symbology)
  • A new geoprocessing tool will be added that takes a mixed layer and produces a layer each per geometry type
  • (maybe) A combobox to select geometry type will be added to the Source/Geometry tab of the layer properties dialog
  • Processing algorithms for which it makes sense (for example buffer and most that only work with attributes) will be adapted to be able to handle mixed geometries, though many (especially those working with specific geometry types) will refuse to use such a layer (requiring a split first) same as how it works with the wrong geometry type today

Initially, within the scope of this QEP, mixed geometry layers will be read-only. If you want to edit the data you'll still need to add the layers separately. This is mostly because editing will introduce a few more considerations (such as how to select which geometry type to draw), however that should of course be added eventually.

One semi-open question is how to handle GeometryCollection (i.e. were there are mixed geometry types within a single feature). Ideally, they'd be rendered with each respective geometry type symbology, and selecting a part of it selects all parts (similar to how multi-geometries work). Implementing this might have to wait for later though.

Deliverables

Roughly split by planned PR:

  • Handle geometry overrides in QgsDataProvider, adjust other providers accordingly, consistent UI for each data provider
  • Add Mixed as a geometry type, add it as an option in the UI, symbology handling, setting to change geometry type of layer
  • Processing algorithm to "split" mixed layer, adaption of some processing algorithms
  • (later, out-of-scope) Adaption of additional processing algorithms
  • (later, out-of-scope) Metadata on which specific geometry types are provided (if known), so that only they are shown when setting symbology etc.
  • (later, out-of-scope) Editing of mixed geometry layers

While pending it currently seems likely that we'll be able to get funding to implement the first three points from two Swedish governmental agencies.

Example(s)

Affected Files

TBD

Risks

This change is a large step away from a traditional and somewhat deeply rooted assumption about GIS. There might be plugins and other integrations that assume a single geometry type per layer that will cease to work with this change.

Performance Implications

Performance should be better or equal to having separate layers per geometry type.

Further Considerations/Improvements

Backwards Compatibility

This will introduce a new geometry type (other code might need to be adapted to handle this case) as well as move where overridden geometry types are stored in the project (requires a minor migration on project load, though it could also be implemented to read from old and new location).

See Risks.

Issue Tracking ID(s)

@rouault
Copy link

rouault commented Jun 27, 2024

  • Qgis.GeometryType and possibly Qgis.WkbType gain a Mixed member

Why not just using Qgis::WkbType::Unknown? For most, if not all, datasources that accept mixed geometry types, you know that you get a mix of geometry types only after doing a full scan of the layer (think of a multi gigabyte WFS layer when the server), which isn't practical in the general case. So effectively for those layers the geometry type is unknown at opening time. At least that's how this is modeled on the GDAL side with the wkbUnknown geometry type which means "unknown, potentially mixed". PostGIS base GEOMETRY type has similar semantics.

The geometry override will be moved from the data providers that implement it themselves into the base class

What do you mean by "geometry override"?

This change is a large step away from a traditional and somewhat deeply rooted assumption about GIS.

Indeed. This has significant breaking potential, and I don't really think this can be done in an incremental backwards compatible way in the 3.x series, or allowing the mixed geometry mode should be opt-in in all places. That sounds very much as a 4.0 topic, at least to allow the mixed geometry mode to be the default behavior.

While we are discussing significant changes, a closely related topic is the management of several geometry fields per layer, where currently providers need to expose one layer per geometry field. But admittedly that's a less frequent use case

@nyalldawson
Copy link
Contributor

I'm curious how we'd alter the gui for eg categorised renderers to accommodate multiple symbol types. Did you have ideas here?

@02JanDal
Copy link
Author

Why not just using Qgis::WkbType::Unknown?

Though it would, in a way, make linguistical semantically sense, that's not purely how it is used in the code today so it would change the semantics a lot of other code expects. A quick look through and you'll find places that use it to mean "mixed geometry", "not yet determined geometry", "no geometry" and "invalid geometry", likely more if one starts to look.

Now, for QGIS Core the best thing would of course be to take the opportunity to clean this up and introduce further enumeration literals for those other meanings, but that would further reduce the backwards compatibility with existing code (because a changed meaning is much harder to catch and can go unnoticed until the code breaks somewhere else, making debugging harder), even more than introducing a single new literal.

What do you mean by "geometry override"?

For example mRequestedGeomType in https://github.com/qgis/QGIS/blob/b578f1ea82ba9bd091198c1673583dadf21aa495/src/providers/postgres/qgspostgresprovider.cpp#L872

(it "overrides" the geometry as detected from the data source)

This has significant breaking potential, and I don't really think this can be done in an incremental backwards compatible way in the 3.x series, or allowing the mixed geometry mode should be opt-in in all places. That sounds very much as a 4.0 topic, at least to allow the mixed geometry mode to be the default behavior.

I haven't really been active enough in the Core parts of QGIS to be able to determine this, is there some sort of policy or similar on which changes require a new major version?

QGIS doesn't guarantee backwards compatibility between minor versions AFAIK, so my personal thinking is that as long as the change does not require major changes to external plugins etc. it should be fine? (an example of something I'd consider warranting a major version bump is changing the primary builds to Qt 6, or a complete rewrite of some major subsystem) Otherwise (in my experience from other projects) it's easy to get stuck in a loop of "these things all need to wait until the next major version, but they never get done so we can never release the next major version, and no-one wants to wait with their thing for the following major version because that'll take further years". (then again, I haven't really gotten enough insight into QGIS specifically to really have an informed opinion, so any previous discussions, plans or policys you can link me to would be greatly appreciated!)

In general though, I think this change should be doable with only minor disruption to external dependents; for example processing algorithms from plugins would simple not allow selecting a mixed layer if they have a filter on geometry type.

While we are discussing significant changes, a closely related topic is the management of several geometry fields per layer, where currently providers need to expose one layer per geometry field. But admittedly that's a less frequent use case

That's indeed also something I have considered in this context, and would really like to see added to QGIS at some point (and, like this, also challenges the traditional GIS-way of thinking while beginning to gain traction in protocols and formats). I think it's mostly orthogonal to this proposal though, so shouldn't affect or be affected to much.

I'm curious how we'd alter the gui for eg categorised renderers to accommodate multiple symbol types. Did you have ideas here?

A good question indeed!

Really I can think of two major approaches how to structure the symbology pane:

  1. A tab per geometry type, each tab containing a symbology editor widget for the geometry type
  2. Adjustments to the existing symbology widgets to integrate multi-geometry-type handling (for example in the case of categorized rendering you'd have the table of categories and for each row you'd be able to select a symbol per geometry type)

My plan is to, within the scope of this proposal and what I'll likely be able to get funding for, go for option 1, as it is significantly easier to implement ("just" a tab widget containing a QgsSymbolSelectorWidget, with some minor changes to QgsSymbolSelectorWidget and related to be able to choose a specific geometry type rather than just take it from the layer).

However, more long term, I think option 2 would be a lot more appreciated by users; in most cases I can imagine one would like to have the same setup (classification categories, etc.) for each geometry type, and likely very similar symbols (same colors etc.). But I think that that would require more work to QgsSymbolSelectorWidget and related widgets than I can justify at the time being. (but if the consensus here is option 2 or nothing I'll of course take it for another spin with the customer)

@nyalldawson
Copy link
Contributor

@02JanDal

I haven't really been active enough in the Core parts of QGIS to be able to determine this, is there some sort of policy or similar on which changes require a new major version?

Honestly, this makes me a little nervous. What you're describing is a very major change to qgis, and a change which will touch classes all over the codebase (data providers, gui, analysis tools, the vector rendering loop, map legends, layer tree, etc). I'm not convinced this is the kind of project which is suitable for a relative newcomer to core qgis development.

For a point of reference, I'd be estimating that doing this change properly would take someone very familiar with the qgis codebase at least 3-5 weeks full time development. For someone learning the code base, you'd want to allow much longer, and also factor in a lot of time for the follow up fixes which will be needed over the next ~12 months as the changes get widespread testing.

I just want you to be fully aware of the magnitude of the changes and the amount of work this will require before committing yourself to undertaking it. 👍

@andreasneumann
Copy link
Member

Should we perhaps consider this for QGIS 4.x - where API breaks are also allowed? Like @nyalldawson I also think that this would have tons of side-effects ....

I think it would be really great to have these mixed-geometry-type layers. Something I wanted to have for quite some time. But it needs to involve careful planning and is probably like a "Marathon". A lot of follow-up fixes will have to be done.

@NathanW2
Copy link
Member

NathanW2 commented Jun 28, 2024 via email

@haubourg
Copy link
Member

Old remembrances of MapInfo untyped geometries , with oracle untyped geometries reminds me of how my information was highly chaotic and hard to use. To me this sounds like more like a regression than an enhancement, because I lived with the pain of untyped geometries years ago.

Not typing geometry types might imply very bad performances for medium to large datasets. We've had infinite fixes of the oracle provider with such lack of constraints.

I also share how profoundly this layer type is in all concepts of QGIS. I'm really afraid of the impacts on the GUI.
So the balance between the current glitches and the amount of work implied makes me nervous like Nyall.
Maybe I start to get old and grew up so much with QGIS that I miss the point of this change and I probably need a small proof of concept to convince me :)

@02JanDal
Copy link
Author

Honestly, this makes me a little nervous. What you're describing is a very major change to qgis, and a change which will touch classes all over the codebase (data providers, gui, analysis tools, the vector rendering loop, map legends, layer tree, etc). I'm not convinced this is the kind of project which is suitable for a relative newcomer to core qgis development.

Sure, it's a pretty big task, but I'm willing to give it a go. I'm also not completely new to this scene; I have a decent amount of years of experience with Qt/C++, and have done my fair share of QGIS plugin work. It's also not my first foray into QGIS Core, though my previous work has been minor adjustments for internal use. But I'm not quite sure how this all is relevant for this discussion, though I'd be happy to discuss how you can help newcomers like myself get more active in another forum.

I'm still figuring out the details of the extent of this change (that's, as I understand, one of the reasons for QEPs) to be able to refine my estimate in the discussion with the client.

Should we perhaps consider this for QGIS 4.x - where API breaks are also allowed?

See previous response with some questions from me.

There is also a lot of possible plugin breaks and hidden behavior changes that would crop up with this kind of project

I guess it would make sense to do some research on which plugins would be affected by this change (starting by checking those that use the geometryType and wkbType methods of QgsDataProvider and QgsVectorLayer). Would that help alleviate some fears?

Old remembrances of MapInfo untyped geometries , with oracle untyped geometries reminds me of how my information was highly chaotic and hard to use. To me this sounds like more like a regression than an enhancement, because I lived with the pain of untyped geometries years ago.

Well yes, it's quite easy to come up with scenarios where mixed geometry layers are a really bad idea, yet the same is the case for any number of other things. That doesn't mean there are some other really good reasons to have something.

@nyalldawson
Copy link
Contributor

@02JanDal

I guess it would make sense to do some research on which plugins would be affected by this change (starting by checking those that use the geometryType and wkbType methods of QgsDataProvider and QgsVectorLayer). Would that help alleviate some fears?

The problem isn't that the impact of this is unknown. It's that it's already known, and is very extensive! 😃

I haven't really been active enough in the Core parts of QGIS to be able to determine this, is there some sort of policy or similar on which changes require a new major version?

(Again, my suggestion would be to get started with core QGIS development on much smaller projects until you ARE familiar with all the processes and policies before undertaking a huge task like this, but let's ignore that recommendation for now).

QGIS policy is that no API breaks are permitted during any major cycle. A plugin or user script written for QGIS 3.0 MUST still run on QGIS 3.40 and later. This is a hard requirement of all QGIS development work, and it's designed to give stability for plugin authors and enterprise deployments of QGIS. The responsibility and burden of work lies with the developer modifying QGIS to ensure that they don't break API, and they have to do the (often extensive) extra work required to ensure that existing APIs do not change.

That's what makes this particular task even harder. You'll be doing open-heart surgery on QGIS, replacing very low level, vital components while still having to keep all the public facing APIs unchanged. 🙀

QGIS 4.0 is not on the horizon yet. Even the transition to Qt 6 has been done carefully to avoid any API breakage with QGIS 3.0 plugins and scripts.

Why not just using Qgis::WkbType::Unknown?

Though it would, in a way, make linguistical semantically sense, that's not purely how it is used in the code today so it would change the semantics a lot of other code expects. A quick look through and you'll find places that use it to mean "mixed geometry", "not yet determined geometry", "no geometry" and "invalid geometry", likely more if one starts to look.

@rouault's suggestion is the correct approach here. We shouldn't just be adding additional WKB types to make a change like this simpler, as WKB IS a standard. (You'd actually create a lot more work for yourself by adding a new WKB type, rather then using the existing unknown type which is already handled by all the existing code paths)

A tab per geometry type, each tab containing a symbology editor widget for the geometry type

This sounds somewhat messy to me, and would result in a quite cramped UI eg when styling through the layer styling dock. The UX concerns need to be thought through carefully to ensure we're not just throwing extra widgets (and user complexity) into QGIS . Which brings me to my advice:

Having undertaken hundreds of massive changes like to QGIS over the past decade, I'd like to suggest what I consider to be the most likely path to success for this project. (I'm not trying to "big note" myself here, rather I'm hoping to share my experience so that you don't get stuck in a situation with your customer which could potentially end up financially disastrous for you!).

  • Break this project up into smaller, more manageable chunks. Commit to these atomic chunks individually only, so that you can re-assess after each when you've a better understanding of the level of work required for the next steps.
  • Start with the smallest logical change. I think this should be adding opt-in support at the vector data provider level for mixed geometry type layers. This should be an API only change, and completely opt-in, so that there's no user-facing changes and no changes to how existing plugins/scripts (and other parts of QGIS) operate. You could do this by adapting individual data providers so that you specify via an extra token on the layer's source that the layer should be loaded as mixed geometry type. Extra API could be added on the QgsVectorDataProvider / QgsVectorLayer level to add getters so that you can determine whether a data provider/layer has mixed geometry types. The layer/data provider would return the Unknown WKB type, ensuring all the existing code for handling non polygon/line/point layers "just works" without change. Implement the required logic in each vector data provider's feature iterator implementations to handle this change. Adapt the memory data provider to support mixed geometry types (since this provider has to be a lossless super-set of the functionality of all other vector data providers, this is a must-do).
  • When this change has been proposed, accepted, coded and merged, move onto another set of smaller atomic changes. Eg: "add geometry type filtering support to feature requests, implement for all provider feature iterators, and adapt test suite to add coverage". "Add support for mixed geometry type layers to processing tools". "Add processing tool for splitting mixed geometry type layers to strict geometry type subsets". "Modify layer loading UI in QGIS to permit loading layers as mixed geometry types". "Handle mixed geometry type layers in vector renderer loop", etc. Let each atomic change be discussed individually.
  • If you try for a mega "all-in-one" approach like you propose here, you'll spend weeks and weeks coding up something you think is correct, implementing GUI for this, etc ... and then submit it and be rejected because you've made some fundamentally incorrect choice at step 1. 😱

Hope that helps, and it's heard it in the collaborative/helpful spirit it's intended to be given in 👍

@02JanDal
Copy link
Author

Again, my suggestion would be to get started with core QGIS development on much smaller projects until you ARE familiar with all the processes and policies before undertaking a huge task like this, but let's ignore that recommendation for now

I also have some other, smaller, stuff I'll hopefully find the time to work on before I get to do any work on this. Though I also intended to break this one down into smaller pieces (Deliverables in original text) rather than tackle the full extent at once (see below).

I've for a long time wanted to get more into QGIS Core development, but through a combination of it being hard to find paid work, most things of personal interest being even larger in scope than this, and free time that is already taken up by several other projects it hasn't happened yet, though I hope to be able to break through that barrier now.

QGIS policy is that no API breaks are permitted during any major cycle.

Thank you, this clarifies it! I think it would be useful (for people like me) if this would be documented at https://docs.qgis.org/3.34/en/docs/developers_guide/index.html or in a CONTRIBUTING.md. I'd be happy to contribute it to the docs, though I think it would make more sense for someone who knows more of the details (for example if this only applies to the Python interface or also C++, if it is only API-compatibility or also ABI-compatibility, etc.) to do so, I'll see about opening an issue on the docs repo.

This sounds somewhat messy to me, and would result in a quite cramped UI eg when styling through the layer styling dock

Yeah, as I noted I don't think it's an ideal solution either, just what I could come up with within the scope of funding.

Break this project up into smaller, more manageable chunks

Based on your input, here is a revised plan (partly so that I can rephrase it in my own words to make sure I've understood your suggestions):

  1. Essentially your second bullet point. Introduce some way for the data provider and layer to tell exactly which geometry types it (might) contain. For single-geometry-type layers this will be a single-item list, for no-geometry layers it will be an empty list, and likely the same for truly unknown. Multi-geometry-type layers would return a more-than-one-item list as well as "unknown" for the existing getters. Implement this at least for the memory data provider and 2-3 others (like OAPIF, GDAL (if doable with GDALs interface, haven't looked into that), and postgres).
    • Also make a note (likely as a code comment) to revisit this interface in QGIS 4 to see if it can be better combined with the existing getters)
    • At this point nothing will have changed for the user
  2. Allow the user to specify a feature type filter in the data source URI, use it in the feature iterators
    • Still no user-visible change, though it will be possible to add filtered layers using Python
    • Having set a filter will result in the new getter from step 1 to only return that geometry type, getting access to the full unfiltered list would need another metadata getter (possibly combined with getting information about potential geometry types in multiple geometry fields, which as noted is related, but out-of-scope)
  3. Use this functionality for selecting feature types in the UI (like can be done for postgres and GeoPackage layers today); those data providers implementing similar functionality today will need to support both until QGIS 4. Implement it consistently for all data providers based on the now common API.
    • At this point the most crucial user-facing feature is in place; it is possible to work with mixed geometry layers by adding them as a single geometry layer
  4. Implement rendering and symbology editing for mixed-geometry layers
  5. Adjust code from 4. to also allow adding a mixed geometry layer
  6. Adapt processing algorithms
  7. Editing of mixed geometry layers

Some of these might result in multiple PRs.

I'll have to talk to my client to see how this would affect their ability to provide funding (steps 1-3 should be fine, as noted I think a "proper" implementation of the GUI for step 5 might be to expensive), might be that I end up doing some steps in my spare time (as I'm also interested in this).

Does this sound better?

Let each atomic change be discussed individually.

If you try for a mega "all-in-one" approach like you propose here

I think there might have been some confusion here based on my understanding of how high-level QEPs are intended to be; I assumed a broader scope which would then at implementation time be further split into separate units of work and PRs (see my initial planned PRs, now the list above, I never intended to try to do this all in one go). Do I understand you correctly here that I should open separate QEPs for each of the steps above?

And for the future; does it still make sense to open one broader QEP and then do smaller, more focused, QEPs after some initial discussion, or should I go straight for smaller focused ones (with some background etc. repeated between them)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants