Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating shallowgroundwater raw data source #61

Merged
merged 65 commits into from
Jan 27, 2022
Merged

Conversation

florisvdh
Copy link
Member

@florisvdh florisvdh commented Aug 6, 2021

This builds on findings in https://github.com/inbo/n2khab-mne-design/pull/74#issue-698959257: shallowgroundwater has now been updated.

All explanation can be found in the compiled HTML file. Although validation results of this result are very good (new data source version, as gpkg, is available in GDrive), i.e. only 1.4% of obliggwdep types missed (see evaluation update in https://github.com/inbo/n2khab-mne-design/pull/74#issuecomment-894343758), more work will be needed to make the result more robust in covering (almost) all areas with 'shallow groundwater'. This will happen most probably by dropping some of the additions made here and instead adding areas based on other (dunes) + changed (soil?) environmental criteria. See comments at the end of the 'check results' chapter.

Beware, this code contains unordered R chunks with various
comparative (but uncommented) trials in doing geoprocessing:

- using sf package
- using rgrass7
- using qgisprocess
- using future package

In short, these are my conclusions:

- large vector datasets with lots of work for feature cleaning by GRASS, take hours to import in GRASS. Not doable, this is only feasible for one-time investments at organization/network level.
- sf can do it all, although unioning or (especially) buffering can take several minutes up to more than half an hour. In some cases the features need to be repaired (st_make_valid)
- QGIS reasonably copes with the tasks, it seems to take timings similar to sf (uses GEOS), but that depends on the algorithm. E.g. native:extractbylocation is much faster than sf_object1[sfobject2,], at least for large objects. However qgisprocess doesn't yet support all argument types.
- When preparing a GPKG layer for QGIS, in which only one geometry type is present, take care that the inputted sf object has that geometry type at the top level, not 'GEOMETRY' (which happens to occur after geoprocessing steps). If needed, use st_cast() before writing GPKG. It just lets QGIS read and render the GPKG layer much faster.
@florisvdh florisvdh requested a review from DriesAdriaens August 6, 2021 15:08
This avoids new_areas.gpkg from consisting of multiple polygons per
layer, while text says 'one multipolygon per layer' (which is what
we eventually apply anyway).
@florisvdh
Copy link
Member Author

florisvdh commented Oct 20, 2021

Up to and including 74f1643: we added buffered obliggwdep types within dune-texture soilmap polygons. All explanation can be found in the compiled HTML file. The evaluation is in https://github.com/inbo/n2khab-mne-design/pull/74#issuecomment-947829246.

@florisvdh
Copy link
Member Author

And in a7110e1 + 5a39b4c, we dropped the additional buffering of newly added areas. This was done in order to avoid too many false positives. All explanation can be found in the compiled HTML file. The evaluation is in https://github.com/inbo/n2khab-mne-design/pull/74#issuecomment-947839596.

…ed files *

This makes it easier when updating the input shallow_groundwater version.
Following previous naming, the output_filename would have been
sg3_extended_4.gpkg.
However we now consider '_extended' to be referring the latest
approach of extending, and we use the suffix '_20211129' to refer
the input shallowgroundwater layer.

Further, a parameter was added to select the appropriate layer
from the multilayer input GPKG.

Finally, the datestamp to get a reproducible GPKG file has been updated.
@florisvdh
Copy link
Member Author

A new input version of shallowgroundwater was prepared by @DriesAdriaens, i.e. version 20211129 (see google doc). The same updates as above were subsequently applied, resulting in sg_20211129_extended.gpkg. All explanation can be found in the compiled HTML file. The evaluation is in https://github.com/inbo/n2khab-mne-design/pull/74#issuecomment-988875414.

- the single needed layer is now made available on GDrive in a seperate (zipped) GPKG.
This makes the file much smaller to download.
- Consequently, file download is now obliged and the downloaded file is not stored
persistently, but in a temp folder.
- The resulting file is written to the n2khab_data folder, which is consistent with
the other R scripts (R markdown) present in this repository.

To make the single-layer derivative from ZonesOndiepGrondwater_20211129.gpkg, the
following R code was run in a separate R session:

```r
library(sf) # must be at least 1.0.4
library(rprojroot)

sgpath_20211129 <- find_root_file("n2khab_data/10_raw/shallowgroundwater_20211129",
                                  criterion = has_dir("n2khab_data"))

sgpath_20211129 <- file.path(sgpath_20211129, "shallowgroundwater_20211129.gpkg")

st_layers(sgpath_20211129)
st_delete(sgpath_20211129, layer = "ZOG_20211129_diss_allvars")
st_delete(sgpath_20211129, layer = "ZOG_20211129")

''# following requires ogrinfo program
''# on Windows, use shell() instead of system():
system(paste0("ogrinfo '", sgpath_20211129, "' -sql VACUUM"))

```
@florisvdh
Copy link
Member Author

The same version as in previous comment has been re-created (from input version 20211129 - cf. google doc), with following noteworthy changes:

  • keeping all essential attributes of the input layer, resulting in 14 (input) + 3 (calculated in R) = 17 attributes that document the origin of data.
  • with each multipolygon representing another combination of attribute values, this now leads to much more rows than in the previous 4-attribute versions (which deliberately focused only on the 3 new columns)
  • converting all attributes to boolean variables
  • renaming and reordering of attribute columns

This result should be the first official version of the data source, for release on Zenodo.

All explanation, the R workflow and some checks (incl. file hashes) can be found in the compiled HTML file.

@florisvdh
Copy link
Member Author

From #61 (comment):

This result should be the first official version of the data source, for release on Zenodo.

This result has been labelled as version shallowgroundwater_v1 of the datasource and has been released by @DriesAdriaens at https://doi.org/10.5281/zenodo.5902881!

@florisvdh florisvdh merged commit e06d124 into master Jan 27, 2022
@florisvdh florisvdh deleted the shallowgw_update branch January 27, 2022 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant