Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: allow extract() to do multiple functions #1335

Closed
dfriend21 opened this issue Nov 7, 2023 · 6 comments
Closed

Feature request: allow extract() to do multiple functions #1335

dfriend21 opened this issue Nov 7, 2023 · 6 comments

Comments

@dfriend21
Copy link
Contributor

I've been wanting to find a way to extract multiple summary stats for polygons that doesn't require calling extract() multiple times, since that's not especially efficient. I've found that I can accomplish this by creating a function that calculates the statistics and then passing that to extract():

library(terra)
v <- vect(system.file("ex/lux.shp", package = "terra"))
r <- rast(system.file("ex/elev.tif", package = "terra"))

f <- function(x){
  return(cbind(mean = mean(x, na.rm = TRUE), 
               sd = sd(x, na.rm = TRUE)))
}
e <- extract(r, v, f)
head(e)
#>      ID elevation elevation.1
#> [1,]  1  467.1052    34.58480
#> [2,]  2  333.8629    68.03058
#> [3,]  3  377.3712    77.14169
#> [4,]  4  373.6000    82.79129
#> [5,]  5  418.6490    48.35488
#> [6,]  6  314.9969    49.02558

Based on some quick investigation into the code, it looks like if you use common functions like mean, median, sum, etc. then it
uses C++ functions that are presumably more optimized (correct me if I'm wrong). So as it is right now there doesn't seem to be any way to calculate two of the optimized C++ functions at the same time, since the function I created will be using the base R functions. It'd be nice if there was a way to do this.

Just a suggestion, feel free to close if you don't want to implement this.

@kadyb
Copy link
Contributor

kadyb commented Nov 7, 2023

BTW: Do you know {exactextractr}? This is the fastest package for zonal statistics (but requires {sf} too for vector data).

library("sf")
library("terra")
library("exactextractr")

v <- read_sf(system.file("ex/lux.shp", package = "terra"))
r <- rast(system.file("ex/elev.tif", package = "terra"))

e <- exact_extract(r, v, fun = c("mean", "stdev"))
head(e)
#       mean    stdev
# 1 467.3792 33.98623
# 2 334.6855 68.61759
# 3 377.2069 77.09263
# 4 372.2498 81.72250
# 5 418.7867 48.63316
# 6 314.7698 49.40102

@dfriend21
Copy link
Contributor Author

Yes, I'm familiar with it. However, I've been struggling to make it work effectively in my situation. I'm working with rasters via HTTP (using vsicurl/) - I want to be able to calculate summary statistics for polygons without having to download the entire raster (some of them are 30+ GB), which is why I really want to calculate the statistics simultaneously rather than with separate extract calls. I tried using exactextractr - in a lot of the test cases I've tried, it works just fine. But I've had a few instances where it hangs. I'm guessing it has something to do with the max_cells_in_memory parameter. The situation where it hangs for me is one where I'm trying to extract data for two polygons that aren't close to each other - so I'm wondering if it's trying to download all of the data covered by the extent, even though the individual polygons aren't that big. That being said, I tried a different test case with two small polygons that were very far away from each other, and it worked just fine. Perhaps if I fiddled around with it a bit more, I could get it to work how I want it to - I haven't spent a ton of time messing with it. But terra::extract() runs quite quickly in the same situation, and since it's been more consistent for me, that's what I'm using right now.

@kadyb
Copy link
Contributor

kadyb commented Nov 8, 2023

Ok, I see. I don't know how to fix it, but maybe it is worth creating the issue in {exactextractr} and Dan will suggest a solution.

Another workaround, I think, is to use {data.table} (or {collapse}) and these statistics should be calculated in parallel for groups, which should also be faster than the current {terra}:

library("terra")
library("data.table")

v <- vect(system.file("ex/lux.shp", package = "terra"))
r <- rast(system.file("ex/elev.tif", package = "terra"))

e <- extract(r, v)
e <- setDT(e, key = "ID")
e <- e[, .(mean = mean(elevation, na.rm = TRUE),
           sd = sd(elevation, na.rm = TRUE)),
       by = ID]

@dfriend21
Copy link
Contributor Author

Thanks for the suggestion. I hadn't thought of just extracting the values and then doing the calculations on that (probably should have...). That's a good alternative.

@rhijmans
Copy link
Member

Thank you for the suggestion. You can now do:

library(terra)
#terra 1.7.60
v <- vect(system.file("ex/lux.shp", package = "terra"))
r <- rast(system.file("ex/elev.tif", package = "terra"))
e <- extract(r, v, fun=c("min", "mean", "max"), na.rm=TRUE)
head(e)
#  ID min_elevation mean_elevation max_elevation
#1  1           339       467.1052           547
#2  2           195       333.8629           514
#3  3           256       377.3712           517
#4  4           213       373.6000           520
#5  5           293       418.6490           511
#6  6           164       314.9969           403

@dfriend21
Copy link
Contributor Author

Great, thanks!

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Dec 10, 2024
# version 1.7-83

## bug fixes

- `flip(direction="vertical")` failed in some cases
  [#1518](rspatial/terra#1518) by Ed Carnell

- `zonal(as.raster=TRUE)` failed when the zonal raster was categorical
  [1514](rspatial/terra#1514) by Jessi L
  Brown

- `distance<data.frame,data.frame>` and `<matrix,matrix>` ignored the
  unit
  argument. [#1545](rspatial/terra#1545) by
  Wencheng Lau-Medrano

- NetCDF files with month time-step encode from 0-11 made R crash
  [#1544](rspatial/terra#1544) by Martin
  Holdrege

- `split<SpatVector>` only worked well if the split field was of type
  character. [#1530](rspatial/terra#1530) by
  Igor Graczykowski

- `gridDist` (and probably some other methods) emitted a "cannot
  overwrite existing file" error when processing large datasets
  [#1522](rspatial/terra#1522) by Clare
  Pearson

- `terrain` did not accept multiple variables
  [#1561](rspatial/terra#1561) by Michael
  Mahoney

- `rotate` was vulnerable to an integer overflow
  [#1562](rspatial/terra#1562) by Sacha
  Ruzzante

- `getTileExtents` could return overlapping tiles or tiles with gaps
  due to floating point
  imprecision. [#1564](rspatial/terra#1564)
  by Michael Sumner


## enhancements

- `as.list<SpatRasterDataset>` sets the names of the list
  [#1513](rspatial/terra#1513)

- a SpatVectorCollection can now be subset with its names; and if made
  from a list it takes the names from the list.
  [1515](rspatial/terra#1515) by jedgroev

- argument `fill_range` to plot<SpatRaster> and `plot<SpatVector>` to
  use the color of the extreme values of the specified range
  [#1553](rspatial/terra#1553) by Mike
  Koontz

- plet<SpatRaster> can now handle rasters with a "local" (Cartesian)
  CRS. [#1570](rspatial/terra#1570) by
  Augustin Lobo.

## new

- `map-region` returns the coordinates of the axes position of a map
  created with `plot<Spat*>`
  [https://github.com/rspatial/terra/issues/1517](https://github.com/rspatial/terra/issues/1517)
  by Daniel Schuch

- `polys<leaflet>` method
  [#1543](rspatial/terra#1543) by Márcia
  Barbosa

- `plot<SpatVectorCollection>` method
  [#1532](rspatial/terra#1532) by jedgroev

- `add_mtext` to add text around the margins of a
  map. [#1567](rspatial/terra#1567) by
  Daniel Schuch

# version 1.7-78

Released 2023-05-22

## bug fixes

- `writeVector` and `readVector` better handle empty geopackage layers
  [#1426](rspatial/terra#1426) by Andrew
  Gene Brown.

- `writeCDF` only wrote global variables if there was more than one
  [#1443](rspatial/terra#1443) by Daniel
  Schlaepfer

- `rasterize` with "by" returned odd layernames
  [#1435](rspatial/terra#1435) by Philippe
  Massicotte

- `convHull`, `minCircle` and `minRect` with a zero-row SpatVector
  crashed R [#1445](rspatial/terra#1445) by
  Andrew Gene Brown

- `rangeFill` with argument `circular=TRUE` did not work properly
  [#1460](rspatial/terra#1460) by Alice

- `crs(describe = TRUE)` returned an mis-ordered extent
  [#1485](rspatial/terra#1485) by Dimitri
  Falk

- `tapp` with a custom function and an index like "yearmonths" could
  shift time for not considering the time
  zone. [#1483](rspatial/terra#1483) by Finn
  Roberts

- `plot<SpatRaster>` could fail when there were multiple values with
  very small differences
  [#1491](rspatial/terra#1491) by srfall

- `as.data.frame<SpatRaster>` with "xy=TRUE" and "wide=FALSE" could
  fail if coordinates were very similar
  [#1476](rspatial/terra#1476) by Pascal
  Oettli

- `rasterizeGeom` now returns the correct layer name
  [#1472](rspatial/terra#1472) by
  HRodenhizer

- `cellSize` with "mask=TRUE" failed if the output was to be written
  to a temp file
  [#1496](rspatial/terra#1496) by Pascal
  Sauer

- `ext<SpatVectorProxy>` did not return the full extent
  [#1501](rspatial/terra#1501) by
  erkent-carb


## enhancements

- `extract` has new argument "small=TRUE" to allow for strict use of
  "touches=FALSE"
  [#1419](rspatial/terra#1419) by Floris
  Vanderhaeghe.

- `as.list<SpatRaster>` has new argument "geom=NULL"

- `rast<list>` now recognizes (x, y, z) base R "image" structures
  [stackoverflow]
  (https://stackoverflow.com/questions/77949551/rspatial-convert-a-grid-list-to-a-raster-using-terra)
  by Ignacio Marzan.

- `inset` has new arguments "offset" and "add"
  [#1422](rspatial/terra#1422) by Armand-CT

- `expanse<SpatRaster>` has argument `usenames`
  [#1446](rspatial/terra#1446) by Bappa Das

- the default color palette is now `terra::map.pal("viridis")` instead
  of `terrain.colors`. The default can be changes with
  `options(terra.pal=...)`
  [#1474](rspatial/terra#1474) by Derek
  Friend

- `as.list<SpatRasterDataset>` now returns a named
  list. [#1513](rspatial/terra#1513) by Eric
  R. Scott


## new

- `bestMatch<SpatRaster>` method

- argument "pairs=TRUE" to `cells` [https://github.com/rspatial/terra/issues/1487](https://github.com/rspatial/terra/issues/1487) by Floris Vanderhaeghe

- `add_grid` to add a grid to a map


# version 1.7-71

Released 2023-01-31

## bug fixes

- k_means did not work if there were NAs
  [#1314](rspatial/terra#1314) by Jakub
  Nowosad

- `layerCor` with a custom function did not work anymore
  [#1387](rspatial/terra#1387) by Jakub
  Nowosad

- `plet` broke when using "panel=TRUE"
  [#1384](rspatial/terra#1384) by Elise
  Hellwig

- using /vis3/ to open a SpatRaster did not work
  [#1382](rspatial/terra#1382) by Mike
  Koontz

- `plot<SpatRaster>(add=TRUE)` sampled the raster data without
  considering the extent of the
  map. [#1394](rspatial/terra#1394) by
  Márcia Barbosa

- `plot<SpatRaster>(add=TRUE)` now only considers the first layer of a
  multi-layer SpatRaster
  [1395](rspatial/terra#1395) by Márcia
  Barbosa

- `set.cats` failed with a tibble was used instead of a data.frame
  [#1406](rspatial/terra#1406) by Mike
  Koontz

- `polys` argument "alpha" was ignored if a single color was
  used. [#1413](rspatial/terra#1413) by
  Derek Friend

- `query` ignore the "vars" argument if all rows were
  selected. [#1398](rspatial/terra#1398) by
  erkent-carb.

- `spatSample` ignored "replace=TRUE" with random sampling,
  na.rm=TRUE, and a sample size larger than the non NA
  cells. [#1411](rspatial/terra#1411) by
  Babak Naimi

- `spatSample` sometimes returned fewer values than requested and
  available for lonlat
  rasters. [#1396](rspatial/terra#1396) by
  Márcia Barbosa.


## enhancements

- `vect<character>` now has argument "opts" for GDAL open options,
  e.g. to declare a file
  encoding. [#1389](rspatial/terra#1389) by
  Mats Blomqvist

- `plot(plg=list(tic=""))` now allows choosing alternative continuous
  legend tic-mark styles ("in", "out", "through" or "none")

- `makeTiles` has new argument "buffer"
  [#1408](rspatial/terra#1408) by Joy
  Flowers.


## new

- `prcomp<SpatRaster>` method
  [#1361](rspatial/terra#1361 (comment))
  by Jakub Nowosad

- `add_box` to add a box around the map. The box is drawn where the
  axes are, not around the plotting region.

- `getTileExtents` provides the extents of tiles. These may be used in
parallelization. See [#1391](https://github.com/rspa
tial/terra/issues/1391) by Alex Ilich.


# version 1.7-65

Released 2023-12-15

## bug fixes

- `flip` with argument `direction="vertical"` filed in some cases with
   large rasters processed in chunks
   [0b714b0](rspatial/terra@0b714b0)
   by Dulci on [stackoveflow](
   https://stackoverflow.com/questions/77304534/rspatial-terraflip-error-when-flipping-a-multi-layer-spatrast-object)

- SpatRaster now correctly handles `NA & FALSE` and `NA | TRUE`
  [#1316](rspatial/terra#1316) by John Baums

- `set.names` wasn't working properly for SpatRasterDataset or
  SpatRasterCollection
  [#1333](rspatial/terra#1333) by Derek Friend

- `extract` with argument "layer" not NULL shifted the layers
  [#1332](rspatial/terra#1332) by Ewan
  Wakefield

- `terraOptions` did not capture "memmin" on
  -[stackoverflow](https://stackoverflow.com/questions/77552234/controlling-chunk
  -size-in-terra) by dww

- `rasterize` with points and a built-in function could crash if no
  field was used
  [#1369](rspatial/terra#1369) by
  anjelinejeline


## enhancements

- `mosaic` can now use `fun="modal"`

- `rast<matrix> and rast<data.frame>` now have option 'type="xylz"
  [#1318](rspatial/terra#1318) by Agustin
  Lobo

- `extract<SpatRaster,SpatVector>` can now use multiple summarizing
  functions [#1335](rspatial/terra#1335) by
  Derek Friend

- `disagg` and `focal` have more optimistic memory requirement
  estimation [#1334](rspatial/terra#1334) by
  Mikko Kuronen

## new

- `k_means<SpatRaster>` method
  [#1314](rspatial/terra#1314) by Agustin
  Lobo

- `princomp<SpatRaster>` method
  [#1361](rspatial/terra#1361) by Alex Ilich

- `has.time<SpatRaster>` method

- new argument "raw=FALSE" to `rast`, `sds`, and `sprc` to allow
  ignoring scale and offset
  [1354](rspatial/terra#1354) by Insang Song


# version 1.7-55

Released 2023-10-14

## bug fixes

- `mosaic` ignored the filename argument if the SpatRasterCollection
  only had a single SpatRaster
  [#1267](rspatial/terra#1267) by Michael
  Mahoney

- Attempting to use `extract` with a raster file that had been deleted
  crashed R. [#1268](rspatial/terra#1268) by
  Derek Friend

- `split<SpatVector,SpatVector>` did not work well in all
  cases. [#1256](rspatial/terra#1256) by
  Derek Corcoran Barrios

- `intersect` with two SpatVectors crashed R if there was a date/time
variable [#1273]( rspatial/terra#1273) by
Dave Dixon

- "values=FALSE" was ignored by
  `spatSample<SpatRaster>(method="weights")`
  [#1275](rspatial/terra#1275) by François
  Rousseu

- `coltab<-` again works with a list as value
[#1280](rspatial/terra#1280) by Diego
Hernangómez

- `stretch` with histogram equalization was not memory-safe
  [#1305](rspatial/terra#1305) by Evan Hersh

- `plot` now resets the "mar" parameter
  [#1297](rspatial/terra#1297) by Márcia
  Barbosa

- `plotRGB` ignored the "smooth" argument
  [#1307](rspatial/terra#1307) by Timothée
  Giraud


## enhancements

- argument "gdal" in `project` was renamed to "use_gdal"
  [#1269](rspatial/terra#1269) by Stuart
  Brown.

- SpatVector attributes can now be stored as an ordered factor
  [#1277](rspatial/terra#1277) by Ben Notkin

- `plot<SpatVector>` now uses an "interval" legend when breaks are
  supplied [#1303](rspatial/terra#1303) by
  Gonzalo Rizzo

- `crop<SpatRaster>` now keeps more metadata, including variable names
  [#1302](rspatial/terra#1302) by rhgof

- `extract(fun="table")` now returns an easier to use data.frame
[#1294](rspatial/terra#1294) by Fernando
Aramburu.


## new
- `metags<-` and `metags` to set arbitrary SpatRaster/file level
   metadata [#1304](https://github.com/rspatial/terra/issues/ 1304) by
   Francesco Chianucci

# version 1.7-46

Released 2023-09-06

## bug fixes

- `plot<SpatVector>` used the wrong main label in some cases
  [#1210](rspatial/terra#1210) by Márcia
  Barbosa

- `plotRGB` failed with an "ext=" argument
  [#1228](rspatial/terra#1228) by Dave Edge

- `rast<array>` failed badly when the array had less than three
  dimensions. [#1254](rspatial/terra#1254)
  by andreimirt.

- `all.equal` for a SpatRaster with multiple layers
[#1236](rspatial/terra#1236) by Sarah
Endicot t

- `zonal(wide=FALSE)` could give wrong results if the zonal SpatRaster
  had "layer" as
  layername. [#1251](rspatial/terra#1251) by
  Jeff Hanson

- `panel` now support argument "range"
  [#141](rspatial/terra#1241) by Jakub
  Nowosad

- `rasterize` with `by=` returned wrong layernames if the by field was
  not sorted [#1266](rspatial/terra#1266) by
  Sebastian Dunnett

- `mosaic` with multiple layers was not correct
  [#1262](rspatial/terra#1262) by
  Jean-Romain


## enhancements

- `wrap<SpatRaster>` now stores color tables
  [#1215](rspatial/terra#1215) by Patrick
  Brown

- `global` now has a "maxcell" argument
  [#1213](rspatial/terra#1213) by Alex Ilich

- `layerCor` with fun='pearson' now returns output with the layer
  names [#1206](rspatial/terra#1206)

- `vrt` now has argument "set_names"
  [#1244](rspatial/terra#1244) by sam-a-levy

- `vrt` now has argument "return_filename"
  [#1258](rspatial/terra#1258) by Krzysztof
  Dyba

- `project<SpatRaster>` has new argument "by_util" exposing the GDAL
  warp utility [#1222](rspatial/terra#1222) by
  Michael Sumner.


## new
- `compareGeom` for list and SpatRasterCollection
  [#1207](rspatial/terra#1207) by Sarah
  Endicott

- `is.rotated<SpatRaster>` method
  [#1229](rspatial/terra#1229) by Andy Lyons

- `forceCCW<SpatVector>` method to force counter-clockwise orientation
  of polygons [#1249](rspatial/terra#1249)
  by srfall.

- `vrt_tiles` returns the filenames of the tiles in a vrt file
  [#1261](rspatial/terra#1261) by Derek
  Friend

- `extractAlong` to extract raster cell values for a line that are
  ordered along the
  line. [#1257](rspatial/terra#1257) by
  adamkc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants