Joint modeling of point and areal data #125

bradleyswilson · 2021-11-16T17:02:51Z

Greetings, I've been working with INLA for joint modeling of point and areal data (similar to the model in this paper).

I'm looking to transferring the model over to inlabru to help handle large numbers of predictions, but am struggling to figure out how I'd implement the model specification.

Is the preferred approach to use the like() and bru() functionality to specify two likelihoods, one for the point data and another for the polygons? When I pass spatial polygons directly to a field(polygons, model = matern), I get an a cannot coerce class ‘structure("Polygons", package = "sp")’ to a data.frame error. I'm looking for it to be constructed similar to the inla.spde.make.block.A from the normal INLA package.

The text was updated successfully, but these errors were encountered:

finnlindgren · 2021-11-16T17:38:35Z

Automated support for field(polygons, ...) syntax (or something equivalent, as usually the integration should be done on the natural scale and not on the log-scale, and for the combined predictor, so the component specification is usually not the appropriate place for this) for count data hasn't been done yet; in principle, it should become similar to the model="cp" syntax with a samplers specification for the polygons. For now, it can be done with some manual intervention, in several ways; I'll sketch one below, using inlabru::ipoints, but a PhD student is working on this more generally.
For multiple likelihoods, yes, the idea is to define the components jointly, and then specify each likelihood with a separate like() call.

Basic ipoints() results for polygons is to construct an integration scheme for the combination of the polygons. To keep them separated, the group option needs to be specified, below assuming that polygons is a SpatialPolygonsDataFrame object where the data.frame part has a variable polygonID (there may be other ways):

ips <- ipoints(samplers=polygons, domain=mesh, group="polygonID")

Then we need a way to apply this to the predictor.

A <- inla.spde.make.A(
  mesh,
  coordinates(ips),
  block=ips$polygonID,
  block.rescale = "none",
  weights = ips$weight)

Then the like() call would look something like this:

like(formula = response ~ log(A %*% exp(here + goes + your + predictor + expression)),
       model = "poisson",
      data = ips,
      allow_combine = TRUE)

Any non-spatial covariates would need to be added to ips, and any spatial covariates handled via one of the covar(griddataframe, model="linear") or covar(covarevaluatorfunction, model="linear"), see examples at https://inlabru-org.github.io/inlabru/articles/web/2d_lgcp_covars.html

I'm sure there are more details to work out depending on your model details, but the principle is given above. Apart from internal as yet unpublished projects we haven't really shown how to do log(A %*% exp(eta)) models yet; existing examples in the literature tend to use A %*% eta instead, which is cheaper (and keeps the familiar log-linear predictor structure) but not strictly a correct approximation of aggregation of Poisson count models.

There's also a snag with the construction above, as I'm almost certain it doesn't allow you to specify the response variable, as it has different length than ips. For that, I believe you currently need to give input data as a list, which I just realised breaks some of the automated coordinate handling. If you don't have any spatial covariates, or write all covariate access via functions instead of SpatialGridDataFrame object, then you can fix it by using

data = c(list(response = data$response), as.list(as.data.frame(ips)))

and use explicit cbind(easting, northing) in your component input specifications instead of the semi-automated coordinates approach; note that this should also work for the other likelihoods; the coordinates thing is just a sometimes-convenient shorthand.

This should become a bit easier to deal with once I add a separate response variable argument to like(), similarly to how inla.stack() already handles that, as then the "right-hand-side" input data can stay a spatial object

finnlindgren · 2021-11-16T17:42:30Z

Additional note: the proper count model aggregation is only well-defined for Poisson, as Po(lambda1)+Po(lambda2)~Po(lambda1+lambda2). For other models, including Binomial with different probabilities, this doesn't hold, so there's no longer a strong theoretical argument saying log(A exp(eta)) is necessarily more correct than A eta; instead it's a research problem to work out what is a sensible approach.

finnlindgren · 2021-11-16T17:48:13Z

I expect to wrap the basic version of this into a like() model soon, as it works almost identically to how model="cp" (which the convenience wrapper lgcp() uses) is implemented; that would also allow it to get the response variable information from the SpatialPolygonsDataFrame instead.

bradleyswilson · 2021-11-16T19:11:19Z

Thanks for the quick and detailed response. I'm working with Gaussian observations rather than a count model aggregation.
If I'm understanding the approach correctly, my version would look something like:

ips <- ipoints(samplers=polygons, domain=mesh, group="polygonID")

A <- inla.spde.make.A(
  mesh,
  coordinates(ips),
  block=ips$polygonID,
  block.rescale = "sum")

like(formula = response ~ A %*% (my + predictor + expression),
       model = "gaussian",
      data = ips,
      allow_combine = TRUE)

I then define a second like() for the point values, something like:

like(formula = response ~  my + predictor + expression + field(pt_coordinates, model=matern),
       model = "gaussian",
      data = pts)

and combine them with bru(jcmp, lik1, lik2)

finnlindgren · 2021-11-16T22:46:04Z

Yes, for Gaussian observations it's also fine!
You'll still need to have the component specification separate from the formulas, and use the list data construction for the aggregating model; something like

comp <- ~ Intercept1(1) + Intercept2(1) +
      covariate(eval_covar(cbind(easting, northing)), model = "linear") +
      field(pt_coordinates, model=matern),
like1 <-
  like(formula = response1 ~ A %*% (Intercept1 + covariate),
       family = "gaussian",
      data = c(list(response1 = data$response1), as.list(as.data.frame(ips))),
      allow_combine = TRUE)
like2 <-
  like(formula = response2 ~ .,
      include = c("Intercept2", "field")
       family = "gaussian",
      data = pts)
bru(comp, like1, like2)

For this to work, the coordinate names in pts and polygons need to be set to known values (easting and northing should be replaced by whatever you have, or you need to change what you have named them). The inlabru devel version is meant to preserve those names when needed and possible (like in ipoints(); use the name argument to override existing names).

bradleyswilson · 2021-11-17T18:14:52Z

In testing a simple example, I can't seem to get around a differing number of rows error when trying to index the spde object for the polygons:
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 14271, 17369

For reference, In my existing INLA code, I'm doing something like this, with different A matrices for points and polys:

s.index <- inla.spde.make.index(name="spatial.field", n.spde=spde$n.spde)
# data and poly_data are data.frames
stack.pts <- inla.stack(
                        data=list(response = data$response),
                        A=list(A.pts,1),
                        effects=list(c(s.index, list(Intercept=1)),
                                      list(covar = data$covar)))

stack.poly <- inla.stack(
                        data=list(response = poly_data$response),
                        A=list(A.poly,1),
                        effects=list(c(s.index, list(Intercept=1)),
                                      list(covar = poly_data$covar)))

formula ~ Intercept + f(spatial.field, model=spde)

The A matrix you've suggested for the aggregation is identical to the one I'm already using, so I'm not sure where the issue is arising. Manually running A %*% spatial.field runs just fine. Do I need a custom mapper to access the spde object with a list of indices rather than coordinate pairs?

comp <- ~ Intercept1(1) + field(spatial.field, model=spde)
like1 <-
  like(formula = response ~ A %*% (Intercept1 + field),
       family = "gaussian",
       data = c(list(response = data$response), s.index),
       allow_combine = TRUE)
fit <- bru(comp, like1)

finnlindgren · 2021-11-17T18:38:29Z

Aha, I see the problem; I mixed two different approaches. In your Gaussian/linear case, you can either change the A matrix to handle it correctly, or set the field component mapper to bru_mapper_index(matern$n.spde). The downside of the latter approach is that it affects both likelihoods, so the other one would then also need an A matrix. Correcting the A mtatrix is therefore a generally better method here, I think. In the nonlinear case (Poisson) the correct solution is to change the A matrix. Instead of inla.spde.make.A it should actually be a much simpler matrix, where each row contains the integration point weight values corresponding to each “block”. I think this should do it: A = sparseMatrix(i=ips$polygonid, j=seq_len(NROW(ips)),x=ips$weight, dims=c(max(ips$polygonid), NROW(ips))) Finn

…

On 17 Nov 2021, at 18:15, Bradley Wilson ***@***.***> wrote: In testing a simple example, I can't seem to get around a differing number of rows error when trying to index the spde object for the polygons: Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 14271, 17369 For reference, In my existing INLA code, I'm doing something like this, with different A matrices for points and polys: s.index <- inla.spde.make.index(name="spatial.field", n.spde=spde$n.spde) # data and poly_data are data.frames stack.pts <- inla.stack( data=list(response = data$response), A=list(A.pts,1), effects=list(c(s.index, list(Intercept=1)), list(covar = data$covar))) stack.poly <- inla.stack( data=list(response = poly_data$response), A=list(A.poly,1), effects=list(c(s.index, list(Intercept=1)), list(covar = poly_data$covar))) formula ~ Intercept + f(spatial.field, model=spde) The A matrix you've suggested for the aggregation is identical to the one I'm already using, so I'm not sure where the issue is arising. Manually running A %*% spatial.field runs just fine. Do I need a custom mapper to access the spde object with a list of indices rather than coordinate pairs? comp <- ~ Intercept1(1) + field(spatial.field, model=spde) like1 <- like(formula = response ~ A %*% (Intercept1 + field), family = "gaussian", data = c(list(response = data$response), s.index), allow_combine = TRUE) fit <- bru(comp, like1) — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

bradleyswilson · 2021-11-17T19:49:23Z

No luck. Still getting the same error (using both approaches), just with different numbers of rows.

finnlindgren · 2021-11-17T20:19:45Z

Can you send a reproducible example I can check/debug?

finnlindgren · 2021-11-17T20:26:18Z

The corrected A matrix can also be constructed via

INLA::inla.spde.make.block.A(inla.as.dgTMatrix(Diagonal(NROW(ips), ips$weight)),block=ips$polygonid,rescale=...)

that allows choosing and optional scaling method ("none" for plain integration or "sum" for spatial averaging).

bradleyswilson · 2021-11-17T21:42:49Z

Sure, this shows the same error on the gorillas data.

polygons <- gorillas$plotsample$plots
polygons@data <- tibble::rownames_to_column(polygons@data) # add group
mesh <- gorillas$mesh
spde <- inla.spde2.matern(mesh=mesh)

ips <- ipoints(samplers=polygons, domain=mesh, group="rowname")
# ipoints drops the grouping variable, adding back in
ips@data <- cbind(ips@data, polygonid = floor(as.numeric(tibble::rownames_to_column(ips@data)$rowname)))

A <- inla.spde.make.block.A(Diagonal(nrow(ips), ips$weight), block=ips$polygonid, rescale="sum")

comp <- ~ Intercept1(1) + field(cbind(x,y), model = spde)
data$response <- rnorm(nrow(polygons), sd = 0.4) 

like1 <-
  like(formula = response ~ A %*% (Intercept1 + field),
       family = "gaussian",
       data = c(list(response = data$response), as.list(as.data.frame(ips))),
       allow_combine = TRUE)

fit <- bru(comp, like1)

finnlindgren · 2021-11-17T22:06:33Z

I'm not sure precisely what error message you get, and for what part of the code.
I made a few corrections to the code; e.g. inla.spde.make.block.A required a dgTMatrix (that function was meant to be used only internally, but it should really check the input instead of failing), and then only got a result[,k] <- result_ size mismatch on the final line; that was because the predictor result is a Matrix, and that wasn't handled correctly by the internal code. As easy workaround was to wrap it in as.vector(). I'll review if that's also the correct fix for the general internal code and fix it there later.
With that fix, the example below runs (the Hessian warning makes sense for the IID example response data).

library(INLA)
#> Loading required package: Matrix
#> Loading required package: foreach
#> Loading required package: parallel
#> Loading required package: sp
#> This is INLA_21.11.16 built 2021-11-16 14:00:50 UTC.
#>  - See www.r-inla.org/contact-us for how to get help.
#>  - To enable PARDISO sparse library; see inla.pardiso()
library(inlabru)
data("gorillas", package="inlabru")

polygons <- gorillas$plotsample$plots
polygons@data <- tibble::rownames_to_column(polygons@data) # add group
mesh <- gorillas$mesh
spde <- inla.spde2.matern(mesh=mesh)

ips <- ipoints(samplers=polygons, domain=mesh, group="rowname")
#> Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj
#> = prefer_proj): Discarded ellps unknown in Proj4 definition: +proj=longlat +R=1
#> +no_defs +type=crs
#> Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj =
#> prefer_proj): Discarded datum unknown in Proj4 definition
#> Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj
#> = prefer_proj): Discarded ellps unknown in Proj4 definition: +proj=longlat
#> +R=6378137 +no_defs +type=crs
#> Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj =
#> prefer_proj): Discarded datum unknown in Proj4 definition
#> Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj
#> = prefer_proj): Discarded ellps unknown in Proj4 definition: +proj=longlat
#> +R=6378137 +no_defs +type=crs
#> Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj =
#> prefer_proj): Discarded datum unknown in Proj4 definition
#> Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj
#> = prefer_proj): Discarded ellps unknown in Proj4 definition: +proj=geocent +R=1
#> +units=m +no_defs +type=crs
#> Warning in showSRID(SRS_string, format = "PROJ", multiline = "NO", prefer_proj =
#> prefer_proj): Discarded datum unknown in Proj4 definition
# The rowname column is still there. Convert to numeric.
# head(ips)
# coordinates rowname       weight
# 1   (581.9648, 674.5587)       1 0.0006067968
ips@data$polygonid <- as.numeric(ips@data$rowname)

A <- inla.spde.make.block.A(
  inla.as.dgTMatrix(Diagonal(nrow(ips), ips$weight)),
  block=ips$polygonid,
  rescale="sum")

comp <- ~ Intercept1(1) + field(cbind(x,y), model = spde)
data <- data.frame(response = rnorm(nrow(polygons), sd = 0.4))

like1 <-
  like(formula = response ~ as.vector(A %*% (Intercept1 + field)),
       family = "gaussian",
       data = c(list(response = data$response), as.list(as.data.frame(ips))),
       allow_combine = TRUE)

fit <- bru(comp, like1)

^{Created on 2021-11-17 by the reprex package (v2.0.1)}

bradleyswilson · 2021-11-17T22:27:01Z

The as.vector() appears to have fixed the mismatch error I was getting, at least in this example! I'll report back when I've attempted spinning up the full model. I really appreciate the troubleshooting help.

finnlindgren · 2022-12-01T23:45:43Z

This is getting closer to having full support in the package. In particular, there are mappers that can be used in the predictor expression for the aggregations. For examples in the unit tests, see https://github.com/inlabru-org/inlabru/blob/devel/tests/testthat/test-aggregate.R

nickschurch · 2024-02-07T19:51:09Z

@bradleyswilson did you figure out how to do this in the end? I'm new to inlabru but trying something similar with two point datasets and one areal gridded dataset and keep hitting blocks...

bradleyswilson · 2024-02-07T20:18:28Z

@nickschurch I believe I went back to using INLA syntax instead of inlabru for this particular instance, although I haven't tested the updates Finn has made since then.

My issue was mostly computational, which I solved another way with mesh projections in a custom tiling scheme.

ASeatonSpatial · 2024-04-17T18:46:33Z

@bradleyswilson I have just scanned through this but I believe this is now supported in the bru mapper system with the logsumexp mapper. We have included an example in the software paper that we are currently drafting (very nearly finished!). I can't share the data here but here is a sneak peek at the code:

# Define Matern/SPDE model
matern <- inla.spde2.pcmatern(
  mesh,
  prior.range = c(2, 0.1),
  prior.sigma = c(3, 0.1)
)

# Define the intermediate zone integration scheme
ips <- fm_int(mesh, samplers = glasgow_iz)
agg <- bru_mapper_logsumexp(
  rescale = FALSE,
  n_block = nrow(glasgow_iz)
)

# Define model components and formula
cmp <- ~ 0 + beta(1) + xi(main = geometry, model = matern)
fml <- count ~ ibm_eval(
  agg,
  input = list(weights = weight, block = .block),
  state = beta + xi
)

# Construct the likelihood
lik <- like(
  formula = fml,
  family = "poisson",
  data = ips,
  response_data = glasgow_iz
)

# Fit the model
fit <- bru(
  components = cmp,
  lik
)

Here glasgow_iz is an sf polygons object. The mappers have methods associated with them, one of which is ibm_eval which evaluates the mapper for a given set of inputs. In this case the mapper is specifically designed to integrate exp(predictor) and take the log of the result (so it's designed for use with the Poisson likelihood with a log link in mind). As far as I am aware in INLA all you can do is sum the log rate within each polygon, so this should hopefully be an improvement.

finnlindgren added the enhancement label Nov 16, 2021

finnlindgren self-assigned this Nov 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Joint modeling of point and areal data #125

Joint modeling of point and areal data #125

bradleyswilson commented Nov 16, 2021

finnlindgren commented Nov 16, 2021 •

edited

Loading

finnlindgren commented Nov 16, 2021

finnlindgren commented Nov 16, 2021 •

edited

Loading

bradleyswilson commented Nov 16, 2021

finnlindgren commented Nov 16, 2021

bradleyswilson commented Nov 17, 2021

finnlindgren commented Nov 17, 2021 via email

bradleyswilson commented Nov 17, 2021

finnlindgren commented Nov 17, 2021

finnlindgren commented Nov 17, 2021 •

edited

Loading

bradleyswilson commented Nov 17, 2021 •

edited

Loading

finnlindgren commented Nov 17, 2021

bradleyswilson commented Nov 17, 2021

finnlindgren commented Dec 1, 2022

nickschurch commented Feb 7, 2024

bradleyswilson commented Feb 7, 2024

ASeatonSpatial commented Apr 17, 2024

Joint modeling of point and areal data #125

Joint modeling of point and areal data #125

Comments

bradleyswilson commented Nov 16, 2021

finnlindgren commented Nov 16, 2021 • edited Loading

finnlindgren commented Nov 16, 2021

finnlindgren commented Nov 16, 2021 • edited Loading

bradleyswilson commented Nov 16, 2021

finnlindgren commented Nov 16, 2021

bradleyswilson commented Nov 17, 2021

finnlindgren commented Nov 17, 2021 via email

bradleyswilson commented Nov 17, 2021

finnlindgren commented Nov 17, 2021

finnlindgren commented Nov 17, 2021 • edited Loading

bradleyswilson commented Nov 17, 2021 • edited Loading

finnlindgren commented Nov 17, 2021

bradleyswilson commented Nov 17, 2021

finnlindgren commented Dec 1, 2022

nickschurch commented Feb 7, 2024

bradleyswilson commented Feb 7, 2024

ASeatonSpatial commented Apr 17, 2024

finnlindgren commented Nov 16, 2021 •

edited

Loading

finnlindgren commented Nov 16, 2021 •

edited

Loading

finnlindgren commented Nov 17, 2021 •

edited

Loading

bradleyswilson commented Nov 17, 2021 •

edited

Loading