Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sp::sp_project() slow compared with rgdal::project() #1242

Closed
dankelley opened this issue Jan 14, 2020 · 7 comments
Closed

sp::sp_project() slow compared with rgdal::project() #1242

dankelley opened this issue Jan 14, 2020 · 7 comments

Comments

@dankelley
Copy link
Contributor

This is not a bug report, but rather a question about speed.

Comments within the code shown below reveal that on my test machine (8 year old macbook pro, running catalina beta), the sf map projection is about 7 times slower than the rgdal version. And, if I pass the projection strings through st_crs, the speed drops by a further factor of about 3, i.e. what I think is the "proper" way of calling st_project seems to be about 21 times slower than rgdal::project().

(An aside: I say "proper" because I'm guessing that passing through st_crs may avoid warnings I get about differing celestial bodies, mentioned at dankelley/oce#1629 (comment) in a discussion of my tests of the possibility of switching the oce package from rgdal to sf. Oddly, those warnings do not seem to come up in interactive use, but they come up when I use Rmarkdown.)

Question: Are these factor of 7 and 21 slowdowns as expected, or are they maybe a result of debugging code that's in the current github sf code?

## R Under development (unstable) (2020-01-10 r77651)
## PROJ.4 runtime: Rel. 6.2.1, Novembr 1st, 2019, [PJ_VERSION: 621]
## GDAL 2.4.2, rlased 2019/06/28
## Linking to sp version: 1.3-2
library(rgdal)                         # sf_0.8-1 installed from github 2020-01-14
library(sf)                            # rgdal_1.4-8 installed from cran

## user 0.029, system 0.001, elapsed 0.029
## user 0.035, system 0.000, elapsed 0.036
## user 0.027, system 0.000, elapsed 0.027
system.time(RGDAL<-lapply(-180:180, function(lon) unname(rgdal::project(cbind(lon,0),"+proj=moll")[1,1])))

## usr 0.192, system 0.000, elapsed 0.193
## usr 0.202, system 0.001, elapsed 0.204
## usr 0.200, system 0.001, elapsed 0.201
system.time(SF1<-lapply(-180:180, function(lon) unname(sf::sf_project("+proj=longlat", "+proj=moll", cbind(lon,0))[1,1])))
stopifnot(all.equal(SF1, RGDAL))

## usr 0.586, system 0.089, elapsed 0.675
## usr 0.603, system 0.092, elapsed 0.696
## usr 0.591, system 0.091, elapsed 0.683
system.time(SF2<-lapply(-180:180, function(lon) unname(sf::sf_project(sf::st_crs("+proj=longlat")$proj4string, sf::st_crs("+proj=moll")$proj4string, cbind(lon,0))[1,1])))
stopifnot(all.equal(SF2, RGDAL))
@edzer
Copy link
Member

edzer commented Jan 14, 2020

The third option doesn't make sense: you moved the st_crs() stuff inside the inner loop. I measure a different of a factor 5, no 7, between 1 and 2, when I make the number of points 10000 instead of 361. Are you using in both cases the old (PROJ 4.x) interface or the newer, or different ones for rgdal and sf? You didn't report the package startup message after loading both libraries. Another possibility is the use of Rcpp by sf, vs. .Call by rgdal but I find it hard to believe that that would matter that significantly.

@dankelley
Copy link
Contributor Author

Thanks for the comment.

I was doing the st_crs() inside the loop because that's how it will come up in oce. The only reason this is in a loop is so that the fast cases give times in the millisecond range.

I do not know what interface is being used by rgdal and sf. All I know is that I'm using rgdal built up locally using source code from CRAN (which gets compiled locally because I am on R-devel and CRAN doesn't supply built-up packages for that) and sp built up locally with devtools::install_github("r-spatial/sf"). I see 6.2.1 quoted for both, in the startup message (see below), so maybe that's the answer.

I lack intuition on the speed of RCPP, since I only use .Call and .C in the oce package.

Since this is providing a bottleneck in oce, I will likely try to alter things so there are fewer of these calculations. That will make the sf approach more feasible, giving oce a clear pathway to switch between the rgdal and sf approaches, if it happens that one of the packages removes the functionality that oce needs. (I have this quirky need to deal with NA input and uncomputable map inversions, as discussed in some other issues.) Although my understanding is that the plan to deprecate rgdal::project() has been dropped, I want flexibility so that if such a thing happens to either rgdal or sf, I can make a quick switch to keep oce from breaking. (Historical note: I used to have proj4 code within oce, but I was instructed by the CRAN community to remove it.)

This comment is just for the information of @edzer; I think I have the information that I need, so if @edzer wants to close the issue, I'd agree.

R Under development (unstable) (2020-01-10 r77651) -- "Unsuffered Consequences"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin15.6.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> ## R Under development (unstable) (2020-01-10 r77651)
> ## PROJ.4 runtime: Rel. 6.2.1, Novembr 1st, 2019, [PJ_VERSION: 621]
> ## GDAL 2.4.2, rlased 2019/06/28
> ## Linking to sp version: 1.3-2
> library(rgdal)                         # sf_0.8-1 installed from github 2020-01-14
Loading required package: sp
rgdal: version: 1.4-8, (SVN revision 845)
 Geospatial Data Abstraction Library extensions to R successfully loaded
 Loaded GDAL runtime: GDAL 2.4.2, released 2019/06/28
 Path to GDAL shared files: /usr/local/Cellar/gdal/2.4.2_3/share/gdal
 GDAL binary built with GEOS: TRUE 
 Loaded PROJ.4 runtime: Rel. 6.2.1, November 1st, 2019, [PJ_VERSION: 621]
 Path to PROJ.4 shared files: (autodetected)
 Linking to sp version: 1.3-2 
> library(sf)                            # rgdal_1.4-8 installed from cran
Linking to GEOS 3.8.0, GDAL 2.4.2, PROJ 6.2.1
> 
> ## user 0.029, system 0.001, elapsed 0.029
> ## user 0.035, system 0.000, elapsed 0.036
> ## user 0.027, system 0.000, elapsed 0.027
> system.time(RGDAL<-lapply(-180:180, function(lon) unname(rgdal::project(cbind(lon,0),"+proj=moll")[1,1])))
   user  system elapsed 
  0.044   0.002   0.046 
> 
> ## usr 0.192, system 0.000, elapsed 0.193
> ## usr 0.202, system 0.001, elapsed 0.204
> ## usr 0.200, system 0.001, elapsed 0.201
> system.time(SF1<-lapply(-180:180, function(lon) unname(sf::sf_project("+proj=longlat", "+proj=moll", cbind(lon,0))[1,1])))
   user  system elapsed 
  0.229   0.004   0.250 
> stopifnot(all.equal(SF1, RGDAL))
> 
> ## usr 0.586, system 0.089, elapsed 0.675
> ## usr 0.603, system 0.092, elapsed 0.696
> ## usr 0.591, system 0.091, elapsed 0.683
> system.time(SF2<-lapply(-180:180, function(lon) unname(sf::sf_project(sf::st_crs("+proj=longlat")$proj4string, sf::st_crs("+proj=moll")$proj4string, cbind(lon,0))[1,1])))
   user  system elapsed 
  0.658   0.105   0.767 
> stopifnot(all.equal(SF2, RGDAL))

@edzer
Copy link
Member

edzer commented Jan 15, 2020

OK, thanks. rgdal uses pj_fwd(), which is part of proj_api.h, which is going to disappear. sf uses proj_trans, part of proj.h. You could try to find out whether that matters, both in terms of runtime and in terms of what oce will have to do a few years from now. In any case, I don't see this as an issue sf developers should spend time on. More general cross-package r-spatial issues can be opened at https://github.com/r-spatial/discuss/issues

@edzer
Copy link
Member

edzer commented Jan 15, 2020

Possibly also of interest:

> library(sf)
Linking to GEOS 3.8.0, GDAL 3.0.2, PROJ 6.2.1
> n = 1000000
> xy = cbind(seq(-180,180,length.out=n), 0)
> system.time(RGDAL<- rgdal::project(xy, "+proj=moll"))
   user  system elapsed 
  0.224   0.008   0.231 
> system.time(SF1<- sf::sf_project("+proj=longlat", "+proj=moll", xy))
   user  system elapsed 
  0.159   0.015   0.175 
> system.time(SF1<- sf::sf_project("+proj=longlat", "+proj=moll", xy, keep = TRUE))
   user  system elapsed 
  0.287   0.008   0.296 

which may inspire you to reconsider how oce calls these functions.

@rsbivand
Copy link
Member

While my system isn't yours, running on GDAL 3.1 master and PROJ 6.3.0 with sp github/edzer master and R-forge rgdal, so using proj.h not proj_api.h, rgdal::project() in your repeated call scenario also runs at about 0.21 seconds. Your case uses proj_api.h for this because this is what released sp and rgdal do, and your GDAL is <3. The timing increase is because the new implementation needs calls to PROJ code first to extract the GEOGCS from the PROJCS, then to search for the coordinate operation to project or inverse project. Since you are making 360 calls, it is costly. So projecting the underlying matrix with new generation rgdal::project() is:

> system.time(RGDAL<-sapply(-180:180, function(lon) unname(rgdal::project(cbind(lon,0),"+proj=moll")[1,1])))
   user  system elapsed 
  0.190   0.014   0.204 
> system.time(RGDAL1 <- rgdal::project(cbind(-180:180,0),"+proj=moll")[,1])
   user  system elapsed 
  0.001   0.000   0.001 
> all.equal(RGDAL, RGDAL1)
[1] TRUE

For sf installed from master now:

> system.time(SF1 <- sapply(-180:180, function(lon) unname(sf::sf_project("+proj=longlat", "+proj=moll", cbind(lon,0))[1,1])))
   user  system elapsed 
  0.096   0.000   0.096 
> system.time(SF1a <- sf::sf_project("+proj=longlat", "+proj=moll", cbind(-180:180,0))[,1])
   user  system elapsed 
  0.001   0.000   0.001 
> all.equal(SF1, SF1a)
[1] TRUE

so part of the timing problem is the *apply. This sf is master, so using proj_api.h I think?

@edzer
Copy link
Member

edzer commented Jan 15, 2020

No, sf master uses proj.h by default, but has a configure flag --with-proj-api=yes that is no by default.

@dankelley
Copy link
Contributor Author

Thanks for the help and for the extra information. I am not sure about the protocol for this project, in terms of who closes issues, but I am now closing it since my question has been answered. Dan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants