Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

switch to 'sf' for projection and polygon calculations? #1629

Closed
16 tasks done
dankelley opened this issue Dec 12, 2019 · 36 comments
Closed
16 tasks done

switch to 'sf' for projection and polygon calculations? #1629

dankelley opened this issue Dec 12, 2019 · 36 comments
Assignees
Labels
development for next CRAN high priority pinned Prevent automatic stale designation projection Map projection

Comments

@dankelley
Copy link
Owner

dankelley commented Dec 12, 2019

In an f2f meeting yesterday, @richardsc and I agreed that it is worth seeing whether oce can rely on the sf package for map projection calculations. I've been kicking this idea around for a while now (see e.g. #1599 and #1604).

The goal is to future-proof oce against changes that ar coming to rgdal. That package is presently used for map-projection calculations, with the rgdal::project() function. This is a problem, because that function is destined for deprecation and later deletion.

I am under the impression that the proposed alternative to rgdal::project() is rgdal::spTransform(), but that is not a solution for oce because it cannot handle unprojectable points, returning nothing if any point is unprojectable. (An example of an unprojectable point would be one that is just on the "edge of the world" in a view: does it go on one side or the other? This problem comes up in drawing grid lines, for example.)

A helpful comment on r-sig-geo-request@r-project.org led me to consider using st::sf_project(), and my initial tests (see #1599) show that this has promise.

I also want to switch some other bits in the code, where I do point-within-polygon calculations with sp::point.in.polygon() and also where I do intersection calculations with e.g. raster::intersect(). I think all of these things can be done with sf, and I would prefer to rely on just one single package, sf, instead of the sp/raster/rgdal triplet. (Relying on multiple packages imposes a cost of learning how to navigate the docs for each of those packages, which is a developer/debugger cost. There are other, more general, problems with excessive package dependence.)

I have started doing this work, in a new branch called sf that will be unstable for a while, as I fiddle. My approach, so far, is to start by reproducing the older method and using all.equal() tests to see if the results are the same. If not, I issue a warning().

Checklist for point-in-polygon calculations

  • R/argo.R:532: keep <- 1==sp::point.in.polygon(lon, lat, lonp, latp)
  • R/map.R:788: erase <- 1==sp::point.in.polygon(xc, yc,
  • R/section.R:681: keep <- 1==sp::point.in.polygon(lon, lat, lonp, latp)

Checklist for map projection

  • map.R:57: XY <- unname(rgdal::project(xy, proj=proj, inv=inv))
  • map.R:67: XY <- unname(rgdal::project(xy, proj=proj, inv=inv, legacy=legacy, allowNAs_if_not_legacy=TRUE))
  • map.R:73: XY <- unname(rgdal::project(xy=xy, proj=proj, inv=inv, legacy=legacy))

Checklist for polygon extent

  • coastline.R:222: box <- as(raster::extent(W, E, S, N), "SpatialPolygons")

Checklist for polygon intersection

  • coastline.R:242: i <- raster::intersect(box, C)

Checklist of sp:: calls (checked when sf equivalents are added; comment-strings ignored)

  • argo.R:532: keep <- 1==sp::point.in.polygon(lon, lat, lonp, latp)
  • coastline.R:239: A <- sp::Polygon(cbind(lon, lat))
  • coastline.R:240: B <- sp::Polygons(list(A), "A")
  • coastline.R:241: C <- sp::SpatialPolygons(list(B))
  • map.R:788: erase <- 1==sp::point.in.polygon(xc, yc,
  • section.R:681: keep <- 1==sp::point.in.polygon(lon, lat, lonp, latp)
  • section.R:2953: input_data=sp::SpatialPointsDataFrame(xy, data.frame(F)), (will not change, since this is an add-on kriging method)
  • section.R:2954: new_data=sp::SpatialPoints(expand.grid(xg/xr, yg/yr))) (will not change, since this is an add-on kriging method)
@dankelley dankelley self-assigned this Dec 12, 2019
@dankelley
Copy link
Owner Author

I handled the R/argo.R case a couple of days ago.

@dankelley
Copy link
Owner Author

dankelley commented Dec 12, 2019

A checklist for point.in.polygon() coding (a ticked box means done, and pushed to GH, but not necessarily after the half-hour it takes for a "build check"):

  • R/argo.R:532: keep <- 1==sp::point.in.polygon(lon, lat, lonp, latp)
  • R/map.R:788: erase <- 1==sp::point.in.polygon(xc, yc,
  • R/section.R:681: keep <- 1==sp::point.in.polygon(lon, lat, lonp, latp)

@richardsc
Copy link
Collaborator

I second this work (obviously 😛 ).

Installing sf on my Ubuntu instance involves a few system dependencies that were not required for other "geo" packages (see the really nice webpage), but they seem to be readily available in the standard ubuntu repos so not a big deal. Not sure about OSX, but sf is widely used by the "tidyverse" folks so I expect it's relatively easy.

@dankelley
Copy link
Owner Author

Thanks for checking into sf availability. I think sf will get more popular as time goes on. (If you've not read the article on it, I recommend it (first item in the 'blogs' part of their webpage).)

@dankelley
Copy link
Owner Author

R/coastline.R handled in "sf" branch commit bf6ae77; a test code is below. Note that this increases the elapsed time by about 15% on a test case, but bear in mind that the test case does both methods, which suggests that the "sf" method is likely to be faster than the old method.

library(oce)
data(coastlineWorld)
options("oce:test_sf" = 1)             # increases time from 1.8s to 2.1s
print(system.time(
cl <- subset(coastlineWorld, -80<lon & lon<-50 & 30<lat & lat<60)
))
if (!interactive()) png("sf_20191229.png")
plot(cl, clon=-65, clat=45, span=6000)
rect(-80, 30, -50, 60, bg="transparent", border="red")
if (!interactive()) dev.off()

@dankelley
Copy link
Owner Author

dankelley commented Dec 26, 2019

I think I've done the first phase at each relevant spot, in the "sf" branch commit 99fb05e.

The test code at https://github.com/dankelley/oce-issues/tree/master/16xx/1629 runs without producing errors and warnings, which means that the "old" scheme and the "new" (sf-based) scheme produce identical results. These tests are only performed if options("oce:test_sf"=1) has been set prior to the function calls.

I am now ready for phase 2, in which the tests are done even without setting that option. That way, anyone using the sf branch will be running these tests. My plan is to work on that branch for a while, perhaps weeks, to see if any problems crop up in everyday work. In addition, I will test by

  • rebuild oce (and its vignettes)
  • rebuild my book
  • rebuild my course notes
  • rebuild my blog
    (and other things I think of) to get broader tests. I will alter the present comment over time, as I do those tests and take notes on possible problems.

@dankelley
Copy link
Owner Author

dankelley commented Dec 26, 2019

Hm, the vignettes do not build. I get as below. The problem is that points at the south pole cause an error with this projection.

Screen Shot 2019-12-26 at 2 10 54 PM

@dankelley
Copy link
Owner Author

The vignette now builds with the "sf" branch, with the following status indicators.

  1. The build test/check hurdles pass.
  2. I added a new blog entry (http://dankelley.github.io/r/2020/01/04/oce-proj.html) to document the progress to date. Things seem ok (that is, the rgdal and sf versions give matching results), but I am seeing many messages of following form, which are coming from the underlying libproj system library, I think.
proj_create_operations: Source and target ellipsoid do not belong to the same celestial body

@dankelley
Copy link
Owner Author

I just noticed that sf::st_project() is really slow compared with rgdal::project(). With the latter, a simple mapPlot() is quite a quick operation (say a second or so -- certainly not a problem the demands of typical use) but with the former, a map can take several tens of seconds.

A lot of the time is in determining where to place axis labels. This might seem like a simple thing, but it isn't, because lon and lat lines might intersect either atlas. The way I handle this (say for the lower axis) is to use optimize() with the desired (say) longitude, seeing if there is any latitude that would place a point on the axis. That can mean dozens of calls to lonlat2map(), which was not a problem with rgdal::project() but which is quite slow with sf::st_project(). I've instrumented the code, and with the sf::st_project() call commented out, it takes 1 to 10 milliseconds for each label, but 1 to 2 seconds with sf::st_project() being used. (Both cases keep the rgdal::project() call.)

This is with sf_0.8-1 and rgdal_1.4-8

@richardsc
Copy link
Collaborator

Sigh. One step forward, two steps back.

@dankelley
Copy link
Owner Author

I have an idea for another way to get the calculation done, without calling optimize(), which calls sf::sf_project() repeatedly, creating a bottleneck. I may be able to think of other ways to increase efficiency, too. We haven't needed these with the rgdal approach, because it's fast enough to be comparable to rendering times, whereas the sf approach visibly stalls the axis drawing by tens of seconds, which is very noticeable and thus might annoy oce users.

@richardsc
Copy link
Collaborator

This might be better in a f2f, but is this comment relevant here:

r-spatial/sf#1227

@dankelley
Copy link
Owner Author

As noted in our f2f earlier today, I have another way of sidestepping the slowness in finding locations for longitude and latitude labels: record intersections of longitude and latitude lines with the edges of the plot region.

A proof of concept, showing which sf functions might be the best to detect the intersections, is as follows. The test is whether the green dots appear at the intersections of the dotted curve (think of this as a curve of constant longitude) and the red line (think of this as the bottom of the box). Whether sf::st_linestring() is the best function to use is unknown to me; it's just what I found first in the documentation. I'll need to check on whether this is a slow function, but the fact that we will only need to call it once per longitude/latitude graticule suggests to me that we shouldn't have to worry about speed.

library(sf)
x <- seq(0, 5, 0.1)
y <- sin(x*pi*2/2)
curve <- sf::st_linestring(cbind(x, y))
ax <- sf::st_linestring(cbind(c(0,5), c(-0.5,-0.5)))
I <- sf::st_intersection(curve, ax)
par(mar=c(3,3,1,1), mgp=c(1.8,0.8,0))
plot(x, y)
plot(ax, add=TRUE, col=2)
plot(I, add=TRUE, col=3, pch=20)

Screen Shot 2020-01-15 at 3 28 40 PM

@dankelley
Copy link
Owner Author

My testing shows this is pretty simple, e.g. "sf" branch commit 58b7a1e with

    library(oce)
    data(coastlineWorld)
    par(mar=c(2,2,1,1))
    mapPlot(coastlineWorld, longitudelim=c(-100,0),latitudelim=c(10,70),projection="+proj=robin", col='gray',debug=2)

yields as below, so we're getting the intersections right, so now it's a matter of hooking this up with axis(). (I use axis() instead of e.g. mtext() or text() calls, because axis() takes care of the matter of overplotting.)

Screen Shot 2020-01-15 at 5 00 59 PM

@dankelley
Copy link
Owner Author

dankelley commented Jan 18, 2020

I've coded mapPlot() using the method hinted-at in the previous comment. It seems to work properly in my tests. It is much faster, with axis labels appearing with no perceptible lag, after the longitude and latitude lines are drawn. (There was a lag of maybe a second per axis, before, on an admittedly slow computer.)

Note that mapAxis() is no longer called by mapPlot(); instead, mapPlot() calls drawGrid() and saves the (new) return value from this function, which is used later within mapPlot() in an axis() call.

As http://dankelley.github.io/r/2020/01/04/oce-proj.html there is an update to a previous blog entry demonstrating map projections.

Since things are looking good for a possible conversion to sf, I have done some tests of the suite of projections that were available before, and the larger st that sf offers; see http://dankelley.github.io/r/2020/01/16/map-projection.html for more on this.

@stale
Copy link

stale bot commented Feb 20, 2020

This issue has been automatically marked as stale because no comments have been made in a month. Unless comments are made in the next week (or the 'pinned' label is applied), this issue will be closed automatically in a week.

@stale stale bot added the stale label Feb 20, 2020
@dankelley dankelley added the pinned Prevent automatic stale designation label Feb 20, 2020
@stale stale bot removed the stale label Feb 20, 2020
@dankelley dankelley removed the pinned Prevent automatic stale designation label Jul 27, 2020
@dankelley
Copy link
Owner Author

I removed the "pinned" tag, because I have ignored this issue for far too long. We have enough features for a new CRAN coming fairly soon, and it would make sense to have this sf work merged into "develop" before then.

@dankelley
Copy link
Owner Author

dankelley commented Jul 27, 2020

As might be expected for a branch that has been ignored for so long, there are some merge conflicts.

Merging "develop" into "sf" gives

Unmerged paths:
  (use "git add <file>..." to mark resolution)
        both modified:   NAMESPACE
        both modified:   R/coastline.R
        both modified:   R/map.R
        both modified:   vignettes/map_projections.Rmd

And git grep -n '<<<<<<<' returns (with an added checklist)

  • NAMESPACE:24:<<<<<<< HEAD
  • R/coastline.R:171:<<<<<<< HEAD
  • R/coastline.R:614:<<<<<<< HEAD
  • R/coastline.R:830:<<<<<<< HEAD
  • R/map.R:493:<<<<<<< HEAD
  • R/map.R:572:<<<<<<< HEAD
  • R/map.R:931:<<<<<<< HEAD
  • R/map.R:1693:<<<<<<< HEAD
  • R/map.R:2036:<<<<<<< HEAD
  • Binary file tests/testthat/local_data/f34_20160808v7.2.gz matches
  • vignettes/map_projections.Rmd:299:<<<<<<< HEAD

@dankelley dankelley changed the title switch to 'sf' for projection and polygon calculations switch to 'sf' for projection and polygon calculations? Jul 28, 2020
@dankelley
Copy link
Owner Author

dankelley commented Aug 1, 2020

Below are some tests, which are also present in tests/testthat/test_map.R. Note the options() call, which turns warnings into errors, so the test suite will fail if rgdal and sf vary. (The tests are that the two mappings to xy space agree to within 1m.)

This test suite shows no problems with "sf" branch commit bf87ce6

library(oce)
data(coastlineWorld)
par(mar=c(2,2,1,1))
options(warn=3)
mapPlot(coastlineWorld, projection="+proj=moll", col="gray")
mapPlot(coastlineWorld, projection="+proj=ortho", col="gray")
mapPlot(coastlineCut(coastlineWorld,-100), longitudelim=c(-130,-55), latitudelim=c(35,60),
        projection="+proj=lcc +lat_0=30 +lat_1=60 +lon_0=-100", col="gray")
mapPlot(coastlineCut(coastlineWorld, -135), longitudelim=c(-130, 50), latitudelim=c(70, 110),
        projection="+proj=stere +lat_0=90 +lon_0=-135", col="gray")

@dankelley
Copy link
Owner Author

VOTING: other oce developers are asked to object to the plan outlined here before noon, next Tuesday.

I have rebuilt my previous blog items that went through tests of the available map projections within oce. The results are at http://dankelley.github.io/r/2020/08/02/oce-proj.html (with plots) and http://dankelley.github.io/r/2020/08/02/map-projection.html (just a listing with some tests).

This has uncovered four projections that ought to be removed from oce, for reasons explained in the Executive Summary of http://dankelley.github.io/r/2020/08/02/oce-proj.html, which reads

Executive Summary. a transition of oce to sf would require the removal of healpix, nesper, pconic and rhealpix projections, all of which cause errors in sf. The loss of healpix, rhealpix and pconic projections is not of much concern, since these were interrupted projections that were not handled properly by oce::mapPlot() anyway (see http://dankelley.github.io/r/2020/01/04/oce-proj.html for the healpix and rhealpix examples; the nesper case was ignored in that blog item). The nesper projection seems unlikely to enjoy wide usage, since it is showing a near-earth satellite view. Thus, the loss of these 4 oce projections should not stand in the way of a transition from rgdal to sf ... and the benefits of switching to the newer sf package are convincing, in terms of accuracy and future availability.

My plan is to remove those projections. Note that nesper was not actually provided by oce, but that it was erroneously mentioned in one of the documentation items. The other three were really not worth keeping in oce anyway, all being interrupted projections (with the earth "split open" like an orange peel pressed on a table) but mapPlot() was erroneously drawing the coastline through those interrupted regions. So, the loss is not significant, in my view; nobody would have considered publishing those plots. And, what we gain from a switch to sf is important, in terms of positional accuracy, etc (I won't try to reproduce in sentences here what has been discussed in perhaps dozens of essays written by map-projection experts over the past few years).

PLAN I plan to remove healpix, rhealpix and pconic and to remove the phrase in the oce docs that suggested that nesper was available. If other developers object to my timetable, I ask that they do so before noon next Tuesday. Hearing nothing by then, I will remove these projections from the sf branch at 1300h next Tuesday, and continue with my testing of that branch for a few days until I merge it into develop. All of this is to close this issue, and lay the ground for a CRAN release before the end of August (in time for September classes).

@dankelley
Copy link
Owner Author

I see that the oce docs mentioned the helmert projection, which was not in the table of permitted transformations, but which did appear in a sentence about PROJ.4 projections.

And that reminds me -- the docs need to stop referring to PROJ.4, because now it's just called PROJ (with a stroke through the O), not PROJ.6 (although that would have made some sense, I suppose).

@richardsc
Copy link
Collaborator

+1 from me!

@dankelley
Copy link
Owner Author

I did the changes described in #1629 (comment) (a few comments up) in the "sf" branch commit 18c4007 and several commits before that. (I accidentally made a bunch of other fixes in "sf" that I should have made in "develop" ... this was the usual fixing of broken/changed links that seems to be housekeeping that's needed about once per month.)

My plan is to live on "sf" for the remainder of the week, and then to merge it into "develop" on Saturday morning. That way, any user who lives on "develop" will become part of the testing team. I imagine there will be some problems, e.g. I had to relax some of my tests that "sf" and "rgdal" match, by only demanding a 1-metre agreement in x-y space. (Granted, that is basically total agreement in an ocean-mapping context, but I am being very careful here because once we transition fully to "sf", it will be hard to transition back to "rgdal".)

Maybe I'll start an issue for a CRAN release...

@stale
Copy link

stale bot commented Aug 24, 2020

This issue has been automatically marked as stale because no comments have been made in two weeks. Unless comments are made in the next week (or the 'pinned' label is applied), this issue will be closed automatically in a week.

@stale stale bot added the stale label Aug 24, 2020
@dankelley dankelley added the pinned Prevent automatic stale designation label Aug 24, 2020
@stale stale bot removed the stale label Aug 24, 2020
@dankelley
Copy link
Owner Author

I've merged "sf" into "develop" (commit 874a6c3) without difficulties. It builds fine locally, and the following blog posts also suggest that it's OK:

  1. http://dankelley.github.io/r/2020/08/02/oce-proj.html
  2. http://dankelley.github.io/r/2020/08/02/map-projection.html

@dankelley
Copy link
Owner Author

The travis-CI build failed (after under 2 minutes) but that was just because it could not set up the test machine, since curl was failing repeatedly (not a surprise, given worldwide internet problems today; see e.g. https://news.ycombinator.com/item?id=24322861)

@dankelley
Copy link
Owner Author

I plan to use this code for a week, and after that I'll start commenting-out the existing rgdal calls, and also the comparisons between sf and rgdal.

@dankelley
Copy link
Owner Author

I've been busy with other things, and have not followed my "a week" plan. However, I just tried to do update.packages() to get the new rgdal-1.5-17 (prompted by R-package-devel Digest, Vol 66, Issue 6) and I see as below on macos. I know there are workarounds to get rgdal working, but they involve doing system commands (installing libraries) that will be a lot to expect from oce users.

Presumably, over the next while, rgdal will get to a state in which it can cleanly install on macos in the way R users expect (without doing system-level work). Still, this is a reminder that oce users will benefit from dropping oce dependence on rgdal.

configure: rgdal: 1.5-17
checking for /usr/bin/svnversion... yes
svnversion: error: The subversion command line tools are no longer provided by Xcode.
configure: svn revision: 1070
checking for gdal-config... no
no
configure: error: gdal-config not found or not executable.
ERROR: configuration failed for packagergdal* removing/Library/Frameworks/R.framework/Versions/4.0/Resources/library/rgdal* restoring previous/Library/Frameworks/R.framework/Versions/4.0/Resources/library/rgdalThe downloaded source packages are in/private/var/folders/8b/l4h64m1j22v5pb7vj049ff140000gn/T/Rtmpi7lK2q/downloaded_packagesWarning message:
In install.packages(update[instlib == l, "Package"], l, repos = repos,  :
  installation of packagergdalhad non-zero exit status

@dankelley
Copy link
Owner Author

I made a little video (https://youtu.be/3VFRsZPzNFw) about my plans to work on this over the solstice holiday.

Partly, this is an experiment to see whether videos are useful. I suspect the answer is 'no' for people reading issues (because a well-written comment can be examined in less time than it takes to watch a video), but 'yes' for people solving issues (because the time to create is about the same, plus there is an advantage to 'thinking out loud').

@dankelley
Copy link
Owner Author

In my video I was noting that I want to check whether most users will have an sf version after 0.8.1, and a glance at https://cran.r-project.org/web/packages/sf/news/news.html tells me that the current version of sf is 0.9.6, and that 0.8.1 is 6 versions old.

@dankelley
Copy link
Owner Author

dankelley commented Dec 24, 2020

Checklist (mutable, code not yet commited)

  • remove rgdal:: calls as shown in video
  • deprecate allowNA arg of oceProject()
  • deprecate legacy "
  • deprecate use_ob_tran "
  • docs
  • tests
  • DESCRIPTION should no longer suggest rgdal

@richardsc
Copy link
Collaborator

I strongly recommend that @paleolimbot get involved with this, given his experience with sf and other geo-computations. If you are wanting the help and additional eyes in code, of course.

@dankelley
Copy link
Owner Author

Right now, I'm mainly uncommenting code. I'm mainly done, but won't push to GH for a day or two. None of this is anything like a wholesale conversion of code. It's just a matter of switching a function call (and removing a lot of fluff that was needed to get rgdal to work across all platforms).

@paleolimbot
Copy link

Happy to test when it hits the develop branch!

The reproj::reproj() function might be worth benchmarking if you are concerned about speed (I forget whether that or sf::st_project() is faster). You could also try libproj, although this won't be back on CRAN in time for the next OCE release. The advantage of libproj is that it requires zero system package installs; the advantage of reproj::reproj() is that it only requires a PROJ install (and even old versions of PROJ are OK).

In case you're curious and don't already know, the 1-meter ish difference between sf and rgdal might be because North America moved since 1983 and PROJ 6 was the first version that took into account the datum shift between NAD83 and WGS84 and rgdal must not be taking this into account.

@dankelley
Copy link
Owner Author

@paleolimbot I won't need eyes on the changes I'm making today -- it's just code cleanup. What we've had for the past year is that all calculations were done once by rgdal and then by sf and then at 1-metre resolution check that you mention. This was because we had rgdal for a long time, and I wanted to switch to sf but didn't want to affect user's results. (The move to sf started when rgdal deprecated project, which didn't actually happen, but it got me looking at sf and I liked it a lot.)

Wow, libproj -- that's a flash from the past. I wonder when it will get to CRAN?

The place where oce could really use your help (in the new year, perhaps by Q2 if you'll have time) is with what we call the UHL (ugly horizontal line) problem. This is what happens when tracing a path that crosses the 'edge of the earth'. We have some ugly kludges to try to fix that, but I suspect that the real solution is to do proper calculations of edge-of-earth intersection, as I think you have in s2 (https://cran.r-project.org/web/packages/sf/vignettes/sf7.html : polygons-divide-the-sphere section). But that would be another issue! I am terrible for topic-drift in issues, and if I were a New-Years-Resolution sort of person, I'd resolve to resolve that :-)

dankelley added a commit that referenced this issue Dec 24, 2020
(Some is removed, some is commented-out for a while as a placeholder.)

See #1629
@dankelley
Copy link
Owner Author

This was fixed in commit e70e266 so I'll close the issue now (finally).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development for next CRAN high priority pinned Prevent automatic stale designation projection Map projection
Projects
None yet
Development

No branches or pull requests

3 participants