diff --git a/vignettes/bcdata.Rmd b/vignettes/bcdata.Rmd index 2fd59130..bbc30666 100644 --- a/vignettes/bcdata.Rmd +++ b/vignettes/bcdata.Rmd @@ -26,14 +26,14 @@ See the License for the specific language governing permissions and limitations The `bcdata` [R](https://www.r-project.org/) package contains functions for searching & retrieving data from the [B.C. Data Catalogue]( https://catalogue.data.gov.bc.ca). -The [B.C. Data Catalogue](https://www2.gov.bc.ca/gov/content?id=79B5224167334667A44C9E8B5143D0C5) is the place to find British Columbia Government data, applications and web services. Much of the data are released under the [Open Government Licence --- British Columbia](https://www2.gov.bc.ca/gov/content/data/open-data/open-government-licence-bc), as well as numerous other [licences](https://catalogue.data.gov.bc.ca/dataset?download_audience=Public). +The [B.C. Data Catalogue](https://www2.gov.bc.ca/gov/content?id=79B5224167334667A44C9E8B5143D0C5) is the place to find British Columbia Government data, applications and web services. Much of the data are released under the [Open Government Licence --- British Columbia](https://www2.gov.bc.ca/gov/content/data/policy-standards/open-data/open-government-licence-bc), as well as numerous other [licences](https://catalogue.data.gov.bc.ca/dataset?download_audience=Public). You can install `bcdata` directly from CRAN: -```r +``` r install.packages("bcdata") library(bcdata) @@ -44,7 +44,7 @@ library(bcdata) `bcdata::bcdc_browse()` let's you access the [B.C. Data Catalogue web interface](https://catalogue.data.gov.bc.ca) directly from R---opening the catalogue search page in your default browser: -```r +``` r ## Take me to the B.C. Data Catalogue home page bcdc_browse() ``` @@ -52,7 +52,7 @@ bcdc_browse() If you know the catalogue "human-readable" record name or permanent ID you can open directly to the record web page: -```r +``` r ## Take me to the B.C. Winery Locations catalogue record using the record name bcdc_browse("bc-winery-locations") @@ -67,7 +67,7 @@ bcdc_browse("1d21922b-ec4f-42e5-8f6b-bf320a286157") Let's search the catalogue for records that contain the word "recycling": -```r +``` r ## Give me the catalogue search results for 'recycling' bcdc_search("recycling") #> List of B.C. Data Catalogue Records @@ -93,7 +93,7 @@ bcdc_search("recycling") You can set the number of records to be returned from the search and/or you can customize your search using the catalogue search _facets_ `license_id`, `download_audience`, `res_format`, `sector`, `organization`, and `groups`: -```r +``` r ## Give me the first catalogue search result for 'recycling' bcdc_search("recycling", n = 1) #> List of B.C. Data Catalogue Records @@ -123,29 +123,47 @@ bcdc_search("recycling", license_id = "2") You can see all valid values for the catalogue search facets using `bcdata::bcdc_search_facets()`: -```r +``` r ## Valid values for search facet 'license_id' bcdc_search_facets(facet = "license_id") -#> facet count display_name name -#> 1 license_id 92 Statistics Canada Open Licence 21 -#> 2 license_id 12 Open Government Licence – TransLink 48 -#> 3 license_id 13 Open Government Licence – Municipality of North Cowichan 44 -#> 4 license_id 5 Open Government Licence – Industry Training Authority 50 -#> 5 license_id 3 Open Government Licence - Destination BC 43 -#> 6 license_id 61 Open Government Licence - Canada 24 -#> 7 license_id 1837 Open Government Licence - British Columbia 2 -#> 8 license_id 2 Open Government Licence - BC Assessment 47 -#> 9 license_id 5 Open Data Licence for ICBC Information 49 -#> 10 license_id 2 Open Data Commons - Public Domain Dedication and Licence 45 -#> 11 license_id 1 King's Printer Licence - British Columbia 25 -#> 12 license_id 15 Elections BC Open Data Licence 42 -#> 13 license_id 1710 Access Only 22 +#> facet name display_name +#> 1 license_id 21 Statistics Canada Open Licence +#> 2 license_id 53 Open Licence - University of Northern British Columbia +#> 3 license_id 48 Open Government Licence – TransLink +#> 4 license_id 44 Open Government Licence – Municipality of North Cowichan +#> 5 license_id 50 Open Government Licence - SkilledTradesBC +#> 6 license_id 43 Open Government Licence - Destination BC +#> 7 license_id 24 Open Government Licence - Canada +#> 8 license_id 2 Open Government Licence - British Columbia +#> 9 license_id 47 Open Government Licence - BC Assessment +#> 10 license_id 49 Open Data Licence for ICBC Information +#> 11 license_id 52 Open Data Licence - Office of the Registrar of Lobbyists for British Columbia +#> 12 license_id 45 Open Data Commons - Public Domain Dedication and Licence +#> 13 license_id 25 King's Printer Licence - British Columbia +#> 14 license_id 42 Elections BC Open Data Licence +#> 15 license_id 22 Access Only +#> count +#> 1 70 +#> 2 2 +#> 3 12 +#> 4 13 +#> 5 5 +#> 6 3 +#> 7 61 +#> 8 1638 +#> 9 2 +#> 10 5 +#> 11 1 +#> 12 2 +#> 13 1 +#> 14 18 +#> 15 1642 ``` Finally, you can retrieve the _metadata_ for a single catalogue record by using the record name or permanent ID with `bcdc_get_record()`. It is advised to use the permanent ID rather than the human-readable name in non-interactive situations---like scripts---to guard against future name changes of a record: -```r +``` r ## Give me the catalogue record metadata for `bc-first-tire-recycling-data-1991-2006` bcdc_get_record("a29ad492-29a2-44b9-8693-d27a8cc8e686") #> B.C. Data Catalogue Record: BC FIRST Tire Recycling Data 1991-2006 @@ -169,76 +187,70 @@ bcdc_get_record("a29ad492-29a2-44b9-8693-d27a8cc8e686") Once you have located the B.C. Data Catalogue record with the data you want, you can use `bcdata::bcdc_get_data()` to download and read the data from the record. You can use the record name, permanent ID or the result from `bcdc_get_record()`. Let's look at the B.C. Highway Web Cameras data: -```r +``` r ## Get the data resource for the `bc-highway-cams` catalogue record bcdc_get_data("bc-highway-cams") -#> # A tibble: 999 × 19 -#> links_bchi…¹ links…² links…³ links…⁴ id highw…⁵ highw…⁶ camName caption credit orien…⁷ latit…⁸ -#> -#> 1 http://imag… http:/… http:/… http:/… 2 5 Coquih… Coquih… Hwy 5,… N 49.6 -#> 2 http://imag… http:/… http:/… http:/… 5 3 Kooten… Hwy 3,… E 49.1 -#> 3 http://imag… http:/… http:/… http:/… 6 16 Smithe… Hwy 16… N 54.8 -#> 4 http://imag… http:/… http:/… http:/… 7 1 Fraser… Cole R… Hwy 1 … E 49.1 -#> 5 http://imag… http:/… http:/… http:/… 8 1 Vancou… Malaha… Hwy 1 … N 48.6 -#> 6 http://imag… http:/… http:/… http:/… 9 19 Nanaim… Hwy 19… N 49.2 -#> 7 http://imag… http:/… http:/… http:/… 10 97 Northe… South … Hwy 97… N 56.1 -#> 8 http://imag… http:/… http:/… http:/… 11 1 Trans … Revels… Hwy 1 … NE 51.0 -#> 9 http://imag… http:/… http:/… http:/… 12 1 Trans … Three … Hwy 1,… E 50.9 -#> 10 http://imag… http:/… http:/… http:/… 13 99 Peace … Peace … Hwy 99… N 49.0 -#> # … with 989 more rows, 7 more variables: longitude , imageStats_updatePeriodMean , -#> # imageStats_updatePeriodStdDev , markedDelayed , updatePeriodMean , -#> # updatePeriodStdDev , fetchMean , and abbreviated variable names ¹​links_bchighwaycam, -#> # ²​links_imageDisplay, ³​links_imageThumbnail, ⁴​links_replayTheDay, ⁵​highway_number, -#> # ⁶​highway_locationDescription, ⁷​orientation, ⁸​latitude +#> # A tibble: 1,007 × 13 +#> links_bchighwaycam links_imageDisplay links_imageThumbnail links_replayTheDay id +#> +#> 1 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 2 +#> 2 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 5 +#> 3 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 6 +#> 4 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 7 +#> 5 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 8 +#> 6 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 9 +#> 7 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 10 +#> 8 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 11 +#> 9 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 12 +#> 10 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 13 +#> # ℹ 997 more rows +#> # ℹ 8 more variables: highway_number , highway_locationDescription , camName , +#> # caption , credit , orientation , latitude , longitude ## OR use the permanent ID, which is better for scripts or non-interactive use bcdc_get_data("6b39a910-6c77-476f-ac96-7b4f18849b1c") -#> # A tibble: 999 × 19 -#> links_bchi…¹ links…² links…³ links…⁴ id highw…⁵ highw…⁶ camName caption credit orien…⁷ latit…⁸ -#> -#> 1 http://imag… http:/… http:/… http:/… 2 5 Coquih… Coquih… Hwy 5,… N 49.6 -#> 2 http://imag… http:/… http:/… http:/… 5 3 Kooten… Hwy 3,… E 49.1 -#> 3 http://imag… http:/… http:/… http:/… 6 16 Smithe… Hwy 16… N 54.8 -#> 4 http://imag… http:/… http:/… http:/… 7 1 Fraser… Cole R… Hwy 1 … E 49.1 -#> 5 http://imag… http:/… http:/… http:/… 8 1 Vancou… Malaha… Hwy 1 … N 48.6 -#> 6 http://imag… http:/… http:/… http:/… 9 19 Nanaim… Hwy 19… N 49.2 -#> 7 http://imag… http:/… http:/… http:/… 10 97 Northe… South … Hwy 97… N 56.1 -#> 8 http://imag… http:/… http:/… http:/… 11 1 Trans … Revels… Hwy 1 … NE 51.0 -#> 9 http://imag… http:/… http:/… http:/… 12 1 Trans … Three … Hwy 1,… E 50.9 -#> 10 http://imag… http:/… http:/… http:/… 13 99 Peace … Peace … Hwy 99… N 49.0 -#> # … with 989 more rows, 7 more variables: longitude , imageStats_updatePeriodMean , -#> # imageStats_updatePeriodStdDev , markedDelayed , updatePeriodMean , -#> # updatePeriodStdDev , fetchMean , and abbreviated variable names ¹​links_bchighwaycam, -#> # ²​links_imageDisplay, ³​links_imageThumbnail, ⁴​links_replayTheDay, ⁵​highway_number, -#> # ⁶​highway_locationDescription, ⁷​orientation, ⁸​latitude +#> # A tibble: 1,007 × 13 +#> links_bchighwaycam links_imageDisplay links_imageThumbnail links_replayTheDay id +#> +#> 1 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 2 +#> 2 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 5 +#> 3 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 6 +#> 4 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 7 +#> 5 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 8 +#> 6 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 9 +#> 7 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 10 +#> 8 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 11 +#> 9 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 12 +#> 10 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 13 +#> # ℹ 997 more rows +#> # ℹ 8 more variables: highway_number , highway_locationDescription , camName , +#> # caption , credit , orientation , latitude , longitude ## OR use the result from bcdc_get_record() my_record <- bcdc_get_record("6b39a910-6c77-476f-ac96-7b4f18849b1c") bcdc_get_data(my_record) -#> # A tibble: 999 × 19 -#> links_bchi…¹ links…² links…³ links…⁴ id highw…⁵ highw…⁶ camName caption credit orien…⁷ latit…⁸ -#> -#> 1 http://imag… http:/… http:/… http:/… 2 5 Coquih… Coquih… Hwy 5,… N 49.6 -#> 2 http://imag… http:/… http:/… http:/… 5 3 Kooten… Hwy 3,… E 49.1 -#> 3 http://imag… http:/… http:/… http:/… 6 16 Smithe… Hwy 16… N 54.8 -#> 4 http://imag… http:/… http:/… http:/… 7 1 Fraser… Cole R… Hwy 1 … E 49.1 -#> 5 http://imag… http:/… http:/… http:/… 8 1 Vancou… Malaha… Hwy 1 … N 48.6 -#> 6 http://imag… http:/… http:/… http:/… 9 19 Nanaim… Hwy 19… N 49.2 -#> 7 http://imag… http:/… http:/… http:/… 10 97 Northe… South … Hwy 97… N 56.1 -#> 8 http://imag… http:/… http:/… http:/… 11 1 Trans … Revels… Hwy 1 … NE 51.0 -#> 9 http://imag… http:/… http:/… http:/… 12 1 Trans … Three … Hwy 1,… E 50.9 -#> 10 http://imag… http:/… http:/… http:/… 13 99 Peace … Peace … Hwy 99… N 49.0 -#> # … with 989 more rows, 7 more variables: longitude , imageStats_updatePeriodMean , -#> # imageStats_updatePeriodStdDev , markedDelayed , updatePeriodMean , -#> # updatePeriodStdDev , fetchMean , and abbreviated variable names ¹​links_bchighwaycam, -#> # ²​links_imageDisplay, ³​links_imageThumbnail, ⁴​links_replayTheDay, ⁵​highway_number, -#> # ⁶​highway_locationDescription, ⁷​orientation, ⁸​latitude +#> # A tibble: 1,007 × 13 +#> links_bchighwaycam links_imageDisplay links_imageThumbnail links_replayTheDay id +#> +#> 1 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 2 +#> 2 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 5 +#> 3 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 6 +#> 4 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 7 +#> 5 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 8 +#> 6 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 9 +#> 7 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 10 +#> 8 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 11 +#> 9 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 12 +#> 10 https://images.drivebc.ca/bchig… https://images.dr… https://images.driv… https://images.dr… 13 +#> # ℹ 997 more rows +#> # ℹ 8 more variables: highway_number , highway_locationDescription , camName , +#> # caption , credit , orientation , latitude , longitude ``` A catalogue record can have one or multiple data files---or "resources". If there is only one resource, `bcdc_get_data()` will return that resource by default, as in the above `bc-highway-cams` example. If there are multiple data resources you will need to specify which resource you want. Let's look at a catalogue record that contains multiple data resources---BC Schools - Programs Offered in Schools: -```r +``` r ## Get the record ID for the `bc-schools-programs-offered-in-schools` catalogue record bcdc_search("school programs", n = 1) #> List of B.C. Data Catalogue Records @@ -270,45 +282,44 @@ bcdc_get_record("b1f27d1c-244a-410e-a361-931fac62a524") We see there are two data files or resources available in this record, so we need to tell `bcdc_get_data()` which one we want. When used interactively, `bcdc_get_data()` will prompt you with the list of available resources through `bcdata` and ask you to select the resource you want. The resource ID for each data set is available _in_ the metadata record ☝️: -```r +``` r ## Get the txt data resource from the `bc-schools-programs-offered-in-schools` ## catalogue record bcdc_get_data("b1f27d1c-244a-410e-a361-931fac62a524", resource = 'a393f8cf-51ec-42c6-8449-4cea4c75385c') #> # A tibble: 16,152 × 24 -#> Data Le…¹ Schoo…² Facil…³ Publi…⁴ Distr…⁵ Distr…⁶ Schoo…⁷ Schoo…⁸ Has E…⁹ Has C…˟ Has E…˟ Has L…˟ -#> -#> 1 SCHOOL L… 2005/2… STANDA… BC Pub… 005 Southe… 005010… Sparwo… NA TRUE NA NA -#> 2 SCHOOL L… 2006/2… STANDA… BC Pub… 005 Southe… 005010… Sparwo… NA TRUE NA NA -#> 3 SCHOOL L… 2007/2… STANDA… BC Pub… 005 Southe… 005010… Sparwo… NA TRUE NA NA -#> 4 SCHOOL L… 2005/2… STANDA… BC Pub… 005 Southe… 005010… Jaffra… NA TRUE NA NA -#> 5 SCHOOL L… 2006/2… STANDA… BC Pub… 005 Southe… 005010… Jaffra… NA TRUE NA NA -#> 6 SCHOOL L… 2007/2… STANDA… BC Pub… 005 Southe… 005010… Jaffra… NA TRUE NA NA -#> 7 SCHOOL L… 2008/2… STANDA… BC Pub… 005 Southe… 005010… Jaffra… NA TRUE NA NA -#> 8 SCHOOL L… 2009/2… STANDA… BC Pub… 005 Southe… 005010… Jaffra… NA TRUE NA NA -#> 9 SCHOOL L… 2010/2… STANDA… BC Pub… 005 Southe… 005010… Jaffra… NA TRUE NA NA -#> 10 SCHOOL L… 2011/2… STANDA… BC Pub… 005 Southe… 005010… Jaffra… NA TRUE NA NA -#> # … with 16,142 more rows, 12 more variables: `Has Prog Francophone` , +#> `Data Level` `School Year` `Facility Type` `Public Or Independent` `District Number` +#> +#> 1 SCHOOL LEVEL 2005/2006 STANDARD BC Public School 005 +#> 2 SCHOOL LEVEL 2006/2007 STANDARD BC Public School 005 +#> 3 SCHOOL LEVEL 2007/2008 STANDARD BC Public School 005 +#> 4 SCHOOL LEVEL 2005/2006 STANDARD BC Public School 005 +#> 5 SCHOOL LEVEL 2006/2007 STANDARD BC Public School 005 +#> 6 SCHOOL LEVEL 2007/2008 STANDARD BC Public School 005 +#> 7 SCHOOL LEVEL 2008/2009 STANDARD BC Public School 005 +#> 8 SCHOOL LEVEL 2009/2010 STANDARD BC Public School 005 +#> 9 SCHOOL LEVEL 2010/2011 STANDARD BC Public School 005 +#> 10 SCHOOL LEVEL 2011/2012 STANDARD BC Public School 005 +#> # ℹ 16,142 more rows +#> # ℹ 19 more variables: `District Name` , `School Number` , `School Name` , +#> # `Has Eng Lang Learner Prog` , `Has Core French` , `Has Early French Immersion` , +#> # `Has Late French Immersion` , `Has Prog Francophone` , #> # `Has Any French Immersion Prog` , `Has Any French Prog` , #> # `Has Aborig Supp Services` , `Has Other Appr Aborig Prog` , -#> # `Has Aborig Lang And Cult` , `Has Continuing Ed Prog` , -#> # `Has Distributed Learn Prog` , `Has Career Prep Prog` , `Has Coop Prog` , -#> # `Has Apprenticeship Prog` , `Has Career Technical Prog` , and abbreviated variable -#> # names ¹​`Data Level`, ²​`School Year`, ³​`Facility Type`, ⁴​`Public Or Independent`, … +#> # `Has Aborig Lang And Cult` , `Has Continuing Ed Prog` , … ``` Alternatively, you can retrieve the full details of the available resources for a given record as a data frame using `bcdc_tidy_resources()`: -```r +``` r ## Get a data frame of data resources for the `bc-schools-programs-offered-in-schools` ## catalogue record bcdc_tidy_resources("b1f27d1c-244a-410e-a361-931fac62a524") #> # A tibble: 2 × 9 -#> name url id format ext packa…¹ locat…² wfs_a…³ bcdat…⁴ -#> -#> 1 ProgramsOfferedinSchools.txt http://www.bced.… a393… txt txt b1f27d… catalo… FALSE TRUE -#> 2 ProgramsOfferedinSchools.xlsx http://www.bced.… 1e34… xlsx xlsx b1f27d… catalo… FALSE TRUE -#> # … with abbreviated variable names ¹​package_id, ²​location, ³​wfs_available, ⁴​bcdata_available +#> name url id format ext package_id location wfs_available bcdata_available +#> +#> 1 ProgramsOfferedinScho… http… a393… txt txt b1f27d1c-… catalog… FALSE TRUE +#> 2 ProgramsOfferedinScho… http… 1e34… xlsx xlsx b1f27d1c-… catalog… FALSE TRUE ``` `bcdc_get_data()` will also detect if the data resource is a geospatial file, and automatically reads and returns it as an [`sf` object](https://r-spatial.github.io/sf/) in your R session. @@ -316,7 +327,7 @@ bcdc_tidy_resources("b1f27d1c-244a-410e-a361-931fac62a524") Let's get the air zones for British Columbia: -```r +``` r ## Find the B.C. Air Zones catalogue record bcdc_search("air zones", res_format = "geojson") #> List of B.C. Data Catalogue Records @@ -342,7 +353,10 @@ bc_az %>% theme_minimal() ``` +
plot of chunk air_zones +

plot of chunk air_zones

+
**Note:** The `bcdata` package supports downloading _most_ file types, including zip archives. It will do its best to identify and read data from @@ -356,7 +370,7 @@ Many geospatial data sets in the B.C. Data Catalogue are available through a [We Let's get the Capital Regional District boundary from the [B.C. Regional Districts geospatial data](https://catalogue.data.gov.bc.ca/dataset/d1aff64e-dbfe-45a6-af97-582b7f6418b9)---the whole file takes 30-60 seconds to download and I only need the one polygon, so why not save some time: -```r +``` r ## Find the B.C. Regional Districts catalogue record bcdc_search("regional districts administrative areas", res_format = "wms", n = 1) #> List of B.C. Data Catalogue Records @@ -392,7 +406,7 @@ bcdc_describe_feature(bc_regional_districts_metadata) #> 8 UPDATE_TYPE FALSE xsd:string character "A short description of the lates… #> 9 WHEN_UPDATED FALSE xsd:date date "The date and time the record was… #> 10 MAP_STATUS FALSE xsd:string character "That the digital map has been ap… -#> # … with 11 more rows +#> # ℹ 11 more rows ## Get the Capital Regional District polygon from the B.C. Regional ## Districts geospatial data @@ -407,6 +421,9 @@ my_regional_district %>% theme_minimal() ``` +
plot of chunk regional_districts +

plot of chunk regional_districts

+
The vignette [Querying Spatial Data with bcdata](https://bcgov.github.io/bcdata/articles/efficiently-query-spatial-data-in-the-bc-data-catalogue.html) provides a full demonstration on how to use `bcdata::bcdc_query_geodata()` to fine tune a [Web Feature Service](https://www2.gov.bc.ca/gov/content?id=95D78D544B244F34B89223EF069DF74E) request for geospatial data from the B.C. Data Catalogue. diff --git a/vignettes/bcdata.Rmd.orig b/vignettes/bcdata.Rmd.orig index 942db865..05f49541 100644 --- a/vignettes/bcdata.Rmd.orig +++ b/vignettes/bcdata.Rmd.orig @@ -48,7 +48,7 @@ library(ggplot2) The `bcdata` [R](https://www.r-project.org/) package contains functions for searching & retrieving data from the [B.C. Data Catalogue]( https://catalogue.data.gov.bc.ca). -The [B.C. Data Catalogue](https://www2.gov.bc.ca/gov/content?id=79B5224167334667A44C9E8B5143D0C5) is the place to find British Columbia Government data, applications and web services. Much of the data are released under the [Open Government Licence --- British Columbia](https://www2.gov.bc.ca/gov/content/data/open-data/open-government-licence-bc), as well as numerous other [licences](https://catalogue.data.gov.bc.ca/dataset?download_audience=Public). +The [B.C. Data Catalogue](https://www2.gov.bc.ca/gov/content?id=79B5224167334667A44C9E8B5143D0C5) is the place to find British Columbia Government data, applications and web services. Much of the data are released under the [Open Government Licence --- British Columbia](https://www2.gov.bc.ca/gov/content/data/policy-standards/open-data/open-government-licence-bc), as well as numerous other [licences](https://catalogue.data.gov.bc.ca/dataset?download_audience=Public). diff --git a/vignettes/efficiently-query-spatial-data-in-the-bc-data-catalogue.Rmd b/vignettes/efficiently-query-spatial-data-in-the-bc-data-catalogue.Rmd index 789ce8b0..a420af9c 100644 --- a/vignettes/efficiently-query-spatial-data-in-the-bc-data-catalogue.Rmd +++ b/vignettes/efficiently-query-spatial-data-in-the-bc-data-catalogue.Rmd @@ -1,6 +1,6 @@ --- title: "Querying Spatial Data with bcdata" -date: "2022-10-28" +date: "2024-12-11" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Querying Spatial Data with bcdata} @@ -31,7 +31,7 @@ This vignette illustrates how to use `bcdata::bcdc_query_geodata` to request and First you need to load the package. We will also load the `sf` and `dplyr` packages to help us work with spatial data. You can learn more about the `sf` package [here](https://r-spatial.github.io/sf/) and `dplyr` [here](https://dplyr.tidyverse.org/): -```r +``` r library(bcdata) library(sf) library(dplyr) @@ -45,7 +45,7 @@ The [B.C. Data Catalogue](https://catalogue.data.gov.bc.ca/dataset) provides man Our first step is to extract the [school district polygons](https://catalog.data.gov.bc.ca/dataset/78ec5279-4534-49a1-97e8-9d315936f08b) from the B.C. Data Catalogue. This layer is described using this command: -```r +``` r bcdc_get_record("78ec5279-4534-49a1-97e8-9d315936f08b") #> B.C. Data Catalogue Record: School Districts of BC #> Name: school-districts-of-bc (ID: 78ec5279-4534-49a1-97e8-9d315936f08b) @@ -67,7 +67,7 @@ bcdc_get_record("78ec5279-4534-49a1-97e8-9d315936f08b") This data is the boundary of each school district. The key information in this metadata is that the layer has a resource in `"wms"` format ---which means it is available through a Web Feature Service. From this we know we can make use of `bcdc_query_geodata`. -```r +``` r bcdc_query_geodata("78ec5279-4534-49a1-97e8-9d315936f08b") #> Querying 'school-districts-of-bc' record #> • Using collect() on this object will return 59 features and 9 fields @@ -79,23 +79,22 @@ bcdc_query_geodata("78ec5279-4534-49a1-97e8-9d315936f08b") #> Bounding box: xmin: 956376 ymin: 475108.4 xmax: 1635228 ymax: 901924.4 #> Projected CRS: NAD83 / BC Albers #> # A tibble: 6 × 10 -#> id ADMIN…¹ SCHOO…² SCHOO…³ FEATU…⁴ FEATU…⁵ FEATU…⁶ OBJEC…⁷ SE_AN…⁸ geometry -#> -#> 1 WHSE_TA… 300 Arrow … 10 FH1000… 7.39e 9 6.38e5 534 ((1600278 678750, 160026… -#> 2 WHSE_TA… 301 Revels… 19 FH1000… 9.42e 9 7.00e5 535 ((1509789 831133.3, 1509… -#> 3 WHSE_TA… 302 Kooten… 20 FH1000… 3.07e 9 3.49e5 536 ((1554806 563111.3, 1554… -#> 4 WHSE_TA… 303 Vernon 22 FH1000… 5.59e 9 6.23e5 537 ((1542093 677612.6, 1542… -#> 5 WHSE_TA… 304 Centra… 23 FH1000… 2.92e 9 3.57e5 538 ((1464990 601366.2, 1464… -#> 6 WHSE_TA… 305 Caribo… 27 FH1000… 6.12e10 2.14e6 539 ((1372394 901260.4, 1372… -#> # … with abbreviated variable names ¹​ADMIN_AREA_SID, ²​SCHOOL_DISTRICT_NAME, -#> # ³​SCHOOL_DISTRICT_NUMBER, ⁴​FEATURE_CODE, ⁵​FEATURE_AREA_SQM, ⁶​FEATURE_LENGTH_M, ⁷​OBJECTID, -#> # ⁸​SE_ANNO_CAD_DATA +#> id ADMIN_AREA_SID SCHOOL_DISTRICT_NAME SCHOOL_DISTRICT_NUMBER FEATURE_CODE FEATURE_AREA_SQM +#> +#> 1 WHSE_TAN… 300 Arrow Lakes 10 FH10000300 7392472526. +#> 2 WHSE_TAN… 301 Revelstoke 19 FH10000300 9416076465. +#> 3 WHSE_TAN… 302 Kootenay-Columbia 20 FH10000300 3072672101. +#> 4 WHSE_TAN… 303 Vernon 22 FH10000300 5588468673. +#> 5 WHSE_TAN… 304 Central Okanagan 23 FH10000300 2916757936. +#> 6 WHSE_TAN… 305 Cariboo-Chilcotin 27 FH10000300 61213520885. +#> # ℹ 4 more variables: FEATURE_LENGTH_M , OBJECTID , SE_ANNO_CAD_DATA , +#> # geometry ``` This is the initial query to the data in the catalogue. What has been returned is *not* the actual data but rather a subset to help you tune your query. The printed output of this query offers several useful pieces of information. Because we have queried with a unique ID, we are shown the name of the record. We also receive instruction that using `collect()` will retrieve a given number of features and fields present for this query. Lastly, there is a reminder that what is printed is only the first 6 rows of the record. Since we are limiting the scope of analysis to the Greater Victoria, Prince George and Kamloops/Thompson school districts, we want to ask the catalogue for only those polygons just like we would in a typical `dplyr` workflow: -```r +``` r bcdc_query_geodata("78ec5279-4534-49a1-97e8-9d315936f08b") %>% filter(SCHOOL_DISTRICT_NAME %in% c("Greater Victoria", "Prince George","Kamloops/Thompson")) #> Querying 'school-districts-of-bc' record @@ -108,18 +107,17 @@ bcdc_query_geodata("78ec5279-4534-49a1-97e8-9d315936f08b") %>% #> Bounding box: xmin: 1126789 ymin: 821142.1 xmax: 1528155 ymax: 1224202 #> Projected CRS: NAD83 / BC Albers #> # A tibble: 1 × 10 -#> id ADMIN…¹ SCHOO…² SCHOO…³ FEATU…⁴ FEATU…⁵ FEATU…⁶ OBJEC…⁷ SE_AN…⁸ geometry -#> -#> 1 WHSE_TA… 328 Prince… 57 FH1000… 5.19e10 2.26e6 562 ((1137478 1221549, 11373… -#> # … with abbreviated variable names ¹​ADMIN_AREA_SID, ²​SCHOOL_DISTRICT_NAME, -#> # ³​SCHOOL_DISTRICT_NUMBER, ⁴​FEATURE_CODE, ⁵​FEATURE_AREA_SQM, ⁶​FEATURE_LENGTH_M, ⁷​OBJECTID, -#> # ⁸​SE_ANNO_CAD_DATA +#> id ADMIN_AREA_SID SCHOOL_DISTRICT_NAME SCHOOL_DISTRICT_NUMBER FEATURE_CODE FEATURE_AREA_SQM +#> +#> 1 WHSE_TAN… 328 Prince George 57 FH10000300 51888780641. +#> # ℹ 4 more variables: FEATURE_LENGTH_M , OBJECTID , SE_ANNO_CAD_DATA , +#> # geometry ``` To further tune our query, we can also request only the columns we want. Really we only want the school district column and the spatial information. During an actual analysis, it is possible that you may need to initially collect more data than you want to determine value to subset by. For example, there is currently no way to ask the catalogue for all possible unique values of `SCHOOL_DISTRICT_NAME`. Is that case the data will need to be brought into R and unique values will need to be determined there. -```r +``` r bcdc_query_geodata("78ec5279-4534-49a1-97e8-9d315936f08b") %>% filter(SCHOOL_DISTRICT_NAME %in% c("Greater Victoria", "Prince George","Kamloops/Thompson")) %>% select(SCHOOL_DISTRICT_NAME) @@ -133,17 +131,16 @@ bcdc_query_geodata("78ec5279-4534-49a1-97e8-9d315936f08b") %>% #> Bounding box: xmin: 1126789 ymin: 821142.1 xmax: 1528155 ymax: 1224202 #> Projected CRS: NAD83 / BC Albers #> # A tibble: 1 × 6 -#> id ADMIN…¹ SCHOO…² SCHOO…³ OBJEC…⁴ geometry -#> -#> 1 WHSE_TANTALIS.TA_SCHOOL_DISTRICTS_SVW.f… 328 Prince… 57 562 ((1137478 1221549, 11373… -#> # … with abbreviated variable names ¹​ADMIN_AREA_SID, ²​SCHOOL_DISTRICT_NAME, -#> # ³​SCHOOL_DISTRICT_NUMBER, ⁴​OBJECTID +#> id ADMIN_AREA_SID SCHOOL_DISTRICT_NAME SCHOOL_DISTRICT_NUMBER OBJECTID +#> +#> 1 WHSE_TANTALIS.TA_SCHOOL_DISTR… 328 Prince George 57 562 +#> # ℹ 1 more variable: geometry ``` Note that in the `select` statement, we did not explicitly ask for the spatial data and also that there are several columns present that we didn't select. This is because within each data set in the data catalogue, there are several columns that will always be returned regardless of what is selected. If you really don't want those columns after you `collect` the data, which we will take care of right now, you can drop them: -```r +``` r districts <- bcdc_query_geodata("78ec5279-4534-49a1-97e8-9d315936f08b") %>% filter(SCHOOL_DISTRICT_NAME %in% c("Greater Victoria", "Prince George","Kamloops/Thompson")) %>% select(SCHOOL_DISTRICT_NAME) %>% @@ -153,11 +150,14 @@ districts <- bcdc_query_geodata("78ec5279-4534-49a1-97e8-9d315936f08b") %>% Again note here that we have assigned the object a name and added the `collect` statement. This step happens when you have selected the data you want and wish to begin working with it in R like a normal `sf` object. For example, we can now plot these three school districts: -```r +``` r plot(st_geometry(districts)) ``` +
plot of chunk districts +

plot of chunk districts

+
Now that we have the spatial boundaries narrowed by specific school districts we can perform some spatial operations to determine parks in the school districts. @@ -165,7 +165,7 @@ Now that we have the spatial boundaries narrowed by specific school districts we For the purposes of this example, let's consider [this greenspace](https://catalogue.data.gov.bc.ca/dataset/6a2fea1b-0cc4-4fc2-8017-eaf755d516da) layer in the catalogue. This layer is described here: -```r +``` r bcdc_get_record("6a2fea1b-0cc4-4fc2-8017-eaf755d516da") #> B.C. Data Catalogue Record: Local and Regional Greenspaces #> Name: local-and-regional-greenspaces (ID: 6a2fea1b-0cc4-4fc2-8017-eaf755d516da) @@ -191,68 +191,68 @@ bcdc_get_record("6a2fea1b-0cc4-4fc2-8017-eaf755d516da") Again we recognize this is [WFS-enabled](https://en.wikipedia.org/wiki/Web_Feature_Service) geospatial data, which means we can make use of `bcdc_query_geodata`. -```r +``` r bcdc_query_geodata("6a2fea1b-0cc4-4fc2-8017-eaf755d516da") #> Querying 'local-and-regional-greenspaces' record -#> • Using collect() on this object will return 8555 features and 19 fields +#> • Using collect() on this object will return 9143 features and 19 fields #> • At most six rows of the record are printed here #> ──────────────────────────────────────────────────────────────────────────────────────────────────── #> Simple feature collection with 6 features and 19 fields #> Geometry type: MULTIPOLYGON #> Dimension: XY -#> Bounding box: xmin: 1228935 ymin: 455032.1 xmax: 1236528 ymax: 471352 +#> Bounding box: xmin: 1205812 ymin: 461894.2 xmax: 1210343 ymax: 463217.4 #> Projected CRS: NAD83 / BC Albers #> # A tibble: 6 × 20 -#> id LOCAL…¹ PARK_…² PARK_…³ PARK_…⁴ REGIO…⁵ MUNIC…⁶ CIVIC…⁷ CIVIC…⁸ STREE…⁹ LATIT…˟ LONGI…˟ -#> -#> 1 WHSE_BASE… 30 Blumse… Local Park Metro … Surrey 3536 Rosema… 49.1 -123. -#> 2 WHSE_BASE… 31 Bob Ru… Local Park Metro … Surrey 5448 148 St 49.1 -123. -#> 3 WHSE_BASE… 32 Bog Pa… Local Park Metro … Surrey 9740 130 St 49.2 -123. -#> 4 WHSE_BASE… 33 Bonacc… Local Park Metro … Surrey 14962 98 Ave 49.2 -123. -#> 5 WHSE_BASE… 34 Cotton… Local Playgr… Metro … Surrey 9356 132A St 49.2 -123. -#> 6 WHSE_BASE… 35 North … Local Green … Metro … Surrey 11260 164 St 49.2 -123. -#> # … with 8 more variables: WHEN_UPDATED , WEBSITE_URL , LICENCE_COMMENTS , -#> # FEATURE_AREA_SQM , FEATURE_LENGTH_M , OBJECTID , SE_ANNO_CAD_DATA , -#> # geometry , and abbreviated variable names ¹​LOCAL_REG_GREENSPACE_ID, -#> # ²​PARK_NAME, ³​PARK_TYPE, ⁴​PARK_PRIMARY_USE, ⁵​REGIONAL_DISTRICT, ⁶​MUNICIPALITY, ⁷​CIVIC_NUMBER, -#> # ⁸​CIVIC_NUMBER_SUFFIX, ⁹​STREET_NAME, ˟​LATITUDE, ˟​LONGITUDE +#> id LOCAL_REG_GREENSPACE…¹ PARK_NAME PARK_TYPE PARK_PRIMARY_USE REGIONAL_DISTRICT MUNICIPALITY +#> +#> 1 WHSE_B… 40 Wowk Nei… Local Park Metro Vancouver Richmond +#> 2 WHSE_B… 41 Hugh Boy… Local Park Metro Vancouver Richmond +#> 3 WHSE_B… 42 Sandifor… Local Park Metro Vancouver Richmond +#> 4 WHSE_B… 43 Kozier N… Local Park Metro Vancouver Richmond +#> 5 WHSE_B… 44 Maple La… Local Park Metro Vancouver Richmond +#> 6 WHSE_B… 45 South Ar… Local Park Metro Vancouver Richmond +#> # ℹ abbreviated name: ¹​LOCAL_REG_GREENSPACE_ID +#> # ℹ 13 more variables: CIVIC_NUMBER , CIVIC_NUMBER_SUFFIX , STREET_NAME , +#> # LATITUDE , LONGITUDE , WHEN_UPDATED , WEBSITE_URL , +#> # LICENCE_COMMENTS , FEATURE_AREA_SQM , FEATURE_LENGTH_M , OBJECTID , +#> # SE_ANNO_CAD_DATA , geometry ``` Since we are interested in only "Park" data we can subset our query: -```r +``` r bcdc_query_geodata("6a2fea1b-0cc4-4fc2-8017-eaf755d516da") %>% filter(PARK_PRIMARY_USE == "Park") #> Querying 'local-and-regional-greenspaces' record -#> • Using collect() on this object will return 4251 features and 19 fields +#> • Using collect() on this object will return 4517 features and 19 fields #> • At most six rows of the record are printed here #> ──────────────────────────────────────────────────────────────────────────────────────────────────── #> Simple feature collection with 6 features and 19 fields #> Geometry type: MULTIPOLYGON #> Dimension: XY -#> Bounding box: xmin: 1228935 ymin: 455032.1 xmax: 1238850 ymax: 468825.5 +#> Bounding box: xmin: 1205812 ymin: 461894.2 xmax: 1210343 ymax: 463217.4 #> Projected CRS: NAD83 / BC Albers #> # A tibble: 6 × 20 -#> id LOCAL…¹ PARK_…² PARK_…³ PARK_…⁴ REGIO…⁵ MUNIC…⁶ CIVIC…⁷ CIVIC…⁸ STREE…⁹ LATIT…˟ LONGI…˟ -#> -#> 1 WHSE_BASE… 30 Blumse… Local Park Metro … Surrey 3536 Rosema… 49.1 -123. -#> 2 WHSE_BASE… 31 Bob Ru… Local Park Metro … Surrey 5448 148 St 49.1 -123. -#> 3 WHSE_BASE… 32 Bog Pa… Local Park Metro … Surrey 9740 130 St 49.2 -123. -#> 4 WHSE_BASE… 33 Bonacc… Local Park Metro … Surrey 14962 98 Ave 49.2 -123. -#> 5 WHSE_BASE… 37 Freedo… Local Park Metro … Surrey 15452 84 Ave 49.2 -123. -#> 6 WHSE_BASE… 46 Barnst… Local Park Metro … Surrey 9998 Lyncea… 49.2 -123. -#> # … with 8 more variables: WHEN_UPDATED , WEBSITE_URL , LICENCE_COMMENTS , -#> # FEATURE_AREA_SQM , FEATURE_LENGTH_M , OBJECTID , SE_ANNO_CAD_DATA , -#> # geometry , and abbreviated variable names ¹​LOCAL_REG_GREENSPACE_ID, -#> # ²​PARK_NAME, ³​PARK_TYPE, ⁴​PARK_PRIMARY_USE, ⁵​REGIONAL_DISTRICT, ⁶​MUNICIPALITY, ⁷​CIVIC_NUMBER, -#> # ⁸​CIVIC_NUMBER_SUFFIX, ⁹​STREET_NAME, ˟​LATITUDE, ˟​LONGITUDE +#> id LOCAL_REG_GREENSPACE…¹ PARK_NAME PARK_TYPE PARK_PRIMARY_USE REGIONAL_DISTRICT MUNICIPALITY +#> +#> 1 WHSE_B… 40 Wowk Nei… Local Park Metro Vancouver Richmond +#> 2 WHSE_B… 41 Hugh Boy… Local Park Metro Vancouver Richmond +#> 3 WHSE_B… 42 Sandifor… Local Park Metro Vancouver Richmond +#> 4 WHSE_B… 43 Kozier N… Local Park Metro Vancouver Richmond +#> 5 WHSE_B… 44 Maple La… Local Park Metro Vancouver Richmond +#> 6 WHSE_B… 45 South Ar… Local Park Metro Vancouver Richmond +#> # ℹ abbreviated name: ¹​LOCAL_REG_GREENSPACE_ID +#> # ℹ 13 more variables: CIVIC_NUMBER , CIVIC_NUMBER_SUFFIX , STREET_NAME , +#> # LATITUDE , LONGITUDE , WHEN_UPDATED , WEBSITE_URL , +#> # LICENCE_COMMENTS , FEATURE_AREA_SQM , FEATURE_LENGTH_M , OBJECTID , +#> # SE_ANNO_CAD_DATA , geometry ``` Here we see that this greatly reduces the number of features that we are dealing with (and correspondingly the amount of data that needs to be transferred over the web). Remember also that we still have not actually requested the full data set. This is just still a preview. Also this query still includes all municipal parks in BC while we only want the ones in the three school districts - the polygons defined by the `districts` object. To find that subset of parks we can make use of the built-in geometric operators which allow us to perform spatial operations remotely fine tuning our query even further. Here using the `INTERSECTS` function is appropriate and since this is a last tuning step, we can call `collect` and assign a name to this object. These requests can sometimes take quite a long: -```r +``` r districts_parks <- bcdc_query_geodata("6a2fea1b-0cc4-4fc2-8017-eaf755d516da") %>% filter(PARK_PRIMARY_USE == "Park") %>% filter(INTERSECTS(districts)) %>% @@ -267,12 +267,15 @@ districts_parks <- bcdc_query_geodata("6a2fea1b-0cc4-4fc2-8017-eaf755d516da") %> Plotting both the filtered parks data and the district polygons reveals an important consideration when using `bcdata`: +
plot of chunk district_parks +

plot of chunk district_parks

+
In this example, many parks not contained within the school districts are included in the `districts_parks` object. This is because rather than a full intersection, `bcdata` draws a bounding box around all the polygons that are doing the intersection (in this case `district`) and does the intersection based on that bounding box. This behaviour is imposed by the Web Feature Server but controlled via the `bcdata.max_geom_pred_size` option (See `?bcdc_options` for default values). Using this example, you can check to see if the size of the `districts` object exceeded the current value of `bcdata.max_geom_pred_size`: -```r +``` r bcdc_check_geom_size(districts) #> The object is too large to perform exact spatial operations using bcdata. #> Object size: 948576 bytes @@ -283,21 +286,27 @@ bcdc_check_geom_size(districts) Drawing the bounding box illustrates this point: +
plot of chunk bbox +

plot of chunk bbox

+
We are left with two options to get around this problem. First, we can simply do some additional processing with the `sf` package. Specifically we can use a spatial join to assign parks into their respective district: -```r +``` r districts_parks_join <- districts_parks %>% st_join(districts, left = FALSE) ``` +
plot of chunk dp_join +

plot of chunk dp_join

+
A second approach is to set an internal option (`bcdata.max_geom_pred_size`) and increase the threshold of when a bounding box is drawn. Options are set in R like this: -```r +``` r options("bcdata.max_geom_pred_size" = {object size in bytes}) ``` @@ -306,7 +315,7 @@ The value of `bcdata.max_geom_pred_size` is set conservatively so that requests Finally, to address our original question of which school district has the most municipal park space we can calculate the area of each park polygon and then sum those areas by school district: -```r +``` r districts_parks_join %>% mutate(area = st_area(geometry)) %>% st_set_geometry(NULL) %>% @@ -324,7 +333,7 @@ districts_parks_join %>% Suppose we now want to find all of the parks within 1km of the school districts we are interested in. We can use `sf::st_buffer()` to make a buffer around the `districts` object, then intersect that with the greenspaces data. Note that `st_buffer()` needs to be executed in R on our computer, to create the buffered area that is sent to the WFS server to perform the `INTERSECT` query remotely. We tell `filter()` to evaluate that piece of code locally by wrapping it in a `local()` call: -```r +``` r greenspaces_around <- bcdc_query_geodata("6a2fea1b-0cc4-4fc2-8017-eaf755d516da") %>% filter(INTERSECTS(local(st_buffer(districts, 1000)))) %>% collect() @@ -334,7 +343,7 @@ greenspaces_around <- bcdc_query_geodata("6a2fea1b-0cc4-4fc2-8017-eaf755d516da") There are a couple of other functions in `bcdata` that are useful to know when working with spatial data from the catalogue. `bcdc_describe_feature` gives the column names, whether the column is selectable, and the column types in both R and on the remote server: -```r +``` r bcdc_describe_feature("6a2fea1b-0cc4-4fc2-8017-eaf755d516da") #> # A tibble: 20 × 5 #> col_name sticky remote_col_type local_col_type column_comments @@ -366,7 +375,7 @@ This is a helpful initial step to learn column names and types when you construc Another useful function is `show_query()` which provides information on the request issued to the remote server: -```r +``` r bcdc_query_geodata("6a2fea1b-0cc4-4fc2-8017-eaf755d516da") %>% filter(PARK_PRIMARY_USE == "Park") %>% filter(INTERSECTS(districts)) %>% @@ -386,7 +395,7 @@ This output is what being created by the dplyr code outlined above. ## Using B.C. Geographic Warehouse (BCGW) layer names -If you are familiar with the [B.C. Geographic Warehouse (BCGW)](https://www2.gov.bc.ca/gov/content/data/geographic-data-services/bc-spatial-data-infrastructure/bc-geographic-warehouse), +If you are familiar with the [B.C. Geographic Warehouse (BCGW)](https://www2.gov.bc.ca/gov/content/data/finding-and-sharing/bc-geographic-warehouse), you may already know the name of a layer that you want from the BCGW. `bcdc_query_geodata()` (as well as all other related functions) supports supplying that name directly. For example, the @@ -395,7 +404,7 @@ shows that the object name is `WHSE_IMAGERY_AND_BASE_MAPS.GSR_AIRPORTS_SVW`, and we can use that in `bcdc_query_geodata()`: -```r +``` r # Look at the columns available: bcdc_describe_feature("WHSE_IMAGERY_AND_BASE_MAPS.GSR_AIRPORTS_SVW") #> # A tibble: 42 × 5 @@ -411,7 +420,7 @@ bcdc_describe_feature("WHSE_IMAGERY_AND_BASE_MAPS.GSR_AIRPORTS_SVW") #> 8 AIRPORT_NAME TRUE xsd:string character AIRPORT_NAME is a business n… #> 9 DESCRIPTION FALSE xsd:string character DESCRIPTION describes the Oc… #> 10 PHYSICAL_ADDRESS FALSE xsd:string character PHYSICAL_ADDRESS contains th… -#> # … with 32 more rows +#> # ℹ 32 more rows # Query the data with bcdc_query_geodata and filter + select: bcdc_query_geodata("WHSE_IMAGERY_AND_BASE_MAPS.GSR_AIRPORTS_SVW") %>% @@ -427,16 +436,16 @@ bcdc_query_geodata("WHSE_IMAGERY_AND_BASE_MAPS.GSR_AIRPORTS_SVW") %>% #> Bounding box: xmin: 833323.9 ymin: 406886.6 xmax: 1266385 ymax: 1054950 #> Projected CRS: NAD83 / BC Albers #> # A tibble: 6 × 12 -#> id CUSTO…¹ BUSIN…² BUSIN…³ OCCUP…⁴ SOURC…⁵ SUPPL…⁶ AIRPO…⁷ LOCAL…⁸ NUMBE…⁹ SEQUE…˟ -#> -#> 1 WHSE_IMAGERY_AND_… "Minis… airTra… Air Tr… BC Air… 455 N Terrac… Terrace 2 578 -#> 2 WHSE_IMAGERY_AND_… "Minis… airTra… Air Tr… BC Air… 464 N Victor… North … 3 734 -#> 3 WHSE_IMAGERY_AND_… "Minis… airTra… Air Tr… BC Air… 482 N Nanaim… Nanaimo 1 854 -#> 4 WHSE_IMAGERY_AND_… "Minis… airTra… Air Tr… BC Air… 483 N Tofino… Tofino 3 642 -#> 5 WHSE_IMAGERY_AND_… "Minis… airTra… Air Tr… BC Air… 484 N Abbots… Abbots… 2 1036 -#> 6 WHSE_IMAGERY_AND_… "Minis… airTra… Air Tr… BC Air… 487 N Bounda… Delta 2 828 -#> # … with 1 more variable: geometry , and abbreviated variable names -#> # ¹​CUSTODIAN_ORG_DESCRIPTION, ²​BUSINESS_CATEGORY_CLASS, ³​BUSINESS_CATEGORY_DESCRIPTION, -#> # ⁴​OCCUPANT_TYPE_DESCRIPTION, ⁵​SOURCE_DATA_ID, ⁶​SUPPLIED_SOURCE_ID_IND, ⁷​AIRPORT_NAME, ⁸​LOCALITY, -#> # ⁹​NUMBER_OF_RUNWAYS, ˟​SEQUENCE_ID +#> id CUSTODIAN_ORG_DESCRI…¹ BUSINESS_CATEGORY_CL…² BUSINESS_CATEGORY_DE…³ OCCUPANT_TYPE_DESCRI…⁴ +#> +#> 1 WHSE_… "Ministry of Forest, … airTransportation Air Transportation BC Airports +#> 2 WHSE_… "Ministry of Forest, … airTransportation Air Transportation BC Airports +#> 3 WHSE_… "Ministry of Forest, … airTransportation Air Transportation BC Airports +#> 4 WHSE_… "Ministry of Forest, … airTransportation Air Transportation BC Airports +#> 5 WHSE_… "Ministry of Forest, … airTransportation Air Transportation BC Airports +#> 6 WHSE_… "Ministry of Forest, … airTransportation Air Transportation BC Airports +#> # ℹ abbreviated names: ¹​CUSTODIAN_ORG_DESCRIPTION, ²​BUSINESS_CATEGORY_CLASS, +#> # ³​BUSINESS_CATEGORY_DESCRIPTION, ⁴​OCCUPANT_TYPE_DESCRIPTION +#> # ℹ 7 more variables: SOURCE_DATA_ID , SUPPLIED_SOURCE_ID_IND , AIRPORT_NAME , +#> # LOCALITY , NUMBER_OF_RUNWAYS , SEQUENCE_ID , geometry ``` diff --git a/vignettes/efficiently-query-spatial-data-in-the-bc-data-catalogue.Rmd.orig b/vignettes/efficiently-query-spatial-data-in-the-bc-data-catalogue.Rmd.orig index 6587c578..942b11ea 100644 --- a/vignettes/efficiently-query-spatial-data-in-the-bc-data-catalogue.Rmd.orig +++ b/vignettes/efficiently-query-spatial-data-in-the-bc-data-catalogue.Rmd.orig @@ -220,7 +220,7 @@ This output is what being created by the dplyr code outlined above. ## Using B.C. Geographic Warehouse (BCGW) layer names -If you are familiar with the [B.C. Geographic Warehouse (BCGW)](https://www2.gov.bc.ca/gov/content/data/geographic-data-services/bc-spatial-data-infrastructure/bc-geographic-warehouse), +If you are familiar with the [B.C. Geographic Warehouse (BCGW)](https://www2.gov.bc.ca/gov/content/data/finding-and-sharing/bc-geographic-warehouse), you may already know the name of a layer that you want from the BCGW. `bcdc_query_geodata()` (as well as all other related functions) supports supplying that name directly. For example, the diff --git a/vignettes/explore-silviculture-data-using-bcdata.Rmd b/vignettes/explore-silviculture-data-using-bcdata.Rmd index 607d4b5f..e7e50f7e 100644 --- a/vignettes/explore-silviculture-data-using-bcdata.Rmd +++ b/vignettes/explore-silviculture-data-using-bcdata.Rmd @@ -1,6 +1,6 @@ --- title: "Exploring Silviculture Data with bcdata" -date: "2022-10-28" +date: "2024-12-11" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Exploring Silviculture Data with bcdata} @@ -47,7 +47,7 @@ Let's use the `bcdata` package to search, retrieve and explore the RESULTS silvi To start, let's load the `bcdata` package. We will also load the `dplyr` and `ggplot2` packages to help us work with the data. You can learn more about the `dplyr` package [here](https://dplyr.tidyverse.org/) and the `ggplot2` package [here](https://ggplot2.tidyverse.org/): -```r +``` r library(bcdata) library(dplyr) library(ggplot2) @@ -59,7 +59,7 @@ library(ggplot2) We can gather the data we need to answer our question using the [RESULTS - silviculture forest cover dataset](https://catalogue.data.gov.bc.ca/dataset/258bb088-4113-47b1-b568-ce20bd64e3e3). First, let's take a look at the metadata record using `bcdc_get_record()`: -```r +``` r # Get the metadata using the human-readable record name bcdc_get_record("results-forest-cover-silviculture") #> B.C. Data Catalogue Record: RESULTS - Forest Cover Silviculture @@ -84,12 +84,12 @@ bcdc_get_record("results-forest-cover-silviculture") We see that this is a [Web Feature Service-enabled](https://en.wikipedia.org/wiki/Web_Feature_Service) geospatial data set--the list of data resources includes `WMS getCapabilities request`--so we can query and retrieve this geospatial data set using `bcdc_query_geodata()`: -```r +``` r # Query the data using the permanent ID of the record to guard against name changes bcdc_query_geodata("258bb088-4113-47b1-b568-ce20bd64e3e3") #> Querying 'results-forest-cover-silviculture' record -#> • Using collect() on this object will return 920718 features and 159 fields -#> • Accessing this record requires pagination and will make 921 separate requests to the WFS. +#> • Using collect() on this object will return 957189 features and 159 fields +#> • Accessing this record requires pagination and will make 96 separate requests to the WFS. #> • See ?bcdc_options #> • At most six rows of the record are printed here #> ──────────────────────────────────────────────────────────────────────────────────────────────────── @@ -99,21 +99,21 @@ bcdc_query_geodata("258bb088-4113-47b1-b568-ce20bd64e3e3") #> Bounding box: xmin: 1184167 ymin: 526455.5 xmax: 1801754 ymax: 1083545 #> Projected CRS: NAD83 / BC Albers #> # A tibble: 6 × 160 -#> id FORES…¹ STOCK…² OPENI…³ STAND…⁴ SILV_…⁵ SILV_…⁶ SILV_…⁷ SILV_…⁸ STOCK…⁹ STOCK…˟ STOCK…˟ -#> -#> 1 WHSE_FORE… 4177991 2243439 1724262 1 C 1.6 1.6 0 IMM ART -#> 2 WHSE_FORE… 3994007 NA 1248495 91 1.2 1.2 0 MAT NAT -#> 3 WHSE_FORE… 3994067 NA 1120935 86 0.6 0.6 0 MAT NAT -#> 4 WHSE_FORE… 3994009 1404558 1248495 A AA 22.1 22.1 0 IMM ART -#> 5 WHSE_FORE… 3994057 1211894 1120935 B 2B 16.9 16.9 0 IMM ART -#> 6 WHSE_FORE… 3994063 NA 1120935 82 0.1 0.1 0 MAT NAT -#> # … with 148 more variables: SILV_RESERVE_CODE , SILV_RESERVE_OBJECTIVE_CODE , +#> id FOREST_COVER_ID STOCKING_STANDARD_UN…¹ OPENING_ID STANDARDS_UNIT_ID SILV_POLYGON_NUMBER +#> +#> 1 WHSE_FORE… 4177991 2243439 1724262 1 C +#> 2 WHSE_FORE… 3994007 NA 1248495 91 +#> 3 WHSE_FORE… 3994067 NA 1120935 86 +#> 4 WHSE_FORE… 3994009 1404558 1248495 A AA +#> 5 WHSE_FORE… 3994057 1211894 1120935 B 2B +#> 6 WHSE_FORE… 3994063 NA 1120935 82 +#> # ℹ abbreviated name: ¹​STOCKING_STANDARD_UNIT_ID +#> # ℹ 154 more variables: SILV_POLYGON_AREA , SILV_POLYGON_NET_AREA , +#> # SILV_NON_MAPPED_AREA , STOCKING_STATUS_CODE , STOCKING_TYPE_CODE , +#> # STOCKING_CLASS_CODE , SILV_RESERVE_CODE , SILV_RESERVE_OBJECTIVE_CODE , #> # TREE_COVER_PATTERN_CODE , REENTRY_YEAR , REFERENCE_YEAR , SITE_INDEX , #> # SITE_INDEX_SOURCE_CODE , BGC_ZONE_CODE , BGC_SUBZONE_CODE , BGC_VARIANT , -#> # BGC_PHASE , BEC_SITE_SERIES , BEC_SITE_TYPE , BEC_SERAL , -#> # IS_SILV_IMPLIED_IND , FOREST_COVER_SILV_TYPE , S_FOREST_COVER_LAYER_ID , -#> # S_TOTAL_STEMS_PER_HA , S_TOTAL_WELL_SPACED_STEMS_HA , -#> # S_WELL_SPACED_STEMS_PER_HA , S_FREE_GROWING_STEMS_PER_HA , … +#> # BGC_PHASE , BEC_SITE_SERIES , BEC_SITE_TYPE , BEC_SERAL , … ``` This query shows that this data set has many features and over 150 fields. Each feature is a treatment unit within a harvested opening, and contains information on the leading five tree species that are present in each treatment unit, including stems per hectare, age, and height. @@ -128,7 +128,7 @@ To address our question, we need the treatment data (1) from the Prince George N First, we can use the `bcdata` package to download the spatial boundary for the Prince George Natural Resource District—`DPG` is the `ORG_UNIT` for Prince George Natural Resource District: -```r +``` r ## Create a spatial feature object named dpg dpg <- bcdc_query_geodata("natural-resource-nr-district") %>% filter(ORG_UNIT=="DPG") %>% # filter for Prince George Natural Resource District @@ -138,21 +138,24 @@ dpg <- bcdc_query_geodata("natural-resource-nr-district") %>% Let's plot this spatial object and double check we have we what we need: -```r +``` r dpg %>% ggplot() + geom_sf() + theme_minimal() ``` +
plot of chunk plot-dpg +

plot of chunk plot-dpg

+
Now we have a spatial object that we can use as a bounding box to filter and download records in the RESULTS - silviculture layer from the Prince George Natural Resource District. We only need to download the treatments that have western larch planted. We can use the `bcdc_describe_feature()` helper function to examine the column names and types of the layer. In this case, we want to keep rows where the five `S_SPECIES_CODE_*` columns contain `"LW"`, the code for western larch. -```r +``` r # Make a vector of tree species we are interested in # (in this case only LW for western larch) spp_list = c("LW") @@ -177,15 +180,15 @@ trees_dpg <- Let's look at the dimensions of this now much more manageable data object we have downloaded from the B.C. Data Catalogue: -```r +``` r dim(trees_dpg) -#> [1] 212 160 +#> [1] 261 160 ``` We can see there are several treatment units planted with western larch, and we can make a quick map of these harvested openings for the Prince George Natural Resource District: -```r +``` r trees_dpg %>% ggplot() + geom_sf() + @@ -193,7 +196,10 @@ trees_dpg %>% theme_minimal() ``` +
plot of chunk map-larch-plantations-dpg +

plot of chunk map-larch-plantations-dpg

+
We can also create some quick descriptive summaries of the data, treating the geospatial attribute table as a data frame in R, and answer our original question---how much western larch has been planted in the Prince George Natural Resource District? @@ -202,7 +208,7 @@ We can also create some quick descriptive summaries of the data, treating the ge #### What is the size and age distribution of larch plantations in the Prince George Natural Resource District in the year 2020? -```r +``` r trees_dpg %>% mutate(age = 2020 - REFERENCE_YEAR + S_SPECIES_AGE_1) %>% #create a plantation age column ggplot() + #start a plot @@ -220,7 +226,10 @@ trees_dpg %>% theme(legend.position = "none") ``` +
plot of chunk unnamed-chunk-1 +

plot of chunk unnamed-chunk-1

+
#### What is the Biogeoclimatic Ecosystem Classification (BEC) distribution of western larch plantations in the Prince George Natural Resource District? @@ -228,7 +237,7 @@ We can download [British Columbia biogeoclimatic (BEC) data](https://catalogue.d -```r +``` r library(sf) #load the sf package # Load the BEC data for Prince George Natural Resource District @@ -245,7 +254,7 @@ trees_bec_dpg <- trees_dpg %>% Now, we can summarize the area planted with western larch by biogeoclimatic unit: -```r +``` r trees_bec_dpg %>% group_by(MAP_LABEL) %>% # group polygons by biogeoclimatic unit summarise(Area = sum(FEATURE_AREA_SQM)/10000) %>% @@ -262,4 +271,7 @@ trees_bec_dpg %>% theme(legend.position = "none") ``` +
plot of chunk unnamed-chunk-3 +

plot of chunk unnamed-chunk-3

+
diff --git a/vignettes/local-filter.Rmd b/vignettes/local-filter.Rmd index ef3f6605..bcb307f9 100644 --- a/vignettes/local-filter.Rmd +++ b/vignettes/local-filter.Rmd @@ -1,6 +1,6 @@ --- title: "Update to `filter()` behaviour in bcdata v0.4.0" -date: "2022-10-28" +date: "2024-12-11" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Update to `filter()` behaviour in bcdata v0.4.0} @@ -33,7 +33,7 @@ bounding box. -```r +``` r library(sf) library(bcdata) @@ -46,13 +46,14 @@ Previously, we could just do this, with `sf::st_bbox()` embedded in the call: -```r +``` r bcdc_query_geodata("local-and-regional-greenspaces") %>% filter(BBOX(st_bbox(two_points, crs = st_crs(two_points)))) ``` ``` -## Error: Unable to process query. Did you use a function that should be evaluated locally? If so, try wrapping it in 'local()'. +## Error: Error : Cannot translate a object to SQL. +## ℹ Do you want to force evaluation in R with (e.g.) `!!x` or `local(x)`? ``` However you must now use `local()` to force local evaluation of @@ -60,35 +61,35 @@ However you must now use `local()` to force local evaluation of into a query plan to be executed on the server: -```r +``` r bcdc_query_geodata("local-and-regional-greenspaces") %>% filter(BBOX(local(st_bbox(two_points, crs = st_crs(two_points))))) ``` ``` ## Querying 'local-and-regional-greenspaces' record -## • Using collect() on this object will return 1154 features and 19 fields +## • Using collect() on this object will return 1158 features and 19 fields ## • At most six rows of the record are printed here ## ──────────────────────────────────────────────────────────────────────────────────────────────────── ## Simple feature collection with 6 features and 19 fields ## Geometry type: POLYGON ## Dimension: XY -## Bounding box: xmin: 1200113 ymin: 385903.5 xmax: 1202608 ymax: 386561.8 +## Bounding box: xmin: 1200113 ymin: 385903.5 xmax: 1202130 ymax: 388026 ## Projected CRS: NAD83 / BC Albers ## # A tibble: 6 × 20 -## id LOCAL…¹ PARK_…² PARK_…³ PARK_…⁴ REGIO…⁵ MUNIC…⁶ CIVIC…⁷ CIVIC…⁸ STREE…⁹ LATIT…˟ LONGI…˟ -## -## 1 WHSE_BASE… 3347 Konuks… Local Green … Capital Distri… 48.5 -123. -## 2 WHSE_BASE… 3304 Local Trail Capital Distri… 48.5 -123. -## 3 WHSE_BASE… 3380 Local Water … Capital Distri… 48.5 -123. -## 4 WHSE_BASE… 3369 Local Water … Capital Distri… 48.5 -123. -## 5 WHSE_BASE… 3453 Local Water … Capital Distri… 48.5 -123. -## 6 WHSE_BASE… 3361 Local Trail Capital Distri… 48.5 -123. -## # … with 8 more variables: WHEN_UPDATED , WEBSITE_URL , LICENCE_COMMENTS , -## # FEATURE_AREA_SQM , FEATURE_LENGTH_M , OBJECTID , SE_ANNO_CAD_DATA , -## # geometry , and abbreviated variable names ¹​LOCAL_REG_GREENSPACE_ID, ²​PARK_NAME, -## # ³​PARK_TYPE, ⁴​PARK_PRIMARY_USE, ⁵​REGIONAL_DISTRICT, ⁶​MUNICIPALITY, ⁷​CIVIC_NUMBER, -## # ⁸​CIVIC_NUMBER_SUFFIX, ⁹​STREET_NAME, ˟​LATITUDE, ˟​LONGITUDE +## id LOCAL_REG_GREENSPACE…¹ PARK_NAME PARK_TYPE PARK_PRIMARY_USE REGIONAL_DISTRICT MUNICIPALITY +## +## 1 WHSE_B… 689 Cranford… Local Water Access Capital District of… +## 2 WHSE_B… 634 Local Water Access Capital District of… +## 3 WHSE_B… 725 Local Water Access Capital District of… +## 4 WHSE_B… 665 Konukson… Local Green Space Capital District of… +## 5 WHSE_B… 622 Local Trail Capital District of… +## 6 WHSE_B… 698 Local Water Access Capital District of… +## # ℹ abbreviated name: ¹​LOCAL_REG_GREENSPACE_ID +## # ℹ 13 more variables: CIVIC_NUMBER , CIVIC_NUMBER_SUFFIX , STREET_NAME , +## # LATITUDE , LONGITUDE , WHEN_UPDATED , WEBSITE_URL , +## # LICENCE_COMMENTS , FEATURE_AREA_SQM , FEATURE_LENGTH_M , OBJECTID , +## # SE_ANNO_CAD_DATA , geometry ``` There is another illustration in the ["querying spatial data vignette"](https://bcgov.github.io/bcdata/articles/efficiently-query-spatial-data-in-the-bc-data-catalogue.html#a-note-about-using-local-r-functions-in-constructing-filter-queries). diff --git a/vignettes/vignette-fig-air_zones-1.png b/vignettes/vignette-fig-air_zones-1.png index eaba6e0b..fdc87245 100644 Binary files a/vignettes/vignette-fig-air_zones-1.png and b/vignettes/vignette-fig-air_zones-1.png differ diff --git a/vignettes/vignette-fig-bbox-1.png b/vignettes/vignette-fig-bbox-1.png index f04e4f5d..60e0584b 100644 Binary files a/vignettes/vignette-fig-bbox-1.png and b/vignettes/vignette-fig-bbox-1.png differ diff --git a/vignettes/vignette-fig-district_parks-1.png b/vignettes/vignette-fig-district_parks-1.png index 464270fe..9836f88b 100644 Binary files a/vignettes/vignette-fig-district_parks-1.png and b/vignettes/vignette-fig-district_parks-1.png differ diff --git a/vignettes/vignette-fig-dp_join-1.png b/vignettes/vignette-fig-dp_join-1.png index df5135a4..cf1d0bc5 100644 Binary files a/vignettes/vignette-fig-dp_join-1.png and b/vignettes/vignette-fig-dp_join-1.png differ diff --git a/vignettes/vignette-fig-map-larch-plantations-dpg-1.png b/vignettes/vignette-fig-map-larch-plantations-dpg-1.png index 01c8482d..218b1e83 100644 Binary files a/vignettes/vignette-fig-map-larch-plantations-dpg-1.png and b/vignettes/vignette-fig-map-larch-plantations-dpg-1.png differ diff --git a/vignettes/vignette-fig-plot-dpg-1.png b/vignettes/vignette-fig-plot-dpg-1.png index 2d57998b..0d6ea7c0 100644 Binary files a/vignettes/vignette-fig-plot-dpg-1.png and b/vignettes/vignette-fig-plot-dpg-1.png differ diff --git a/vignettes/vignette-fig-regional_districts-1.png b/vignettes/vignette-fig-regional_districts-1.png index f81ad3ae..77f8d550 100644 Binary files a/vignettes/vignette-fig-regional_districts-1.png and b/vignettes/vignette-fig-regional_districts-1.png differ diff --git a/vignettes/vignette-fig-unnamed-chunk-1-1.png b/vignettes/vignette-fig-unnamed-chunk-1-1.png index f6ec2f7b..c228206b 100644 Binary files a/vignettes/vignette-fig-unnamed-chunk-1-1.png and b/vignettes/vignette-fig-unnamed-chunk-1-1.png differ diff --git a/vignettes/vignette-fig-unnamed-chunk-3-1.png b/vignettes/vignette-fig-unnamed-chunk-3-1.png index c63392fe..441b7f0d 100644 Binary files a/vignettes/vignette-fig-unnamed-chunk-3-1.png and b/vignettes/vignette-fig-unnamed-chunk-3-1.png differ