Ex5.Rmd

---
title: "Exercise 5: Simple Data Exploration <br><small>Geographic Information Systems 1 Lab</small></br>"
author: "GEOG 3150"
output:
  html_notebook:
    df_print: paged
    rows.print: 10
    theme: cosmo
    highlight: breezedark
    number_sections: yes
    toc: yes
    toc_float:
      collapsed: no
      smooth_scroll: yes
  pdf_document: default
  html_document:
    toc: yes
    df_print: paged
editor_options:
  chunk_output_type: inline
  mode: gfm
---

```{=html}
<style type="text/css">

h1.title {
  font-size: 40px;
  font-family: "Times New Roman", Times, serif;
  color: DarkBlue;
  text-align: center;
}
h4.author { /* Header 4 - and the author and data headers use this too  */
  font-size: 20px;
  font-family: "Times New Roman", Times, serif;
  color: DarkBlue;
  text-align: center;
}

.zoom {
  transform-origin: 40% 50% 0;
  transition: transform .2s;
  margin: 0 auto;
}
.zoom img{
	width:auto;
	height:auto;	
}
.zoom:hover {
  transform: scale(2);
}

th, td {padding: 5px;}

</style>
```
<hr></hr>

The purpose of this exercise is to help familiarize you with simple ways to explore attributes in various datasets. These skills will help you extract new datasets, connect to tabular data, and qualitatively compare different variables.

# The Introduction

With the unprecedented growth in middle Tennessee, the Montgomery County Commission, Stormwater Management, and Health Department are working with the Tennessee Department of Environment and Conservation on an initiative to assess the relationship of brownfield sites to our community and watersheds. Brownfields are locations in communities that pose risks to future land use and development as a result of previous land use practices, particularly commercial and industrial (check out more information on brownfields here: [https://www.osha.gov/brownfields/brownfields-qna](https://www.osha.gov/brownfields/brownfields-qna)). They often contain high levels of soil and water contamination, and in some cases pollutants can remain in the ecosystem for decades. Unfortunately, brownfields are often point source locations for ground and surface water contamination. 
The goal of the initiative is to determine if there are any spatial characteristics of these hazardous locations that have the potential to impact current and future residents of the area. The primary objectives of the initative are to: a) examine the location of brownfields in the county, b) determine which watersheds would be primarily impacted, and c) ascertain if there is a relationship between brownfield sites and any particular demographics in the county. With these three objectives, the county partners may make data-informed decisions to best support and prioritize programs that keep our community and environment safe.

In this exercise you will:

-   Continue working with geoprocessing tools
-   Obtain and import data from external sources
-   Learn to link tabular data
-   Qualitatively compare datasets

Software specific directions can be found for each step below. Please submit the answer to the questions and your final map by the due date.

## Step One: The Data

The datasets used in this exercise will be found on the [Exercise 5](https://github.com/chrismgentry/GIS1-Exercise-5) Github Page, previous exercises such as [Exercise 2](https://github.com/chrismgentry/GIS1-Exercise-2) and [Exercise 3](https://github.com/chrismgentry/GIS1-Exercise-3), and also from the [Tennessee Geographic Information Council](http://www.tngis.org/data-collections.htm). **TN GIS** maintains a number of datasets in their collections that are useful for projects involving the state of Tennessee. 

<details>
<summary><big>View Directions in <b> [ArcGIS Pro]{style="color:#ff4500"} </b></big></summary>

As with previous exercises you should begin by launching [ArcGIS Pro]{style="color:#ff4500"}, creating a new blank template, and creating a folder for this specific exercise. You should now see the typical starting screen that greeted you in all of the previous exercises. While some of the data for this exercise you may already have in previous exercise folders, you will start this lab by downloading a dataset from **TN GIS**. While they maintain a number of quality collections, you will specifically download the _statewide watershed coverage (12 digit Hydrologic Unit Code)_ for Tennessee. This information can be found at the following link: [http://www.tngis.org/water.htm](http://www.tngis.org/water.htm). On that page you will find the link for **"Download Watershed Coverage"**. Click the link, and using the download button <img src= "Images/google-drive-download.jpg" alt="Google Drive Download Button" width = "20" height = "20"> in the upper-right corner, save the _tn_wbd_ zip file to your project folder.

<p align="center"><img src= "Images/tngis-website.png" alt="TN GIS Water Data Download" style="width:100%"></p>

Once you have downloaded the file, navigate to the saved location to unzip the file. Within the unzipped folder you will find three additional folders titled:

- tn_8dig_huc
- tn_12dig_huc
- tn_250k_huc

These are watershed files at varying levels of detail. For hydrologic units you are looking for one with the largest number of digits to get the largest scale data. So for this exercise you will unzip the **tn_12dig_huc** dataset.

<p align="center"><img src= "Images/second-unzip.jpg" alt="tn_12dig_huc.zip extract" style="width:100%"></p>

Finally, with that final folder extracted you will find a folder titled _hydrologic_units_ that will contain a shapefile named **wbdhu12_a_tn.shp** that will be used in this exercise. This is the polygon file representing the 12 digit hydrologic unit codes for the entire state of Tennessee.

Next, you will need the _tornado_data_ file from Exercise 2 and the _census_tracts_ data from Exercise 3. You have a few options for obtaining this data. You can download the data again (but this time to the new project folder), you can navigate to the Exercise 2 and Exercise 3 project folders, respectively, on your computer and copy the zip files to the Exercise 5 project folder, or you can copy the data over using the catalog pane in [ArcGIS Pro]{style="color:#ff4500"}. While the first two options are relatively straight forward, it is important to learn how to navigate and use the catalog in ArcGIS. 

On the _View_ tab, click the Catalog Pane button <img src= "Images/catalog-pane-button.jpg" alt="Catalong Pane Button" width = "14" height = "20"> to open the Catalog Window Pane on the right side of the screen. On the project tab, right-click on the folders option and click "Add Folder Connection. In the resulting window navigate to the folder you would like to connect to and single-click the folder to select it. You don't want to double-click into the folder. You should see the name of the folder appear at the bottom of the window and the OK button should be available.

<p align="center"><img src= "Images/catalog-pane-windows.png" alt="Accessing Catalog Pane" style="width:100%"></p>

Once you have connected to the additional folders you want to use in conjunction with this project you can navigate to them within the _Folders_ link in the Catalog Pane. While you could add data directly from the other folders, the best practice might be to copy the data from one project to another. If for example you plan to alter the data then using it directly from the previous folder would alter it there as well. This could cause future issues when returning to that project. For this exercise you can navigate to the Exercise 2 folder and copy the _tornado_data_ file and paste it in the Exercise 5 folder. This is the safest way to move data such as shapefiles or geodatabases. Because the various data types contain numerous individual files to make up a dataset, catalog will copy/move them all correctly. If you tried to move them using File Explorer and missed one of the files associated with that data it might not work appropriately. So for Exercise 5, you will need to copy the _tornado_data_ and _montco_tracts_ data from exercises two and three respectively.

<p align="center"><div class="zoom"><img src= "Images/catalog-copy-paste.png" alt="Copy and Paste Data in Catalog Pane" style="width:100%"></div></p>

Finally, you will need to download the **Brownfields** and **Demographics** data from the [Exercise 5, GitHub Data](https://github.com/chrismgentry/GIS1-Exercise-5/tree/main/Data) page. Save both in your Exercise 5 project folder and unzip the **brownfields.zip** file to access the dataset.

<big><b>Question No. 1</b></big><br>
Using **File Explorer**, examine the dataset contained in the _brownfields.zip_ file and answer the following question:
<blockquote>
What is the common name of the extracted files? How many are there? What are the various file extensions?
</blockquote>
<small>The library of Congress has a great description of the various extensions [here](https://www.loc.gov/preservation/digital/formats/fdd/fdd000280.shtml).</small>
</details>
<hr></hr>

<details>
<summary><big>View directions in <b> [QGIS]{style="color: #006400"} </b></big></summary>

As with previous exercises you should begin by launching [QGIS]{style="color: #006400"}, creating a new empty project, and creating a project folder for this specific exercise. You should now see the typical starting screen that greeted you in all of the previous exercises. While some of the data for this exercise you may already have in previous exercise folders, you will start this lab by downloading a dataset from **TN GIS**. While they maintain a number of quality collections, you will specifically download the _statewide watershed coverage (12 digit Hydrologic Unit Code)_ for Tennessee. This information can be found at the following link: [http://www.tngis.org/water.htm](http://www.tngis.org/water.htm). On that page you will find the link for **"Download Watershed Coverage"**. Click the link, and using the download button <img src= "Images/google-drive-download.jpg" alt="Google Drive Download Button" width = "20" height = "20"> in the upper-right corner, save the _tn_wbd_ zip file to your project folder.

<p align="center"><img src= "Images/tngis-website.png" alt="TN GIS Water Data Download" style="width:100%"></p>

Once you have downloaded the file, navigate to the saved location to unzip the file. Within the unzipped folder you will find three additional folders titled:

- tn_8dig_huc
- tn_12dig_huc
- tn_250k_huc

These are watershed files at varying levels of detail. For hydrologic units you are looking for one with the largest number of digits to get the largest scale data. So for this exercise you will unzip the **tn_12dig_huc** dataset.

<p align="center"><img src= "Images/second-unzip.jpg" alt="tn_12dig_huc.zip extract" style="width:100%"></p>

Finally, with that final folder extracted you will find a folder titled _hydrologic_units_ that will contain a shapefile named **wbdhu12_a_tn.shp** that will be used in this exercise. This is the polygon file representing the 12 digit hydrologic unit codes for the entire state of Tennessee.

Next, you will need the _tornado_data_ file from Exercise 2 and the _census_tracts_ data from Exercise 3. You have a few options for obtaining this data. You can download the data again (but this time to the new project folder), you can navigate to the Exercise 2 and Exercise 3 project folders, respectively, on your computer and copy the zip files to the Exercise 5 project folder, or you can copy the data over using the _browser window_ in [QGIS]{style="color: #006400"}. While the first two options are relatively straight forward, it is important to be confident navigating and using the browser in QGIS.

If you created a "favorites" folder you will most likely navigate within that location, however, if you haven't created a favorite folder you will search through your drives for the _tornado_data_ file from Exercise 2. Once you locate the file, right/CRTL click on the file and select **Export Layer > To File...**. In the resulting window select _ESRI Shapefile_ as the "Format", for the "File name" click on the browse button <img src= "Images/qgis-file-location-button.jpg" alt="Browse Location Button" width = "20" height = "20"> and give it a file name and save it to your Exercise 5 project folder. If you check the "Add Saved File to Map" button and click OK the file will be added to your layers. 

<p align="center"><div class="zoom"><img src= "Images/qgis-browser.png" alt="Export file from Browser" style="width:100%"></div></p>

Repeat this process for the _census_tracts_ data from Exercise 3. While you could add data directly from the other folders, the best practice might be to export the data from one project to another. If for example you plan to alter the data then using it directly from the previous folder would alter it there as well. This could cause future issues when returning to that project. With these two files added to your layers you only need to download the **Brownfields** and **Demographics** data from the [Exercise 5, GitHub Data](https://github.com/chrismgentry/GIS1-Exercise-5/tree/main/Data) page. Save both in your Exercise 5 project folder and unzip the **brownfields.zip** file to access the dataset.

<big><b>Question No. 1</b></big><br>
Using **File Explorer/Finder**, examine the dataset contained in the _brownfields.zip_ file and answer the following question:
<blockquote>
What is the common name of the extracted files? How many are there? What are the various file extensions?
</blockquote>
<small>The library of Congress has a great description of the various extensions [here](https://www.loc.gov/preservation/digital/formats/fdd/fdd000280.shtml).</small>

</details>
<hr></hr>

<details><summary><big>View directions in <b> [R]{style="color: #6495ED"} </b></span></big></summary>
Before you begin, you will need to open the [Ex5 Colab Notebook](https://github.com/chrismgentry/GIS1-Exercise-5/blob/main/GIS1_EX5.ipynb) and insert **tocolab** after _github_ in the URL to open in the _Colab Environment_. As you have seen before, [R]{style="color: #6495ED"} requires various packages to complete certain analyses. In this exercise you will be using a large number of packages including: **googledrive, tidyverse, ggsn, cowplot, maps, mapproj, raster, rgeos, rgdal, sp, sf, biscale**. Each of these packages also contain various dependencies so it will take a while to load. In previous exercises you installed and loaded packages individually. This requires two lines of code for each package. Therefore this exercise would begin with twenty-four lines to install and load the necessary packages. So in this exercise you will learn to install and load the packages in a three line script. The first line lists the packages, the second line installs all packages, and the third line loads them. In later exercises you will learn to use a library management package to check for libraries on your computer, install them if necessary, and load the packages necessary for the project. For this exercise you will use the following script:

```{r load packages, message=FALSE, warning=FALSE, echo = TRUE}
packages<-c('googledrive','tidyverse','ggsn','cowplot','maps','mapproj',
            'raster','rgeos','rgdal','sp','sf','biscale')
sapply(packages, install.packages, character.only = TRUE)
sapply(packages, require, character.only = TRUE)
```

As with [Exercise 3](https://chrismgentry.github.io/GIS1-Exercise-3/) the `tigris` package needs to be loaded separately from other packages with the following script:

```{r load tigris, message=FALSE, warning=FALSE, echo=TRUE}
devtools::install_github('walkerke/tigris')
library('tigris')
```

The datasets needed for this exercise include: census tracts from [Exercise 3](https://github.com/chrismgentry/GIS1-Exercise-3/tree/main/Data), watersheds data from [TN GIS](http://www.tngis.org/water.htm), brownfields information and demographics data from this [exercise](https://github.com/chrismgentry/GIS1-Exercise-5/tree/main/Data). As with previous exercises all of the data for this lab will be able to be downloaded direct from either a [GitHub Page](https://github.com/chrismgentry/GIS1-Exercise-5) or from a public website. The data from the TN GIS is stored in a Google Drive folder. Most cloud storage platforms have unique data structure that require more detailed download information than a simple \*.csv being stored on a webpage. So to download a file from Google Drive you will use the `googledrive` package that was installed in the list of packages above.

To avoid connecting your own Google credientials you will begin by using the `drive_deauth` function which suspends authorization credentials. Depending on your use of Google Drive within R you may need to provide your login credentials or an access token. If you navigate to [http://www.tngis.org/water.htm](http://www.tngis.org/water.htm) you will find the link for **"Download Watershed Coverage"**. 

<p align="center"><img src= "Images/tngis-website.png" alt="TN GIS Water Data Download" style="width:100%"></p>

By clicking the link, a new window will open with a Google Drive download page. On this screen you can locate the _file ID_ within the URL.

<p align="center"><img src= "Images/google-drive-link.png" alt="Google Drive Download Link" style="width:100%"></p>

This ID will be used in the `drive_downloads` function to obtain the file.

```{r google drive download, message=FALSE, warning=FALSE, echo=TRUE}
drive_deauth()
drive_download(as_id("0B9UIdGiB_LXOeVVQNm91bGpvUUE"), overwrite = TRUE)
```

On the Google Colaboratory page you will see a folder button <img src= "Images/google-colab-files-button.jpg" alt="Google Colab Files Button" width="20" height="20"> on the left that opens a new pane on the left of the screen. The "directory" for this location is `/content/` which can be directly accessed within Colab. 

<p align="center"><img src= "Images/colab-files-tab.png" alt="Colab Files Tab" style="width:100%"></p>

The script above downloaded the **tn_wbd.zip** file that is now located in the contents folder. You can use the `unzip` function to extract the necessary data.

```{r unzip 1, message=FALSE, warning=FALSE, echo=TRUE}
unzip('tn_wbd.zip')
```

In the folders pane you can see three additional files were extracted. These are watershed files at varying levels of detail. For hydrologic units the file option with the largest number of digits (e.g. 8-digit Huc vs 12-digit HUC) provides the largest scale data. So for this exercise you will unzip the **tn_12dig_huc.zip** dataset. In order to help organize the data, you will add a _exdir =_ call to the script to create a new folder for the data within the contents folder. Because this exercise is specific to Colab the scripts below will differ if you are using different IDE for R.

```{r unzip 2, message=FALSE, warning=FALSE, echo=TRUE}
unzip('tn_12dig_huc.zip', exdir = "/content/watersheds")
```

Now you can open the watersheds folder and view the contents. Due to the file structure of the zip file, the uncompressed data now contains the characters **hydrologic_units\\** in front of the file name. [R]{style="color: #6495ED"} will not permit these characters within a file or object therefore you need to rename the data before you can continue.

<p align="center"><div class="zoom"><img src= "Images/colab-file-names.png" alt="Inaccessible File Names" style="width:100%"></div></p>

To start this process you will create a list of files in the _watershed_ folder that contains the "hydrologic_units" prefix. Since we need to consistently remove the first seventeen characters, you can use the `sub` function to remove those characters. Alternatively if you just needed to rename some files you could add characters between the apostrophes.

```{r list file names, message=FALSE, warning=FALSE, echo=TRUE}
names <- list.files(path = "/content/watersheds", pattern = "hydrologic_units")
new_names <- sub('.................','',names)
```

Because `file.rename` operates at the root directory level, you need to temporarily direct the working directory to the location of the inappropriately named files, rename them, then return to the correct working directory. It is possible there is a way around this, unfortunately via Colab or RStudio I do not know how to use `file.name` outside of the directory containing the values.

```{r rename files, message=FALSE, warning=FALSE, echo=TRUE}
setwd("/content/watersheds")
file.rename(names,new_names)
setwd("/content")
```

If you navigate back to the files pane you can now see the watershed files have been renamed. To return to the content folder you can use the _up directory_ button <img src= "Images/colab-files-up-button.jpg" alt="Google Colab Up Directory Button" width="20" height="20">.

<p align="center"><div class="zoom"><img src= "Images/colab-new-names.png" alt="Renamed File" style="width:100%"></div></p>

Now you can use the `readOGR` function from `rgeos` to rad in the shapefile to a new object.

```{r read in watersheds, message=FALSE, warning=FALSE, echo=TRUE}
watersheds_data <- readOGR("/content/watersheds/wbdhu12_a_tn.shp")
```

With the watersheds dataset created you need to address the remaining datasets. Using similar steps to previous exercises you will now download and create a brownfields dataset. You will also create a new folder for this data just like in the watershed script above.

```{r read in brownfields, message=FALSE, warning=FALSE, echo=TRUE}
download.file('https://github.com/chrismgentry/GIS1-Exercise-5/raw/main/Data/brownfields.zip', 'brownfields.zip')
unzip('brownfields.zip', exdir = "/content/brownfields")
brownfields_data <- readOGR("/content/brownfields/brownfields.shp")
```

The next dataset to import will be the [demographics](https://github.com/chrismgentry/GIS1-Exercise-5/blob/main/Data/demographics.csv) dataset from the exercise GitHub page. Because the data is a simple \*.csv file it can easily be read in with the `read.csv` function.

```{r read in demographics, message=FALSE, warning=FALSE, echo=TRUE}
demographics <- read.csv('https://raw.githubusercontent.com/chrismgentry/GIS1-Exercise-5/main/Data/demographics.csv')
```

The final dataset to import is the census tract for Montgomery County. Using the `tigris` package like in [Exercise 3, Step 1](https://chrismgentry.github.io/GIS1-Exercise-3/#11_Step_One:_The_Data) you can use `tracts` to obtain the dataset.

```{r read in census tracts, message=FALSE, warning=FALSE, echo=TRUE}
montco_tracts <- tracts("TN", county = "Montgomery")
```

Three of these datasets (brownfields, watersheds, and census tracts) are already in spatial data formats. In order to perform further analysis you need to make sure they have the same coordinate reference system (crs).

```{r check crs, message=FALSE, warning=FALSE, echo=TRUE}
crs(watersheds_data)
crs(brownfields_data)
crs(montco_tracts)
```

You can see they are all in different projections. Therefore you need to reproject them under a single crs. For this exercise you will use **EPSG:4326** which in R appears as _+proj=longlat +datum=WGS84 +no_defs_ when referenced in the script. Because the brownfields data is already in this crs you can use it or the EPSG to correct the other datasets. Remember from _Exercise 3_ that data from `tigris` is in a slightly different spatial data structure ( _sf_ vs. _SpatialPolygonsDataFrame_) so the process to reproject that data will vary from the watersheds dataset.

```{r reproject data, message=FALSE, warning=FALSE, echo=TRUE}
watersheds_data <- spTransform(watersheds_data, crs(brownfields_data))
montco_tracts <- st_transform(montco_tracts, 4326)
```

You can now check the crs information for each dataset and they should all be in EPSG:4326 (or +proj=longlat +datum=WGS84 +no_defs). With the data created and reprojected to the same crs you can move on to the analysis.

<big><b>Question No. 1</b></big><br>
Using the files button on the left, examine the dataset contained in the brownfields.zip file and answer the following question:
<blockquote>
What is the common name of the extracted files? How many are there? What are the various file extensions?
</blockquote>
<small>The library of Congress has a great description of the various extensions [here](https://www.loc.gov/preservation/digital/formats/fdd/fdd000280.shtml).</small>

</details>

## Step Two: The Analyses

The data collected in the previous section requires additional processing so you can reduce the dataset to only the pertinent information for the analyses. In this step you will use additional geoprocessing techniques and data management tools to link two datasets for further examination.

<details>
<summary><big>View Directions in <b> [ArcGIS Pro]{style="color:#ff4500"} </b></big></summary>

With the data collected you can now add the _brownfields_, census tracts, _tornado_data_, and _wbdhu12_a_tn_ (watersheds) data to your project. Although there are a number of ways of isolating data to make derived datasets (e.g. Select > Lasso in [Exercise 4, Step 1](https://chrismgentry.github.io/GIS1-Exercise-4/#11_Step_One:_The_Data)), in this exercise you will use another tool from the **Geoprocessing** Toolbox to complete this task. On the _View Tab_ click on the Geoprocessing Toolbox button <img src= "Images/geoprocessing-button.jpg" alt="Geoprocessing Button" width = "20" height = "20"> to open the Geoprocessing pane on the right side of the screen. By navigating through the tools menus you will find **Select** under _Analysis Tools > Extract_. 
With this tool you will write a simple expression to "select" a small portion of the data you need for further analysis. To do this, double-click the _Select_ tool and in the resulting pane input the following parameters:

- Input Features = tornado_data\*
- Output Feature Class = Here you will insert the file name you want to use. Click on the folder icon to the right of the field and in the new window save the file as _montgomery_county.shp_ in your project folder.
- Expression = Click the drop-down button for the new expression <img src= "Images/new-expression-button.jpg" alt="Expression Button" width = "87" height = "20"> button and use the drop-down boxes to provide the following information: 
  - Where _NAME_ &nbsp; _is equal to_ &nbsp; _Montgomery_
  - Click Run

<p align="center"><div class="zoom"><img src= "Images/select-geoprocessing.png" alt="Select Geoprocessing Tool" style="width:100%"></div></p>
\*The tornado dataset is only being used to obtain a polygon for Montgomery County for the clip process in the next step. 

This will add the new **montgomery_county** shapefile to your contents. You can now remove the tornado dataset because it will no longer be needed. With the polygon of Montgomery County available you can now use the **Clip** tool like in [Exercise 4, Step One](https://chrismgentry.github.io/GIS1-Exercise-4/#11_Step_One:_The_Data) to clip the _brownfields_ and _watersheds_ datasets to reduce them to only those within Montgomery County. If you receive a "Datum conflict" warning, for the purposes of this exercise, you can ignore it an continue with the clip. Recall that the _Input Features_ is the data you want to reduce, the _Clip Feature_ is the data you want it to take the shape of, and _Output Feature Class_ is what you are naming the new file. Refer back to [Exercise 4, Step One](https://chrismgentry.github.io/GIS1-Exercise-4/#11_Step_One:_The_Data) for more information about Clip.

<p align="center"><div class="zoom"><img src= "Images/clipped-data.png" alt="Select Clipped Datasets" style="width:100%"></div></p>

With the new clipped datasets you can remove or just uncheck (in case you want to use them in your final map) the full _brownfields_ and _watersheds_ datasets to reduce clutter. You can also now zoom in closer to view only Montgomery County.

In the final step to prepare the data, you are going to connect a non-spatial data to the census tract dataset. In [Step One](https://chrismgentry.github.io/GIS1-Exercise-5/#11_Step_One:_The_Data) you downloaded a file titled **demographics.csv**. This file contains comma-separated values detailing additional demographic data that you need to append to the census tract data. Although the process is relatively straight-forward, there are a number of steps that need to be taken in order to join the data.

First, if you haven't already, add the _demographics.csv_ file to your table of contents. This can be done from the Catalog Pane or with the "Add Data" button <img src= "Images/arcgis-add-data-button.png" alt="Add Data Button" width = "16" height = "20"> like in previous exercises. Because [ArcGIS Pro]{style="color:#ff4500"} treats \*.csv files as "read only" you need to convert it to a table that can be edited in the software. Now, right-click on the _demographics.csv_ standalone table and go to **Data > Export Table**. In the resulting window choose the following options:

- Input Rows = demographics.csv
- Output Location = Similar to before, navigate only to your project folder and click once to select. In the step you are simply designating the folder the file is to be saved.
- Output Name = Give the file a new name such as demo_table.dbf

Before clicking OK, you need to expand the **Fields** section of the window and click on _Tract_ in the Output Fields column. Then click on the _Properties_ Tab and change the **Type** field to _Text_. Then click OK. If you continued without changing the field type, the variable would most likely be treated as a numerical value. If you open the attribute table for any dataset and mouse-over the variable column without clicking a pop-up window will appear detailing the _Type_ and other parameters of the variable. In the census dataset from the previous exercise, the _NAME_ variable is Type: Text (7). The seven in parenthesis means the max number of available characters is seven. So before you export a table it is good practice to make sure the variables match the variables you intend to join or that the variables will be treated in a manner necessary for additional analyses. 

<p align="center"><img src= "Images/arcgis-table-export.png" alt="Export CSV to DBF" style="width:90%"></p>

The new standalone table should have been added to the Table of Contents. If not you should add it now; the csv table can be removed. Now you can connect the new table to the census tract dataset. Begin by right-clicking on the census data and selecting **Joins and Relates > Add Join**. In the new _Add Join_ window select the following options (your file names may vary):

- Input Table = montco_tracts, or whatever you named the census tract information
- Input Join Field = NAME
- Join Table = demo_table, or whatever you named the new demographics data
- Join Table Field = Tract

For this exercise keep the "Keep All Target Features" button checked and if you receive an warning about an indexing error with the census data you can ignore it for this exercise. Then click the _Validate Join_ button. This will pop-up an new window that will describe the process of checking the two datasets to see if they can be joined. At the bottom of the dialog you should see a line that says there were 39 joins. Close that message and click OK to run the join.

<p align="center"><div class="zoom"><img src= "Images/arcgis-join.png" alt="Join Datasets" style="width:100%"></div></p>

Finally, open the attribute table for the census tracts and scroll to the far right of the table. If the join worked properly you should see a number of additional fields added to the table.

<p align="center"><div class="zoom"><img src= "Images/arcgis-joined-table.png" alt="Join Datasets, Attribute Table" style="width:100%"></div></p>

This will provide all of the data and information you need to visualize the data and make comparisons of the watersheds.

<big><b>Question No. 2</b></big>
<blockquote>
How many watersheds cover Montgomery County? Although they have been clipped from their original geometry, which watershed is the largest? Which is the smallest?
</blockquote>

</details>
<hr></hr>

<details>
<summary><big>View Directions in <b> [QGIS]{style="color:#006400"} </b></big></summary>

With the data collected you can now add the _brownfields_, census tracts, _tornado_data_, and _wbdhu12_a_tn_ (watersheds) data to your project. Although there are a number of ways of isolating data to make derived datasets (e.g. Select Features > Select Features by Freehand in [Exercise 4, Step 1](https://chrismgentry.github.io/GIS1-Exercise-4/#11_Step_One:_The_Data)), in this exercise you will use another tool from **Vector Selection** in the _Processing Toolbox_ to complete this task.

<p align="center"><div class="zoom"><img src= "Images/qgis-ex5-data.png" alt="Vector Selection" style="width:100%"></div></p>

With this tool you will select only a small portion of the data you need for further analysis. To do this, double-click the "Select"Extract by Attribute" tool and in the resulting window input the following parameters (file names may vary):

- Input layer = tornado_data
- Selection attribute = NAME
- Operator = = (equals sign)
- Value = Montgomery

Remember that in [QGIS]{style="color:#006400"} you have the ability to either create permanent files or temporary layers. Because you will only be using the Montgomery County dataset to clip files later on, you can decide whether to use the browse button <img src= "Images/qgis-file-location-button.jpg" alt="Browse Location Button" width = "20" height = "20"> to save the file for future use or just create a temporary file.

<p align="center"><img src= "Images/qgis-extract-by-attribute.png" alt="Select by Attribute" style="width:85%"></p>

This will add the new **montgomery_county** temporary file (or shapefile if saved) to your layers. You can now remove the tornado dataset because it will no longer be needed. You may also consider renaming it if necessary. With the polygon of Montgomery County available you can now use the **Clip** tool like in [Exercise 4, Step One](https://chrismgentry.github.io/GIS1-Exercise-4/#11_Step_One:_The_Data) to clip the _brownfields_ and _watersheds_ datasets to reduce them to only those within Montgomery County. Recall that the _Input layer_ is the data you want to reduce (e.g. brownfields or watersheds), the _Overlay layer_ is the data you want it to take the shape of. Refer back to [Exercise 4, Step One](https://chrismgentry.github.io/GIS1-Exercise-4/#11_Step_One:_The_Data) for more information about Clip. You should go ahead and use the browse button <img src= "Images/qgis-file-location-button.jpg" alt="Browse Location Button" width = "20" height = "20"> to save these as permanent files. Be sure to use a naming convention that will allow you to recal what the files are later on (e.g. montco_brownfields).

With the new clipped datasets you can remove or just uncheck (in case you want to use them in your final map) the full _brownfields_ and _watersheds_ datasets to reduce clutter. You can also now zoom in closer to view only Montgomery County.

<p align="center"><div class="zoom"><img src= "Images/qgis-montco-data.png" alt="Montgomery County Datasets" style="width:100%"></div></p>

In the final step to prepare the data, you are going to connect a non-spatial data to the census tract dataset. In [Step One](https://chrismgentry.github.io/GIS1-Exercise-5/#11_Step_One:_The_Data) you downloaded a file titled **demographics.csv**. This file contains comma-separated values detailing additional demographic data that you need to append to the census tract data. Although the process is relatively straight-forward, there are a number of steps that need to be taken in order to join the data.

First, you will add the **demographics.csv** using _Layer > Add Layer > Add Delimited Text Layer_ from the menu bar, by clicking the "Add Delimited Layer" button <img src= "Images/qgis-add-csv-button.jpg" alt="Add Delimited Layer" width = "20" height = "20">, or by using the shortcut keys CRTL+Shift+T/CMD+Shift+T.

In the resulting window, use the browse button to find the **demographics.csv** data in your project folder. Be sure that "No geometry (attribute only table)" is selected and the the rest of the information as the default. Click Add.

<p align="center"><div class="zoom"><img src= "Images/qgis-delimited-data.png" alt="Text Delimited Dialog" style="width:100%"></div></p>

The _demographics_ table should now be added to your layers. To connect it to the population information, right/CRTL click on the census tract data and select "properties". In the Properties Menu, select the _Joins_ <img src= "Images/qgis-joins.jpg" alt="Joins Tab" width = "40" height = "20"> tab from the left side menu. At the bottom of the screen click the plus (**+**) symbol to add a join. In the new window select the following parameters:

- Join layer = demographics
- Join Field = Tract
- Target field = NAME

Leave the "Cache join layer in memory" checked and scroll down to "Joined Fields". Click all of the fields and scroll down to the "Custom field name prefix" option and check the box. Remove all of the text in the box and then click OK.

<p align="center"><img src= "Images/qgis-vector-join-dialog.png" alt="Vector Join Window" style="width:85%"></p>

Be sure to click OK on the properties window as well to complete the join. Finally, open the attribute table for the census tracts and scroll to the far right of the table. If the join worked properly you should see a number of additional fields added to the table.

<p align="center"><div class="zoom"><img src= "Images/qgis-join-table.png" alt="Join Table" style="width:100%"></div></p>

This will provide all of the data and information you need to visualize the data and make comparisons of the watersheds.

<big><b>Question No. 2</b></big>
<blockquote>
How many watersheds cover Montgomery County? Although they have been clipped from their original geometry, which watershed is the largest? Which is the smallest?
</blockquote>

</details>
<hr></hr>

<details>
<summary><big>View Directions in <b> [R]{style="color:#6495ED"} </b></big></summary>

There are some additional processing steps needed on the collected datasets before you can move on to the visualizations. In [Exercise 3, Step 2](https://chrismgentry.github.io/GIS1-Exercise-3/#12_Step_Two:_The_Analyses) you used the `merge` function to combined the census tracts and population data. In this exercise you will repeat those steps to connect the _demographics_ dataset to the census tracts.

```{r add demographics data, message=FALSE, warning=FALSE, echo=TRUE}
census_tracts <- merge(x = montco_tracts, y = demographics, by.x = "NAME", by.y = "Tract", all = TRUE)
```

By plotting the _brownfield_ or _watersheds_ data using `ggplot2` or quickly using `plot(x)` where x = the name of the dataset, you will like be able to determine that the data included extends far beyond the boundaries of Montgomery County. In the previous [exercise](https://chrismgentry.github.io/GIS1-Exercise-4/#12_Step_Two:_The_Analyses) the `intersect` function was used to clip out the counties within the hurricane buffer. In this exercise you will use the `crop` function to see how it varies from `intersect`. For this step you will want to crop the brownfields and watersheds datasets by the census tracts to retain only those within the county.

```{r crop data, message=FALSE, warning=FALSE, echo=TRUE}
montco_brownfields <- crop(brownfields_data,census_tracts)
montco_watersheds <- crop(watersheds_data,census_tracts)
```

If you create a `ggplot()` of the data you will see how the datasets were subset. Essentially, a bounding box (extent of the dataset from the most upper-left portion to the lower-right) was created for the census tract data and any data within that box was retained. Why might this potentially be problematic in some instances? In what instance might this be the best method?

Before proceeding to the visualization portion you need to create a couple tables to help answer the questions asked by your community partners. One being how many brownfield occur within each watershed and census tract. To do this you can use the `over` function from the `rgeos` to create a count of the overlapping data. This is called a _spatial join_ and it returns a count of the points that fall within a specific polygon. More information on `over` can be found [here](https://www.rdocumentation.org/packages/rgeos/versions/0.5-5/topics/over).

```{r spatial join with watersheds, message=FALSE, warning=FALSE, echo=TRUE}
brownfields_per_watershed <- over(montco_brownfields,montco_watersheds)
watershed_table <- as.data.frame(table(brownfields_per_watershed$HUC_12))
watershed_table <- transform(watershed_table, Var1 = as.character(Var1))
colnames(watershed_table) <- c("HUC_12","BF_Count")
watershed_table
```

In the script above, `over` created the spatial join analysis. The next line converted the information into a dataframe table with columns of variables and rows of observations. However, because the watershed names (HUC_12) a numeric, the table would not be able to be joined to a previous dataset because the numeric name for the watershed was treated as a character (nominal data). So the `transform` function was used to convert the names to characters. Finally, `colnames` was used to rename the columns to names that are more appropriate for the data.

Once the table has been created you will need to connect the table to the original data so the information can potentially be included in the visualization. Because some of the watershed do no contain a brownfield there will be NA values reported for those observations. So there will be a line added to replace the NA values with zeros in the script below:

```{r connect table to watersheds, message=FALSE, warning=FALSE, echo=TRUE}
montco_watershed_data <- merge(x = montco_watersheds, y = watershed_table, by.x = "HUC_12", by.y = "HUC_12", all = TRUE)
montco_watershed_data@data$BF_Count[is.na(montco_watershed_data@data$BF_Count)] <- 0
montco_watershed_data@data
```

While you have seen the `merge` function before, notice the syntax for the second line. Because the data is a SpatialPolygonsDataFrame the data table that contains the data will be held in a _slot_ (a container for information in certain file types) call **data**. So to examine the information you will call to the object and add _@data_ following the object name. In the example above, the script is identifying the **BF_Count** variable in the watersheds data slot and saying "if there _is_ a _na_ value in that variable within the slot, it should be replaced with a 0." If you examine the object only you will notice there are actually several slots including:

- bb = bounding box
- data = the data table
- plotOrder = the order of the polygons when drawn
- polygons = the longitude and latitude values corresponding to the different polygons
- proj4string = the crs information

Remember that you should always examine new datasets when they are imported into your project. Because the census tract data is a slightly different format than the watershed you will complete a very similar process to the above script with a slight modification to convert the census tracts to a SpatialPolygonsDataFrame.

```{r create tracts table, message=FALSE, warning=FALSE, echo=TRUE}
census_tracts_spdf <- sf::as_Spatial(census_tracts)
brownfields_per_tract <- over(montco_brownfields,census_tracts_spdf)
census_tract_table <- as.data.frame(table(brownfields_per_tract$NAME))
census_tract_table <- transform(census_tract_table, Var1 = as.character(Var1), Freq = as.numeric(Freq))
colnames(census_tract_table) <- c("Name","BF_Count")
census_tract_table
```

With the table created you can now attach it to the original census data.

```{r connect to tracts, message=FALSE, warning=FALSE, echo=TRUE}
census_tract_dataset <- merge(x = census_tracts, y = census_tract_table, by.x = "NAME", by.y = "Name", all = TRUE)
census_tract_dataset$BF_Count[is.na(census_tract_dataset$BF_Count)] <- 0
str(census_tract_dataset)
```

while you could have kept the SpatialPolygonsDataFrame version of the census data, it is important to know how to manage different classes of data. So in this exercise the census tract data will continue to be [simple features (sf)](https://cran.r-project.org/web/packages/sf/vignettes/sf1.html) data while the watershed data will be [sp](https://cran.r-project.org/web/packages/sp/vignettes/intro_sp.pdf).

If you examine the brownfields dataset it is also a type of _sp_ data called a **SpatialPointsDataFrame**. Because the values are points and not polygons you can see the type changes accordingly. If you create a visualization of the data you will find that while crop worked to subset the information based on the bounding rectangle of the census tracts, because of the shape of Montgomery County and the location of the brownfields one record was retained erroneously. The first record "The Compost Company" is located south of the county and therefore needs to be removed from the dataset. To do this you will convert the brownfields dataset to a data frame and remove the first row of data. Because items in a data frame can be identified by row (first value) and column (second value) simply adding [-1,] behind the object name will "subrtract" the entire row from the dataset. Since you are already editing the data, it would also make sense to rename the column to names that match `ggplot2` nomenclature.

```{r convert fix rename brownfields, message=FALSE, warning=FALSE, echo=TRUE}
brownfields_dataset <- as.data.frame(montco_brownfields)
colnames(brownfields_dataset) <- c("Name", "long", "lat", "NA")
brownfields_dataset <- brownfields_dataset[-1,]
brownfields_dataset
```

Finally, just in case you need a simple outline of Montgomery County for your visualization in the next step you can create an object polygon of the county with a similar script to create the states information in [Exercise 2, Step 1](https://chrismgentry.github.io/GIS1-Exercise-2/#11_Step_One:_The_Data).

```{r create montgomery county polygon, message=FALSE, warning=FALSE, echo=TRUE}
counties <- map_data("county")
tn_counties <- subset(counties, region == "tennessee")
montco <- subset(tn_counties, subregion == "montgomery")
```

<big><b>Question No. 2</b></big>
<blockquote>
How many watersheds cover Montgomery County? Although they have been clipped from their original geometry, which watershed is the largest? Which is the smallest?
</blockquote>
<small>HINT: View acres in the watershed dataset.</small>
</details>

## Step Three: The Visualization

In this step you will need to examine the spatial distribution of brownfields within the watersheds of Montgomery County and make some qualitative interpretations of potentially impacted urban areas. 

<details><summary><big>View directions in <b> [ArcGIS Pro]{style="color:#ff4500"} </b></span></big></summary>

Examine the spatial distribution of the brownfield throughout the county. The clustering should be relatively apparent and might match up with your knowledge of industrial activities in the various areas of Montgomery County. In order to help quantify the number of brownfields in each watershed you can use a **Spatial join** to create a count variable for this information. To do this, right-click on the _montgomery county watershed_ dataset and go to _Join and Relates > Spatial Join_

<p align="center"><img src= "Images/arcgis-spatial-join.png" alt="Spatial Join Datasets" style="width:85%"></p>

In the resulting window, select the following parameters (your file names may vary):

- Target Features = Montgomery County watersheds
- Join Features = Montgomery County brownfields
- Output Feature Class = This is where you name the new dataset. For continuity you can name it something like: brownfields_per_watershed.shp
- Join Operation = Choose "Join one to one" from the drop-down menu
  - Keep all Target Features should remain checked
- Match Option = Choose "Intersect" from the drop-down menu

You can leave the remaining items blank and click OK.

<p align="center"><img src= "Images/arcgis-spatial-join-pane.png" alt="Spatial Join Datasets Dialog" style="width:85%"></p>

By examining the attribute table for the new dataset you should see a new variable called **Join_Count**. This is the number of brownfields that occur within each watershed.
 
Using the skills you learned in Exercises [Two](https://chrismgentry.github.io/GIS1-Exercise-2/), [Three](https://chrismgentry.github.io/GIS1-Exercise-3/), and [Four](https://chrismgentry.github.io/GIS1-Exercise-4/) you can now make a map that shows Montgomery County, the location of brownfields and watersheds in a graduated color scheme by number of brownfields. Remember to include cartographic elements such as legend, scale bar, north arrow, etc. In this visualization you may also want to add a different basemap or inset map that provides additional supporting information.

<big><b>Question No. 3</b></big>
<blockquote>
Which watershed contains the most brownfields?
</blockquote>

</details>
<hr></hr>

<details><summary><big>View directions in <b> [QGIS]{style="color:#006400"} </b></span></big></summary>

Examine the spatial distribution of the brownfield throughout the county. The clustering should be relatively apparent and might match up with your knowledge of industrial activities in the various areas of Montgomery County. In order to help quantify the number of brownfields in each watershed you can use **Join attributes by location (Summary)** from the _Vector General_ menu in the Processing Toolbox. this will create a count of the total number of brownfields per watershed polygon. In the _Join attributes by location (Summary)_ select the following options (layer names may vary):

- Input layer = watershed layer for Montgomery County
- Join layer = brownfields layer for Montgomer County
- Geometric predicate = Intersects

In the _Summaries to calculate..._ you will click the browse button and select only "count". When selected, click the blue arrow button <img src= "Images/qgis-blue-arrow.jpg" alt="blue Arrow" width = "20" height = "20"> to return to the previous page. As before, you can leave this as a temporary file\* or you can choose to create a permanent file for future use. Now click Run.

<p align="center"><div class="zoom"><img src= "Images/qgis-spatial-join.png" alt="Spatial Join Datasets Dialog" style="width:100%"></div></p>
\*If you choose to make a temporary file you should rename it in the layers with a sensible name.

By examining the attribute table for the new dataset you should see a new variable called **Name_count**. This is the number of brownfields that occur within each watershed. Unfortunately, some of the cells are populated with "Null" values. You need to remove these in order to create a proper graduated color scheme. To do this you will open the attribute table for newly created dataset and click the _Field calculator_ button <img src= "Images/qgis-field-calculator-button.jpg" alt="Field Calculator button" width = "20" height = "20">. In the new window uncheck the box for "Create a new field" and check the box for "Update existing field". In the drop-down menu below the box select _Name_count_. In the Expression box type the following and click OK:

```if("Name_count" is null, 0, "Name_count")```

<p align="center"><div class="zoom"><img src= "Images/qgis-field-calculator.png" alt="Field Calculator" style="width:100%"></div></p>

This will replace all of the null values with zero values. Finally, click the _Toggle editing mode_ button <img src= "Images/qgis-editor-button.jpg" alt="toggle Editing" width = "20" height = "20"> and save the changes to the table.

Now, using the skills you learned in Exercises [Two](https://chrismgentry.github.io/GIS1-Exercise-2/), [Three](https://chrismgentry.github.io/GIS1-Exercise-3/), and [Four](https://chrismgentry.github.io/GIS1-Exercise-4/) you can now make a map that shows Montgomery County, the location of brownfields and watersheds in a graduated color scheme by number of brownfields. Remember to include cartographic elements such as legend, scale bar, north arrow, etc. In this visualization you may also want to add a different basemap or inset map that provides additional supporting information.

<big><b>Question No. 3</b></big>
<blockquote>
Which watershed contains the most brownfields?
</blockquote>

</details>
<hr></hr>

<details><summary><big>View directions in <b> [R]{style="color:#6495ED"} </b></span></big></summary>

Examine the spatial distribution of the brownfield throughout the county. The clustering should be relatively apparent and might match up with your knowledge of industrial activities in the various areas of Montgomery County. Use the skills you developed in Exercises [4](https://chrismgentry.github.io/GIS1-Exercise-4/#13_Step_Three:_The_Visualization), [3](https://chrismgentry.github.io/GIS1-Exercise-3/#13_Step_Three:_The_Visualization), and [2](https://chrismgentry.github.io/GIS1-Exercise-2/) to **_complete_** the scripts below and create the visualizations for the number of brownfields per watershed and the number of brownfields per census tract. Don't forget to add all of the necessary map elements and feel free to add any ancillary data from this or previous exercises that enhance your map.

Watershed Map:
```{r watershed visualization, message=FALSE, warning=FALSE, echo=TRUE}
#ggplot() +
#geom_polygon(data = DATASET, aes(x=long, y=lat, group=group, fill = SELECT THE APPROPRIATE VARIABLE), color = "gray", size = 0.25, linetype="dashed") + 
#geom_polygon(data = montco, aes(x=long, y=lat, group=group), fill = NA, color = "black", size = 1) + 
#geom_point(data = DATASET, aes(x=long, y=lat), color = "SELECT A COLOR") +
#coord_fixed() +
#ADD THE NECESSARY ELEMENTS HERE
```

Census Tract Map:
```{r census tract visualization, message=FALSE, warning=FALSE, echo=TRUE}
#ggplot() +
#geom_sf(data = DATASET, aes(fill = SELECT THE APPROPRIATE VARIABLE)) + scale_fill_viridis_c(direction = -1, option = "A") +
#geom_point(data = DATASET, aes(x=long, y=lat), color = "red") +
#coord_sf() +
#ADD THE NECESSARY ELEMENTS HERE
```

Remember to remove all \# comment tags before running the edited scripts.

<big><b>Question No. 3</b></big>
<blockquote>
Which watershed contains the most brownfields?
</blockquote>

</details>

## Step Four: The County Commisson Report

After discussing the results of the previous analysis with your colleagues at County Commission, Stormwater Management, Health Department, and TDEC, they are interested in seeing how the location of brownfields impacts the community. Although the [commission districts](https://mcgtn.org/storage/departments/commission/maps/DistrictMap.pdf) do not perfectly replicate the census tracts, the County Commissioners and the Health Department want to know if the brownfield sites are directly related to census tracts with large minority populations. They are concerned by a [recently published report](https://www.epa.gov/cleanups/olem-programs-address-contamination-superfund-brownfields-and-rcra-sites-near-60-percent-us) that states:

> "While there is no single way to characterize communities located near our sites, this population is more minority, low income, linguistically isolated, and less likely to have a high school education than the U.S. population as a whole. As a result, these communities may have fewer resources with which to address concerns about their health and environment."

During these discussions the Health Department would also like to know if the areas with a high number of brownfields have higher populations of children.


<details><summary><big>View directions in <b> [ArcGIS Pro]{style="color:#ff4500"} </b></span></big></summary>

Using the skills you learned in this and previous exercises, create a new _spatial join_ between the _census tracts_ and _brownfields_ datasets (for this exercise ignore any datum warning). As with the previous spatial join, you should now have an additional variable labeled "Join_Count" that details the number of brownfields per census tract.

One way you can view two variables at once on a map is to create bivariate symbology.

<p align="center"><img src= "Images/symbology-bivariate-pane.jpg" alt="Bivariate Color Selection" style="width:65%"></p>

This creates a grid of colors with an X and Y axis with the following categories:

<p align="center"><img src= "Images/bivariate-color-scheme.png" alt="Bivariate Color Scheme" style="width:65%"></p>

- Where the Bottom Left cell indicates low values in both variables
- Where the Upper Left cell indicates high values in the first variable and low values in the second variable
- Where the Bottom Right cell indicates low values in the first variable and high values in the second variable
- Where the Upper Right cell indicates high values in the first variable and high values in the second variable

The other grid cells represent midpoints in the variables. The variables can be selected in the _symbology pane_ where "Field 1" is one variable and "Field 2" is the other variable. So to visualize _brownfields_ and one of the population demographics select one for each field. For "Grid Size" select 3x3. You can select your own color scheme with the drop-down menu. In the fields section below, you can rename the fields (e.g. "join count" means nothing so give it an appropriate name).

<p align="center"><img src= "Images/arcgis-bivariate-options.png" alt="Bivariate Color Options" style="width:65%"></p>

With these settings you should have a symbology that shows if a census tract is high or low in either of the particular variables. Test a number of different variable combinations with the count of brownfields and various demographic categories. 

<big><b>Question No. 4</b></big>
<blockquote>
Which census tract contains the most brownfields?
</blockquote>

</details>
<hr></hr>

<details><summary><big>View directions in <b> [QGIS]{style="color:#006400"} </b></span></big></summary>

Using the skills you learned in this and previous exercises, create a new _Join attributes by location (Summary)_ between the _census_tracts_ and _brownfields_ datasets (for this exercise ignore any datum warning). As with the previous spatial join, you should now have an additional variable labeled "Name_count" that details the number of brownfields per census tract.

One way you can view two variables at once on a map is to create bivariate symbology. This creates a grid of colors with an X and Y axis with the following categories:

<p align="center"><img src= "Images/bivariate-color-scheme.png" alt="Bivariate Color Scheme" style="width:65%"></p>

- Where the Bottom Left cell indicates low values in both variables
- Where the Upper Left cell indicates high values in the first variable and low values in the second variable
- Where the Bottom Right cell indicates low values in the first variable and high values in the second variable
- Where the Upper Right cell indicates high values in the first variable and high values in the second variable

The other grid cells represent midpoints in the variables. You can create these bivariate maps in QGIS with some editing of the individual color palettes in your graduate colors map and by using a plug-in to create the legend. In order to create the map you first need to duplicate your layer. This example will use the _census tract_ dataset with the "Name_count" variable for number of brownfields and variables for different demographics such as total population (tot_pop). To duplicate a layer, right/CRTL click on a layer and select **Duplicate Layer**. If it helps you to keep the two layers organized, you can amend their layer name to include the variable used for the graduated colors (e.g. tracts_brownfields_count or tracts_brownfields_pop). Otherwise just remember what each one is displaying.

Next, you will need to create a graduated symbology for each layer with three (3) categories and a specific color palette. Because bivariate maps blend two color schemes it is important to have the appropriate color selection. Here is an example of several bivariate color palettes from [Joshua Stevens](https://www.joshuastevens.net/cartography/make-a-bivariate-choropleth-map/)

<p align="center"><div class="zoom"><img src= "Images/bivariate-color-palettes.png" alt="Bivariate Color Palettes" style="width:100%"></div></p>

This example will use the second color palette. There are a few steps required to match your graduated colors categories to the color palettes on the X and Y axes in the image above:

  1. Select the appropriate _value_. Tot_Pop and Name_Count will be used for this example.
  2. Set the classes to 3.
  3. Double-click on the first of the selected colors.
  4. In the new _Symbol Selector_ window, click on the color bar.
  5. In the next _Select Color_ window, type in the HTML notation that matches your selected palette. Remember, one of your layers will start at the bottom left and work up and the other will start and the bottom left and work to the right.

Finally, click OK on the windows in Steps 3 and 4 and repeat the process for the next color moving progressively up or right on the selected bivariate palette depending on the layer. This process will need to be completed for both of the layers (one with the brownfield count and the other with a demographic variable) you will use in your bivariate map.

<p align="center"><div class="zoom"><img src= "Images/qgis-customize-color-steps.png" alt="Bivariate Color Palette Steps" style="width:100%"></div></p>

Once you have completed this process for each of the layers you should return to the properties menu of the first layer and under the _Symbology_ tab scroll down to _Layer Rendering_ and expand the options to see the options for **Blending mode**. Choose _multiply_ for the layer, and _normal_ for the feature and click OK.

<p align="center"><div class="zoom"><img src= "Images/qgis-layer-rendering.jpg" alt="Layer Rendering" style="width:100%"></div></p>

With both layers viewable you should now have a bivariate color palette for the data.

<p align="center"><div class="zoom"><img src= "Images/qgis-bivariate-map.png" alt="Bivariate Map" style="width:100%"></div></p>

Because QGIS is unable to interpret the appropriate legend style from the available data you need to add a plug-in. Click **Plugins > Manage and Install Plugins** from the menu bar. 

<p align="center"><img src= "Images/qgis-manage-plugins.png" alt="Install and Manage Plugins" style="width:100%"></p>

Next, search for "Bivariate Legend" in the search bar. Be sure _All_ is selected in the left side column. This should bring up the *Bivariate Legend* plugin. Click _Install Plugin_ in the lower right of the window to install the legend generator.

<p align="center"><div class="zoom"><img src= "Images/qgis-bivariate-legend-plugin.png" alt="Install Bivariate Legend Plugin" style="width:100%"></div></p>

With the plugin installed you can return to your project. You will now have a _Bivariate Legend_ button <img src= "Images/qgis-bivariate-plugin-button.jpg" alt="Bivariate Legend button" width = "20" height = "20"> on your toolbar. Click the button to open the legend generator. Select the appropriate _Top layer_ from the drop-down menu and click the box for Reverse colors. Select the appropriate _Bottom layer_, set the _Square width_ to 48, and choose _Multiply_ for the drop-down menu below. Then click _Generate legend_.

<p align="center"><img src= "Images/qgis-bivariate-legend-generator.png" alt="Bivariate Legend Generator" style="width:85%"></p>

Finally, click **Export legend to image** and save it in your project folder with a **.png** extension. In your layout, use the _Add image_ button <img src= "Images/qgis-add-image-button-layout.jpg" alt="Add image to layout button" width = "20" height = "20"> and draw the image box while holding the shift key to constrain the box to a perfect square. Next, right-click in the box and go to _Item properties_ to locate the image file. In the properties pane, select _Raster Image_ and navigate to your project folder to add the image. 

<p align="center"><div class="zoom"><img src= "Images/qgis-insert-image-legend.png" alt="Insert Legend Image" style="width:100%"></div></p>

Remember that this is now simply an image which means you will need to manually create the legend using the image and inserted text. Be sure you remember which data belongs on which axis and think about how you want the reader to interpret the information. Here is an example of one that could be used for the legend created above.

<p align="center"><img src= "Images/qgis-sample bivariate-legend.png" alt="Bivariate Legend Example" style="width:50%"></p>

You should <u>experiment with different demographics</u> to see how they relate to the number of brownfields before completing the assignment. You will want to select the variable that you believe shows which demographic would most be impacted by the presence of brownfields in the various tracts.

<big><b>Question No. 4</b></big>
<blockquote>
Which census tract contains the most brownfields?
</blockquote>

</details>
<hr></hr>

<details><summary><big>View directions in <b> [R]{style="color:#6495ED"} </b></span></big></summary>

As stated above, the County Commissioners and the Health Department want to see the relationship between census tracts with large minority populations and brownfields and the Health Department would also like to see the relationship between brownfields juvenile populations. The visualizations you created above simply show the number of brownfields per watershed/census tract. You could create a qualitative visualization where demographics (either minority class or age class) is used as the fill with brownfield locations as an overlay. However, you can also create a two variable quantitative map called a bivariate map that displays quantitative categories for two different variables. This creates a grid of colors with an X and Y axis with the following categories:

<p align="center"><img src= "Images/bivariate-color-scheme.png" alt="Bivariate Color Scheme" style="width:65%"></p>

- Where the Bottom Left cell indicates low values in both variables
- Where the Upper Left cell indicates high values in the first variable and low values in the second variable
- Where the Bottom Right cell indicates low values in the first variable and high values in the second variable
- Where the Upper Right cell indicates high values in the first variable and high values in the second variable

The other grid cells represent midpoints in the variables. You can create these bivariate maps in [R]{style="color:#6495ED"} manually or by using the `biscale` package loaded at the beginning of this exercise. Because bivariate maps blend two color schemes it is important to have the appropriate color selection. [Graduated symbology palettes](https://slu-opengis.github.io/biscale/reference/bi_pal.html) have been created that cover a range of color possibilities for the legend. These include: 

<p align="center"><div class="zoom"><img src= "Images/biscale-bivariate-colors.png" alt="Biscale Bivariate Color Palettes" style="width:100%"></div></p>

Like the visualization you created for [Exercise 3, Step 3](https://chrismgentry.github.io/GIS1-Exercise-4/#13_Step_Three:_The_Visualization), you will rely on syntax from `cowplot` to overlay the map and legend from `biscale`. However, currently the only demographic data in this exercise is by race. So before you begin this process you need to take the additional step of connecting the population data from [Exercise 3, Step 2](https://chrismgentry.github.io/GIS1-Exercise-3/#12_Step_Two:_The_Analyses). The same script can be used to obtain the dataset:

```{r population data, message=FALSE, warning=FALSE, echo=TRUE}
population <- read.csv('https://raw.githubusercontent.com/chrismgentry/GIS1-Exercise-3/main/Data/mont_co_pop.csv', colClasses=c(Tract="character"))
```

To connect the data to the census tract dataset **_complete_** the following script:

```{r connect pop data to tracts, message=FALSE, warning=FALSE, echo=TRUE}
#census_with_pop <- merge(x = census_tract_dataset, y = DATASET, by.x = "VARIABLE", by.y = "VARIABLE", all = TRUE)
```

If you have questions about completing the script above review the information from [Exercise 3, Step 2](https://chrismgentry.github.io/GIS1-Exercise-3/#12_Step_Two:_The_Analyses) or from similar steps earlier in this exercise.

To create the bivariate dataset you need to use the `bi_class` function from _biscale_.

```{r bivariate data, message=FALSE, warning=FALSE, echo=TRUE}
bivariate_data <- bi_class(census_with_pop, x = total_pop, y = BF_Count, dim = 3, style = "jenks")
```

In this script you identify the dataset to be used to create the _bi_class_ data. You identify the x variable, in this example total population and the y variable for the count of brownfields. Finally you provide the number of dimensions (3) and _style_ to calculate the divisions in the data. For this example you will use [Jenks Natural Breaks Classification](https://pro.arcgis.com/en/pro-app/latest/help/mapping/layer-properties/data-classification-methods.htm) simply identified as "jenks". 

Now you can use the information above to create the visualization based on total population. You will need to create a new code block or alter the one above to identify an x variable appropriate to answer the questions posed by the County Commission and Health Department.

```{r bivariate map total pop, message=FALSE, warning=FALSE, echo=TRUE}
bivariate_map <- ggplot() +
  geom_sf(data = bivariate_data, mapping = aes(fill = bi_class), color = "white", size = 0.1, show.legend = FALSE) +
  geom_point(data = brownfields_dataset, aes(x=long, y=lat), color = "red") + 
  bi_scale_fill(pal = "DkViolet", dim = 3) +
  theme_void()
```

You can use this same example script to create a map based on a different demographic variable (x) as long as it is identified appropriately in the **bivariate_data** script block above. Next you need to create an object to serve as the legend. Unfortunately, because `biscale` legends are not linked to the data you will need to overlay it as an "image". So be careful that any changes made to the data are reflected in the legend as well.

```{r bivariate map legend, message=FALSE, warning=FALSE, echo=TRUE}
legend <- bi_legend(pal = "DkViolet",
                    dim = 3,
                    xlab = "Total Population",
                    ylab = "No. Brownfields",
                    size = 10)
```

With the map and legend objects created you can now script the base of the final map using the same process as the [last exercise](https://chrismgentry.github.io/GIS1-Exercise-4/#13_Step_Three:_The_Visualization).

```{r final map, message=FALSE, warning=FALSE, echo=TRUE}
final_map <- ggdraw() +
  draw_plot(bivariate_map, 0, 0, 1, 1) +
  draw_plot(legend, 0.7, 0, 0.25, 0.25)
final_map
```

For this map, total population was simply used to provide an example script. So to finish this exercise you will need to alter the **x** variable in the _bivariate_data_ script to determine if there is any relationship between demographics (race or age) for the County Commission and Health Department. 

<big><b>Question No. 4</b></big>
<blockquote>
Which census tract contains the most brownfields?
</blockquote>

</details>

# The Write-Up

In the report you provide to the County Commission, Stormwater Management, Health Department, and TDEC please provide the following information:

- Which watershed is most potentially impacted by brownfields?
- What is the total population of the tracts with brownfields?
- What demographic could be the most impacted by brownfields?
  - By examining the tract with the most brownfields, what demographic is the most impacted, and how does it relate to location within the city/county?

Be sure to include mpas you created to support your report. When complete, send a link to your _Colab Notebook_ or word document with answers to Questions 1-4 and your completed map(s) via email.