-
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Codebook.Rmd
287 lines (175 loc) · 17.7 KB
/
Codebook.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
---
title: |
<center> Codebook for data repository </center>
<center> Building age map, Vienna, around 1920 </center>
output:
pdf_document:
toc: true
number_sections: true
toc_depth: '2'
author: "Ulrich Kral, Ferdinand Reimer"
urlcolor: blue
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(openxlsx)
library(rmarkdown)
library(knitr)
library(sf)
library(tmap)
library(readxl)
library(mapview)
library(ggplot2)
rm(list=ls())
tmap_mode("view")
```
```{r specify_inputs, echo = FALSE}
###########################################
# specify local directory that includes the files from the GitHub repository https://github.com/ukral/building.map_1920
dir_github = "C:/Users/U/Documents/03_TU Wien/building.map_1920/01_Github/building.map_1920/"
###########################################
###########################################
# specify local directory that includes the files from the Zenodo repository https://www.doi.org/10.5281/zenodo.3715200
dir_zenodo = "C:/Users/U/Documents/03_TU Wien/building.map_1920/02_Zenodo/"
###########################################
BSM_1920 <- st_read(paste(dir_zenodo, "Building age map around 1920/BSM_1920.shp", sep = ""), quiet = T)
dataset_final <- read.csv(paste(dir_zenodo, "Building age map around 1920/BSM_1920_attribute_table.csv", sep = ""), sep = ";")
CB_2020 <- st_read(paste(dir_zenodo, "City boundary 2020/CB_2020.shp", sep = ""), quiet = T)
AOOS_1920 <- st_read(paste(dir_zenodo, "Areas out of scope 1920/AOOS_1920.shp", sep = ""), quiet = T)
CB_1920 <- st_read(paste(dir_zenodo, "City boundary 1920/Version_1/CB_1920.shp", sep = ""), quiet = T)
UDB_1920 <- st_read(paste(dir_zenodo, "Urban district boundaries 1920/Version_1/UDB_1920.shp", sep = ""), quiet = T)
SABAM_1920 <- st_read(paste(dir_zenodo, "Scope of analog building age map 1920/SABAM_1920.shp", sep = ""), quiet = T)
DD_T_10 <- read_xlsx(paste(dir_github, "Background_data.xlsx", sep = ""), sheet = "DD_T_10", col_names = TRUE, range = "A3:C31")
CB_SABAM_1920_AT <- read_xlsx(paste(dir_github, "Background_data.xlsx", sep = ""), sheet = "CB_SABAM_1920_AT", col_names = TRUE, range = "A3:B5")
```
# Preface {-}
**This Codebook**
* **is part** of the GitHub repository [building.map_1920](https://github.com/ukral/building.map_1920). Check out the [readme file](https://github.com/ukral/building.map_1920/blob/main/README.md) to learn more about the corresponding datasets and the use and re-use the underlying R Markdown file.
* **refers** to the geospatial datasets "Building age map around 1920" and "Scope of analog building age map 1920", which can be retrieved from [Zenodo](https://doi.org/10.5281/zenodo.3715200).
* **specifies** the formats of the datasets and the data fields in the attribute tables of the datasets.
* **is for users** of the datasets who want to use and re-use the datasets.
# General dataset specifications
## Location and geographical coverage
The spatial coverage is the city of Vienna, which is the capital of Austria (Latitude: 48° 12' 30.56" N, 16° 22' 19.49" E).
```{r, scope, fig.cap = "City of Vienna", echo=FALSE}
# tm_shape(CB_2020) + tm_polygons("red", alpha = 0.3)
```
## Dataset format
Geospatial data
* File extension: *.shp
* Coordinate reference system: [EPSG-Code 31256](https://spatialreference.org/ref/epsg/mgi-austria-gk-east/).
* Software readability: Examples: [QGIS](https://www.qgis.org), [ArcGIS](https://desktop.arcgis.com/de/), [R](https://www.r-project.org/) and [FME Data Integration Platform from Safe Software](https://www.safe.com/)
Attribute tables
* File extension: *.csv
* CSV specifications: The data records are separated by semicolons (“;”). The decimal separator is a decimal point “.”. The Universal Coded Character Set (UCS) is UTF-8.
* Software readability: Files are readable with the aforementioned software tools.
# Dataset "Building age map around 1920"
## Introduction
This chapter specifies the geospatial dataset [Building age map around 1920](https://doi.org/10.5281/zenodo.3715200), briefly called BSM_1920.
The dataset includes `r nrow(dataset_final)` data entries (rows) and `r ncol(dataset_final)` data fields (columns). The dataset can be retrieved from the [Zenodo repository](https://www.doi.org/10.5281/zenodo.3715200), whereas the dataset can be combined from the following files
* "BSM_1920.shp", which includes the polygon-features and an unique identifier in the data field "ID".
* "BSM_1920_attribute_table_final.csv", which is the attribute table without geometry information. It also includes the data field "ID" and therefore can be linked with the "Building stock and age 1920.shp" file.
## Attribute table
The dataset includes `r nrow(DD_T_10)` data fields. `r kable(head(DD_T_10[,1:2], nrow(DD_T_10)))`
## Comments
This section comments on the individual data fields by giving background information and details for using the data.
### ID
The field includes a unique character for each of the `r nrow(dataset_final)` polygon-features. The character the combination of "UD" for urban district", the district number "UD.2020", an underline "_" and an integer. Example: "UD1_1543".
It is noted that the integer at the end of the "ID" is unique per urban district and doesn't follow a specific logical order. It is also noted the series of integers has gaps, which is due to the removal and merging of polygon-features during the clean-up process and the integration of all building footprints from more than 100 different maps sheets into a single map.
### ID.bs
The field includes a unique character for `r length(which(dataset_final$ID.bs != ""))` polygon-features. It corresponds with the data field "ID" of the [Building schematic of Vienna in the late 1920s](https://doi.org/10.5281/zenodo.4106173) dataset and therefore allows to assign data records "year of construction", "year of purchase", "cadastral community" and "floors".
It is noted that the "ID.bs" records are based on address matching between the BSM_1920 and the [Building schematic of Vienna in the late 1920s](https://doi.org/10.5281/zenodo.4106173) dataset. The addresses in the BSM_1920 dataset are not completely verified, because we were not able to reconstruct the address data of 1920 based on address labels in historical building stock maps.
### Area
The field specifies the area of the polygon-feature in squared meters. It is noted that polygon-features represent built-up land but not necessary buildings in a definitional sense.
### UD.1920
The field specifies the location of the polygon-feature with respect to the urban district distribution in 1920. At this time, the city was divided into 21 districts.
The field covers the following unique records: `r sort(unique(dataset_final$UD.1920), na.last = T)`. The integers `r min(dataset_final$UD.1920, na.rm=T)`-`r max(dataset_final$UD.1920, na.rm=T)` stand for the number of the urban districts. "NA" records are for polygon-features beyond the city boundary 1920.
### UD.2020
The field specifies the location of the polygon-feature with respect to the urban district distribution in 2020.
The field covers the following unique records: `r unique(dataset_final$UD.2020)`. The integers `r min(dataset_final$UD.2020, na.rm=T)`-`r max(dataset_final$UD.2020, na.rm=T)` stand for the number of the urban districts.
### Location.city_limit_1920
This data field specifies if the polygon-feature is located inside our outside the city boundary of 1920.
The field covers the following unique records: `r unique(dataset_final$Location.city_limit_1920)`.
### Location.abam_1920
The [analog building age map 1920](https://www.geschichtewiki.wien.gv.at/Baualter_1920) covers `r round(as.numeric(st_area(SABAM_1920)/st_area(CB_1920)*100),digits=0)`% of the city area at this time. This field specifies if the polygon-feature is inside or outside of the coverage area of the [analog building age map 1920](https://www.geschichtewiki.wien.gv.at/Baualter_1920).
The field covers the following unique records: `r unique(dataset_final$Location.abam_1920)`.
### Polygon.source_id
The field identifies the data sources from which the building footprints were retrieved. The field covers `r length(unique(dataset_final$Polygon.source_id))` unique records. The first two positions indicate the type of data source, whereas
* "FF" stands for "Fortführungsmappe" (in German). It is an analog cadastral map sheet, which we used to vectorize building footprints. The cadastral maps sheets are available in TIFF format and can be retrieved from [Federal Office of Metrology and Surveying (BEV)](https://www.bev.gv.at). The nomenclatur is presented in Figure \@ref(fig:nomenclatur).
```{r nomenclatur, fig.align = 'center', out.width = "30%", fig.cap = "Nomenclatur for identifying historical cadastral map sheets.", echo=FALSE}
include_graphics(paste(dir_github, "Figures/Codebook_figure_FF.png", sep = ""))
```
* "FW" stands for "Feuerwehrplan" (in German). It is an historical fire brigade map sheet, which we used to vectorize building footprints. The cadastral maps sheets are available in TIFF format and can be retrieved from [Federal Office of Metrology and Surveying (BEV)](https://www.bev.gv.at).
* "MA" stands for "Magistratratsabteilung" (in German). It is the city authority - department 41, who provided the polygon-features. The polygon-features can be retrieved from [Open Data Österreich](https://www.data.gv.at/katalog/dataset/7cf0da04-1f77-4321-929e-78172c74aa0b).
* "BE" stands for "Bezirksplan" (in German). It covers historical urban district maps. The urban district maps are available in TIFF format and can be retrieved from [Municipal and Provincial Archives of Vienna ](https://www.wien.gv.at/kultur/archiv).
* "BA" stands for "Baualterskarte" (in German). It is the [analog building age map 1920](https://www.geschichtewiki.wien.gv.at/Baualter_1920). It covers historical urban district maps. The urban district maps are available in TIFF format and can be retrieved from [Municipal and Provincial Archives of Vienna ](https://www.wien.gv.at/kultur/archiv).
### Polygon.source_name
This field groups the "Polygon.source_id" records by the type of the cartographic document into: `r unique(dataset_final$Polygon.source_name)`.
### Editor
The field covers name of the person who digitized the building footprints from analog maps or the institution that provided the polygon-features.
The field covers the following unique records: `r unique(dataset_final$Editor)`.
### CD.abam
The field covers construction periods of buildings, which were retrieved from the [analog building age map 1920](https://www.geschichtewiki.wien.gv.at/Baualter_1920).
The field covers the following unique records: `r unique(dataset_final$CD.abam)`. Blank records indicate that no information is available.
### CD.bs
The field covers construction years of buildings, which were retrieved from [Building schematic of Vienna in the late 1920s](https://doi.org/10.5281/zenodo.4106173).
It is noted that the assignment of construction years is based on address matching and the addresses in the BAM_1920 were not entirely validated. It is also noted that the BSM_1920 includes polygon-features with multiple addresses. Due to the address matching, these polygon-features potentially got multiple construction year assignments.
The field covers `r length(unique(dataset_final$CD.bs))` unique records.
### TP_pub.date
The field stands for the temporal presence of the building footprint. The temporal presence was retrieved from the publication data of the analog building stock map, from which we retrieved and digitized the building footrint. In other words, the presence is confirmed if the builing is shown on the analog buildig age map. The temporal presence is either given by years or periods. Periods are given for the cadastral map sheets (see data field "Polygon.source_id"), because the map sheets were used to record changes in a specific period.
The field covers the following unique records: `r length(unique(dataset_final$TP_pub.date))`.
### TP_pub.date_year
The field includes either the year or the end of the period as given in "TP_pub.date". It documents the temporal presence of the building footprints by a single year.
The field covers the following unique records: `r sort(unique(dataset_final$TP_pub.date_year))`.
### TP_pub.date_period
This field groups the "TP_pub.date_year" records into the following periods: `r sort(unique(dataset_final$TP_pub.date_period))`.
### PoC
This field covers the building's period of construction. The periods were retrieved from "CD.abam", "CD.bs" and "TP_pub.date_year" and harmonized as followed: `r sort(unique(dataset_final$PoC))`.
It is noted that the periods include two distinctive types. Firs, periods with a timespan of 10 years. Second, periods that indicate that the building has been constructed before a given year (e.g. 0-1850).
### PoC.source_name
This field specifies the data source from which the "PoC" records were retrieved. The field covers the following unique records: `r unique(dataset_final$PoC.source_name )`.
### ID.Address.2019
The "Address.2019" records were retrieved on 6 July 2019 from the address database [Adressen Standorte Wien](https://www.data.gv.at/katalog/dataset/1d5c2411-9719-4c8f-b99d-57a5f4a4ae41). It's data field "OBJECTID" is identical with the data field "ID.Address.2019".
### Address.2019_reloc
The "Address.2019" records were retrieved from the address database [Adressen Standorte Wien](https://www.data.gv.at/katalog/dataset/1d5c2411-9719-4c8f-b99d-57a5f4a4ae41) as point data. Points that were located outside of polygon-features were re-located to the inside of polygon-features close by. This enables the assignment of address data to polygon-features. This data field specifies if the address point was manually re-located.
The field covers the following unique records: `r unique(dataset_final$Address.2019_reloc)`.
### Address.2019
The field specifies the address in 2019, in particular the name of the traffic area and the orientation number. This field corresponds with "NAME" in the address database [Adressen Standorte Wien](https://www.data.gv.at/katalog/dataset/1d5c2411-9719-4c8f-b99d-57a5f4a4ae41).
The field covers `r length(unique(dataset_final$Address.2019))` unique records.
### Address.1920
The field specifies the address in 1920, in particular the name of the traffic area and the orientation number. It is noted that the address data are not entirely validated.
The field covers `r length(unique(dataset_final$Address.1920))` unique records.
### Address_reviewed
We reviewed some "Address.2019" records to verify if the address was valid in 1920. This data field specifies if the "Address.2019" record was reviewed or not.
The field covers the following unique records: `r unique(dataset_final$Address_reviewed)`.
### Address_revised
Addresses under review (see “Address_reviewed”) were revised based on the address labels in historical building stock maps (see "Address_reviewed"). This data field includes all addresses, as shown on historical buildings stock maps with address labels, that refer to the polygon-features. This data fields includes the "Address.2019" record if it is valid for 1920 too, as well as additional addresses. It is noted that multiple addresses (e.g. for corner buildings) are separated by a comma (";").
The field covers `r length(unique(dataset_final$Address_revised))` unique records.
### Location.usm
The analog urban sprawl map [Die räumliche Ausdehnung Wien von 1857 bis 1957](https://doi.org/10.5281/zenodo.5594097) documents the built-up area in 1918. This field specifies if the polygon-feature is located within the built-up area 1918. The field covers the following unique records: `r unique(dataset_final$Location.usm)`.
### Location.lsm
The land use map documents the built up area 1912. The map is available in SHP format and can be retrieved from [S. Hohensinner](https://orcid.org/0000-0002-3517-0259) based on personal request. This field specifies if the polygon-feature is located within the built-up area 1918. The field covers the following unique records: `r unique(dataset_final$Location.lum)`.
### Presence.conf
The data field combines the information from “Location.usm” and “Location.lum” and confirms if the building footprint can be found in one of the two types of built-up areas. Building footprints outside of the built-up areas are considered as long as the temporal presence "TP_pub.date_year" equal or before 1921.
The field covers the following unique records: `r unique(dataset_final$Presence.conf)`.
### CD.bs_fit
This data field shows if the construction year “CD.bs” fits within the period of construction “PoC”. The field covers the following unique records: `r unique(dataset_final$CD.bs_fit)`.
The data record are interpreted as followed:
* "included": The construction year "CD.bs" is within the period of construction "PoC".
* "before": The construction year "CD.bs" is before the start year of the period of construction "PoC".
* "after": The construction year "CD.bs" is after the end year of the period of construction "PoC".
* "ambiguous": The "CD.bs" records include multiple years, which are within two distinctive periods of construction "PoC". So, the "CD.bs" record cannot be assigned to a single period.
* "No 'CD.bs' record": The polygon-feature has no "CD.bs" record.
# Dataset "Scope of analog building age map 1920"
## Introduction
This chapter refers to the geospatial dataset [Scope of analog building age map 1920](https://doi.org/10.5281/zenodo.3715200), briefly called SABAM_1920.
The dataset includes `r nrow(SABAM_1920)` polygon-feature.
## Attribute table
The dataset includes `r ncol(st_drop_geometry(SABAM_1920))` data fields. `r kable(head(CB_SABAM_1920_AT, nrow(CB_SABAM_1920_AT)))`
## Comments
### ID
This field includes a unique identifier for the polygon-feature.
### Area
This field specifies the area of the polygon-feature in squared meter.
# Competing interests {-}
The authors declare no competing interests.