-
Notifications
You must be signed in to change notification settings - Fork 2
/
README.Rmd
159 lines (117 loc) · 5.18 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
---
title: piRF - Prediction Intervals for Random Forests
output: github_document
author: Chancellor Johnstone and Haozhe Zhang
bibliography: REFERENCES.bib
nocite: |
@roy2019prediction, @ghosal2018boosting, @zhu2019hdi, @zhang2019random, @meinshausen2006quantile, @romano2019conformalized, @tung2014bias, @breiman2001random, @lopez2008neural
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
<img src="piRF.png" align="right" height="120"/>
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
```{r, echo = FALSE, results="hide", message=FALSE, warning=FALSE}
library(badger)
```
```{r, echo = FALSE, prompt=FALSE, results='asis', comment=""}
cat(
"[![](https://www.r-pkg.org/badges/version/piRF?color=orange)](https://cran.r-project.org/package=piRF)",
"[![](http://cranlogs.r-pkg.org/badges/last-month/piRF?color=blue)](https://cran.r-project.org/package=piRF)"
#badge_cran_release("piRF", "orange"),
#badge_cran_download("piRF", "grand-total", "blue")
)
```
## Introduction
The goal of *piRF* is to implement multiple state-of-the art random forest prediction interval methodologies in one complete package. Currently, the methods implemented can only be utilized within isolated packages, or the authors have not made a package publicly available. The package itself utilizes the functionality provided by the *ranger* package. If you utilize this package in any publications, please use the following citation:
Johnstone C, Zhang H (2020). piRF: Prediction Intervals for Random Forests. R package version 0.1.0,
<https://CRAN.R-project.org/package=piRF>.
A BibTeX entry for LaTeX users is
@Manual{,
title = {piRF: Prediction Intervals for Random Forests},
author = {Chancellor Johnstone and Haozhe Zhang},
year = {2020},
note = {R package version 0.1.0},
url = {https://CRAN.R-project.org/package=piRF},
}
## Installation
You can install the released version of *piRF* from [CRAN](https://CRAN.R-project.org) with:
``` r
install.packages("piRF")
```
And the development version from [GitHub](https://github.com/) with:
``` r
# install.packages("devtools")
devtools::install_github("chancejohnstone/piRF")
```
## Example
This is a basic example which utilizes the *airfoil* dataset included with *piRF*. The dataset comes from [UCI Archive](https://archive.ics.uci.edu/ml/datasets/Airfoil+Self-Noise#). The NASA data set comprises different size NACA 0012 airfoils at various wind tunnel speeds and angles of attack.
The following functions are not exported by *piRF* but are used for this example.
```{r example}
library(piRF)
## basic example code
data(airfoil)
head(airfoil)
#functions to get average length and average coverage of output
getPILength <- function(x){
#average PI length across each set of predictions
l <- x[,2] - x[,1]
avg_l <- mean(l)
return(avg_l)
}
getCoverage <- function(x, response){
#output coverage for test data
coverage <- sum((response >= x[,1]) * (response <= x[,2]))/length(response)
return(coverage)
}
```
Prediction intervals are generated for each of the methods implemented using train and test data sets constructed from the *airfoil* data.
```{r}
method_vec <- c("quantile", "oob", "bcqrf", "cqrf", "bop", "hdi", "brf")
#generate train and test data
set.seed(2020)
ratio <- .975
nrow <- nrow(airfoil)
n <- floor(nrow*ratio)
samp <- sample(1:nrow, size = n)
train <- airfoil[samp,]
test <- airfoil[-samp,]
#generate fitted models
res <- pirf(pressure ~ . , train_data = train,
method = method_vec,
concise= FALSE,
num_threads = 2,
alpha = .1)
#generate prediction intervals from fitted models
preds <- predict(res, test_data = test)
```
In this example, the *num_threads* option identifies the use of two cores for parallel processing. The default is to use all available cores. The *concise* option allows for the output of predictions for the test observations.
Below are the coverage rates and average prediction interval lengths using the test dataset. Both are important characteristics of prediction intervals.
```{r}
#empirical coverage, and average prediction interval length for each method
coverage <- sapply(preds$int, FUN = getCoverage, response = test$pressure)
coverage
length <- sapply(preds$int, FUN = getPILength)
length
```
Below are plots of the resulting prediction intervals generated for each method.
```{r}
#plotting intervals and predictions
par(mfrow = c(2,2))
for(i in 1:length(method_vec)){
#color based on empirical coverage
col <- ((test$pressure >= preds$int[[i]][,1]) * (test$pressure <= preds$int[[i]][,2])-1)*(-1)+1
plot(x = preds$preds[[i]], y = test$pressure, pch = 20,
col = "black", ylab = "true", xlab = "predicted", main = method_vec[i])
abline(a = 0, b = 1)
segments(x0 = preds$int[[i]][,1], x1 = preds$int[[i]][,2],
y1 = test$pressure, y0 = test$pressure, lwd = 1, col = col)
}
```
If you find any issues with the package, or have suggestions for improvements, please let us know.
## References