-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
101 lines (69 loc) · 2.95 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
title: "fastbackward"
output: github_document
---
<!-- badges: start -->
[![CRAN status](https://www.r-pkg.org/badges/version/fastbackward)](https://CRAN.R-project.org/package=fastbackward)
<!-- badges: end -->
# Overview
**fastbackward** is a package that contains the `fastbackward()` function. This
function works similarly to backward elimination with the `stepAIC()` function from
the **MASS** package; except, the `fastbackward()` function makes use of a
bounding algorithm to perform backward elimination faster.
# How to install
**fastbackward** can be installed using the `install.packages()` function
```{r, eval = FALSE}
install.packages("fastbackward")
```
It can also be installed via the `install_github()` function from the **devtools** package.
```{r, eval = FALSE}
devtools::install_github("JacobSeedorff21/fastbackward")
```
# Usage
Here is a comparison of runtimes with the `stepAIC()` function from the
**MASS** package and the `fastbackward()` function from the **fastbackward**
package. This comparison is based upon a randomly generated logistic
regression model with 1000 observations and 50 covariates.
```{r, warning = FALSE, message = FALSE}
# Loading in fastbackward and MASS
library(fastbackward)
library(MASS)
# Defining function to generate datasets for logistic regression
LogisticSimul <- function(n, d, Bprob = .5, sd = 1, rho = 0.5){
x <- MASS::mvrnorm(n, mu = rep(1, d), Sigma = diag(1 - rho, nrow = d, ncol = d) +
matrix(rho, ncol = d, nrow = d))
beta <- rnorm(d + 1, mean = 0, sd = sd)
beta[sample(2:length(beta), floor((length(beta) - 1) * Bprob))] = 0
beta[beta != 0] <- beta[beta != 0] - mean(beta[beta != 0])
p <- 1/(1 + exp(-x %*% beta[-1] - beta[1]))
y <- rbinom(n, 1, p)
df <- cbind(y, x) |>
as.data.frame()
df
}
# Setting seed and creating dataset
set.seed(33391)
df <- LogisticSimul(1000, 50, .5, sd = 0.5)
# Fitting full logistic regression model
fullmodel <- glm(y ~ ., data = df, family = binomial(link = "logit"))
# Times
## Timing fast backward elimination
fastbackwardTime <- system.time(fastbackward1 <- fastbackward(fullmodel, trace = 0))
fastbackwardTime
## Timing step function
stepTime <- system.time(BackwardStep <- stepAIC(fullmodel, direction = "backward", trace = 0))
stepTime
```
For this logistic regression model, the fast backward elimination algorithm from
the **fastbackward** package was about `r round(stepTime[[3]] / fastbackwardTime[[3]], 2)`
times faster than stepAIC. The amount of speedup attained from the fast backward elimination
algorithm depends on the strength of association between the covariates and the
response variable and the number of covariates. So, speedup will vary depending
on the specific problem.
### Checking results
```{r}
# Checking if both methods give same results
all.equal(BackwardStep, fastbackward1)
```
Hence, the two methods give the same results and the fast backward elimination
algorithm is faster than stepAIC.