-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
155 lines (111 loc) · 5.16 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include=FALSE}
options(width = 80)
knitr::opts_chunk$set(
eval = FALSE,
collapse = TRUE,
comment = "#>",
fig.path = "Readme_files/"
)
```
```{r, include=FALSE}
pkgs = c("here", "opalr", "DSI", "DSOpal", "dsBaseClient")
for (pkg in pkgs) {
if (! requireNamespace(pkg, quietly = TRUE))
install.packages(pkg, repos = c(getOption("repos"), "https://cran.obiba.org"))
}
devtools::install(quiet = TRUE, upgrade = "always")
## Install packages on the DataSHIELD test machine:
surl = "https://opal-demo.obiba.org/"
username = "administrator"
password = "password"
opal = opalr::opal.login(username = username, password = password, url = surl)
check1 = opalr::dsadmin.install_github_package(opal = opal, pkg = "dsCWB", username = "schalkdaniel", ref = "main")
if (! check1)
stop("[", Sys.time(), "] Was not able to install dsCWB!")
check2 = opalr::dsadmin.publish_package(opal = opal, pkg = "dsCWB")
if (! check2)
stop("[", Sys.time(), "] Was not able to publish methods of dsCWB!")
opalr::opal.logout(opal = opal)
```
[![Actions Status](https://github.com/schalkdaniel/dsCWB/workflows/R-CMD-check/badge.svg)](https://github.com/schalkdaniel/dsCWB/actions) [![License: LGPL v3](https://img.shields.io/badge/License-LGPL%20v3-blue.svg)](https://www.gnu.org/licenses/lgpl-3.0) [![codecov](https://codecov.io/gh/schalkdaniel/dsCWB/branch/main/graph/badge.svg?token=0K9P2WBKNH)](https://codecov.io/gh/schalkdaniel/dsCWB)
# Component-wise boosting for DataSHIELD
The package provides functionality to conduct and visualize component-wise boosting on decentralized data. The basis is the [DataSHIELD](https://www.datashield.org/) infrastructure for distributed computing. This package provides the calculation of the [__component-wise boosting__](https://www.tandfonline.com/doi/abs/10.1198/016214503000125). Note that DataSHIELD uses an option `datashield.privacyLevel` to indicate the minimal amount of numbers required to be allowed to share an aggregated value of these numbers. Instead of setting the option, we directly retrieve the privacy level from the [`DESCRIPTION`](https://github.com/schalkdaniel/dsCWB/blob/master/DESCRIPTION) file each time a function calls for it. This options is set to 5 by default.
## Installation
At the moment, there is no CRAN version available. Install the development version from GitHub:
```{r,eval=FALSE}
remotes::install_github("schalkdaniel/dsCWB")
```
#### Register methods
It is necessary to register the assign and aggregate methods in the OPAL administration. These methods are registered automatically when publishing the package on OPAL (see [`DESCRIPTION`](https://github.com/schalkdaniel/dsCWB/blob/main/DESCRIPTION)).
Note that the package needs to be installed at both locations, the server and the analysts machine.
## Usage
### Log into DataSHIELD
```{r}
library(DSI)
library(DSOpal)
surl = "https://opal-demo.obiba.org/"
username = "administrator"
password = "password"
builder = newDSLoginBuilder()
for (i in seq_len(3L)) {
builder$append(
server = paste0("server", i),
url = surl,
user = username,
password = password,
table = paste0("CNSIM.CNSIM", i)
)
}
connections = datashield.login(logins = builder$build(), assign = TRUE)
```
### Fit distributed component-wise boosting
```{r}
library(dsCWB)
#Remove all missings:
datashield.assign(connections, "Dclean", quote(dsNaRm("D")))
symbol = "Dclean"
target = "LAB_TSC"
feature_names = c("GENDER", "DIS_DIAB", "LAB_HDL", "LAB_TRIG")
cwb = dsCWB(connections, "Dclean", target, feature_names, mstop = 100L,
val_fraction = 0.2, patience = 3L, seed = 31415L)
# Visualize selected base learner:
plotBaselearnerTraces(cwb)
# Get log for further investigation:
l = cwb$getLog()
l$minutes = as.numeric(difftime(l$time, l$time[1], units = "mins"))
library(ggplot2)
# Plot train vs test risk:
ggplot(l, aes(x = minutes)) +
geom_line(aes(y = risk_train, color = "Train risk")) +
geom_line(aes(y = risk_val, color = "Val risk")) +
labs(color = "") + xlab("Minutes") + ylab("Risk")
# Visualize effect LAB_TRIG (no site-specific corrections):
pdata_LAB_TRIG = cwb$featureEffectData("LAB_TRIG")
ggplot(pdata_LAB_TRIG, aes(x = value, y = pred)) +
geom_line()
# Effect of GENDER (just site-specific effects):
pdata_GENDER = cwb$featureEffectData("GENDER")
ggplot(pdata_GENDER, aes(x = value, y = pred, color = server)) +
geom_boxplot() +
facet_grid(~ server) +
guides(color = "none")
datashield.logout(connections)
```
## Citing
To cite `dsCWB` in publications, please use:
> Schalk, D., Bischl, B., & Rügamer, D. (2022). Privacy-Preserving and Lossless Distributed Estimation of High-Dimensional Generalized Additive Mixed Models. arXiv preprint arXiv:2210.07723.
```
@article{schalk2022dcwb,
doi = {10.48550/ARXIV.2210.07723},
url = {https://arxiv.org/abs/2210.07723},
author = {Schalk, Daniel and Bischl, Bernd and Rügamer, David},
title = {Privacy-Preserving and Lossless Distributed Estimation of High-Dimensional Generalized Additive Mixed Models},
publisher = {arXiv},
year = {2022},
copyright = {Creative Commons Attribution 4.0 International}
}
```