-
Notifications
You must be signed in to change notification settings - Fork 9
/
xportr.Rmd
282 lines (211 loc) · 9.29 KB
/
xportr.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
---
title: "Getting Started"
output:
rmarkdown::html_vignette:
toc: true
check_title: TRUE
vignette: >
%\VignetteIndexEntry{Getting Started}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = " "
)
library(DT)
options(cli.num_colors = 1)
```
```{r, include=FALSE}
options(width = 60)
local({
hook_output <- knitr::knit_hooks$get("output")
knitr::knit_hooks$set(output = function(x, options) {
if (!is.null(options$max.height)) {
options$attr.output <- c(
options$attr.output,
sprintf('style="max-height: %s;"', options$max.height)
)
}
hook_output(x, options)
})
})
```
```{r, include=FALSE}
knitr::knit_hooks$set(output = function(x, options) {
if (!is.null(options$max_height)) {
paste('<pre style = "max-height:',
options$max_height,
'; float: left; width: 775px; overflow-y: auto;">',
x, "</pre>",
sep = ""
)
} else {
x
}
})
```
```{r, include=FALSE}
datatable_template <- function(input_data) {
datatable(
input_data,
rownames = FALSE,
options = list(
autoWidth = FALSE,
scrollX = TRUE,
pageLength = 5,
lengthMenu = c(5, 10, 15, 20)
)
) %>%
formatStyle(
0,
target = "row",
color = "black",
backgroundColor = "white",
fontWeight = "500",
lineHeight = "85%",
fontSize = ".875em" # same as code
)
}
```
# Getting Started with xportr
The demo will make use of a small `ADSL` dataset available with the `xportr` package and has the following features:
* 306 observations
* 51 variables
* Data types other than character and numeric
* Missing labels on variables
* Missing label for data set
* Order of variables not following specification file
* Formats missing
To create a fully compliant v5 xpt `ADSL` dataset, that was developed using R, we will need to apply the 6 main functions within the `xportr` package:
* `xportr_type()`
* `xportr_length()`
* `xportr_order()`
* `xportr_format()`
* `xportr_label()`
* `xportr_write()`
```{r, eval = TRUE, message = FALSE, warning = FALSE}
# Loading packages
library(dplyr)
library(labelled)
library(xportr)
library(readxl)
# Loading in our example data
data("adsl_xportr", package = "xportr")
```
```{r, echo = FALSE}
datatable_template(adsl_xportr)
```
# Preparing your Specification Files
In order to make use of the functions within `{xportr}` you will need to create an R data frame that contains your specification file. You will most likely need to do some pre-processing of your spec sheets after loading in the spec files for them to work appropriately with the `xportr` functions. Please see our example spec sheets in `system.file(file.path("specs", "ADaM_spec.xlsx"), package = "xportr")` to see how `xportr` expects the specification sheets.
```{r}
var_spec <- read_xlsx(
system.file(file.path("specs/", "ADaM_spec.xlsx"), package = "xportr"),
sheet = "Variables"
) %>%
rename(type = "Data Type") %>%
rename_with(tolower)
```
Below is a quick snapshot of the specification file pertaining to the `ADSL` data set, which we will make use of in the 6 `{xportr}` function calls below. Take note of the order, label, type, length and format columns.
```{r, echo = FALSE, eval = TRUE}
var_spec_view <- var_spec %>%
filter(dataset == "ADSL")
datatable_template(var_spec_view)
```
# xportr_type()
**NOTE:** We make use of `str()` to expose the attributes (length, labels, formats, type)
of the datasets. We have suppressed these calls for the sake of brevity.
In order to be compliant with transport v5 specifications an `xpt` file can only have two data types: character and numeric/dbl. Currently the `ADSL` data set has chr, dbl, time, factor and date.
```{r, max_height = "200px", echo = FALSE}
str(adsl_xportr)
```
Using `xportr_type()` and the supplied specification file, we can *coerce* the variables in the `ADSL` set to be either numeric or character.
```{r, echo = TRUE}
adsl_type <- xportr_type(adsl_xportr, var_spec, domain = "ADSL", verbose = "message")
```
Now all appropriate types have been applied to the dataset as seen below.
```{r, max_height = "200px", echo = FALSE}
str(adsl_type)
```
# xportr_length()
Next we can apply the lengths from a variable level specification file to the data frame. `xportr_length()` will identify variables that are missing from your specification file. The function will also alert you to how many lengths have been applied successfully. Before we apply the lengths lets verify that no lengths have been applied to the original dataframe.
```{r, max_height = "200px", echo = FALSE}
str(adsl_xportr)
```
No lengths have been applied to the variables as seen in the printout - the lengths would be in the `attr()` part of each variables. Let's now use `xportr_length()` to apply our lengths from the specification file.
```{r}
adsl_length <- adsl_xportr %>% xportr_length(var_spec, domain = "ADSL", verbose = "message")
```
```{r, max_height = "200px", echo = FALSE}
str(adsl_length)
```
Note the additional `attr(*, "width")=` after each variable with the width. These have been directly applied from the specification file that we loaded above!
# xportr_order()
Please note that the order of the `ADSL` variables, see above, does not match the specification file `order` column. We can quickly remedy this with a call to `xportr_order()`. Note that the variable `SITEID` has been moved as well as many others to match the specification file order column. Variables not in the spec are moved to the end of the data and a message is written to the console.
```{r, echo = TRUE}
adsl_order <- xportr_order(adsl_xportr, var_spec, domain = "ADSL", verbose = "message")
```
```{r, echo = FALSE}
datatable_template(adsl_order)
```
# xportr_format()
Now we apply formats to the dataset. These will typically be `DATE9.`, `DATETIME20` or `TIME5`, but many others can be used. Notice that in the `ADSL` dataset there are 8 Date/Time variables and they are missing formats. Here we just take a peak at a few `TRT` variables, which have a `NULL` format.
```{r, max_height = "200px", echo = FALSE}
adsl_fmt_pre <- adsl_xportr %>%
select(TRTSDT, TRTEDT, TRTSDTM, TRTEDTM)
tribble(
~Variable, ~Format,
"TRTSDT", attr(adsl_fmt_pre$TRTSDT, which = "format"),
"TRTEDT", attr(adsl_fmt_pre$TRTEDT, which = "format"),
"TRTSDTM", attr(adsl_fmt_pre$TRTSDTM, which = "format"),
"TRTEDTM", attr(adsl_fmt_pre$TRTEDTM, which = "format")
)
```
Using our `xportr_format()` we can apply our formats to the dataset.
```{r}
adsl_fmt <- adsl_xportr %>% xportr_format(var_spec, domain = "ADSL")
```
```{r, max_height = "200px", echo = FALSE}
adsl_fmt_post <- adsl_fmt %>%
select(TRTSDT, TRTEDT, TRTSDTM, TRTEDTM)
tribble(
~Variable, ~Format,
"TRTSDT", attr(adsl_fmt_post$TRTSDT, which = "format"),
"TRTEDT", attr(adsl_fmt_post$TRTEDT, which = "format"),
"TRTSDTM", attr(adsl_fmt_post$TRTSDTM, which = "format"),
"TRTEDTM", attr(adsl_fmt_post$TRTEDTM, which = "format")
)
```
**NOTE:** You can use `attr(data$variable, which = "format")` to inspect formats applied
to a dataframe. The above output has these individual calls bound together for easier viewing.
# xportr_label()
Please observe that our `ADSL` dataset is missing many variable labels. Sometimes these labels can be lost while using R's function. However, a CDISC compliant data set needs to have each variable with a label.
```{r, max_height = "200px", echo = FALSE}
adsl_no_lbls <- haven::zap_label(adsl_xportr)
str(adsl_no_lbls)
```
Using the `xport_label` function we can take the specifications file and label all the variables available. `xportr_label` will produce a warning message if you the variable in the data set is not in the specification file.
```{r}
adsl_lbl <- adsl_xportr %>% xportr_label(var_spec, domain = "ADSL", "message")
```
```{r, max_height = "200px"}
str(adsl_lbl)
```
# xportr_write()
Finally, we arrive at exporting the R data frame object as a `xpt` file with `xportr_write()`. The `xpt` file will be written directly to your current working directory. To make it more interesting, we have put together all six functions with the magrittr pipe, `%>%`. A user can now apply types, length, variable labels, formats, data set label and write out their final xpt file in one pipe! Appropriate warnings and messages will be supplied to a user to the console for any potential issues before sending off to standard clinical data set validator application or data reviewers.
```{r}
adsl_xportr %>%
xportr_type(var_spec, "ADSL", "message") %>%
xportr_length(var_spec, "ADSL", verbose = "message") %>%
xportr_label(var_spec, "ADSL", "message") %>%
xportr_order(var_spec, "ADSL", "message") %>%
xportr_format(var_spec, "ADSL") %>%
xportr_write("adsl.xpt")
```
That's it! We now have a `xpt` file created in R with all appropriate types, lengths, labels, ordering and formats from our specification file. If you are interested in exploring more of the custom
warnings and error messages as well as more background on `xpt` generation be sure
to check out the [Deep Dive](deepdive.html) User Guide.
As always, we welcome your feedback. If you spot a bug, would like to
see a new feature, or if any documentation is unclear - submit an issue
on [xportr's GitHub page](https://github.com/atorus-research/xportr/issues).