forked from tombisho/synthetic_bookdown
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path04-harmonisation.Rmd
96 lines (69 loc) · 3.48 KB
/
04-harmonisation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
# Harmonisation with synthetic data {#harmonisation}
In this section we describe how to harmonise synthetic data on the client side. This assumes that you have used one of the previous methods to generate your synthetic data set.
Recall that we are aiming to use synthetic data on the client side to design harmonisation algorithms, and then implement these on Opal on the server side using the real data. This removes the need for the user to have full access to the data. Harmonisation algorithms can be implemented in Opal using MagmaScript (JavaScript with some additional functions) without having full access to the data. The idea is that writing JavaScript on the client side, having full access to the synthetic data, is easier than writing the code on the server side with only access to summaries.
Additional steps for harmonisation after generation of synthetic data are:
1. With the synthetic data on the client side, the user can view the data and develop their code. They will be able to see the how the data changes as the code is run.
2. When the code is complete, it can be run on the serve side using the real data.
In detail, the steps proposed are:
1. Start a JavaScript session on the client side
2. Load the synthetic data into the session
3. Write and test JavaScript code in the session against the synthetic data
4. When happy, copy the code into Opal to generate the harmonised data
```{r echo=FALSE, fig.cap="Prototyping DataSHIELD harmonisation using synthetic data on Javascript"}
knitr::include_graphics(rep("images/dssynthetic_harm.png"))
```
## Getting set up
First we start a JavaScript session and load the additional MagmaScript functionality that is found in Opal. We also load our synthetic data into the JavaScript session.
```{r start V8}
library(V8)
ct2 = v8()
ct2$source("https://raw.githubusercontent.com/tombisho/dsSyntheticClient/main/MagmaScript.min.js")
synth_data = read.csv(file = "data/synth_data.csv")
ct2$assign("synth_data", synth_data)
```
We then go into the JavaScript v8 console.
```{r eval=FALSE}
ct2$console()
```
## Experiment with a single row
A MagmaScript function grabs the first row of data. We can then write some JavaScript to operate on that single row and show the result:
```{javascript}
var $ = MagmaScript.MagmaScript.$.bind(synth_data[0]);
if ($('y3age').value() > 25 ){
out = 1
} else {
out = 0
}
```
## Test on whole dataset
Now we test our code against the whole dataset. This is done by:
1. Defining the script as a string assigned to a variable
2. Execute this script in a loop through each row of data
3. Each time capture the output
```{javascript}
myScript = `
if ($('y3age').value() > 25 ){
out = 1
} else {
out = 0
}
`
var my_out = [];
for (j = 0; j < synth_data.length; j++){
my_out.push(MagmaScript.evaluator(myScript, als_syn[j]))
}
exit
```
And pull the results into R for inspection:
```{r eval=FALSE}
my_out = ct2$get("my_out")
synth_data_harm = synth_data
synth_data_harm$my_var = my_out
```
## Run the code on the real data
If we are happy with the code, we can paste it directly into the Opal *script* interface so that it can be executed on the real data:
```{r echo=FALSE, fig.cap="Script editor in Opal"}
knitr::include_graphics(rep("images/opal_script.PNG"))
```
This will generate a harmonised variable in the view on Opal which can be used in analyses. The summary statistics of the harmonised data can be checked to make sure the harmonisation is working correctly.
A similar process could be conducted in a platform like MOLGENIS.