-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
125 lines (85 loc) · 3.41 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
---
title: "Homomorphic regression"
author: "Mat Weldon"
date: '2022-05-31'
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
This project tests whether homomorphic encryption, as implemented in the `homomorpheR` package, can be used to solve a simplified data sharing problem.
```{r}
library(homomorpheR)
keyPair <- PaillierKeyPair$new(modulusBits = 256)
```
Set up convenience functions to encrypt and decrypt integers
```{r}
encrypt = function(x,kp=keyPair){
if(length(x)>1){
res = lapply(x,function(y)kp$pubkey$encrypt(y))
}else{
res = kp$pubkey$encrypt(x)
}
return(res)
}
decrypt = function(x,kp=keyPair){
if(length(x)>1){
res = lapply(x,function(y)kp$getPrivateKey()$decrypt(y))
}else{
res = kp$getPrivateKey()$decrypt(x)
}
return(res)
}
```
Under the Paillier system, multiplying two encrypted quantities is equivalent to adding the unencrypted quantities.
```{r}
a = gmp::as.bigz(12738)
b = gmp::as.bigz(48231)
ea = encrypt(a)
eb = encrypt(b)
identical(decrypt(ea*eb), a+b)
```
Additionally, raising an encrypted quantity to some power $n$ is equivalent to multiplying the unencrypted quantity by $n$.
```{r}
identical(decrypt(ea^2),a*2)
```
This means we can calculate the dot product of an encrypted vector with an unencrypted vector.
```{r}
encrypted_dot_product = function(enc_x,y){
N = length(enc_x)
stopifnot(N==length(y))
list_of_products = vector(N,mode="list")
for(i in seq_along(list_of_products)){
list_of_products[[i]] = enc_x[[i]]^y[i]
}
dot_prod = Reduce(`*`,list_of_products)
return(dot_prod)
}
```
```{r}
x = lapply(1:10, function(x) random.bigz(nBits = 16))
encrypted_x = encrypt(x)
numeric_x = sapply(x,as.integer)
y = 1:10
dot_prod = encrypted_dot_product(encrypted_x,y) |> decrypt()
print(sum(numeric_x*y))
print(as.numeric(dot_prod))
```
This semi-encrypted dot product would be immediately useful. It would enable one agency to share encrypted information with another agency, which could then perform computations and return the result. For example, agencies could perform regression without sharing data, except matching id's.
For example, say ONS has a matrix of sensitive variables about households, identified by their address, and energy companies have household energy consumption for those households. ONS could compute and then elementwise encrypt the matrix:
$$
\newcommand{\mX}{\mathbf{X}}
\xi(\mathcal{X}) = \xi\left( \left[\mX^T\mX \right]^{-1} \mX^T\right)
$$
where $\xi(\cdot)$ represents elementwise encryption. The encryption protects each element of $\mathbf{X}$, but additionally pre-multiplying the design matrix with the covariance matrix effectively obfuscates the values of $\mathbf{X}$, especially if there are many columns in the matrix.
Then the energy company could compute:
$$
\newcommand{\mX}{\mathbf{X}}
\xi(\mathcal{X}) \odot \mathbf{y} = \xi(\left[\mX^T\mX \right]^{-1} \mX^T \mathbf{y}) = \xi(\hat{\beta})
$$
where $\odot$ represents the homomorphic matrix product derived from the homomorphic dot product defined above. The energy company would then send the encrypted estimate vector back to ONS, who would decrypt the coefficient estimates and combine them with the estimates from other providers.
This is as far as I've been able to get so far because encryption and decryption can only handle integers or rational numbers.
```{r}
a = 1.245
new_a = encrypt(a) |> decrypt()
```