added NewArchitecture but still needs a visual #330

njriasan · 2019-03-06T21:49:59Z

This is the template for the new architecture. May want to wait on me to make a visual but I wanted to know if you had any base images you wanted me to work with first.

shankari

Looks pretty good overall. Just some minor typos and clarifications. Will wait for feedback from @jf87 before merging

shankari · 2019-04-01T18:16:14Z

docs/future_work/NewArchitecture.md

+## Overview
+
+The plans to change the e-mission architecture are oriented around keeping user data encrypted and only decrypting the data when an approved service or algorithm needs to run on the data. The general workflow for maintaining detail is:
+1. The user collects data from the application. This application uses a phone specific private key to encrypt the data and sends the encrypted data to the server.


I've been thinking about this, and at least for this set of requirements and implementation, I believe we don't need a private key (e.g. part of a public-private keypair). We just need a secret key (can be symmetric) that is only known by the user.

I think you want to have some similar schema like in PGP, where we create some random symmetric key which we use to encrypt the data with AES or similar. Then we can use the public key of the server to encrypt that key so that the server can securely decrypt the key and then can use this key to decrypt the actual data. Also in such way you avoid that a compromise of the single secret key will compromise all data.

Also, I would add an overview figure which displays the main components and their interactions. Then it's easier to follow the steps you describe.

shankari · 2019-04-01T18:17:57Z

docs/future_work/NewArchitecture.md

+1. The user collects data from the application. This application uses a phone specific private key to encrypt the data and sends the encrypted data to the server.
+2. The user finds an algorithm which they wish to run on their data or an aggregating algorithm in which they comfortable participating. The user then acquires the hash for this algorithm (possibly with a QR code) and updates their profile on the server to grant permissions to run the algorithm.
+3. The user decides they want to run one of the algorithms they have approved. To do so they need to send their private key to the server so that it can decrypt their stored data. This is done by spawning a user enclave built through Graphene SGX running in a docker container. The user then remotely attests this container and once this establishes a secure between the user and enclave, the user transmits the private key over that channel.
+4. The server enclave uses the hash for the algorithm to determine a microservice to run on the server (or remotely). This then spawns a microservice enclave, which the server enclave will need to attest to develop a secure channel.


and by "server enclave" here, you mean the "user enclave" that you talked about earlier, right?

because you also call it "secure enclave" in the next bullet point

It seems there is no need to send the key. My assumption is that we have SGX to store the keys to the data securely and store permissions. If any application wants to access user data, it calls some API on the server, then we can go through SGX to verify if the application should have access at all or on which granularity it should have access. It's then the responsibility of the application to make use of the data that it gets.

Also instead of hash, it might also be possible to use WAVE?

shankari · 2019-04-01T18:33:59Z

docs/future_work/NewArchitecture.md

+
+#### Aggregate Algorithms
+
+It is also possible for a user to agree to be a participant in algorithms that aggregate over larger groups of data. This requires a few changes to the architecture and a different form of interaction. First to facility these algorithms that are not requested by the user it is necessary to have the server enclave available even when a user is offline. To do this we will keep the server enclave running with the private key and the user profile and only shut down the enclave upon request from the user or if it necessary to update details about the profile or key in a manner which modifies existing behavior. 


to facility -> to facilitate?

since this assumes SGX, can't we use sealing to suspend the server enclaves when they are not actively being used?

shankari · 2019-04-01T18:50:43Z

docs/future_work/NewArchitecture.md

+4. The server enclave uses the hash for the algorithm to determine a microservice to run on the server (or remotely). This then spawns a microservice enclave, which the server enclave will need to attest to develop a secure channel.
+5. The server enclave sends the data to the microservice to use in conducting its algorithm. In doing so, the server enclave will decrypt the data inside the secure enclave and then transmit the data over the secure channel formed between the enclaves.
+6. The microservice performs the algorithm and returns to the server enclave the output of the algorithm.
+7. The server enclave then returns to the user the result of running the algorithm.


I think that there should also be the option for the server enclave to store the results back to the encrypted datastore. The current algorithms do this (e.g. store the results of running the pipeline under different keys)

So does this mean that the algorithm will run outside SGX? How do we protect user data when it is processed by an algorithm?

shankari · 2019-04-01T18:59:21Z

docs/future_work/NewArchitecture.md

+
+1. The user's smartphone makes a request to a known access location (essential a server at a known domain) with a request to spawn a user cloud instance.
+2. The known access location spawns a container to produce a "user cloud." This user cloud consists of a server running inside a secure enclave via Graphene. The known access location then replies to the smartphone with an address and port of the spawned user cloud.
+3. The smart phone connects to the known access location. The two establish a secure channel through SGX's remote attestation. All user cloud will run the same general program, so this component is trusted to only allow a new user to connect once at the beginning. While the known access location is untrusted the user cloud code's will be open source and its hash known, allowing us to verify the connection. Then the smartphone will send its private key and profile of allowed algorithm to the user cloud.


this should be "The smart phone connects to the spawned user cloud", right?

shankari · 2019-04-01T19:00:14Z

docs/future_work/NewArchitecture.md

+2. The known access location spawns a container to produce a "user cloud." This user cloud consists of a server running inside a secure enclave via Graphene. The known access location then replies to the smartphone with an address and port of the spawned user cloud.
+3. The smart phone connects to the known access location. The two establish a secure channel through SGX's remote attestation. All user cloud will run the same general program, so this component is trusted to only allow a new user to connect once at the beginning. While the known access location is untrusted the user cloud code's will be open source and its hash known, allowing us to verify the connection. Then the smartphone will send its private key and profile of allowed algorithm to the user cloud.
+4. The user sends some data to the user cloud that it wishes to store over the established secure connection.
+5. The user cloud spawns the user's database instance as a container and provides the instance with the private key. The instance can be any paricular database which runs on a section of a distributed file system reserved just for the user (so all contents can be encrypted with the user's private key).


*particular

shankari · 2019-04-01T19:02:19Z

docs/future_work/NewArchitecture.md

+5. The user cloud spawns the user's database instance as a container and provides the instance with the private key. The instance can be any paricular database which runs on a section of a distributed file system reserved just for the user (so all contents can be encrypted with the user's private key).
+6. The user cloud sends the data to the database instance. This database instance will then store the data encrypted with the private key.
+
+Steps 1-3 constitute the process of launching a user cloud. If the user cloud is already running then in step 2 rather than launch a new user cloud the known access location should just return the address of the user's user cloud which is already running (which it should be possible to authenticate, although we may want to produce some shared secret for existing user clouds).


do we need to authenticate? Since the known access location is untrusted, it can return an arbitrary user cloud. The phone should attest the user cloud before it communicates with it, presumably through the cert mechanism.

shankari · 2019-04-01T19:03:43Z

docs/future_work/NewArchitecture.md

+Steps 1-3 constitute the process of launching a user cloud. If the user cloud is already running then in step 2 rather than launch a new user cloud the known access location should just return the address of the user's user cloud which is already running (which it should be possible to authenticate, although we may want to produce some shared secret for existing user clouds).
+Step 5 launches a database instance. It will likely be necessary to keep the database running for much of the life of the user cloud. This step may instead consist of resuming the container or can be skipped if it is actively running.
+
+Below are diagrams showing a visual of the stages numbered with the appropriate steps. Untrusted entities are in pink while the trusted components are light green.


why does the user cloud need to see encrypted data (in "Architecture after a user cloud is spawned.")? If we are using the private key to decrypt the contents of the filesystem, the database, and the user cloud can directly read encrypted data from it, right?

shankari · 2019-04-01T19:14:11Z

docs/future_work/NewArchitecture.md

+
+### Working with a Subset of Data
+
+Another challenge is how to give algorithms approval for only a subset of data. For example imagine I wanted to give an algorithm access to all my travel data for only the previous month. The biggest challenge in this domain is managing the complexity it produces. Do we want to use a unique key for each permission category? What if a subset of data is approved for some algorithms but not others? What happens if a key is lost or needs to be changed? Ultimately we hope many of these issues can be avoided by our implicit trust in the data fetching server enclave, but we do have concerns about inflating its size and complexity given the important data it manages.


I think that this is essentially the problem that WAVE (David's other student M. Andersen) is trying to solve. Once we get there, we should definitely explore WAVE

Ok, I also commented regarding WAVE above ;-)

jf87

I just went through all.
See my comments.

jf87 · 2019-04-03T09:47:17Z

docs/future_work/NewArchitecture.md

+## Overview
+
+The plans to change the e-mission architecture are oriented around keeping user data encrypted and only decrypting the data when an approved service or algorithm needs to run on the data. The general workflow for maintaining detail is:
+1. The user collects data from the application. This application uses a phone specific private key to encrypt the data and sends the encrypted data to the server.


I think you want to have some similar schema like in PGP, where we create some random symmetric key which we use to encrypt the data with AES or similar. Then we can use the public key of the server to encrypt that key so that the server can securely decrypt the key and then can use this key to decrypt the actual data. Also in such way you avoid that a compromise of the single secret key will compromise all data.

jf87 · 2019-04-03T09:48:18Z

docs/future_work/NewArchitecture.md

+## Overview
+
+The plans to change the e-mission architecture are oriented around keeping user data encrypted and only decrypting the data when an approved service or algorithm needs to run on the data. The general workflow for maintaining detail is:
+1. The user collects data from the application. This application uses a phone specific private key to encrypt the data and sends the encrypted data to the server.


Also, I would add an overview figure which displays the main components and their interactions. Then it's easier to follow the steps you describe.

jf87 · 2019-04-03T09:55:43Z

docs/future_work/NewArchitecture.md

+
+The plans to change the e-mission architecture are oriented around keeping user data encrypted and only decrypting the data when an approved service or algorithm needs to run on the data. The general workflow for maintaining detail is:
+1. The user collects data from the application. This application uses a phone specific private key to encrypt the data and sends the encrypted data to the server.
+2. The user finds an algorithm which they wish to run on their data or an aggregating algorithm in which they comfortable participating. The user then acquires the hash for this algorithm (possibly with a QR code) and updates their profile on the server to grant permissions to run the algorithm.


Maybe each data segment that is sent to the server can be associated with some permissions. By default the user's smatphone has full access. Then when we allow access to other applications (algorithms), we can amend these permissions. This would allow us to have permissions on smaller granularity (e.g., one trip or one day of data) and not just either full access or none.

jf87 · 2019-04-03T10:02:48Z

docs/future_work/NewArchitecture.md

+1. The user collects data from the application. This application uses a phone specific private key to encrypt the data and sends the encrypted data to the server.
+2. The user finds an algorithm which they wish to run on their data or an aggregating algorithm in which they comfortable participating. The user then acquires the hash for this algorithm (possibly with a QR code) and updates their profile on the server to grant permissions to run the algorithm.
+3. The user decides they want to run one of the algorithms they have approved. To do so they need to send their private key to the server so that it can decrypt their stored data. This is done by spawning a user enclave built through Graphene SGX running in a docker container. The user then remotely attests this container and once this establishes a secure between the user and enclave, the user transmits the private key over that channel.
+4. The server enclave uses the hash for the algorithm to determine a microservice to run on the server (or remotely). This then spawns a microservice enclave, which the server enclave will need to attest to develop a secure channel.


It seems there is no need to send the key. My assumption is that we have SGX to store the keys to the data securely and store permissions. If any application wants to access user data, it calls some API on the server, then we can go through SGX to verify if the application should have access at all or on which granularity it should have access. It's then the responsibility of the application to make use of the data that it gets.

jf87 · 2019-04-03T10:05:14Z

docs/future_work/NewArchitecture.md

+1. The user collects data from the application. This application uses a phone specific private key to encrypt the data and sends the encrypted data to the server.
+2. The user finds an algorithm which they wish to run on their data or an aggregating algorithm in which they comfortable participating. The user then acquires the hash for this algorithm (possibly with a QR code) and updates their profile on the server to grant permissions to run the algorithm.
+3. The user decides they want to run one of the algorithms they have approved. To do so they need to send their private key to the server so that it can decrypt their stored data. This is done by spawning a user enclave built through Graphene SGX running in a docker container. The user then remotely attests this container and once this establishes a secure between the user and enclave, the user transmits the private key over that channel.
+4. The server enclave uses the hash for the algorithm to determine a microservice to run on the server (or remotely). This then spawns a microservice enclave, which the server enclave will need to attest to develop a secure channel.


Also instead of hash, it might also be possible to use WAVE?

jf87 · 2019-04-03T10:12:09Z

docs/future_work/NewArchitecture.md

+4. The server enclave uses the hash for the algorithm to determine a microservice to run on the server (or remotely). This then spawns a microservice enclave, which the server enclave will need to attest to develop a secure channel.
+5. The server enclave sends the data to the microservice to use in conducting its algorithm. In doing so, the server enclave will decrypt the data inside the secure enclave and then transmit the data over the secure channel formed between the enclaves.
+6. The microservice performs the algorithm and returns to the server enclave the output of the algorithm.
+7. The server enclave then returns to the user the result of running the algorithm.


So does this mean that the algorithm will run outside SGX? How do we protect user data when it is processed by an algorithm?

jf87 · 2019-04-03T10:18:49Z

docs/future_work/NewArchitecture.md

+#### Aggregate Algorithms
+
+It is also possible for a user to agree to be a participant in algorithms that aggregate over larger groups of data. This requires a few changes to the architecture and a different form of interaction. First to facility these algorithms that are not requested by the user it is necessary to have the server enclave available even when a user is offline. To do this we will keep the server enclave running with the private key and the user profile and only shut down the enclave upon request from the user or if it necessary to update details about the profile or key in a manner which modifies existing behavior. 
+Since aggregation also occurs independent of user requests it is no longer feasible to have the server enclave launch a microservice. Instead the group intending to perform aggregation with launch an aggregator enclave which will launch a new enclave per user which produces a scalar value based upon the user's data. That scalar enclave will communicate directly with the server enclave to get the data and will need to be stored in the profile. Then this scalar can be directly communicated to the aggregator enclave to compute the aggregate result over the data.


Why can we not calculate this scalar in the user enclave itself?
In my mind, user enclaves can provide the same interface to user initiated algorithms and aggregate algorithms initiated by 3rd parties. In both cases there is some request to the user-enclave for some data range and with some algorithm ID. Then the permissions stored in the user-enclave will ensure correct response. Does this make sense?

jf87 · 2019-04-03T10:20:21Z

docs/future_work/NewArchitecture.md

+
+### Working with a Subset of Data
+
+Another challenge is how to give algorithms approval for only a subset of data. For example imagine I wanted to give an algorithm access to all my travel data for only the previous month. The biggest challenge in this domain is managing the complexity it produces. Do we want to use a unique key for each permission category? What if a subset of data is approved for some algorithms but not others? What happens if a key is lost or needs to be changed? Ultimately we hope many of these issues can be avoided by our implicit trust in the data fetching server enclave, but we do have concerns about inflating its size and complexity given the important data it manages.


Ok, I also commented regarding WAVE above ;-)

shankari · 2019-04-15T21:32:31Z

Merging this for now to make it easier to read.

njriasan added 5 commits March 6, 2019 13:46

added NewArchitecture but still needs a visual

a652ac3

added images and detailed steps

8f70714

trying an img fix

71934e3

cleaned up images a bit

b57138e

added inline html for centering

5e80ef9

shankari reviewed Apr 1, 2019

View reviewed changes

jf87 reviewed Apr 3, 2019

View reviewed changes

PatGendre mentioned this pull request Apr 5, 2019

user requirements related to "self data" #364

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added NewArchitecture but still needs a visual #330

added NewArchitecture but still needs a visual #330

njriasan commented Mar 6, 2019

shankari left a comment

shankari Apr 1, 2019

jf87 Apr 3, 2019

jf87 Apr 3, 2019

shankari Apr 1, 2019

shankari Apr 1, 2019

jf87 Apr 3, 2019

jf87 Apr 3, 2019

shankari Apr 1, 2019

shankari Apr 1, 2019

shankari Apr 1, 2019

jf87 Apr 3, 2019

shankari Apr 1, 2019

shankari Apr 1, 2019

shankari Apr 1, 2019

shankari Apr 1, 2019

shankari Apr 1, 2019

jf87 Apr 3, 2019

jf87 left a comment

jf87 Apr 3, 2019

jf87 Apr 3, 2019

jf87 Apr 3, 2019

jf87 Apr 3, 2019

jf87 Apr 3, 2019

jf87 Apr 3, 2019

jf87 Apr 3, 2019

jf87 Apr 3, 2019

shankari commented Apr 15, 2019


		#### Aggregate Algorithms

		It is also possible for a user to agree to be a participant in algorithms that aggregate over larger groups of data. This requires a few changes to the architecture and a different form of interaction. First to facility these algorithms that are not requested by the user it is necessary to have the server enclave available even when a user is offline. To do this we will keep the server enclave running with the private key and the user profile and only shut down the enclave upon request from the user or if it necessary to update details about the profile or key in a manner which modifies existing behavior.


		### Working with a Subset of Data

		Another challenge is how to give algorithms approval for only a subset of data. For example imagine I wanted to give an algorithm access to all my travel data for only the previous month. The biggest challenge in this domain is managing the complexity it produces. Do we want to use a unique key for each permission category? What if a subset of data is approved for some algorithms but not others? What happens if a key is lost or needs to be changed? Ultimately we hope many of these issues can be avoided by our implicit trust in the data fetching server enclave, but we do have concerns about inflating its size and complexity given the important data it manages.

added NewArchitecture but still needs a visual #330

Are you sure you want to change the base?

added NewArchitecture but still needs a visual #330

Conversation

njriasan commented Mar 6, 2019

shankari left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jf87 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shankari commented Apr 15, 2019