Async usercall interface for SGX enclaves #515

vn971 · 2023-08-28T20:22:48Z

Entering and exiting an SGX enclave is performance costly. It's much more efficient to continue executing within the enclave and communicate with the enclave-runner by passing messages. The tokio runtime can be used for such asynchronous communication.
This PR provides very basic support for this in EDP, but changes to mio and tokio still need to be upstreamed. These changes are fully backwards compatible; your existing enclaves will continue to run as expected.

Credits for this PR go to:
Mohsen: #404
YxC: #441

This commit is an attempt to have the async-usercalls finally merged into the main codebase (master branch).

raoulstrackx

Most things I found are minor, or can/should be picked up as part of another ticket

ipc-queue/src/position.rs

intel-sgx/enclave-runner/src/usercalls/abi.rs

intel-sgx/enclave-runner/src/usercalls/mod.rs

intel-sgx/async-usercalls/src/callback.rs

intel-sgx/async-usercalls/src/lib.rs

ipc-queue/src/position.rs

.travis.yml

ipc-queue/src/position.rs

intel-sgx/async-usercalls/src/io_bufs.rs

intel-sgx/async-usercalls/src/queues.rs

raoulstrackx · 2023-10-16T15:11:28Z

Added one last comment related to MakeSend and ticket #530, but will approve merge once this passes further testing.

DragonDev1906 · 2023-10-16T15:17:40Z

Short Questions (I have not read all the changes):

What exactly is meant with async usercall interface? As far as I can tell the enclave_runner side already uses futures and thus allows making UsercallExtensions with async functions. Does this change only affect the internals of rust-sgx (i.e. not visible from the enclave or runner), does it change the runner interface or does it change how the enclave code can interact with the outside?
Are there examples on how the async usercall interface is used (from the application developer side, if it doesn't just affect internals)?

At the moment I'm not sure what exactly is meant with async usercall interface.

raoulstrackx · 2023-10-16T15:41:44Z

Good question @DragonDev1906, I've updated the description of this PR to make things more clear. Let me know if you still have questions. This PR doesn't have examples, but we'll add some once the changes to mio and tokio have been upstreamed and things are easier to be used.

DragonDev1906 · 2023-10-16T16:29:28Z

Nice, I've had a few issues with dependencies that rely on tokio with the net feature, which made it impossible to use them. Thank you for the clarification.

I do have two more questions (though I'm not sure if this is the right place to ask them):
At the moment I only have sync code in the enclave, with a custom runner (using tokio and handling tls termination where I don't need it in the enclave) responsible for pushing data received from other systems to the enclave. Basically I'm just sending a continuous list of commands with data and process any results returned from the enclave.

Will enclaves without async code, where the runner doesn't have to wait for the enclave to finish before sending the next command benefit from this change? (I think it might be a good idea to not use async code in the enclave (application code) to lower the complexity, at least if the problem can be converted to a list of commands that should be executed), though please correct me if I'm wrong on that part.
The second one is a bit less related to async usercalls, but since you've asked if I had more questions ...
I'm currently contemplating which implementation is best for the above stated situation (goal: High throughput, I'm getting a medium amount of data but also need some computation on it (mainly hashing and signature verification), so I might even end up compute-bound):
1. Communicate via TCP (no custom runner needed), likely slow because it needs to go into kernel space
2. Communicate via the existing Usercall Extensions (hence the question of whether there will be performance benefits in this Situation)
3. Communicate via the async usercall interface (unless that only makes sense when the enclave itself runs async code).
4. Use the enclave in library mode. At the moment I have no idea how to estimate the performance of this approach, as it basically means no async at all (as far as I can tell) and having to wait for the previous call to finish before continuing. It may save on serialization and deserialization (unless that's just done automatically), but I think it gives less flexibility than a (buffered) TCP or (async) usercall extension.

I plan to test the throughput of those options, but perhaps you already have some experience or suggestions which option may be the slowest/most inefficient. Especially the library mode and if such a system would even benefit of the async usercall interface changes. (It could also be useful to have such a comparison of communication options somewhere in the docs).

(so many questions, sorry)

raoulstrackx · 2023-10-17T08:45:10Z

No worries @DragonDev1906

Will enclaves without async code, where the runner doesn't have to wait for the enclave to finish before sending the next command benefit from this change?

No, without changes to your code, this PR doesn't have any impact for you.

... which implementation is best for the above stated situation...
i. Communicate via TCP (no custom runner needed), likely slow because it needs to go into kernel space

If you use the changes in this PR to build an async enclave, your code will be a bit more readable. Biggest change would be that you don't need to enter/exit the enclave to request new commands/return responses. If the enclave is compute expensive, the performance benefit of that may be minimal. Async code works best when it no longer blocks on I/O, but can do something useful while it waits for some event. Based on your description, you may already be doing that with a custom runner.

ii. Communicate via the existing Usercall Extensions

See previous answer

iii. Communicate via the async usercall interface (unless that only makes sense when the enclave itself runs async code).

Yes that only makes sense if the enclave runs async code

iv. Use the enclave in library mode.

That seems unrelated to whether you right sync or async code.

DragonDev1906 · 2023-10-17T09:55:22Z

Biggest change would be that you don't need to enter/exit the enclave to request new commands/return responses.

Just to see if I understood that correctly: The changes in this PR (when using the new async interface) are going to mean that multiple usercalls can/will be batched into a single ECALL (enter/exit), with the ability to use async code to send multiple usercalls without waiting for the response. But there still needs to be at lest one ECALL (for the entire batch) before the runner can process the usercall and the same for the way back, correct?

Just some info if you're interested, @raoulstrackx:

Based on your description, you may already be doing that with a custom runner.

Yeah, my enclave is not waiting for any responses for requests sent out (that's handled outside the enclave) and only blocks while trying to read new commands (currently via TCP) or writing results (also via TCP), but new commands don't depend on previous results unless something goes wrong.

If you use the changes in this PR to build an async enclave, your code will be a bit more readable. [...]
Based on your description, you may already be doing that with a custom runner.

I've thought about implementing in a "enclave requests the data and waits for the response" way, where async usercalls would likely be a big performance benefit and/or be a lot more readable. My conclusion to that was that there is a rather big trade-of:

If the enclave does the requests directly, without an intermediary, it needs to terminate TLS, which is necessary for most use cases but for me the data integrity is provided in the data itself, using hashes, merkle trees and signatures, so TLS termination in the enclave only added complexity.
If I have a simple intermediary that just strips TLS and the enclave sends requests to get some data out (typical async model) the requesting logic would be simpler, as the runner wouldn't have to know what data is needed next, but it hides things a malicious runner could do. Additionally, if there is a bug in the Enclave code (e.g. requesting the wrong data), the enclave code would need to be updated, not just the runner code (in our case updating the runner code is a lot easier).
With the approach I've now gone (which I hopefully won't regret choosing): The runner providing the data and the Enclave just checking validity of it (and if anything is missing) the runner clearly has the ability to decide the order of the data/commands (which he kind of could do anyways, but not as easily) and a change to the order only requires updating the runner code, the enclave code is simpler. It doesn't need much code for network communication, it doesn't need async code, its execution can be deterministic given the input (hard to do with async) and thus makes auditing the enclave code easier, but that comes at additional complexity in the code generating the commands to run and thus a bigger chance for the entire system to stop working until the runner is updated. The main disadvantage is having to split the processing logic from the data fetching.

I'm not yet sure if this architecture is going to bite me at some point. It's good to know that there will be an efficient way to implement it in a "enclave asks for data" way should the need arise to do that because a complete separation of fetching and logic gets too difficult.

raoulstrackx · 2023-10-26T13:53:24Z

@DragonDev1906 sorry I forgot to reply to your comment.

The changes in this PR (when using the new async interface) are going to mean that multiple usercalls can/will be batched into a single ECALL (enter/exit),

Strictly speaking: yes, but I think you misunderstood a bit how EDP is expecting to be used. The idea is to run an entire application in the enclave. So the single ecall you refer to, is coming from the enclave-runner that calls the enclave for the very first time. This eventually leads to the enclave calling your main function within its boundaries. Then all usercalls can be done asynchronously from within the enclave. See also the enclave execution lifecycle

For questions/comments not specifically related to this PR. Let's switch to the #rust-sgx channel in the Runtime-Encryption Slack workspace

intel-sgx/async-usercalls/src/batch_drop.rs

arai-fortanix · 2023-11-27T18:54:13Z

intel-sgx/async-usercalls/src/batch_drop.rs

+        fn make_progress(&self, deferred: &[Identified<Usercall>]) -> usize {
+            let sent = self.core.try_send_multiple_usercalls(deferred);
+            if sent == 0 {
+                self.core.send_usercall(deferred[0]);


I think there's a potential runtime error here if someone calls make_progress with an empty slice for deferred. I don't think there's currently a place that calls this function with an empty slice, but someone could change the code in the future.

Good catch!

arai-fortanix · 2023-11-27T19:08:13Z

intel-sgx/async-usercalls/src/batch_drop.rs

+                return;
+            }
+            let sent = self.make_progress(&self.deferred);
+            let mut not_sent = self.deferred.split_off(sent);


Is there a reason why we're using the methodology here of doing split_off/clear/append rather than using drain(0..sent) or something like a VecDequeue?

I suspect that the answer is that the common case is expected to be that make_progress() should normally be able to process the entire queue, which makes the split_off() and append() effectively no-ops. Which is fine, but it's probably useful to leave a comment for the next person who looks at this code and wonders the same thing.

Yes it works, but your drain(..sent) seems more readable. I'll update.

intel-sgx/async-usercalls/src/batch_drop.rs

arai-fortanix · 2023-11-27T19:30:37Z

intel-sgx/async-usercalls/src/io_bufs.rs

+        }
+        let mut wrote = 0;
+        for buf in bufs {
+            wrote += self.write(buf);


If self.write() ever returns 0, we can break out of the loop. This might be an efficiency concern if the slice of bufs is long and we're frequently writing to a nearly-full buffer.

There is even a strong requirement to immediately return when write ever returns 0; following writes may succeed and thus increase the wrote counter. The user will then incorrectly assume that the first slices were written while the latter ones weren't. That may not be the case.

intel-sgx/async-usercalls/src/lib.rs

arai-fortanix · 2023-11-27T21:06:06Z

intel-sgx/async-usercalls/src/lib.rs

+            n => &returns[..n],
+        };
+        // 2. try to lock the mutex, if successful, receive all pending callbacks and put them in the hash map
+        let mut guard = match self.callbacks.try_lock() {


It's unclear to me why the try_lock() is useful here. It would seem simpler, as efficient, and possibly more correct to write this as something like:

let mut guard = self.callbacks.lock().unwrap(); for (id, cb) in self.callback_rx.try_iter() { guard.insert(id, cb); }

What am I missing?

I think this is meant as an optimization; when self.callbacks.try_lock() fails, it means multiple threads are competing for this resource. The code then avoids receiving more callbacks. I can imagine that self.callback_rx.try_iter() is a potentially time costly operation.
But I don't understand why this is correct in all cases. When the system is under very heavy load, the result of the usercall may be received in step 1 before the callback is added to self.callbacks. This eventually leads to the callback never being called. There's also a memory leak: Eventually the callback is received in step 2, but never gets removed from the hashmap.
@mzohreva Are we missing something?

I think you are right, this should be simplified.

arai-fortanix · 2023-11-27T21:20:22Z

intel-sgx/async-usercalls/src/queues.rs

+        pub fn insert(&mut self, value: T) -> u32 {
+            let initial_id = self.next_id;
+            loop {
+                let id = self.next_id;


There's an extremely unlikely but severe potential performance problem with this approach for allocating ids. Suppose you have 2^32 - 1 calls that never terminate, and then a single other call which is fast but you never have more than one of those outstanding at once. Each time you make the fast call, you'll need to do 2^32 map lookups to find the available id.

I don't think we need to worry about this in practice.

But it does make me wonder if we might want to have some way to limit the length of the call queue to something smaller than 2^32, with either blocking if the queue is full, or switching to making the call synchronously in this case. This might not be useful for DSM, but I can imagine some type of EDP application in which enclave threads could be generating asynchronous calls faster than they could be serviced, and you might want to have the ability to either throttle the enclave threads when the queue gets full or switch to synchronous ocalls, instead of growing the queue without bounds and getting a panic when you run out of memory.

I agree this is a potential problem, but can/should be fixed as a separate issue. @arai-fortanix what do you think?

Fixing later seems fine.

Created issue #550 for this

Credits for this commit go to: Mohsen: fortanix#404 YxC: fortanix#441 This commit is an attempt to have the async-usercalls finally merged into the main codebase (master branch).

Address code review comments

@raoulstrackx

This change ports the old tests from .travis.yml to the new github actions used on the current master branch. @raoulstrackx

(CI failing otherwise)

vn971 force-pushed the vas/async-usercalls-scratch branch from 12728f3 to f917b37 Compare August 29, 2023 06:39

vn971 force-pushed the vas/async-usercalls-scratch branch 5 times, most recently from c2eae70 to 7dfaf7d Compare September 8, 2023 09:17

raoulstrackx reviewed Sep 20, 2023

View reviewed changes

vn971 force-pushed the vas/async-usercalls-scratch branch from 8c87b85 to 9a71137 Compare September 22, 2023 16:21

This was referenced Oct 6, 2023

Performance improvement: synchronous usercalls inside async-usercalls methods #531

Open

Reduce duplication of code in provider_api.rs #532

Open

Export WithId in the Rust compiler and use it in rust-sgx #533

Open

raoulstrackx reviewed Oct 13, 2023

View reviewed changes

DragonDev1906 mentioned this pull request Oct 18, 2023

async enclave.run function that does not create a new tokio runtime #534

Open

raoulstrackx mentioned this pull request Nov 24, 2023

SGX port tokio 1.34.0 fortanix/tokio#2

Closed

arai-fortanix self-requested a review November 27, 2023 18:25

arai-fortanix reviewed Nov 27, 2023

View reviewed changes

intel-sgx/async-usercalls/src/batch_drop.rs Outdated Show resolved Hide resolved

arai-fortanix reviewed Nov 27, 2023

View reviewed changes

arai-fortanix previously approved these changes Nov 27, 2023

View reviewed changes

vn971 force-pushed the vas/async-usercalls-scratch branch from 2394fe3 to 9017716 Compare January 4, 2024 09:33

vn971 dismissed arai-fortanix’s stale review via 273c394 January 15, 2024 17:19

Vasili Novikov added 3 commits January 17, 2024 11:51

Async usercall interface for SGX enclaves

37461c2

Credits for this commit go to: Mohsen: fortanix#404 YxC: fortanix#441 This commit is an attempt to have the async-usercalls finally merged into the main codebase (master branch).

Fix compilation of usercalls/mod.rs

ece6882

Move PositionMonitor to position.rs

80d9c2c

Vasili Novikov and others added 13 commits January 17, 2024 11:51

Brush up / format interface_async

c08d142

Remove old test.sh

fbe9172

Make PositionMonitor safe by using checked overflowing operations

db3354d

Enforce warning-free compilation for async-usercalls

336b467

Code review (async-usercalls)

53ea183

Address code review comments

Code review

79b4078

Code review: reducy Monitor's fields visibility

a4b9ad5

Code review for PositionMonitor and SequentialMap

31089f7

Refactor ReadPosition::is_past

678a920

Async usercalls code review

d1deb0e

Address reviewer comments

5eef62a

Addressed reviewer comments

7bf0522

Fix Cargo.lock

2c49170

vn971 force-pushed the vas/async-usercalls-scratch branch from 273c394 to 2c49170 Compare January 17, 2024 10:52

Add async-usercalls to the github workflow

51ea191

This change ports the old tests from .travis.yml to the new github actions used on the current master branch. @raoulstrackx

vn971 force-pushed the vas/async-usercalls-scratch branch from 2bf3f6c to 51ea191 Compare January 17, 2024 11:30

Vasili Novikov added 2 commits January 17, 2024 14:47

Address latest nightly warnings

5b2d147

Fix usage of the nightly compiler for docs

a8d0d46

(CI failing otherwise)

raoulstrackx approved these changes Jan 18, 2024

View reviewed changes

raoulstrackx added this pull request to the merge queue Jan 18, 2024

Merged via the queue into fortanix:master with commit fe323cb Jan 18, 2024
1 check passed

Taowyoo mentioned this pull request Jan 27, 2024

Cancel queue + async usercall interface #404

Closed

This was referenced Apr 13, 2024

Async usercall interface for SGX enclaves #291

Closed

Cancel queue #286

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async usercall interface for SGX enclaves #515

Async usercall interface for SGX enclaves #515

vn971 commented Aug 28, 2023 •

edited by raoulstrackx

Loading

raoulstrackx left a comment

raoulstrackx commented Oct 16, 2023

DragonDev1906 commented Oct 16, 2023

raoulstrackx commented Oct 16, 2023

DragonDev1906 commented Oct 16, 2023 •

edited

Loading

raoulstrackx commented Oct 17, 2023 •

edited

Loading

DragonDev1906 commented Oct 17, 2023

raoulstrackx commented Oct 26, 2023

arai-fortanix Nov 27, 2023

raoulstrackx Jan 2, 2024

arai-fortanix Nov 27, 2023

raoulstrackx Jan 2, 2024 •

edited

Loading

arai-fortanix Nov 27, 2023

raoulstrackx Jan 3, 2024

arai-fortanix Nov 27, 2023

raoulstrackx Jan 3, 2024 •

edited

Loading

mzohreva Jan 4, 2024

arai-fortanix Nov 27, 2023

arai-fortanix Nov 27, 2023

raoulstrackx Jan 3, 2024

arai-fortanix Jan 3, 2024

raoulstrackx Jan 4, 2024

Async usercall interface for SGX enclaves #515

Async usercall interface for SGX enclaves #515

Conversation

vn971 commented Aug 28, 2023 • edited by raoulstrackx Loading

raoulstrackx left a comment

Choose a reason for hiding this comment

raoulstrackx commented Oct 16, 2023

DragonDev1906 commented Oct 16, 2023

raoulstrackx commented Oct 16, 2023

DragonDev1906 commented Oct 16, 2023 • edited Loading

raoulstrackx commented Oct 17, 2023 • edited Loading

DragonDev1906 commented Oct 17, 2023

raoulstrackx commented Oct 26, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raoulstrackx Jan 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raoulstrackx Jan 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vn971 commented Aug 28, 2023 •

edited by raoulstrackx

Loading

DragonDev1906 commented Oct 16, 2023 •

edited

Loading

raoulstrackx commented Oct 17, 2023 •

edited

Loading

raoulstrackx Jan 2, 2024 •

edited

Loading

raoulstrackx Jan 3, 2024 •

edited

Loading