Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return to a pure Sync API #28

Closed
wjones127 opened this issue Aug 25, 2023 · 1 comment · Fixed by #38
Closed

Return to a pure Sync API #28

wjones127 opened this issue Aug 25, 2023 · 1 comment · Fixed by #38

Comments

@wjones127
Copy link
Collaborator

TDLR: return the API to be synchronous. Make the Rust default TableClient use async in it's implementation, by embedding a reference to a runtime/executor.

Why not async?

  • Async API brings complexity to the FFI layer
  • Async call stacks can be more difficult to debug
  • Offering both a sync and async API will be too much maintenance burden. In prototyping a parallel API, we found there was no way around writing duplicate code. See: feat: parallel sync and async APIs #25

How can TableClient default impl still use async?

Async is most valuable at the IO level to bring IO concurrency, so our TableClient methods should be designed to allow for concurrent IO operations in an async runtime.

For example, consider the FileSystemClient.

pub trait FileSystemClient: Send + Sync {
/// List the paths in the same directory that are lexicographically greater or equal to
/// (UTF-8 sorting) the given `path`. The result should also be sorted by the file name.
async fn list_from(&self, path: &Url) -> DeltaResult<BoxStream<'_, DeltaResult<FileMeta>>>;
/// Read data specified by the start and end offset from the file.
async fn read_files(&self, files: Vec<FileSlice>) -> DeltaResult<Vec<Bytes>>;
}

The list_from method could be a synchronous iterator, but under the hood the implementation could use asynchronous calls to buffer the next page. Similarly, read_files could also be implemented with async calls to read files concurrently.

Implementing correctly

If we want to run async functions inside of sync helpers, we need to submit them to some sort of executor. Thus, a table client that wants to run async tasks needs to hold a reference to an executor itself.

The main danger to be avoided is calling tokio::Runtime::block_on() from within one of the helpers. Users might wrap the kernel high-level APIs in a tokio runtime, and tokio will panic if block_on() is called while inside an async runtime.

We may wish to make it so the user could configure the executor used. For example, they might want to pass down their own tokio runtime.

@mmgeorge
Copy link

I understand why this was done, but I think this change makes it quite difficult to integrate with web clients which cannot block. Not sure if anyone has any tips or has tried this yet? Without support for atomics / shared memory which require certain flags, I think I'm forced to either maintain a separate async fork or try do to something hacky like throwing specific retry errors and caching everything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants