Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async or sync APIs #19

Closed
kpozin opened this issue Mar 27, 2020 · 21 comments · Fixed by #28
Closed

Async or sync APIs #19

kpozin opened this issue Mar 27, 2020 · 21 comments · Fixed by #28
Assignees
Labels
A-design Area: Architecture or design C-meta Component: Relating to ICU4X as a whole T-docs-tests Type: Code change outside core library
Milestone

Comments

@kpozin
Copy link
Contributor

kpozin commented Mar 27, 2020

Our data loading involves reading files on demand, and sometimes even requesting data from a REST provider over HTTP.

Is the plan to make all the APIs asynchronous? Or to have two versions of each method?

CC: @sffc @zbraniecki

@zbraniecki
Copy link
Member

@Manishearth

I think we should aim to have our I/O async, because it enforces a lot of downstream async decisions, and plugging sync into async is easy. The reverse is hard.

For async to work we mainly have to decide between tokio and async_std. We also have to understand that async is relatively new in Rust.
I expect us to hit complications downstream in some complex scenarios of interactions between lifetimes, iterators, and Futures. Those wrinkles will get smoothed out Rust matures GAT, generators, async iterators etc. but initially it may add some headache.

@nciric
Copy link
Contributor

nciric commented Mar 27, 2020

I think that core OmnICU code shouldn't care about where and how data comes to it. Wrapper layer should take care of sync/async decision.

For example, Wasm port would be wrapped in JS with async code that downloads locale data from the server, and passes it down to Wasm layer for final date formatting.

@zbraniecki
Copy link
Member

I think that core OmnICU code shouldn't care about where and how data comes to it.

I'm not sure if its possible.

If you have any code that does:

let data = getData();
operateOnData(data);

and that code has to be possible to be async, then it trickles down to each and every caller who has to accept an async callsite. There are core algos that are sync input, sync output, sure, but if my constructor takes a DataProvider and that DataProvider potentially is async, then I have to write all of the data dependent functionality as async and my maximize or minimize methods must be async as well.

@sffc
Copy link
Member

sffc commented Mar 27, 2020

I think constructors / factories should be async, and methods should be sync. In other words, you can "await" a new DateTimeFormat, and then when the DateTimeFormat is ready, it has all data that it needs ready to go for the format methods.

@nciric
Copy link
Contributor

nciric commented Mar 27, 2020

Then I would say that your constructors/factories belong to the client wrapper code.

The low level constructor should be:

var x = ugly_DateTimeFormat(locale, options, data);
...
x.format();

The wrapper should get data part from file/server and serve to the sync, base code.

So I agree that some part may have to be async - e.g. Chrome loads all of the data on IO thread, but it doesn't have to go all the way down.

@zbraniecki
Copy link
Member

The low level constructor should be:

Yes, that's one option, but I don't know if this is what we want to do. This basically means that we're writing a fully synchronous API which handles dataprovider only from the outside and expects it to be synchronous later.

That means that the provider has to have all the data at the moment when its fed to the constructor and cannot be augmented during lifetime and cannot be requested for any data update or additional data download.

That may be severely limiting.

@sffc
Copy link
Member

sffc commented Mar 27, 2020

I had discussed the "inline wrapper layer" approach last week (I still need to make an explainer for it), and in that model, it could be possible for data loading to occur exclusively in the wrapper layer. In that case, we could expose user-facing async functions, but the functional exports from OmnICU would be synchronous and have the data already loaded for them.

On the other hand, putting data loading in the wrapper layer makes the wrapper layer even more complex than it already is, and it may be limiting to us down the road in unknown ways.

@nciric
Copy link
Contributor

nciric commented Mar 28, 2020

That means that the provider has to have all the data at the moment when its fed to the constructor and cannot be augmented during lifetime and cannot be requested for any data update or additional data download.

That may be severely limiting.

But aren't we aiming to have immutable objects? Once I create a DateFormatter for sr I don't expect it to change. If we add data for pl after that, it will only affect new objects created with Polish locale.

Or am I misunderstanding the issue?

@sffc
Copy link
Member

sffc commented Mar 28, 2020

I think the question of whether data loading goes into the wrapper layer or deeper in the implementation code is tangential to the question of whether data loading is sync or async. And, even if data loading happens in the wrapper layer, it still requires async APIs in Rust.

@nciric
Copy link
Contributor

nciric commented Mar 28, 2020

I think the question of whether data loading goes into the wrapper layer or deeper in the implementation code is tangential to the question of whether data loading is sync or async. And, even if data loading happens in the wrapper layer, it still requires async APIs in Rust.

I think async requirement was never a question for me. So to answer @kpozin question - async API will be available, from client's perspective. How deep would asyncness go is another question (that we should hash out before implementing first API).

@zbraniecki
Copy link
Member

But aren't we aiming to have immutable objects? Once I create a DateFormatter for sr I don't expect it to change.

That's not necessarily the case. I can see a scenario where you may want to pull additional data from the provider or react to some dynamic event during lifetime of the object by updating data in the provider.

Two scenarios I have in mind:

  1. Formatter learns that it needs currency names for polish - it asks DataProvider to provide them, DataProvider notices that it needs this data lazily and pulls it.

Without that option, all data has to be loaded before the provider gets passed to the constructor. That may be limiting and costly. lazy data loading is powerful and allows for significant perf/mem wins.

  1. Formatter uses some data and during the app lifetime OmnICU gets an updated data from remote source. We'd like to update the data in long-living objects.

Without that option, you need to handle data updates separately and invalidate/reinitiate formatters.

We could handle that in several ways, including recreating instances on data updates, or having just one event to override dataprovider in the instance once the data loads. That may fix (2), but likely will still be insufficient or clunky for (1).

I think we should keep an option open that we'll want to handle async data providers.

@sffc
Copy link
Member

sffc commented Mar 28, 2020

Clients tend to assume calling a format function is synchronous; introducing asynchronicity into every terminal function* call is a huge deal that we shouldn't take lightly.

To respond to Zibi's two scenarios:

  1. Formatter learns that it needs currency names for polish - it asks DataProvider to provide them, DataProvider notices that it needs this data lazily and pulls it.

The formatter should know that it needs currency names for polish in the constructor. The constructor knows the locale, and it knows whether or not currency long names were requested as a formatting options. This is data that should additionally be known by static analysis, as I explained last week.

  1. Formatter uses some data and during the app lifetime OmnICU gets an updated data from remote source. We'd like to update the data in long-living objects.

If the constructor is async, then the data should be loaded before the promise resolves to the immutable formatter object.

I'm not convinced this is a scenario we should necessarily plan for, but if you did want to do it, you still can, externally to OmnICU: you can wait for an event to fire when new data is available, and then build new formatters and swap them into place, replacing the old formatters.

Also: even if we wanted to cover this scenario within OmnICU, I think doesn't actually require async terminal functions*; the function can check synchronously if updated data is available in a cache somewhere, and then swap it in and use it, all synchronously.

* A "terminal function" is a method that you call on a constructed OmnICU object, like .format(), .select(), or .segment().

@sffc
Copy link
Member

sffc commented Mar 28, 2020

(updated my previous post a bit)

To be clear, I'm all in favor of async data providers; I'm just saying that we should assume we no longer need the data providers after the factory methods return. During the construction phase, OmnICU should get all the data it might need (based on the input options), and store it in the memory space associated with the formatter instance such that terminal method calls can be fast and synchronous.

@zbraniecki
Copy link
Member

Clients tend to assume calling a format function is synchronous; introducing asynchronicity into every terminal function* call is a huge deal that we shouldn't take lightly.

I am not suggesting that.

The formatter should know that it needs currency names for polish in the constructor.

I can see scenarios where ability to lazily pull data would allow us to produce better quality software. It may not require us to make all methods asynchronous tho.

you can wait for an event to fire when new data is available, and then build new formatters and swap them into place, replacing the old formatters.

Right, or we could allow for updating providers inside the formatter:

async fn on_update() {
  let new_provider = fetch_provider().await;
  formatter.set_provider(new_provider);
}

right?

@zbraniecki
Copy link
Member

This leaves one scenario - lazy data loading. Seems like Shane and Nebojsa are of the opinion that the provider should have all data needed for the constructor available synchronously at construction time.

I'm suggestion that there may be a value in having ability to lazily fetch additional data. Not sure yet how it would work, and it may end up being on the wrapper, but something like:

async fn create_currency_formatter() -> NumberFormat {
  data_provider.ensure_currency().await;
  let nf = NumberFormat::new_with_provider(&data_provider);
  return nf;
}

@sffc
Copy link
Member

sffc commented Mar 28, 2020

you can wait for an event to fire when new data is available, and then build new formatters and swap them into place, replacing the old formatters.

Right, or we could allow for updating providers inside the formatter:

async fn on_update() {
  let new_provider = fetch_provider().await;
  formatter.set_provider(new_provider);
}

right?

I think Scenario 2 is something that can be built on top of OmnICU pretty easily with the model of async constructor, sync terminal. For example (apologies if my syntax is not correct; still learning):

// OmnICU code:

struct ICUCurrencyFormatter {
  data: /* a type that includes all data required for this particular formatter */
}

async fn create_icu_currency_formatter(data_provider: &dyn DataProvider) {
  let data = data_provider.get_currency_data().await;
  return ICUCurrencyFormatter { data: data };
}

// Client code:

struct AutoUpdatingCurrencyFormatter {
  p: DataProvider;
  f: Option<ICUCurrencyFormatter>;
}

impl AutoUpdatingCurrencyFormatter {
  async fn format(&self, num: i64) -> Result<Err, String> {
    if (self.f == None || self.p.has_new_data()) {
      self.f = create_icu_currency_formatter(self.p).await;
    }
    return self.f.format(i64);
  }
}

// Example call site:
data_provider: MyDataProvider = // ...
auto_fmt = AutoUpdatingCurrencyFormatter {
  p: data_provider
};
let result = auto_fmt.format(100).await;

This leaves one scenario - lazy data loading. Seems like Shane and Nebojsa are of the opinion that the provider should have all data needed for the constructor available synchronously at construction time.

Not exactly; I'm of the opinion that the constructor should be async such that it can use an async data provider. Once the constructor resolves, it should no longer need any more data from the async data provider, and the object can become synchronous from that point forward. (Thus, "async factory, sync terminal")

I'm suggestion that there may be a value in having ability to lazily fetch additional data. Not sure yet how it would work, and it may end up being on the wrapper, but something like:

async fn create_currency_formatter() -> NumberFormat {
  data_provider.ensure_currency().await;
  let nf = NumberFormat::new_with_provider(&data_provider);
  return nf;
}

Yes, that is an example of an async factory method loading data and passing it synchronously into OmnICU. Cira suggested something similar in #19 (comment). This is one possible model we could adopt. Another model is if we put the data resolution a little deeper in the constructor code, rather than up front in the wrapper layer.

@Manishearth
Copy link
Member

I think if possible all asyncness should be on the provider, and totally up to the provider on how to make work. So individual providers may have pluggable networking hooks, or whatever.

We should not be picking async-std or tokio here, at best we should have APIs that use Futures, letting users pick whichever framework they'd like. I'd point out that async IO frameworks like async-std and tokio are mostly useful when you're in a web server context, for applications they're less useful and really just spawning a thread to do the fetch should be enough.

@hagbard
Copy link
Contributor

hagbard commented Apr 2, 2020

I'm 100% in agreement about not having any async or risk of failure in the fetching of individual pieces of data".

Yes, this means that a message formatter will need to be constructed with all knowledge of any data it might need (which will affect the API for constructing it) but there's really no other option, since lazy loading can always fail, and failure isn't something the APIs should expose to users at that level (e.g. "You 20th message formatter call failed inside the UI thread because data wasn't available for that one locale" - no business logic will have a good response to this).

Failure of the data suppliers is all about a failure of system configuration. It's not in the hands of business logic, so business logic shouldn't be expected to code around it.

When constructing these things we should think about returning some semantically understandable information about the available data (e.g. ask for Polish formatter, get English - code will not fail at runtime and it's not likely the business logic can do better, but you might want to apologise to the end user). Just returning "this is the locale of the data we have" isn't great since there's a gulf between "pt_PT/RB" and "en_GB/US". Was it a "good" match or a "bad" match is probably the sort of level an end user will care about.

@zbraniecki
Copy link
Member

I'm happy with the conclusion here. Let's focus on keeping async to Provider.

@nciric
Copy link
Contributor

nciric commented Apr 15, 2020

Should we close this one as resolved and move discussion to actual proposals for providers?

@sffc
Copy link
Member

sffc commented Apr 15, 2020

OK. I think there's not much more to discuss here. I mention in #28 that sync/async is up to the host language and data provider.

@sffc sffc closed this as completed Apr 15, 2020
@sffc sffc linked a pull request Apr 17, 2020 that will close this issue
@sffc sffc added the T-docs-tests Type: Code change outside core library label Apr 17, 2020
@sffc sffc added A-design Area: Architecture or design C-meta Component: Relating to ICU4X as a whole labels May 7, 2020
@sffc sffc self-assigned this May 7, 2020
@sffc sffc added this to the 2020 Q2 milestone Jun 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-design Area: Architecture or design C-meta Component: Relating to ICU4X as a whole T-docs-tests Type: Code change outside core library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants