Take loader_factory instead of loader in `JsonldOptions` #143

damooo · 2023-11-06T16:12:08Z

Current JsonLdOptions<L> takes a loader of type L, and wraps it in a Mutex. It effectively makes jsonld parser non concurrent, as every concurrent op have to wait anyway till others complete.

Instead, it will be great to take a loader-factory fn() => L, that can return fresh loader on every call. As most loaders can be cheap to clone or construct, it will enable concurrency without much cost.

The text was updated successfully, but these errors were encountered:

pchampin · 2023-11-16T14:47:11Z

You are absolutely right, wrapping loaders (which have an async API) into a (sync) Mutex is an aberration. My bad!

However, I'm not entirely convinced by your assertion that loaders are "usually cheap to clone or construct". Some loaders may maintain an in-memory copy of frequently used (and possibly big) contexts. Similarly, HTTP loaders should rely on some form of caching (which could be on disk or in memory). Copying those in-memory caches would be inefficient. The alternative would be to resort to some kind of mutex inside the loaders...

So I would keep the idea of wrapping the loader in a mutex, but use futures-util's async Mutex instead. Would that make sense?

Could you maybe make a PR with these changes?

damooo · 2023-11-16T15:06:42Z

However, I'm not entirely convinced by your assertion that loaders are "usually cheap to clone or construct". Some loaders may maintain an in-memory copy of frequently used (and possibly big) contexts. Similarly, HTTP loaders should rely on some form of caching (which could be on disk or in memory). Copying those in-memory caches would be inefficient.

The rust pattern for this is to use interior mutability, wrapped in an Arc which is cheap to clone. For example, The moka's in-memory cache, is cheap to clone, and all clones are backed by same effective cache. From their docs:

Cloning is a cheap operation for Cache as it only creates thread-safe reference-counted pointers to the internal data structures.
Don’t wrap a Cache by a lock such as Mutex or RwLock. All methods provided by the Cache are considered thread-safe, and can be safely called by multiple async tasks at the same time. No lock is needed.

And for on disk cache, like CaCache too need just path to cache.

Manas has an implementation of http loader for the exact purpose. Here it was instantiated with moka inmemory caching middleware factory:

https://github.com/manomayam/manas/blob/7a5748925ba76ed5546cf5647bf10d203e98a58d/crates/manas_server/src/recipe/impl_/single_pod/mod.rs#L83-L99

damooo · 2023-11-16T15:26:11Z

So I would keep the idea of wrapping the loader in a mutex, but use futures-util's async Mutex instead. Would that make sense?

That still makes the parser "effective" non concurrent, as each parser usage effectively have to wait for others to complete.

It would be really practical to use a loader factory. If it is considered okay, i should make a pr with that.

damooo · 2023-11-17T08:57:59Z

The sync mutex is perfectly fine, as lock is acquired in non-async context. It will not block runtime.

Issue was not about parsing task blocking the runtime, but about inability to run multiple parsing tasks concurrently using the same parser, as all tasks have to wait for the lock on config mutex.

May be, that's fine enough, as we can instantiate new parser for each task. Mentioning in docs will be greatly helpful though.

pchampin · 2023-12-04T20:15:54Z

The sync mutex is perfectly fine, as lock is acquired in non-async context. It will not block runtime.

Indeed. I stand corrected. Same for caches being cheap to clone.

Issue was not about parsing task blocking the runtime, but about inability to run multiple parsing tasks concurrently using the same parser, as all tasks have to wait for the lock on config mutex.

I see that now.

May be, that's fine enough, as we can instantiate new parser for each task. Mentioning in docs will be greatly helpful though.

Since this would be specific to the JSON-LD parser, while all other parsers are "concurrency friendly", this is probably not the best option.

I'm reopening this issue, and I'm welcoming a PR replacing "loader" with a "loader_factory" :)

pchampin · 2024-01-08T12:00:38Z

This was actually included in v0.8.0

pchampin added bug Something isn't working help wanted Extra attention is needed labels Nov 16, 2023

damooo closed this as completed Nov 17, 2023

pchampin reopened this Dec 4, 2023

pchampin added enhancement New feature or request and removed bug Something isn't working labels Dec 4, 2023

damooo mentioned this issue Dec 7, 2023

refactor JsonLdOptions to take document loader factory instead of loader #151

Closed

pchampin added this to the 0.9 milestone Dec 14, 2023

pchampin modified the milestones: 0.9, 0.8 Jan 8, 2024

pchampin closed this as completed Jan 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Take loader_factory instead of loader in `JsonldOptions` #143

Take loader_factory instead of loader in `JsonldOptions` #143

damooo commented Nov 6, 2023

pchampin commented Nov 16, 2023

damooo commented Nov 16, 2023 •

edited

Loading

damooo commented Nov 16, 2023

damooo commented Nov 17, 2023 •

edited

Loading

pchampin commented Dec 4, 2023

pchampin commented Jan 8, 2024

Take loader_factory instead of loader in JsonldOptions #143

Take loader_factory instead of loader in JsonldOptions #143

Comments

damooo commented Nov 6, 2023

pchampin commented Nov 16, 2023

damooo commented Nov 16, 2023 • edited Loading

damooo commented Nov 16, 2023

damooo commented Nov 17, 2023 • edited Loading

pchampin commented Dec 4, 2023

pchampin commented Jan 8, 2024

Take loader_factory instead of loader in `JsonldOptions` #143

Take loader_factory instead of loader in `JsonldOptions` #143

damooo commented Nov 16, 2023 •

edited

Loading

damooo commented Nov 17, 2023 •

edited

Loading