Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take loader_factory instead of loader in JsonldOptions #143

Closed
damooo opened this issue Nov 6, 2023 · 6 comments
Closed

Take loader_factory instead of loader in JsonldOptions #143

damooo opened this issue Nov 6, 2023 · 6 comments
Labels
enhancement New feature or request help wanted Extra attention is needed
Milestone

Comments

@damooo
Copy link
Contributor

damooo commented Nov 6, 2023

Current JsonLdOptions<L> takes a loader of type L, and wraps it in a Mutex. It effectively makes jsonld parser non concurrent, as every concurrent op have to wait anyway till others complete.

Instead, it will be great to take a loader-factory fn() => L, that can return fresh loader on every call. As most loaders can be cheap to clone or construct, it will enable concurrency without much cost.

@pchampin pchampin added bug Something isn't working help wanted Extra attention is needed labels Nov 16, 2023
@pchampin
Copy link
Owner

You are absolutely right, wrapping loaders (which have an async API) into a (sync) Mutex is an aberration. My bad!

However, I'm not entirely convinced by your assertion that loaders are "usually cheap to clone or construct". Some loaders may maintain an in-memory copy of frequently used (and possibly big) contexts. Similarly, HTTP loaders should rely on some form of caching (which could be on disk or in memory). Copying those in-memory caches would be inefficient. The alternative would be to resort to some kind of mutex inside the loaders...

So I would keep the idea of wrapping the loader in a mutex, but use futures-util's async Mutex instead. Would that make sense?

Could you maybe make a PR with these changes?

@damooo
Copy link
Contributor Author

damooo commented Nov 16, 2023

However, I'm not entirely convinced by your assertion that loaders are "usually cheap to clone or construct". Some loaders may maintain an in-memory copy of frequently used (and possibly big) contexts. Similarly, HTTP loaders should rely on some form of caching (which could be on disk or in memory). Copying those in-memory caches would be inefficient.

The rust pattern for this is to use interior mutability, wrapped in an Arc which is cheap to clone. For example, The moka's in-memory cache, is cheap to clone, and all clones are backed by same effective cache. From their docs:

Cloning is a cheap operation for Cache as it only creates thread-safe reference-counted pointers to the internal data structures.
Don’t wrap a Cache by a lock such as Mutex or RwLock. All methods provided by the Cache are considered thread-safe, and can be safely called by multiple async tasks at the same time. No lock is needed.

And for on disk cache, like CaCache too need just path to cache.

Manas has an implementation of http loader for the exact purpose. Here it was instantiated with moka inmemory caching middleware factory:

https://github.com/manomayam/manas/blob/7a5748925ba76ed5546cf5647bf10d203e98a58d/crates/manas_server/src/recipe/impl_/single_pod/mod.rs#L83-L99

@damooo
Copy link
Contributor Author

damooo commented Nov 16, 2023

So I would keep the idea of wrapping the loader in a mutex, but use futures-util's async Mutex instead. Would that make sense?

That still makes the parser "effective" non concurrent, as each parser usage effectively have to wait for others to complete.

It would be really practical to use a loader factory. If it is considered okay, i should make a pr with that.

@damooo
Copy link
Contributor Author

damooo commented Nov 17, 2023

The sync mutex is perfectly fine, as lock is acquired in non-async context. It will not block runtime.

Issue was not about parsing task blocking the runtime, but about inability to run multiple parsing tasks concurrently using the same parser, as all tasks have to wait for the lock on config mutex.

May be, that's fine enough, as we can instantiate new parser for each task. Mentioning in docs will be greatly helpful though.

@damooo damooo closed this as completed Nov 17, 2023
@pchampin
Copy link
Owner

pchampin commented Dec 4, 2023

The sync mutex is perfectly fine, as lock is acquired in non-async context. It will not block runtime.

Indeed. I stand corrected. Same for caches being cheap to clone.

Issue was not about parsing task blocking the runtime, but about inability to run multiple parsing tasks concurrently using the same parser, as all tasks have to wait for the lock on config mutex.

I see that now.

May be, that's fine enough, as we can instantiate new parser for each task. Mentioning in docs will be greatly helpful though.

Since this would be specific to the JSON-LD parser, while all other parsers are "concurrency friendly", this is probably not the best option.

I'm reopening this issue, and I'm welcoming a PR replacing "loader" with a "loader_factory" :)

@pchampin pchampin reopened this Dec 4, 2023
@pchampin pchampin added enhancement New feature or request and removed bug Something isn't working labels Dec 4, 2023
@pchampin pchampin added this to the 0.9 milestone Dec 14, 2023
@pchampin pchampin modified the milestones: 0.9, 0.8 Jan 8, 2024
@pchampin
Copy link
Owner

pchampin commented Jan 8, 2024

This was actually included in v0.8.0

@pchampin pchampin closed this as completed Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants