You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OPEA should provide documentation and reference architecture on the mechanisms for storing and deploying applications along with all the dependencies (e.g., container images, Helm charts) or host model repositories locally.
Enterprises operating in secure environments need fully offline solution.
The text was updated successfully, but these errors were encountered:
Currently model downloading is done by each container separately when they start, and those services having write access to that volume. Meaning that user/admin may not even know whether node will run out disk before all those services are ready...
I think there should be a separate model downloader, that is used to pre-fetch all relevant models to the model volume, and that volume would be set as read-only afterwards. IMHO this should be how it's done (documented to be done) by default. Models being downloaded at run-time should be an exception.
There could be a separate script / container using that, which would download all specified models to a location expected by the services. Models could be specified either directly, or script could e.g. pick their names from the listed service specs / Helm charts.
PS. One more advantage of this would not needing to provide the secret HF token to all the inferencing services.
OPEA should provide documentation and reference architecture on the mechanisms for storing and deploying applications along with all the dependencies (e.g., container images, Helm charts) or host model repositories locally.
Enterprises operating in secure environments need fully offline solution.
The text was updated successfully, but these errors were encountered: