Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need documentation for air-gapped (offline) on-prem deployment #1405

Open
Yu-amd opened this issue Jan 16, 2025 · 2 comments
Open

Need documentation for air-gapped (offline) on-prem deployment #1405

Yu-amd opened this issue Jan 16, 2025 · 2 comments

Comments

@Yu-amd
Copy link

Yu-amd commented Jan 16, 2025

OPEA should provide documentation and reference architecture on the mechanisms for storing and deploying applications along with all the dependencies (e.g., container images, Helm charts) or host model repositories locally.

Enterprises operating in secure environments need fully offline solution.

@eero-t
Copy link
Contributor

eero-t commented Jan 20, 2025

I don't think documentation is enough.

Currently model downloading is done by each container separately when they start, and those services having write access to that volume. Meaning that user/admin may not even know whether node will run out disk before all those services are ready...

I think there should be a separate model downloader, that is used to pre-fetch all relevant models to the model volume, and that volume would be set as read-only afterwards. IMHO this should be how it's done (documented to be done) by default. Models being downloaded at run-time should be an exception.

@eero-t
Copy link
Contributor

eero-t commented Jan 20, 2025

Helm charts are already using HF downloader in initContainers: https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/common/vllm/templates/deployment.yaml#L53

There could be a separate script / container using that, which would download all specified models to a location expected by the services. Models could be specified either directly, or script could e.g. pick their names from the listed service specs / Helm charts.

PS. One more advantage of this would not needing to provide the secret HF token to all the inferencing services.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants