Skip to content

Latest commit

 

History

History
116 lines (84 loc) · 6.1 KB

index.md

File metadata and controls

116 lines (84 loc) · 6.1 KB
layout title notitle
default
Home
true

MLC LLM

MLC LLM is a universal solution that allows any language model to be deployed natively on a diverse set of hardware backends and native applications, plus a productive framework for everyone to further optimize model performance for their own use cases. Everything runs locally with no server support and accelerated with local GPUs on your phone and laptop. Check out our GitHub repository to see how we did it. Check out documentation if you are interested in exploring more possibilities of MLC LLM.

Try it out

Visit our instruction page to try out MLC LLM!

This section also contains some brief instructions to run large-language models and chatbot natively on your environment. For more information, please visit our instruction page.

iPhone

The MLC-Chat app is available on App Store, try it out here to install and use it for iOS devices (iPhone/iPad). Vicuna-7B takes 4GB of RAM and RedPajama-3B takes 2.2GB to run. Considering the iOS and other running applications, we will need a recent iPhone with 6GB for Vicuna-7B or 4GB for RedPajama-3B to run the app. The application is only tested on iPhone 14 Pro Max, iPhone 14 Pro and iPhone 12 Pro.

To build the iOS app from source, You can also check out our GitHub repo.

Note: The text generation speed on the iOS app can be unstable from time to time. It might run slow in the beginning and recover to a normal speed then.

Android

Download the APK file here and install on your phone. You can then start a chat with LLM. When you first open the app, parameters need to be downloaded and the loading process could be slow. In future run, the parameters will be loaded from cache (which is fast) and you can use the app offline. Our current demo relies on OpenCL support on the phone and takes about 6GB of RAM, if you have a phone with the latest Snapdragon chip, you can try out out demo.

We tested our demo on Samsung Galaxy S23. It does not yet work on Google Pixel due to limited OpenCL support. We will continue to bring support and welcome contributions from the open source community. You can also check out our GitHub repo to build the Android app from source.

Check out our blog post for the technical details throughout our process of making MLC-LLM possible for Android.

Windows Linux Mac

We provide a CLI (command-line interface) app to chat with the bot in your terminal. Before installing the CLI app, we should install some dependencies first.

  1. We use Conda to manage our app, so we need to install a version of conda. We can install Miniconda or Miniforge.
  2. On Windows and Linux, the chatbot application runs on GPU via the Vulkan platform. For Windows and Linux users, please install the latest Vulkan driver. For NVIDIA GPU users, please make sure to install Vulkan driver, as the CUDA driver may not be good.

After installing all the dependencies, just follow the instructions below the install the CLI app:

# Create a new conda environment, install CLI app, and activate the environment.
conda create -n mlc-chat-venv -c mlc-ai -c conda-forge mlc-chat-cli-nightly
conda activate mlc-chat-venv

# Install Git and Git-LFS if you haven't already.
# They are used for downloading the model weights from HuggingFace.
conda install git git-lfs
git lfs install

# Create a directory, download the model weights from HuggingFace, and download the binary libraries
# from GitHub.
mkdir -p dist/prebuilt
git clone https://github.com/mlc-ai/binary-mlc-llm-libs.git dist/prebuilt/lib

# Download prebuilt weights of Vicuna-7B
cd dist/prebuilt
git clone https://huggingface.co/mlc-ai/mlc-chat-vicuna-v1-7b-q3f16_0
cd ../..
mlc_chat_cli --local-id vicuna-v1-7b-q3f16_0

# Download prebuilt weights of RedPajama-3B
cd dist/prebuilt
git clone https://huggingface.co/mlc-ai/mlc-chat-RedPajama-INCITE-Chat-3B-v1-q4f16_0
cd ../..
mlc_chat_cli --local-id RedPajama-INCITE-Chat-3B-v1-q4f16_0

# Download prebuilt weights of RWKV-raven-1.5B/3B/7B
cd dist/prebuilt
git clone https://huggingface.co/mlc-ai/mlc-chat-rwkv-raven-1b5-q8f16_0
# or git clone https://huggingface.co/mlc-ai/mlc-chat-rwkv-raven-3b-q8f16_0
# or git clone https://huggingface.co/mlc-ai/mlc-chat-rwkv-raven-7b-q8f16_0
cd ../..
mlc_chat_cli --local-id rwkv-raven-1b5-q8f16_0  # Replace your local id if you use 3b or 7b model.

Web Browser

Please check out WebLLM, our companion project that deploys models natively to browsers. Everything here runs inside the browser with no server support and accelerated with WebGPU.

Links

  • Check out our GitHub repo to see how we build, optimize and deploy the bring large-language models to various devices and backends.
  • Check out our companion project WebLLM to run the chatbot purely in your browser.
  • You might also be interested in Web Stable Diffusion, which runs the stable-diffusion model purely in the browser.
  • You might want to check out our online public Machine Learning Compilation course for a systematic walkthrough of our approaches.

Disclaimer

The pre-packaged demos are for research purposes only, subject to the model License.