Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure can use offline #191

Merged
merged 1 commit into from
May 28, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 13 additions & 20 deletions FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,8 @@ See models that are currently supported in this automatic way, and the same dict

Note, when running `generate.py` and asking your first question, it will download the model(s), which for the 6.9B model takes about 15 minutes per 3 pytorch bin files if have 10MB/s download.

If all data has been put into `~/.cache` by HF transformers, then these following steps (those related to downloading HF models) are not required.

1) Download model and tokenizer of choice

```python
Expand Down Expand Up @@ -208,30 +210,21 @@ from langchain.embeddings import HuggingFaceEmbeddings
embedding = HuggingFaceEmbeddings(model_name=hf_embedding_model, model_kwargs=model_kwargs)
```

4) Gradio uses Cloudfare scripts, download from Cloudfare:
```
iframeResizer.contentWindow.min.js
index-8bb1e421.js
```
place them into python environment at:
```
site-packages/gradio/templates/cdn/assets
site-packages/gradio/templates/frontend/assets
```

5) For jupyterhub dashboard, modify `index-8bb1e421.js` to remove or hardcode port number into urls where `/port/7860` is located. One may have to modify:
```
templates/cdn/index.html
templates/frontend/index.html
templates/frontend/share.html
```

6) Run generate with transformers in [Offline Mode](https://huggingface.co/docs/transformers/installation#offline-mode)
4) Run generate with transformers in [Offline Mode](https://huggingface.co/docs/transformers/installation#offline-mode)

```bash
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 python generate.py --base_model='h2oai/h2ogpt-oasst1-512-12b'
HF_DATASETS_OFFLINE=1 TRANSFORMERS_OFFLINE=1 python generate.py --base_model='h2oai/h2ogpt-oasst1-512-12b' --gradio_offline_level=2
```

Some code is always disabled that involves uploads out of user control: Huggingface telemetry, gradio telemetry, chromadb posthog.

The additional option `--gradio_offline_level=2` changes fonts to avoid download of google fonts. This option disables google fonts for downloading, which is less intrusive than uploading, but still required in air-gapped case. The fonts don't look as nice as google fonts, but ensure full offline behavior.

If the front-end can still access internet, but just backend should not, then one can use `--gradio_offline_level=1` for slightly better-looking fonts.

Note that gradio attempts to download [iframeResizer.contentWindow.min.js](https://cdnjs.cloudflare.com/ajax/libs/iframe-resizer/4.3.1/iframeResizer.contentWindow.min.js),
but nothing prevents gradio from working without this. So a simple firewall block is sufficient. For more details, see: https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/10324.

### Isolated LangChain Usage:

See [tests/test_langchain_simple.py](tests/test_langchain_simple.py)
Expand Down
13 changes: 12 additions & 1 deletion generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,15 @@
import time
import traceback
import typing
import warnings
from datetime import datetime
import filelock
import psutil

os.environ['HF_HUB_DISABLE_TELEMETRY'] = '1'
os.environ['BITSANDBYTES_NOWELCOME'] = '1'
warnings.filterwarnings('ignore', category=UserWarning, message='TypedStorage is deprecated')

from loaders import get_loaders
from utils import set_seed, clear_torch_cache, save_generate_output, NullContext, wrapped_partial, EThread, get_githash, \
import_matplotlib, get_device, makedirs, get_kwargs
Expand All @@ -22,7 +27,6 @@
SEED = 1236
set_seed(SEED)

os.environ['HF_HUB_DISABLE_TELEMETRY'] = '1'
from typing import Union

import fire
Expand Down Expand Up @@ -83,6 +87,7 @@ def main(
cli_loop: bool = True,
gradio: bool = True,
gradio_avoid_processing_markdown: bool = False,
gradio_offline_level: int = 0,
chat: bool = True,
chat_context: bool = False,
stream_output: bool = True,
Expand Down Expand Up @@ -174,6 +179,12 @@ def main(
:param cli_loop: whether to loop for CLI (False usually only for testing)
:param gradio: whether to enable gradio, or to enable benchmark mode
:param gradio_avoid_processing_markdown:
:param gradio_offline_level: > 0, then change fonts so full offline
== 1 means backend won't need internet for fonts, but front-end UI might if font not cached
== 2 means backend and frontend don't need internet to download any fonts.
Note: Some things always disabled include HF telemetry, gradio telemetry, chromadb posthog that involve uploading.
This option further disables google fonts for downloading, which is less intrusive than uploading,
but still required in air-gapped case. The fonts don't look as nice as google fonts, but ensure full offline behavior.
:param chat: whether to enable chat mode with chat history
:param chat_context: whether to use extra helpful context if human_bot
:param stream_output: whether to stream output from generate
Expand Down
4 changes: 4 additions & 0 deletions gpt_langchain.py
Original file line number Diff line number Diff line change
Expand Up @@ -727,6 +727,10 @@ def prep_langchain(persist_directory, load_db_if_exists, db_type, use_openai_emb
return db


import posthog
posthog.disabled = True


def get_existing_db(persist_directory, load_db_if_exists, db_type, use_openai_embedding, langchain_mode,
hf_embedding_model):
if load_db_if_exists and db_type == 'chroma' and os.path.isdir(persist_directory) and os.path.isdir(
Expand Down
29 changes: 27 additions & 2 deletions gradio_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,21 @@
import uuid
import filelock
import pandas as pd
import requests
import tabulate

# This is a hack to prevent Gradio from phoning home when it gets imported
os.environ['GRADIO_ANALYTICS_ENABLED'] = 'False'
def my_get(url, **kwargs):
print('Gradio HTTP request redirected to localhost :)', flush=True)
kwargs.setdefault('allow_redirects', True)
return requests.api.request('get', 'http://127.0.0.1/', **kwargs)

original_get = requests.get
requests.get = my_get
import gradio as gr
requests.get = original_get

from gradio_themes import H2oTheme, SoftTheme, get_h2o_title, get_simple_title, get_dark_js
from prompter import Prompter, \
prompt_type_to_model_name, prompt_types_strings, inv_prompt_type_to_model_lower, generate_prompt
Expand All @@ -19,7 +32,6 @@
from generate import get_model, languages_covered, evaluate, eval_func_param_names, score_qa, langchain_modes, \
inputs_kwargs_list, get_cutoffs, scratch_base_dir

import gradio as gr
from apscheduler.schedulers.background import BackgroundScheduler


Expand Down Expand Up @@ -95,6 +107,7 @@ def go_gradio(**kwargs):
else:
css_code = """footer {visibility: hidden}"""
css_code += """
@import url('https://fonts.googleapis.com/css2?family=Source+Sans+Pro:wght@400;600&display=swap');
body.dark{#warning {background-color: #555555};}
#small_btn {
margin: 0.6em 0em 0.55em 0;
Expand Down Expand Up @@ -131,7 +144,19 @@ def _postprocess_chat_messages(self, chat_message: str):

Chatbot._postprocess_chat_messages = _postprocess_chat_messages

theme = H2oTheme() if kwargs['h2ocolors'] else SoftTheme()
if kwargs['gradio_offline_level'] >= 0:
# avoid GoogleFont that pulls from internet
if kwargs['gradio_offline_level'] == 1:
# front end would still have to download fonts or have cached it at some point
base_font = 'Source Sans Pro'
else:
base_font = 'Helvetica'
theme_kwargs = dict(font=(base_font, 'ui-sans-serif', 'system-ui', 'sans-serif'),
font_mono=('IBM Plex Mono', 'ui-monospace', 'Consolas', 'monospace'))
else:
theme_kwargs = dict()

theme = H2oTheme(**theme_kwargs) if kwargs['h2ocolors'] else SoftTheme(**theme_kwargs)
demo = gr.Blocks(theme=theme, css=css_code, title="h2oGPT", analytics_enabled=False)
callback = gr.CSVLogger()

Expand Down
41 changes: 40 additions & 1 deletion gradio_themes.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
from __future__ import annotations

from typing import Iterable

from gradio.themes.soft import Soft
from gradio.themes import Color
from gradio.themes.utils import colors, sizes
from gradio.themes.utils import colors, sizes, fonts

h2o_yellow = Color(
name="yellow",
Expand Down Expand Up @@ -43,6 +46,22 @@ def __init__(
spacing_size: sizes.Size | str = sizes.spacing_md,
radius_size: sizes.Size | str = sizes.radius_md,
text_size: sizes.Size | str = sizes.text_lg,
font: fonts.Font
| str
| Iterable[fonts.Font | str] = (
fonts.GoogleFont("Montserrat"),
"ui-sans-serif",
"system-ui",
"sans-serif",
),
font_mono: fonts.Font
| str
| Iterable[fonts.Font | str] = (
fonts.GoogleFont("IBM Plex Mono"),
"ui-monospace",
"Consolas",
"monospace",
),
):
super().__init__(
primary_hue=primary_hue,
Expand All @@ -51,6 +70,8 @@ def __init__(
spacing_size=spacing_size,
radius_size=radius_size,
text_size=text_size,
font=font,
font_mono=font_mono,
)
super().set(
link_text_color="#3344DD",
Expand Down Expand Up @@ -89,6 +110,22 @@ def __init__(
spacing_size: sizes.Size | str = sizes.spacing_md,
radius_size: sizes.Size | str = sizes.radius_md,
text_size: sizes.Size | str = sizes.text_md,
font: fonts.Font
| str
| Iterable[fonts.Font | str] = (
fonts.GoogleFont("Montserrat"),
"ui-sans-serif",
"system-ui",
"sans-serif",
),
font_mono: fonts.Font
| str
| Iterable[fonts.Font | str] = (
fonts.GoogleFont("IBM Plex Mono"),
"ui-monospace",
"Consolas",
"monospace",
),
):
super().__init__(
primary_hue=primary_hue,
Expand All @@ -97,6 +134,8 @@ def __init__(
spacing_size=spacing_size,
radius_size=radius_size,
text_size=text_size,
font=font,
font_mono=font_mono,
)


Expand Down