Configurable context_window and tokenizer #1437

imartinez · 2023-12-21T13:35:37Z

Extract two key model parameters to settings.

Tokenizer: used to calculate the available context window given a prompt in several LlamaIndex pipelines. The default tokenizer is the one used for gpt-3.5 but it is not aligned with, for example, Mistral model. Using huggingface as provider of the tokenizer information, added to setup download script.
context_window: base model argument

cognitivetech · 2023-12-31T23:23:24Z

is there a way to make this setup script not download the model if I already have it?

running setup for embeddings \ tokenizer.. since I don't know how to add those manually but it also triggers superfluous model download.

alternatively it would be nice to have a documented way to manually download and configure embeddings \ tokenizer

)

imartinez added 2 commits December 21, 2023 14:31

Extract tokenizer and context_window to settings

a38760e

Download tokenizer file during setup

a8cfb2a

imartinez requested review from pabloogc and lopagela December 21, 2023 13:35

pabloogc approved these changes Dec 21, 2023

View reviewed changes

imartinez merged commit 4780540 into main Dec 21, 2023
6 checks passed

imartinez deleted the feature/llm-settings branch December 21, 2023 13:49

github-actions bot mentioned this pull request Dec 21, 2023

chore(main): release 0.3.0 #1413

Merged

This was referenced Dec 26, 2023

Unable to run embeddings model from sagemaker #1383

Open

Unable to query Query Docs mode when using sagemaker #1367

Closed

cognitivetech mentioned this pull request Jan 4, 2024

can I remove context_window from settings.yml and let tokenizer choose? #1483

Closed

simonbermudez pushed a commit to simonbermudez/saimon that referenced this pull request Feb 24, 2024

feat(settings): Configurable context_window and tokenizer (zylon-ai#1437

de5330d

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable context_window and tokenizer #1437

Configurable context_window and tokenizer #1437

imartinez commented Dec 21, 2023

cognitivetech commented Dec 31, 2023

Configurable context_window and tokenizer #1437

Configurable context_window and tokenizer #1437

Conversation

imartinez commented Dec 21, 2023

cognitivetech commented Dec 31, 2023