Recreated settings changes - Adds serveral options for llamacpp and ollama #1703

icsy7867 · 2024-03-11T16:25:00Z

Original PR here:
#1677

llama-cpp https://llama-cpp-python.readthedocs.io/en/latest/api-reference/
https://docs.llamaindex.ai/en/stable/examples/llm/llama_2_llama_cpp.html#

ollama - https://github.com/run-llama/llama_index/blob/eeb2a60387b8ae1994005ad0eebb672ee02074ff/llama-index-integrations/llms/llama-index-llms-ollama/llama_index/llms/ollama/base.py

No configurable changes. -
openailike - https://docs.llamaindex.ai/en/stable/examples/llm/localai.html#localai

Not sure about the model_kwargs. The value is references for openai, but I could not find documentation on what values were allowed.
openai - https://github.com/run-llama/llama_index/blob/eeb2a60387b8ae1994005ad0eebb672ee02074ff/llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/base.py
https://docs.llamaindex.ai/en/stable/examples/llm/openai.html

For the text/description I used the values found here:
https://github.com/ollama/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values

LlamaCPP, where it used the same K/V, had the same values. However my setup is currently using ollama, need some testing done for LlamaCPP.

I also added the temperature under the main llm.settings. This should allow the models that supports this value to be edited/changed.

icsy7867 · 2024-03-11T17:52:05Z

Hmm small bug... num_predict: 128 doesnt do what I think it does. It tells llamaindex the maximum size of the response. So this should probably be set to -1 or -2 by default.

It is odd though, that the default says "128", but if you dont set that kwarg, you get a larger response.

icsy7867 · 2024-03-11T18:05:25Z

Looking at the ollama code:
https://github.com/ollama/ollama/blob/f878e91070af750709f1b3195eeb9fbdcaad2bef/openai/openai.go#L174

	if r.MaxTokens != nil {
		options["num_predict"] = *r.MaxTokens
	}

It looks like the default is 128, unless you have max tokens set. Then it just makes the value the same as the max tokens. Alternatively. setting this to "Max new tokens" might make more sense.

…y uses the context window size

icsy7867 added 3 commits March 11, 2024 16:17

Recreated settings changes

10ffebe

post check

1afa6e1

Fixed variable value

942f2b1

icsy7867 added 2 commits March 11, 2024 20:14

Set default of num_predict of ollama to None, so that it automaticall…

2154fd2

…y uses the context window size

Fixed suspect black issue

41865dc

imartinez approved these changes Mar 11, 2024

View reviewed changes

imartinez merged commit 02dc83e into zylon-ai:main Mar 11, 2024
6 checks passed

github-actions bot mentioned this pull request Mar 11, 2024

chore(main): release 0.5.0 #1708

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recreated settings changes - Adds serveral options for llamacpp and ollama #1703

Recreated settings changes - Adds serveral options for llamacpp and ollama #1703

icsy7867 commented Mar 11, 2024

icsy7867 commented Mar 11, 2024

icsy7867 commented Mar 11, 2024

Recreated settings changes - Adds serveral options for llamacpp and ollama #1703

Recreated settings changes - Adds serveral options for llamacpp and ollama #1703

Conversation

icsy7867 commented Mar 11, 2024

icsy7867 commented Mar 11, 2024

icsy7867 commented Mar 11, 2024