Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For llama.cpp, model_kwargs gets ignored #743

Closed
isamu-isozaki opened this issue Mar 14, 2024 · 0 comments · Fixed by #744
Closed

For llama.cpp, model_kwargs gets ignored #743

isamu-isozaki opened this issue Mar 14, 2024 · 0 comments · Fixed by #744
Labels

Comments

@isamu-isozaki
Copy link
Contributor

isamu-isozaki commented Mar 14, 2024

Describe the issue as clearly as possible:

I found that when we do

llm = outlines.models.llamacpp(
      model_path,
      model_kwargs={
          "n_gpu_layers": 15,
          "n_batch": 2048,
          "n_ctx": 2048
      },
      device="cpu"
  )

similar to the documentation, the contents of model_kwargs get ignored. The reason I think is here

def llamacpp(model_path: str, device: Optional[str] = None, **model_kwargs) -> LlamaCpp:
    from llama_cpp import Llama
    if device == "cuda":
        model_kwargs["n_gpu_layers"].setdefault(-1)

    model = Llama(model_path, **model_kwargs)
    return LlamaCpp(model=model)

where model_kwargs becomes a dict with key model_kwargs. To fix this I just did

def llamacpp(model_path: str, device: Optional[str] = None, **model_kwargs) -> LlamaCpp:
    from llama_cpp import Llama
    model_kwargs = model_kwargs["model_kwargs"]
    if device == "cuda":
        model_kwargs["n_gpu_layers"].setdefault(-1)

    model = Llama(model_path, **model_kwargs)
    return LlamaCpp(model=model)

but I wanted to make sure if this was a bug. I did make a pr for it if the way the function is called is correct. For example, if we just directly put the keywords in to llamacpp instead of doing model_kwargs= it'll work but that'll be different from the transformers api.

Steps/code to reproduce the bug:

import outlines
llm = outlines.models.llamacpp(
      model_path,
      model_kwargs={
          "n_gpu_layers": 15,
          "n_batch": 2048,
          "n_ctx": 2048
      },
      device="cpu"
)

Expected result:

llm will be initialized with 2048 context length but instead it's 512 context length as the model_kwargs do not get set in model

Error message:

No response

Outlines/Python version information:

Version information
latest

Context for the issue:

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant