Examples for LLM Use Cases with Langchain and Llamaindex #385

aayushsss1 · 2024-08-03T20:35:14Z

Describe the change you'd like to see

With KServe now supporting the OpenAI Schema for LLM runtimes, few examples of use cases using native Langchain and Llamaindex features like text generation, RAG—QA, chat, etc. with KServe-hosted models

Additional context
Add any other context or screenshots about the feature request here.

Sample call -

from langchain_openai import OpenAI
client = OpenAI(
        base_url=f"http://{INGRESS_HOST}:{INGRESS_PORT}/v1/",
        api_key=api_key,
        default_headers={
            "Host": SERVICE_HOSTNAME,
        },
        model=MODEL_NAME,
        temperature=0.8,
        top_p=1,
    )

Original Issue - kserve/kserve#3419

The text was updated successfully, but these errors were encountered:

allilou · 2024-08-27T17:59:18Z

I'm getting 404 not found error using the bellow code

from langchain_openai import OpenAI
llm = OpenAI(
        base_url=f"http://{INGRESS_HOST}:{INGRESS_PORT}/v1/",
        api_key=api_key,
        default_headers={
            "Host": SERVICE_HOSTNAME,
        },
        model="MODEL_NAME",
        temperature=0.8,
        top_p=1,
    )
messages = [
    (
        "system",
        "You are a helpful assistant that translates English to French. Translate the user sentence.",
    ),
    ("human", "I love programming."),
]
ai_msg = llm.invoke(messages)
ai_msg

Kserve version 0.13.1

lizzzcai · 2024-09-10T08:30:04Z

I added some examples in this PR, @allilou please give it a try.

allilou · 2024-09-10T16:19:26Z

I saw in the README that you are using the Hugging faces runtime.
In my case I've deployed an LLM with the Custom Python Serving Runtime
https://kserve.github.io/website/latest/modelserving/v1beta1/custom/custom_model/

The Open Inference API is working well, but having an OpenAI inference endpoint will be better for the developers.

lizzzcai · 2024-09-11T02:08:46Z

Hi @allilou , did you try to remove the / after the v1 in your example? can you share your custom runtime implementation if possible.

allilou · 2024-09-13T11:02:14Z

I'm using this code to make the call

from openai import OpenAI

client = OpenAI(
    base_url=f"{SERVER_URL}/openai/v1",
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Say this is a test",
        }
    ],
    model="my-model-name",
)

And I'm getting this error

Traceback (most recent call last):
  File "../test_openai_api_on_kserve.py", line 11, in <module>
    chat_completion = client.chat.completions.create(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/openai/_utils/_utils.py", line 274, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/openai/resources/chat/completions.py", line 679, in create
    return self._post(
           ^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/openai/_base_client.py", line 1260, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/openai/_base_client.py", line 937, in request
    return self._request(
           ^^^^^^^^^^^^^^
  File "/home/vscode/.local/lib/python3.11/site-packages/openai/_base_client.py", line 1041, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: Error code: 404 - {'detail': 'Not Found'}

My custom model is the following (Kserve 0.13)

class KserveLLM(kserve.Model):
    llm: LLMEngine

    def __init__(self, name: str):
        super().__init__(name)
        self.name = name
        self.ready = False

    def load(self):
        engine_args = EngineArgs(
            model=LLM_ID, download_dir=LLM_PATH, dtype="half", enforce_eager=True
        )
        self.llm = LLMEngine.from_engine_args(engine_args)

    def predict(self, request: InferRequest, headers: Dict = None) -> Dict:
        """class method wrapping self.model.predict()
        """
        input_query = ""
        max_new_tokens = 1000
        temperature = 0.01
        top_k = 2
        top_p = 0.01

        for input in request.inputs:
            if input.name == "input_text":
                input_query = input.data[0]
            elif input.name == "temperature":
                temperature = input.data[0]
            elif input.name == "top_k":
                top_k = input.data[0]
            elif input.name == "top_p":
                top_p = input.data[0]
            elif input.name == "max_new_tokens":
                max_new_tokens = input.data[0]

        if len(input_query) == 0:
            error_message = "Empty Query text !!!"
            self.logger.warning(f"[LLM]: {error_message}")
            return self._error_response(error_message)

        self.llm_sampling_params = SamplingParams(
            temperature=temperature,
            max_tokens=max_new_tokens,
            top_k=top_k,
            top_p=top_p,
        )

        self.logger.info(f"[LLM] Query: {input_query}")
        self.llm.add_request("0", input_query, self.llm_sampling_params)
        output_text = ""

        try:
            while True:
                request_outputs = self.llm.step()
                for request_output in request_outputs:
                    if request_output.finished:
                        output_text += " ".join(
                            [o.text for o in request_output.outputs]
                        )

                if not (self.llm.has_unfinished_requests()):
                    break
        except Exception as e:
            error_message = "ERROR: Faild to generate predictions !!"
            self.logger.error(f"[LLM] {error_message} {str(e)}")
            return self._error_response(error_message)

        self.logger.info(f"[LLM-OUT]: {output_text}")

        response_id = generate_uuid()
        infer_output = InferOutput(
            name="predictions", shape=[1, 1], datatype="FP32", data=[output_text]
        )
        infer_response = InferResponse(
            model_name=self.name, infer_outputs=[infer_output], response_id=response_id
        )
        return infer_response

if __name__ == "__main__":

    models_list: list[Any] = []

    # model servers
    llm: KserveLLM = KserveLLM("my-model-name")

    try:
        llm.load()
        models_list.append(llm)
    except ModelMissingError:
        _logger.error(f"fail to load model [LLM]")


    if len(models_list) == 0:
        print(f"[NO MODEL TO LOAD]")
        exit()

    print(
        f"[LOADED]: {[type(model).__name__ for model in models_list]}"
    )

    print(f"{Fore.BLUE}[SERVER]: STARTING")
    # Init ModelServer
    model_server = ModelServer(http_port=8080, workers=1, enable_docs_url=True)
    # Start server
    model_server.start(models_list)

lizzzcai · 2024-09-16T02:56:53Z

Hi @allilou , you example is using Kserve.Model to build custom model but you are trying to use openAI SDK to connect to the model. I think the model should be able to connected via open inference protocol like this. However I don't think it will work out of the box with openai SDK (except the huggingface server). Myabe @yuzisun can comment. I will suggest you to raise an issue in the KServe repo as well.

yuzisun · 2024-09-16T03:08:32Z

You should implement the OpenAIModel chat completion API instead.

https://github.com/kserve/kserve/blob/9ec6842546682b338f0141bba2bda769a953a850/python/kserve/kserve/protocol/rest/openai/openai_model.py#L51

lizzzcai mentioned this issue Sep 10, 2024

add llm sdk integration example #395

Merged

yuzisun closed this as completed in #395 Sep 16, 2024

Rajakavitha1 mentioned this issue Sep 23, 2024

Update kubernetes_deployment.md #398

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Examples for LLM Use Cases with Langchain and Llamaindex #385

Examples for LLM Use Cases with Langchain and Llamaindex #385

aayushsss1 commented Aug 3, 2024

allilou commented Aug 27, 2024 •

edited

Loading

lizzzcai commented Sep 10, 2024

allilou commented Sep 10, 2024

lizzzcai commented Sep 11, 2024

allilou commented Sep 13, 2024

lizzzcai commented Sep 16, 2024

yuzisun commented Sep 16, 2024

Examples for LLM Use Cases with Langchain and Llamaindex #385

Examples for LLM Use Cases with Langchain and Llamaindex #385

Comments

aayushsss1 commented Aug 3, 2024

allilou commented Aug 27, 2024 • edited Loading

lizzzcai commented Sep 10, 2024

allilou commented Sep 10, 2024

lizzzcai commented Sep 11, 2024

allilou commented Sep 13, 2024

lizzzcai commented Sep 16, 2024

yuzisun commented Sep 16, 2024

allilou commented Aug 27, 2024 •

edited

Loading