Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support for Microsoft azure endpoints #67

Merged
merged 14 commits into from
Jun 16, 2023
Merged

Conversation

64bit
Copy link
Owner

@64bit 64bit commented Apr 28, 2023

New:

  • There are two configurations OpenAIConfig and AzureConfig to be used in Client::with_config( config )
  • Client::new() defaults to client with default OpenAIConfig

Breaking change:

  • api_base, org_id, api_key are no longer configured through Client instead platform specific configurations are done through OpenAIConfig or AzureConfig

Examples:

  • Add new example azure-openai-service

closes #32

@64bit 64bit added the help wanted Extra attention is needed label Apr 28, 2023
@deipkumar
Copy link

@64bit is the only thing blocking this change testing examples/azure-openai-service with an Azure Open AI deployment?

@64bit
Copy link
Owner Author

64bit commented May 19, 2023

Hi @deipkumar,

That's correct, testing is the blocker for changes in this PR to be published to crates.io

@deipkumar
Copy link

I'm getting the following when I run:

(base) ➜  examples git:(microsoft-azure-endpoints) ✗ ./target/debug/azure-openai-service
Error: JSONDeserialize(Error("missing field `type`", line: 1, column: 302))

@deipkumar
Copy link

That was because I ran with a gpt-3.5-turbo deployment, which failed on embedding_example (obviously).

completions_stream_example goes on repeat:

(base) ➜  examples git:(microsoft-azure-endpoints) ✗ ./target/debug/azure-openai-service
. Is it Attenborough? Is that what he says? So, okay. Many years ago, in a universe that was shaped by a great war between the Decepticons and the Autobots, an age of strife that raged across the galaxy. In such a place, many tales and legends would be born. This is one such tale. On a distant, barren moon, Optimus Prime and his trusted lieutenant, Bumblebee, were patrolin' a Autobot science facility. Situated in the remote regions of space, (laughs) situated in the remote regions of space, the facility was tasked with researching new means of combating the Decepticons. To help tip the scales of victory in the upcoming battle for dominance. Under their care, the scientists and engineers of the facility would tinker and work long hours in preparation for when the moment was right to reveal their secrets. But one moonlit night, their hard work was radically interrupted. (clears throat) By a massive crashed spaceship that fell suddenly from the sky. (crashing) Oh no, Optimus, cried Bumblebee. We have to help those survivors. The two of them raced to the crash site of the craft amongst the debris and shattered circuits were a group of unfamiliar robots, never before seen by the Autobots. "Are you Decepticons?" Optimus growled. The robots stirred and the first to speak was obviously a leader in his own right, something that Optimus Prime could easily recognize. "No, we are not Decepticons. "We are the Primals, guardians of the Thirteenth Cybertronian Colony Fleet. "We have escaped the destruction caused by our eternal enemy, the Unicron. "We were returning to our home when one of our ships malfunctioned "and the unfortunate events that you have seen before you happened." (laughs) Optimus considered his response thoughtfully. He knew well the chance that they were Decepticons, but something in their eyes made the Prime believe that they were genuine. "All right, very well. "You may stay here and recover." "We'll help you repair your ship and get you on your way "in the nearest stellar body." And so within days,stream failed: Stream ended
 after the crash, the Primals rejoined their fleet and the Autobots could only believe what they had been told and continued their work until the next night raid on the Decepticon stronghold. And that my good friends is a brief but amazing bedtime story about space robots (.

We're gonna do this for you, Champ. Mr. Attenborough's over there. We're gonna get in here for you. Let's rock.

And the Transformers slept soundly that night. His own bot cried out to one of the moons, and Bumblebee shifted a little in his sleep. Optimus Prime, despite everything, slept with his eyes a little open, always vigilant, since the threat was always present, no matter how peaceful the night seemed to be. But even that weighed light in the face of the quiet beauty of the moon's glow softly lighting the Cybertronian landscape from above. And that's the end of that.

That was excellent.

I love it.

Love it.

I love Sir David Attenborough's voice. It's like the same voice you can hear for years and years and never get tired of it.

So amazing.

Good job, yo. That was great.

So as a newcomer to the Transformers franchise, can you tell us a little bit about how you came to be involved in Transformers: War for Cybertron trilogy?

Yeah, absolutely. First and foremost, I'm really lucky that I have an extensive background in voiceover and doing animation VO work and things like that. And I think it's one of the things that opened the door for you guys took a chance on me when you were casting for the role of Alita within the trilogy. And I think, to be honest, when I found out it was the Transformers universe you guys were working on, that was even more exciting for me to be involved. And so, you know, it gave me the opportunity to do a lot of research and to really deep dive into the different series, which was kind of daunting because there's so much out there, but also really exciting. And I think one of the things I really loved was researching where exactly in the timeline we were within the trilogy because it gives you a lot of information as to what the characters are thinking, how things are moving around. And then obviously, you know, pairing that with, you know, working with the scripts and the director and everyone else involved, it was a really cool experience to see it all come together.

That's amazing. So since you're new to the franchise, can you tell us a little bit about how much preparation and research went into your role as Alita?

stream failed: Stream ended
...

and chat_completion_example works fine:

(base) ➜  examples git:(microsoft-azure-endpoints) ✗ ./target/debug/azure-openai-service

Response:

0: Role: assistant  Content: A large language model is an artificial intelligence model trained on a large dataset of text to predict the likelihood of a particular word or phrase given its context. This is known as language modeling. To achieve this, the model learns the relationships between the words and phrases in the training data and uses this knowledge to generate coherent text.

The basic architecture of a large language model typically involves a neural network with many layers of nodes. The input to the network is a sequence of words, and the output is a probability distribution over the next word in the sequence.

During training, the network is presented with large amounts of text and learns to predict the probability of the next word in the sequence based on the previous words. This involves adjusting the weights of the network to minimize the difference between the predicted probability distribution and the actual distribution of the next word in the training data.

Once the model has been trained, it can be used to generate text by taking a sequence of words as input and using the network to predict the next word in the sequence. This process can be repeated to generate longer sequences of text.

Large language models have been used for a variety of natural language processing tasks including language translation, question answering, and text summarization, among others.

When I run with an embeddings deployment I get:

Error: ApiError(ApiError { message: "Too many inputs. The max number of inputs is 1.  We hope to increase the number of inputs per request soon. Please contact us through an Azure support request at: https://go.microsoft.com/fwlink/?linkid=2213926 for further questions.", type: "invalid_request_error", param: None, code: None })

which is due to the fact that Azure only supports embedding one text chunk at a time.

In general, the code is a bit confusing with Azure with with_deployment_id("gpt35") in the azure config and then model("text-davinci-003") in the request for example.

@64bit
Copy link
Owner Author

64bit commented May 19, 2023

Thank you for your help on this and posting the output!

Here are my observations:

completions_stream_example

It appears that Azure doesn't respond with [DONE] to end the stream? Ref: https://github.com/64bit/async-openai/blob/microsoft-azure-endpoints/async-openai/src/client.rs#L320-L330 , what would need to change in stream<0>(mut event_stream: EventSource) function? (Open Question Number 1)

Which is then surfaced as infinite loop on example, may be example should exit on error instead of printing and continuing, but ideal would be to handle Azure response in stream<0>( ... ) function?

chat_completion_example

Nice

embedding_example

This should be easy to fix by changing input from array of two elements to just single &str input.


In general, the code is a bit confusing with Azure with with_deployment_id("gpt35") in the azure config and then model("text-davinci-003") in the request for example.

I think this is from openai-python comparision where they take deployment_id as part of method call parameter. chat_completion = openai.ChatCompletion.create(deployment_id="deployment-name", model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hello world"}])

For async-openai the reason behind having deployment id in config instead of every request is that deployment id is fixed (my assumption, please correct me if thats wrong? Open Question Number 2). So one only need to configure it once.

But from your explanation it seems that "deployment id" refers to model name.. which may make the abstraction of AzureConfig to be a bad one.

Also deplyment id is not part of request body in OpenAI's & Azure's OpenAPI spec (even though its part of URL in Azure's OpenAPI spec) so it helps to reuse function signatures to be same; Having deployment id in function signature would require futher refactor/design change than whats in this PR.


So in summary I have two open questions, please let me know your feedback on how we can address them.

@deipkumar
Copy link

deipkumar commented May 19, 2023

Question 1: Azure does respond with [DONE]. Using the Python SDK (I put print(line.strip().decode("utf-8")) right before this line: https://github.com/openai/openai-python/blob/fe3abd16b582ae784d8a73fd249bcdfebd5752c9/openai/api_requestor.py#L98):

data: {"id":"cmpl-7I0LcSGtHaT5RwObOEA7zBxKoNPU1","object":"text_completion","created":1684525596,"model":"text-davinci-003","choices":[{"text":" new","index":0,"finish_reason":"length","logprobs":null}],"usage":null}
data: [DONE]

Using your Rust SDK (I put println!("\ndata: {:?}", message.data); before this line: https://github.com/64bit/async-openai/blob/microsoft-azure-endpoints/async-openai/src/client.rs#L328):

data: "{\"id\":\"cmpl-7I0NowuFlhjJVD2V10KajaPMhPCkz\",\"object\":\"text_completion\",\"created\":1684525732,\"model\":\"text-davinci-003\",\"choices\":[{\"text\":\" heroes\",\"index\":0,\"finish_reason\":\"length\",\"logprobs\":null}],\"usage\":null}"
stream failed: Stream ended

you should be able to look for finish_reason though?

Question 2: Deployment ID refers to a deployment of a specific model. Each Azure Open AI resource/service is fixed (api_base), but you may want to hit different deployments/models for each resource/service. This is what it looks like in Python:

import openai
openai.api_type = "azure"
openai.api_base = "https://blahblah.openai.azure.com/"
openai.api_version = "2023-03-15-preview"
openai.api_key = "*********"

prompt = "You are a mathematician. "

response = openai.Completion.create(
    engine="gpt3",
    prompt=prompt,
    temperature=0,
    max_tokens=20,
    stream=True
)

response = openai.ChatCompletion.create(
    engine="gpt35",
    messages=[
        {"role": "system", "content": prompt },
        {"role": "user", "content": "Write a poem." },
    ],
    temperature=0,
    max_tokens=20,
    stream=True
)

@64bit
Copy link
Owner Author

64bit commented May 20, 2023

Question 1: It seems a bug in this library then, stream should not end with OpenAIError::StreamError

Question 2: Engine seems to be deprecated? atleast in OpenAI. Even in their updated Python example https://github.com/openai/openai-python#microsoft-azure-endpoints I see deployment and model explicitly provided.

But semantically I see what you mean, one deployment is just one model. In which case library user may want to call API with different models/deployment.

As of this PR, that means user creates one client instance for each deployment.

Or this library needs to expose method signature to accept deployment id like Python library.
Not sure what design choice to make here from top of my head.


So this PR is not ready to be shipped without some non trivial work, I was hoping it should work out of the box.

@deipkumar
Copy link

Do you think we can merge support for Azure embeddings and non streaming completions now and defer streaming support?

@64bit
Copy link
Owner Author

64bit commented May 23, 2023

Hypothetically say we do ship this, the major issues with that is the library immediately suffers from reliability issues where some of the APIs don't work on all of exposed APIs from library.

Second major issue is this introduces breaking change already, and when Azure support is implemented 100% there could be further breaking changes.

Both are not desired effect for the users of library. Its better to pick reliability and quality over shipping partial working library.

That said, the feature implemented in this PR still is useful if needed, in which case you can pin the dependency to Git commit in this branch:

async-openai = { git = "https://github.com/64bit/async-openai", rev = "17e097eecf79c0919100477650163709c833b3fe"}

@Czechh
Copy link
Contributor

Czechh commented Jun 14, 2023

Hey @64bit are you planning on holding on until there's full parity between both platforms? If there's anything missing I can help with, happy to jump in. Otherwise I'll used the pinned dependency. (thank you for this great library)

@64bit
Copy link
Owner Author

64bit commented Jun 14, 2023

Hi @Czechh, Thank you - I'm happy to know its useful.

After news today on function calls, it seems keeping parity may not be practical - Azure API seems to lag OpenAI API.

So I'm wondering if I should ship this and make it clear in the documnetation that OpenAI API is going to be 100% working and Azure may not be 100% working just because Rust type differences and relies fully on community contributions who have access to Azure OpenAI service.

That said, in this PR if you have cycles to contribute: if completions stream example can be made to work without getting stream failed: Stream ended would be boost to quality one less bug.

Lastly, knowing that #72 is being worked up on (see #73 ) - I'd hold to ship this to avoid interruptting their work on supporting function calls.

Do you think this is a reasonable plan to ship it?

@Czechh
Copy link
Contributor

Czechh commented Jun 14, 2023

I think it's good to go. Azure endpoints aren't freely available, and I don't think they will ever be. So the users will need to be more savvy inherently. This is a change for the 1-5% of users.

Right now, there isn't any other option besides writing up your own client/forking this.

The biggest issue I see is the fact that it had to be a breaking change. I understand why not keeping ::new() as openai wasn't the first pick - but I would reconsider so the change isn't breaking, also keeping in mind that this is for a small subset of users. Anyways those are my thoughts.

@64bit
Copy link
Owner Author

64bit commented Jun 15, 2023

Thank you for your thoughtful feedback. Its better to have an option than to have none. Also agree that ::new() should not be breaking change.

Based on percentage of users I'm convinced to update this PR and ship this.

@64bit 64bit mentioned this pull request Jun 15, 2023
@64bit 64bit removed the help wanted Extra attention is needed label Jun 15, 2023
@64bit 64bit merged commit e0b1543 into main Jun 16, 2023
@64bit 64bit deleted the microsoft-azure-endpoints branch June 16, 2023 19:31
@64bit
Copy link
Owner Author

64bit commented Jun 16, 2023

Released this in v0.11.0.

Thank you all for your contributions!

Also recorded the bug identified in this PR in #74

@Czechh
Copy link
Contributor

Czechh commented Jun 17, 2023

Tested this, and works great! Thank you.

I'll add to https://github.com/getmetal/motorhead now. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Microsoft Azure Endpoints
3 participants