Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add example: <Generate Embeddings> and <Embedding Similarity Search> #274

Closed
wants to merge 23 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
afe5890
add ImageEditRequest.ResponseFormat
Apr 6, 2023
45d45c9
add ImageEditRequest/ImageVariRequest.ResponseFormat
aceld Apr 6, 2023
7515e50
add ImageVariRequest/ImageEditRequest.ResponseFormat
aceld Apr 18, 2023
85a4d2e
complete image_test
aceld Apr 18, 2023
0423b07
delete var prompt param
aceld Apr 18, 2023
63ffc9c
Merge remote-tracking branch 'upstream/master'
aceld Apr 20, 2023
5088b38
fix:model param type, add moderation Model Name const.
aceld Apr 20, 2023
6cd1c33
rename ModerationText001
aceld Apr 20, 2023
4152dfa
Merge remote-tracking branch 'upstream/master'
aceld Apr 21, 2023
ce80107
add example: <Generate Embeddings> and <Embedding Similarity Search>
aceld Apr 21, 2023
3704737
add DotProduct to Embeddings library, fix embedding example
aceld Apr 26, 2023
973b711
Comment should end in a period (godot)
aceld Apr 26, 2023
2112722
add embeddings TestCosineSimilarity, TestDotProduct
aceld Apr 26, 2023
63d5fa0
delete CosineSimilarity
aceld Apr 26, 2023
149d208
compared by []string, fix float comparison
aceld Jun 13, 2023
b779885
add example: <Generate Embeddings> and <Embedding Similarity Search>
aceld Jun 13, 2023
7786d16
Merge remote-tracking branch 'upstream/master' into embedding_example
aceld Jun 15, 2023
8f51306
Merge remote-tracking branch 'upstream/master' into embedding_example
aceld Jun 16, 2023
4e12e6d
add example: <Generate Embeddings> and <Embedding Similarity Search>
aceld Jun 16, 2023
b0d7bf3
add example: <Generate Embeddings> and <Embedding Similarity Search>
aceld Jun 16, 2023
e2a23c5
add example: <Generate Embeddings> and <Embedding Similarity Search>
aceld Jun 16, 2023
330ec63
sort import
aceld Jun 16, 2023
9e2d687
Merge branch 'master' into embedding_example
aceld Jul 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
213 changes: 205 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -527,6 +527,202 @@ func main() {
```
</details>


<details>
<summary>Generate Embeddings</summary>
sashabaranov marked this conversation as resolved.
Show resolved Hide resolved

```go
package main

import (
"context"
"encoding/gob"
"fmt"
"os"

"github.com/sashabaranov/go-openai"
)

func getEmbedding(ctx context.Context, client *openai.Client, input []string) ([]float32, error) {

resp, err := client.CreateEmbeddings(ctx, openai.EmbeddingRequest{
Input: input,
Model: openai.AdaEmbeddingV2,
})

if err != nil {
return nil, err
}

return resp.Data[0].Embedding, nil
}

func main() {

ctx := context.Background()
client := openai.NewClient("your token")

// example selections
selections := []string{
"Welcome to the go-openai interface, which will be the gateway for golang software engineers to enter the OpenAI development world.",
"It was tasty and fresh. The other one I bought was old and tasted moldy. But this one was good.",
"Great coffee at a good price. I'm a subscription buyer and I buy this month after month. What more can I say?",
"This chocolate is amazing..I love the taste and smell, this is the only chocolate for me...I found a new love!",
"I love this coffee! And such a great price. Will buy more when I am running out which will be soon.",
"The Raspberry Tea Syrup is great. I can use it for hot and cold drinks as well in certain recipes.",
"Everyone that dips with this loves it! So easy to use! Olive oil and tasty bread is all you need.",
"This is a favorite of mine for using over ice. Even bought it to give out as Christmas gifts last year.",
"If you like a great , hot, sauce then buy this. If spicy with heat isn't to your liking then don't buy it.",
"My name is Aceld, and I am a Golang software development engineer. I like young and beautiful girls.",
"The competition was held over two days,24 July and 2 August. The qualifying round was the first day with the apparatus final on the second day.",
"There are 4 types of gymnastics apparatus: floor, vault, pommel horse, and rings. The apparatus final is a competition between the top 8 gymnasts in each apparatus.",
}

// Generate embeddings
var selectionsEmbeddings [][]float32
for _, selection := range selections {
embedding, err := getEmbedding(ctx, client, []string{selection})
if err != nil {
fmt.Printf("GetEmedding error: %v\n", err)
return
}
selectionsEmbeddings = append(selectionsEmbeddings, embedding)
}

// Write embeddings binary data to file
file, err := os.Create("embeddings.bin")
if err != nil {
fmt.Printf("Create file error: %v\n", err)
return
}
defer file.Close()

encoder := gob.NewEncoder(file)
err = encoder.Encode(selectionsEmbeddings)
if err != nil {
fmt.Printf("Encode error: %v\n", err)
return
}
Comment on lines +593 to +605
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think file I/O and marshalling is largely out of scope to the purpose of this example. Could you please remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sashabaranov Sure, you're right. Do you have any suggestions on how to store vector data more efficiently? I would appreciate some advice.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aceld I think the storage of vector data is largely is out of scope for this README — the point is just to show an example, not to build vector-search DB.


return
}
```
</details>

<details>
<summary>Embedding Similarity Search</summary>

```go
package main

import (
"context"
"encoding/gob"
"fmt"
"io/ioutil"
"os"
"sort"
"strings"

"github.com/sashabaranov/go-openai"
)

func getEmbedding(ctx context.Context, client *openai.Client, input []string) ([]float32, error) {
resp, err := client.CreateEmbeddings(ctx, openai.EmbeddingRequest{
Input: input,
Model: openai.AdaEmbeddingV2,
})

if err != nil {
return nil, err
}

return resp.Data[0].Embedding, nil
}

// Sort the index in descending order of similarity
func sortIndexes(scores []float32) []int {
indexes := make([]int, len(scores))
for i := range indexes {
indexes[i] = i
}
sort.SliceStable(indexes, func(i, j int) bool {
return scores[indexes[i]] > scores[indexes[j]]
})
return indexes
}

func main() {
ctx := context.Background()
client := openai.NewClient("your token")

// "embeddings.bin" from exp: <Generate Embeddings>
file, err := os.Open("embeddings.bin")
if err != nil {
panic(err)
}
defer file.Close()

// load all embeddings from local binary file
var allEmbeddings [][]float32
decoder := gob.NewDecoder(file)
if err := decoder.Decode(&allEmbeddings); err != nil {
fmt.Printf("Decode error: %v\n", err)
return
}

// make some input you like
input := "I am a Golang Software Engineer, I like Go and OpenAI."

// get embedding of input
inputEmbd, err := getEmbedding(ctx, client, []string{input})
if err != nil {
fmt.Printf("GetEmedding error: %v\n", err)
return
}

// Calculate similarity through cosine matching algorithm
var questionScores []float32
for _, embed := range allEmbeddings {
// OpenAI embeddings are normalized to length 1, which means that:
// Cosine similarity can be computed slightly faster using just a dot product
score := openai.DotProduct(embed, inputEmbd)
questionScores = append(questionScores, score)
}

// Take the subscripts of the top few selections with the highest similarity
sortedIndexes := sortIndexes(questionScores)
sortedIndexes = sortedIndexes[:3] // Top 3
Comment on lines +675 to +695
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we please add this section to the previous example and have one single example for embeddings?


fmt.Println("input:", input)
fmt.Println("----------------------")
fmt.Println("similarity section:")
selectionsFile, err := os.Open("selections.txt")
if err != nil {
fmt.Printf("Open file error: %v\n", err)
return
}
defer selectionsFile.Close()

fileData, err := ioutil.ReadAll(selectionsFile)
if err != nil {
fmt.Printf("ReadAll file error: %v\n", err)
return
}

// Split by line
selections := strings.Split(string(fileData), "\n")

for _, index := range sortedIndexes {
selection := selections[index]
fmt.Printf("%.4f %s\n", questionScores[index], selection)
}

return
}
```
</details>

<details>
<summary>JSON Schema for function calling</summary>

Expand Down Expand Up @@ -593,19 +789,20 @@ The `Parameters` field of a `FunctionDefinition` can accept either of the above
Open-AI maintains clear documentation on how to [handle API errors](https://platform.openai.com/docs/guides/error-codes/api-errors)

example:
```
```go
e := &openai.APIError{}

if errors.As(err, &e) {
switch e.HTTPStatusCode {
case 401:
// invalid auth or key (do not retry)
switch e.HTTPStatusCode {
case 401:
// invalid auth or key (do not retry)
case 429:
// rate limiting or engine overload (wait and retry)
// rate limiting or engine overload (wait and retry)
case 500:
// openai server error (retry)
// openai server error (retry)
default:
// unhandled
}
// unhandled
}
}

```
Expand Down
25 changes: 22 additions & 3 deletions embeddings_test.go
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
package openai_test

import (
. "github.com/sashabaranov/go-openai"
"github.com/sashabaranov/go-openai/internal/test/checks"

"bytes"
"context"
"encoding/json"
"fmt"
"math"
"net/http"
"testing"

. "github.com/sashabaranov/go-openai"
"github.com/sashabaranov/go-openai/internal/test/checks"
)

func TestEmbedding(t *testing.T) {
Expand Down Expand Up @@ -116,3 +117,21 @@ func TestEmbeddingEndpoint(t *testing.T) {
_, err = client.CreateEmbeddings(context.Background(), EmbeddingRequestTokens{})
checks.NoError(t, err, "CreateEmbeddings tokens error")
}

func TestDotProduct(t *testing.T) {
v1 := []float32{1, 2, 3}
v2 := []float32{2, 4, 6}
expected := float32(28.0)
result := DotProduct(v1, v2)
if math.Abs(float64(result-expected)) > 1e-12 {
t.Errorf("Unexpected result. Expected: %v, but got %v", expected, result)
}

v1 = []float32{1, 0, 0}
v2 = []float32{0, 1, 0}
expected = float32(0.0)
result = DotProduct(v1, v2)
if math.Abs(float64(result-expected)) > 1e-12 {
t.Errorf("Unexpected result. Expected: %v, but got %v", expected, result)
}
}
11 changes: 11 additions & 0 deletions embeddings_utils.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
package openai

// DotProduct Calculate dot product of two vectors.
func DotProduct(v1, v2 []float32) float32 {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put it as an Embedding method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

@sashabaranov like you say.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally, but we don't really have []float32 vectors in the library except for Embedding struct. Might make sense to add it as func (e Embedding) DotProduct(another Embedding)

var result float32
// Iterate over vectors and calculate dot product.
for i := 0; i < len(v1); i++ {
result += v1[i] * v2[i]
}
return result
}