Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix query embedding isn't normalized in Collection.QueryEmbedding() call #77

Merged
merged 1 commit into from
May 17, 2024

Conversation

philippgille
Copy link
Owner

For the Collection.Query call the embedding is created and normalized. For Collection.QueryEmbedding() the embedding is a parameter, and so far it was assumed/expected to be normalized by the user who calls the function, but that's not guaranteed, so now we check if the embedding is normalized and if it's not then we normalize it.

There's virtually no impact on query performance if the query was normalized already, so no need to extend the function with a boolean parameter to allow users to explicitly skip the check.

Here's a run on main branch:

Benchmark result
$ go test -benchmem -run=^$ -bench .
goos: linux
goarch: amd64
pkg: github.com/philippgille/chromem-go
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
BenchmarkCollection_Query_NoContent_100-8      	   13202	     90541 ns/op	    5165 B/op	      95 allocs/op
BenchmarkCollection_Query_NoContent_1000-8     	    2181	    512950 ns/op	   13577 B/op	     141 allocs/op
BenchmarkCollection_Query_NoContent_5000-8     	     564	   2140110 ns/op	   47125 B/op	     174 allocs/op
BenchmarkCollection_Query_NoContent_25000-8    	     121	   9984625 ns/op	  211688 B/op	     203 allocs/op
BenchmarkCollection_Query_NoContent_100000-8   	      30	  39467498 ns/op	  810281 B/op	     228 allocs/op
BenchmarkCollection_Query_100-8                	   13039	     91023 ns/op	    5154 B/op	      94 allocs/op
BenchmarkCollection_Query_1000-8               	    2186	    517386 ns/op	   13564 B/op	     141 allocs/op
BenchmarkCollection_Query_5000-8               	     564	   2095981 ns/op	   47145 B/op	     175 allocs/op
BenchmarkCollection_Query_25000-8              	     100	  10031972 ns/op	  211669 B/op	     203 allocs/op
BenchmarkCollection_Query_100000-8             	      30	  39626423 ns/op	  810543 B/op	     238 allocs/op
PASS
ok  	github.com/philippgille/chromem-go	28.154s

Here's a run on this branch with the normalization check:

Benchmark result
$ go test -benchmem -run=^$ -bench .
goos: linux
goarch: amd64
pkg: github.com/philippgille/chromem-go
cpu: 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
BenchmarkCollection_Query_NoContent_100-8      	   13184	     90867 ns/op	    5143 B/op	      94 allocs/op
BenchmarkCollection_Query_NoContent_1000-8     	    2170	    513390 ns/op	   13578 B/op	     141 allocs/op
BenchmarkCollection_Query_NoContent_5000-8     	     562	   2130892 ns/op	   47115 B/op	     174 allocs/op
BenchmarkCollection_Query_NoContent_25000-8    	     100	  10010782 ns/op	  211756 B/op	     207 allocs/op
BenchmarkCollection_Query_NoContent_100000-8   	      30	  39342416 ns/op	  810356 B/op	     230 allocs/op
BenchmarkCollection_Query_100-8                	   13170	     90577 ns/op	    5165 B/op	      95 allocs/op
BenchmarkCollection_Query_1000-8               	    2151	    513987 ns/op	   13574 B/op	     141 allocs/op
BenchmarkCollection_Query_5000-8               	     558	   2114246 ns/op	   47109 B/op	     173 allocs/op
BenchmarkCollection_Query_25000-8              	     100	  10089708 ns/op	  211777 B/op	     208 allocs/op
BenchmarkCollection_Query_100000-8             	      30	  39509327 ns/op	  810383 B/op	     232 allocs/op
PASS
ok  	github.com/philippgille/chromem-go	26.538s

It looks even faster, which is just from regular fluctuation.

Only when calling Collection.QueryEmbedding() and passing existing
embeddings that aren't normalized yet.
@philippgille philippgille marked this pull request as ready for review May 17, 2024 20:19
@philippgille philippgille merged commit a0850b1 into main May 17, 2024
4 checks passed
@philippgille philippgille deleted the fix-query-embedding-is-not-normalized branch May 17, 2024 20:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant