Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

semantic-search: increase maximal allowed chunk size, fix error when timeout #752

Merged
merged 1 commit into from
Nov 17, 2021

Conversation

PrimozGodec
Copy link
Collaborator

Issue
  1. When doing a semantic search on laws, many of them are longer than 50000, all of them are below 300000
  2. Sometimes embedding fails since timeout happens in embedding. In those cases result for a chunk is None and not a list. The current implementation fails since it wants to extend the list with None.
Description of changes
  • handle error 2
  • allow documents to be maximally 300000 long and still make chunks with max size 50000. Documents that are longer than MAX_CHUNK_SIZE will be in one request which is longer thank MAX_CHUNK_SIZE
Includes
  • Code changes
  • Tests
  • Documentation

@PrimozGodec PrimozGodec marked this pull request as draft November 16, 2021 11:52
@codecov-commenter
Copy link

codecov-commenter commented Nov 16, 2021

Codecov Report

Merging #752 (98e56c6) into master (b3511fa) will increase coverage by 0.27%.
The diff coverage is 77.77%.

❗ Current head 98e56c6 differs from pull request most recent head 0570321. Consider uploading reports for the commit 0570321 to get more accurate results

@@            Coverage Diff             @@
##           master     #752      +/-   ##
==========================================
+ Coverage   74.31%   74.59%   +0.27%     
==========================================
  Files          73       74       +1     
  Lines        9669     9876     +207     
  Branches     1321     1333      +12     
==========================================
+ Hits         7186     7367     +181     
- Misses       2228     2252      +24     
- Partials      255      257       +2     

@PrimozGodec PrimozGodec marked this pull request as ready for review November 17, 2021 14:48
@djukicn djukicn merged commit 182d8a1 into biolab:master Nov 17, 2021
@PrimozGodec PrimozGodec deleted the fix-semantic-search branch November 17, 2021 17:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants