You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After using Danswer for a while at BrightInsight, we propose adding a feature to customize chunk sizes when creating a connector.
Main Goals:
Customize Chunk Size: Allow increasing or decreasing the vector database chunk size. Currently, this is set by DOC_EMBEDDING_CONTEXT_SIZE.
Customize Chunk Overlap: Allow increasing or decreasing the vector database chunk overlap. Currently, this is set by CHUNK_OVERLAP.
Specific Details:
This modification will be off by default. To turn it on, we will use the environment setting ENABLE_VECTOR_DB_SETTINGS. This way, Danswer will continue working as usual unless this setting is enabled.
If ENABLE_VECTOR_DB_SETTINGS is true, when adding a new connector, two new fields will appear: one for DOC_EMBEDDING_CONTEXT_SIZE and another for CHUNK_OVERLAP.
Update the connector_credential_pair Table to save the values of DOC_EMBEDDING_CONTEXT_SIZE and CHUNK_OVERLAP. This way, we can reuse these settings when syncing again the connector.
Modify the chunking logic to check if a connector has DOC_EMBEDDING_CONTEXT_SIZE and CHUNK_OVERLAP in the database. If not, use the existing logic.
The text was updated successfully, but these errors were encountered:
After using Danswer for a while at BrightInsight, we propose adding a feature to customize chunk sizes when creating a connector.
Main Goals:
DOC_EMBEDDING_CONTEXT_SIZE
.CHUNK_OVERLAP
.Specific Details:
ENABLE_VECTOR_DB_SETTINGS
. This way, Danswer will continue working as usual unless this setting is enabled.ENABLE_VECTOR_DB_SETTINGS
is true, when adding a new connector, two new fields will appear: one forDOC_EMBEDDING_CONTEXT_SIZE
and another forCHUNK_OVERLAP
.connector_credential_pair
Table to save the values ofDOC_EMBEDDING_CONTEXT_SIZE
andCHUNK_OVERLAP
. This way, we can reuse these settings when syncing again the connector.DOC_EMBEDDING_CONTEXT_SIZE
andCHUNK_OVERLAP
in the database. If not, use the existing logic.The text was updated successfully, but these errors were encountered: