-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add slim connector description #3303
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
26faec0
to
9c1212f
Compare
@@ -24,6 +24,8 @@ env: | |||
GOOGLE_DRIVE_OAUTH_CREDENTIALS_JSON_STR: ${{ secrets.GOOGLE_DRIVE_OAUTH_CREDENTIALS_JSON_STR }} | |||
GOOGLE_GMAIL_SERVICE_ACCOUNT_JSON_STR: ${{ secrets.GOOGLE_GMAIL_SERVICE_ACCOUNT_JSON_STR }} | |||
GOOGLE_GMAIL_OAUTH_CREDENTIALS_JSON_STR: ${{ secrets.GOOGLE_GMAIL_OAUTH_CREDENTIALS_JSON_STR }} | |||
# Slab | |||
SLAB_BOT_TOKEN: ${{ secrets.SLAB_BOT_TOKEN }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ This is required to make the tests run in github.
When submitting the PR for merging, make sure to share this with one of the developers @ danswer
backend/danswer/connectors/README.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All changes to this file are just for contributors and do not need to be replicated for other slim connectors
@@ -0,0 +1,5 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We keep long texts to check against in another file as a json for code cleanliness
"Need a test account with a slab subscription to run this test." | ||
"Trial only lasts 14 days." | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is not already a test verifying the content of at least one of the retrieved documents, please add one!
"Need a test account with a slab subscription to run this test." | ||
"Trial only lasts 14 days." | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test just checks that the ids retrieved through the load_from_state() is a subset of the ids retrieved from retrieve_all_slim_documents().
We dont check direct equality between the 2 sets because there are some circumstances where the slim connector might find an id that is filtered out by load/poll connector.
This is a case that is handled downstream and is therefor okay
from danswer.connectors.models import Document | ||
from danswer.connectors.slab.connector import SlabConnector | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only necessary if loading data from an outside folder
@@ -28,6 +31,8 @@ | |||
SLAB_GRAPHQL_MAX_TRIES = 10 | |||
SLAB_API_URL = "https://api.slab.com/v1/graphql" | |||
|
|||
_SLIM_BATCH_SIZE = 1000 | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^1000 is typical for this
return None | ||
|
||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was added for cleanliness. However, its specific to this connector so can be ignored
end: SecondsSinceUnixEpoch | None = None, | ||
) -> GenerateSlimDocumentOutput: | ||
slim_doc_batch: list[SlimDocument] = [] | ||
for post_id in get_all_post_ids(self.slab_bot_token): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice here we call only get_all_post_ids(self.slab_bot_token).
This means when running this connector, we never retrieve all the additional information that load and poll connectors do retrieve, meaning this is a much lighter/faster process.
make sure:
- the id creation process is identical
- that you are NOT retrieving the rest of the data that is used to make the document
Description
This should be used as a reference when adding slim connectors to existing connectors!
Included are: