Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add slim connector description #3303

Merged
merged 4 commits into from
Dec 2, 2024
Merged

Conversation

hagen-danswer
Copy link
Contributor

@hagen-danswer hagen-danswer commented Dec 2, 2024

Description

This should be used as a reference when adding slim connectors to existing connectors!

Included are:

  • example slim connector implementation
  • example test format
  • some comments on the pr itself
  • updated documentation to reference this

Copy link

vercel bot commented Dec 2, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
internal-search ✅ Ready (Inspect) Visit Preview 💬 Add feedback Dec 2, 2024 7:30pm

@hagen-danswer hagen-danswer force-pushed the add-slim-connector-description branch from 26faec0 to 9c1212f Compare December 2, 2024 19:03
@@ -24,6 +24,8 @@ env:
GOOGLE_DRIVE_OAUTH_CREDENTIALS_JSON_STR: ${{ secrets.GOOGLE_DRIVE_OAUTH_CREDENTIALS_JSON_STR }}
GOOGLE_GMAIL_SERVICE_ACCOUNT_JSON_STR: ${{ secrets.GOOGLE_GMAIL_SERVICE_ACCOUNT_JSON_STR }}
GOOGLE_GMAIL_OAUTH_CREDENTIALS_JSON_STR: ${{ secrets.GOOGLE_GMAIL_OAUTH_CREDENTIALS_JSON_STR }}
# Slab
SLAB_BOT_TOKEN: ${{ secrets.SLAB_BOT_TOKEN }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ This is required to make the tests run in github.

When submitting the PR for merging, make sure to share this with one of the developers @ danswer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All changes to this file are just for contributors and do not need to be replicated for other slim connectors

@@ -0,0 +1,5 @@
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We keep long texts to check against in another file as a json for code cleanliness

"Need a test account with a slab subscription to run this test."
"Trial only lasts 14 days."
)
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is not already a test verifying the content of at least one of the retrieved documents, please add one!

"Need a test account with a slab subscription to run this test."
"Trial only lasts 14 days."
)
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test just checks that the ids retrieved through the load_from_state() is a subset of the ids retrieved from retrieve_all_slim_documents().

We dont check direct equality between the 2 sets because there are some circumstances where the slim connector might find an id that is filtered out by load/poll connector.

This is a case that is handled downstream and is therefor okay

from danswer.connectors.models import Document
from danswer.connectors.slab.connector import SlabConnector


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only necessary if loading data from an outside folder

@@ -28,6 +31,8 @@
SLAB_GRAPHQL_MAX_TRIES = 10
SLAB_API_URL = "https://api.slab.com/v1/graphql"

_SLIM_BATCH_SIZE = 1000

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^1000 is typical for this

return None

@property
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was added for cleanliness. However, its specific to this connector so can be ignored

end: SecondsSinceUnixEpoch | None = None,
) -> GenerateSlimDocumentOutput:
slim_doc_batch: list[SlimDocument] = []
for post_id in get_all_post_ids(self.slab_bot_token):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice here we call only get_all_post_ids(self.slab_bot_token).

This means when running this connector, we never retrieve all the additional information that load and poll connectors do retrieve, meaning this is a much lighter/faster process.

make sure:

  • the id creation process is identical
  • that you are NOT retrieving the rest of the data that is used to make the document

@onyx-dot-app onyx-dot-app deleted a comment from pablonyx Dec 2, 2024
@onyx-dot-app onyx-dot-app deleted a comment from pablonyx Dec 2, 2024
@hagen-danswer hagen-danswer added this pull request to the merge queue Dec 2, 2024
Merged via the queue into main with commit 5385bae Dec 2, 2024
9 of 10 checks passed
@hagen-danswer hagen-danswer deleted the add-slim-connector-description branch December 6, 2024 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants