New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Paginate connector page #2328

Merged

hagen-danswer merged 8 commits into main from paginate-connector-page

Sep 6, 2024

Contributor

hagen-danswer commented Sep 4, 2024 •

edited

Loading

This should greatly improve load times by:

loading connector info then index attempts
fetch and catch the pages in batches, prefetching +- 1 batch around the current page

hagen-danswer added 2 commits

September 4, 2024 15:17


          Added pagination to individual connector pages

ae350b1


          I cooked

e4e1d4e

vercel bot commented Sep 4, 2024 •

edited

Loading

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
internal-search	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Sep 6, 2024 4:25pm

vercel bot deployed to Preview

September 4, 2024 22:45

View deployment

hagen-danswer commented

View reviewed changes

Contributor Author

hagen-danswer left a comment

yummy cooking

backend/danswer/db/index_attempt.py

+                      stmt = stmt.join(SearchSettings).where(
+                          SearchSettings.status == IndexModelStatus.PRESENT
+                      )

Contributor Author

hagen-danswer Sep 4, 2024

pretty identical except for this pagination logic

backend/danswer/server/documents/cc_pair.py

		@@ -33,6 +39,38 @@
		router = APIRouter(prefix="/manage")

Contributor Author

hagen-danswer Sep 4, 2024

I created a seperate endpoint to get the index attempts for a couple reasons:

enable pagination of the index attempts
allow the front end to load the rest of the connector data without having to wait for the index attempts

backend/danswer/server/documents/models.py

                       num_docs_indexed: int,  # not ideal, but this must be computed separately
                       is_editable_for_current_user: bool,
                   ) -> "CCPairFullInfo":
+                      # figure out if we need to artificially deflate the number of docs indexed.

Contributor Author

hagen-danswer Sep 4, 2024

This logic was originally in the front end but i thought it made more sense to be here

backend/danswer/server/documents/models.py

+                          and number_of_index_attempts == 1
+                      ):
+                          num_docs_indexed = last_index_attempt.new_docs_indexed

Contributor Author

hagen-danswer Sep 4, 2024

we now return the last indexing status and the number of docs to facilitate some of the original display logic on the frontend without having to send back the indexattempts themselves

web/src/app/admin/connector/[ccPairId]/IndexingAttemptsTable.tsx

                 const [indexAttemptTracePopupId, setIndexAttemptTracePopupId] = useState<
                   number | null
                 >(null);
-                const indexAttemptToDisplayTraceFor = ccPair.index_attempts.find(
+                const [currentPageData, setCurrentPageData] =

Contributor Author

hagen-danswer Sep 4, 2024

I cooked from here down. 🧑‍🍳
The main features are:

preloading pages
debouncing (in the form of a 200 ms delay from clicking the button to actually calling the api)
instant loading of preloaded pages

Contributor Author

hagen-danswer Sep 5, 2024

READ THE COMMENTS FOR EACH PART, THEY ARE INFORMATIVE


          Gordon Ramsay in this b

bebe9af

vercel bot deployed to Preview

September 5, 2024 14:57

View deployment


          meepe

d6e20a6

vercel bot deployed to Preview

September 5, 2024 15:04

View deployment


          properly calculated max chunk and switch dict to array

2c9cf82

vercel bot deployed to Preview

September 5, 2024 15:35

View deployment

hagen-danswer commented

View reviewed changes

web/src/app/admin/connector/[ccPairId]/IndexingAttemptsTable.tsx

		const indexAttemptToDisplayTraceFor = ccPair.index_attempts.find(

		const totalPages = Math.ceil(ccPair.number_of_index_attempts / NUM_IN_PAGE);

Contributor Author

hagen-danswer Sep 5, 2024

this allows navigation to the different index_attempt pages through the url

web/src/app/admin/connector/[ccPairId]/IndexingAttemptsTable.tsx

                 const [indexAttemptTracePopupId, setIndexAttemptTracePopupId] = useState<
                   number | null
                 >(null);
-                const indexAttemptToDisplayTraceFor = ccPair.index_attempts.find(
+                const [currentPageData, setCurrentPageData] =

Contributor Author

hagen-danswer Sep 5, 2024

READ THE COMMENTS FOR EACH PART, THEY ARE INFORMATIVE

hagen-danswer added 2 commits

September 5, 2024 09:25


          chunks -> batches

d3eaeec


          increased max page size

3df85e3

vercel bot deployed to Preview

September 5, 2024 16:28

View deployment

pablonyx reviewed

View reviewed changes

web/src/app/admin/connector/[ccPairId]/IndexingAttemptsTable.tsx

+                    setCurrentPageData(cachedBatches[batchNum][batchPageNum]);
+                    setIsCurrentPageLoading(false);
+                  } else {
+                    setIsCurrentPageLoading(true);

Contributor

pablonyx Sep 5, 2024

Generally, we should try to avoid using effects transform state (React's docs are great - https://react.dev/learn/you-might-not-need-an-effect). I'd try to use some memoization for expensive calculations and avoid using effects to continuously update state - (but fine for pagination)

web/src/app/admin/connector/[ccPairId]/IndexingAttemptsTable.tsx Outdated

+                // we use it to avoid duplicate requests
+                const ongoingRequestsRef = useRef<Set<number>>(new Set());
+                const urlBuilder = (batchNum: number) =>

Contributor

pablonyx Sep 5, 2024

nit: more descriptive function naming


          renmaed var

f69e8c3

vercel bot deployed to Preview

September 6, 2024 16:25

View deployment

pablonyx approved these changes

View reviewed changes

hagen-danswer added this pull request to the merge queue

Merged via the queue into main with commit 8977b1b

7 checks passed

hagen-danswer deleted the paginate-connector-page branch

September 6, 2024 17:33

onimsha mentioned this pull request

chore/merge upstream 2024091301 mindvalley/danswer#55

Merged

8 tasks

rajivml pushed a commit to UiPath/danswer that referenced this pull request


          Paginate connector page (onyx-dot-app#2328)

cc89c9b

* Added pagination to individual connector pages

* I cooked

* Gordon Ramsay in this b

* meepe

* properly calculated max chunk and switch dict to array

* chunks -> batches

* increased max page size

* renmaed var

rajivml pushed a commit to UiPath/danswer that referenced this pull request


          Paginate connector page (onyx-dot-app#2328)

86a744c

* Added pagination to individual connector pages

* I cooked

* Gordon Ramsay in this b

* meepe

* properly calculated max chunk and switch dict to array

* chunks -> batches

* increased max page size

* renmaed var

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet