Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paginate connector page #2328

Merged
merged 8 commits into from
Sep 6, 2024
Merged

Paginate connector page #2328

merged 8 commits into from
Sep 6, 2024

Conversation

hagen-danswer
Copy link
Contributor

@hagen-danswer hagen-danswer commented Sep 4, 2024

This should greatly improve load times by:

  • loading connector info then index attempts
  • fetch and catch the pages in batches, prefetching +- 1 batch around the current page

Copy link

vercel bot commented Sep 4, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
internal-search ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 6, 2024 4:25pm

Copy link
Contributor Author

@hagen-danswer hagen-danswer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yummy cooking

stmt = stmt.join(SearchSettings).where(
SearchSettings.status == IndexModelStatus.PRESENT
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty identical except for this pagination logic

@@ -33,6 +39,38 @@
router = APIRouter(prefix="/manage")


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a seperate endpoint to get the index attempts for a couple reasons:

  • enable pagination of the index attempts
  • allow the front end to load the rest of the connector data without having to wait for the index attempts

num_docs_indexed: int, # not ideal, but this must be computed separately
is_editable_for_current_user: bool,
) -> "CCPairFullInfo":
# figure out if we need to artificially deflate the number of docs indexed.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic was originally in the front end but i thought it made more sense to be here

and number_of_index_attempts == 1
):
num_docs_indexed = last_index_attempt.new_docs_indexed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we now return the last indexing status and the number of docs to facilitate some of the original display logic on the frontend without having to send back the indexattempts themselves

const [indexAttemptTracePopupId, setIndexAttemptTracePopupId] = useState<
number | null
>(null);
const indexAttemptToDisplayTraceFor = ccPair.index_attempts.find(
const [currentPageData, setCurrentPageData] =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cooked from here down. 🧑‍🍳
The main features are:

  • preloading pages
  • debouncing (in the form of a 200 ms delay from clicking the button to actually calling the api)
  • instant loading of preloaded pages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

READ THE COMMENTS FOR EACH PART, THEY ARE INFORMATIVE

const indexAttemptToDisplayTraceFor = ccPair.index_attempts.find(

const totalPages = Math.ceil(ccPair.number_of_index_attempts / NUM_IN_PAGE);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this allows navigation to the different index_attempt pages through the url

const [indexAttemptTracePopupId, setIndexAttemptTracePopupId] = useState<
number | null
>(null);
const indexAttemptToDisplayTraceFor = ccPair.index_attempts.find(
const [currentPageData, setCurrentPageData] =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

READ THE COMMENTS FOR EACH PART, THEY ARE INFORMATIVE

setCurrentPageData(cachedBatches[batchNum][batchPageNum]);
setIsCurrentPageLoading(false);
} else {
setIsCurrentPageLoading(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, we should try to avoid using effects transform state (React's docs are great - https://react.dev/learn/you-might-not-need-an-effect). I'd try to use some memoization for expensive calculations and avoid using effects to continuously update state - (but fine for pagination)

// we use it to avoid duplicate requests
const ongoingRequestsRef = useRef<Set<number>>(new Set());

const urlBuilder = (batchNum: number) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: more descriptive function naming

@hagen-danswer hagen-danswer added this pull request to the merge queue Sep 6, 2024
Merged via the queue into main with commit 8977b1b Sep 6, 2024
7 checks passed
@hagen-danswer hagen-danswer deleted the paginate-connector-page branch September 6, 2024 17:33
rajivml pushed a commit to UiPath/danswer that referenced this pull request Oct 2, 2024
* Added pagination to individual connector pages

* I cooked

* Gordon Ramsay in this b

* meepe

* properly calculated max chunk and switch dict to array

* chunks -> batches

* increased max page size

* renmaed var
rajivml pushed a commit to UiPath/danswer that referenced this pull request Oct 2, 2024
* Added pagination to individual connector pages

* I cooked

* Gordon Ramsay in this b

* meepe

* properly calculated max chunk and switch dict to array

* chunks -> batches

* increased max page size

* renmaed var
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants