Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[frontend] Kubeflow does N queries for in the "Runs" page #11346

Open
asaff1 opened this issue Oct 31, 2024 · 7 comments
Open

[frontend] Kubeflow does N queries for in the "Runs" page #11346

asaff1 opened this issue Oct 31, 2024 · 7 comments

Comments

@asaff1
Copy link

asaff1 commented Oct 31, 2024

Environment

  • How did you deploy Kubeflow Pipelines (KFP)?
    Kubeflow pipelines standalone, AWS setup

Steps to reproduce

Open your network panel in dev tools. Navigate to the Runs page and look at your network console. Try to increase page size in the UI, and see many requests.

When navigating to the "Runs" page, kubeflow will send an API call to fetch a list of runs. There are two problems here:

  1. This query returns a total_size field, which does unneeded COUNT(*) query on the whole "run_details" table, the count is not even displayed in the UI.
  2. More important, after the run list is fetched, the UI will do an API call per run (which is a DB query) to get its associated pipeline. This is very slow. Instead, the runs API could simply do an SQL JOIN to get the pipeline info.
    This is really slow, even for page size of 10, tested with my medium size RDS instance. When page size is 100, this page will do over 100 SQL queries.

image

Expected result

To get the runs data with the pipeline info, one query should be enough. Page should load faster.

Materials and Reference


Impacted by this bug? Give it a 👍.

@droctothorpe
Copy link
Contributor

#10797 should resolve this. Make sure that you're on the latest version of KFP.

@asaff1
Copy link
Author

asaff1 commented Oct 31, 2024

@droctothorpe I see.
BTW, same issue for the experiments page - it loads the last 5 runs times the number of experiments shown.
I'm using kubeflow 2.1.0 and not latest 2.3.0 because of regressions that were introduced (probably when argo was upgraded - for example retry doesn't work)

@asaff1
Copy link
Author

asaff1 commented Oct 31, 2024

@droctothorpe I've tried to replace only the image: in the ml-pipeline-ui deployment. I changed from gcr.io/ml-pipeline/frontend:2.1.0 to gcr.io/ml-pipeline/frontend:2.3.0. And still, the issue exists.. I do see the PR in the changelog of 2.3.0, so it is really strange..

@droctothorpe
Copy link
Contributor

That's odd. Maybe look at the source code in the chrome console and make sure it includes the code in question. Alternatively, you can try to build and push a new image from the master branch and deploy that. Another option is to run the UI in development mode from your local machine. You can see how to do that here.

@asaff1
Copy link
Author

asaff1 commented Nov 5, 2024

@droctothorpe I believe that the build of 2.3.0 didn't build your PR. Would be great if you can try it yourself, test the gcr.io/ml-pipeline/frontend:2.3.0 image.

@rimolive
Copy link
Member

@asaff1 Can you confirm your issue is solved if you build the image manually and use in your deployment? If the latest image did not fix this issue, we need to reopen it.

@asaff1
Copy link
Author

asaff1 commented Nov 21, 2024

@rimolive I will try to build and will update. What I can say for sure is that the 2.3.0 image still have the bug. This doesn't match the release notes of the 2.3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants