Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

%reload_kedro taking 2mins+ on Databricks #4042

Open
sagrawal128 opened this issue Jul 30, 2024 · 2 comments
Open

%reload_kedro taking 2mins+ on Databricks #4042

sagrawal128 opened this issue Jul 30, 2024 · 2 comments
Assignees

Comments

@sagrawal128
Copy link

Description

Hi, I am using Kedro 18.14 on Databricks 11.4 LTS. I am trying to run kedro as suggested in documentation and using %reload_kedro to refresh session. However, it takes more than 2 mins for this command to finish. Is this expected behaviour?

Context

Slow run time

Steps to Reproduce

  1. Create Databricks compute with 11.3 machine
  2. Install kedro 18.14 in the notebook
  3. Add a kedro project to workspace using the git repository option on Databricks
  4. Try to load kedro project using %reload_kedro magic

Expected Result

The reload magic should be fast, a few seconds probably

Actual Result

image

Your Environment

  • Kedro version used (pip show kedro or kedro -V): 18.14
  • Python version used (python -V): 3.9
  • Operating system and version: Databricks (details in image below)

image

@ravi-kumar-pilla
Copy link
Contributor

Hi @sagrawal128 , Thanks for raising the issue. We will try to replicate the issue on our end and see if there are areas to improve. Thank you

@noklam
Copy link
Contributor

noklam commented Sep 9, 2024

Sorry for the late response, @sagrawal128 How big is the project? Is the setup very heavy that it takes a lot of time? During %reload_kedro, Kedro basically try to create

  • DataCatalog
  • Pipeline
  • ConfigLoader

These are all components that is necessary before a pipeline run, usually they should be fast. It would be also good to check what kind of dataset are you using, if possible upgrade kedro-datasets. There were issue with some older dataset that setup a database connection early which is causing performance issue.

@noklam noklam self-assigned this Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

3 participants