Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KED-1455] Framework on debugging kedro nodes #225

Closed
wants to merge 3 commits into from

Conversation

MigQ2
Copy link
Contributor

@MigQ2 MigQ2 commented Feb 18, 2020

Description

I have been using kedro for some time now and I wanted to share some thoughts on how to approach kedro nodes debugging.

Usually when I have to debug a kedro node I take one of the following two approaches:

  1. Use a debugger like pdb and put a breakpoint somewhere in the function of the node I want to debug. The good thing about this is that it's quite simple and I can easily jump into any other function called within the node. What I don't like is that some nodes have inputs which take a long time to load (i.e. a big pandas DataFrame) and I need to restart the program whenever an Exception is raised, even if it was something very easy to fix, so if my node had many minor bugs I end up losing too much time loading all inputs several times.

  2. Open an interactive session (Jupyter Notebook or kedro ipython), and manually load the node inputs by calling catalog.load() multiple times, then sequentially feeding into the interpreter the lines of my function. What I like about this is approach is that if I find an issue with my code I can fix it on the fly and continue execution without loading all inputs again. What I don't like about this approach is having to manually load all node inputs via catalog.load()

First of all I would like to ask out how you have tackled debugging kedro nodes.

Also, I'd like to share an attempt to automate a bit more 2., so that node inputs can be loaded automatically. The idea is to call context.load_node_inputs("my_node_name") within an ipython/jupyter session and get all inputs loaded into it.

Development notes

I informally tested my code with nodes that contain partial parameters, *args and **kwargs but I did not test other complex cases like decorated functions or nodes involving Transformers.
I'm leaving this initial approach here and if you believe this is useful we could work on improving it and making it more formal.
I did not find a neat and tidy way to achieve this so my code ended up a bit complex and obscure, if you have any suggestions on other ways to implement this I'd love to hear!

Please share your thoughts, comment and suggest!

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change and added my name to the list of supporting contributions in the RELEASE.md file
  • Added tests to cover my changes

Notice

  • I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":

  • I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.

  • I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.

  • I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.

@limdauto limdauto requested review from limdauto and removed request for limdauto February 28, 2020 13:38
@yetudada
Copy link
Contributor

yetudada commented Mar 9, 2020

@MigQ2 There's some lovely thought in this PR. I'm going to bring it up to the team for a discussion. Thanks for raising this!

@yetudada yetudada changed the title Framework on debugging kedro nodes [KED-1425] Framework on debugging kedro nodes Mar 9, 2020
@lorenabalan
Copy link
Contributor

Hi @MigQ2 , thanks a lot for the effort that went into this! We understand the need for such a workflow in Jupyter, and there are some good points highlighted in here that we’ll take away with us. Although this particular implementation doesn’t quite fit into the wider story as is, we’ll think about them when we work out how to best address your need.

@lorenabalan lorenabalan changed the title [KED-1425] Framework on debugging kedro nodes [KED-1455] Framework on debugging kedro nodes Apr 22, 2020
@lorenabalan
Copy link
Contributor

Hi @MigQ2 ! Good news is this has become significantly easier since the introduction of Hooks in versions 0.16.* . @mzjp2 and @limdauto have kindly included a helpful section in our docs with an example workflow, have a look at our latest develop docs. I'll go ahead and close this PR for now but feel free to raise another one / issue if you have additional feedback.

@WaylonWalker
Copy link
Contributor

I made a hook that I used to learn about hooks a bit more. I have found it quite handy when developing other hooks to use. This one was simple enough I posted it as a gist.

https://gist.github.com/WaylonWalker/aa2cb7e06d09513c2864b0425418260d

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants