[KED-1455] Framework on debugging kedro nodes #225

MigQ2 · 2020-02-18T21:30:53Z

Description

I have been using kedro for some time now and I wanted to share some thoughts on how to approach kedro nodes debugging.

Usually when I have to debug a kedro node I take one of the following two approaches:

Use a debugger like pdb and put a breakpoint somewhere in the function of the node I want to debug. The good thing about this is that it's quite simple and I can easily jump into any other function called within the node. What I don't like is that some nodes have inputs which take a long time to load (i.e. a big pandas DataFrame) and I need to restart the program whenever an Exception is raised, even if it was something very easy to fix, so if my node had many minor bugs I end up losing too much time loading all inputs several times.
Open an interactive session (Jupyter Notebook or kedro ipython), and manually load the node inputs by calling catalog.load() multiple times, then sequentially feeding into the interpreter the lines of my function. What I like about this is approach is that if I find an issue with my code I can fix it on the fly and continue execution without loading all inputs again. What I don't like about this approach is having to manually load all node inputs via catalog.load()

First of all I would like to ask out how you have tackled debugging kedro nodes.

Also, I'd like to share an attempt to automate a bit more 2., so that node inputs can be loaded automatically. The idea is to call context.load_node_inputs("my_node_name") within an ipython/jupyter session and get all inputs loaded into it.

Development notes

I informally tested my code with nodes that contain partial parameters, *args and **kwargs but I did not test other complex cases like decorated functions or nodes involving Transformers.
I'm leaving this initial approach here and if you believe this is useful we could work on improving it and making it more formal.
I did not find a neat and tidy way to achieve this so my code ended up a bit complex and obscure, if you have any suggestions on other ways to implement this I'd love to hear!

Please share your thoughts, comment and suggest!

Checklist

Read the contributing guidelines
Opened this PR as a 'Draft Pull Request' if it is work-in-progress
Updated the documentation to reflect the code changes
Added a description of this change and added my name to the list of supporting contributions in the RELEASE.md file
Added tests to cover my changes

Notice

I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":
I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.
I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.
I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.

yetudada · 2020-03-09T10:53:08Z

@MigQ2 There's some lovely thought in this PR. I'm going to bring it up to the team for a discussion. Thanks for raising this!

lorenabalan · 2020-03-09T18:18:42Z

Hi @MigQ2 , thanks a lot for the effort that went into this! We understand the need for such a workflow in Jupyter, and there are some good points highlighted in here that we’ll take away with us. Although this particular implementation doesn’t quite fit into the wider story as is, we’ll think about them when we work out how to best address your need.

lorenabalan · 2020-05-27T10:17:29Z

Hi @MigQ2 ! Good news is this has become significantly easier since the introduction of Hooks in versions 0.16.* . @mzjp2 and @limdauto have kindly included a helpful section in our docs with an example workflow, have a look at our latest develop docs. I'll go ahead and close this PR for now but feel free to raise another one / issue if you have additional feedback.

WaylonWalker · 2020-05-27T12:40:11Z

I made a hook that I used to learn about hooks a bit more. I have found it quite handy when developing other hooks to use. This one was simple enough I posted it as a gist.

https://gist.github.com/WaylonWalker/aa2cb7e06d09513c2864b0425418260d

mrg143504 added 2 commits February 18, 2020 17:51

Added method to load node inputs into the namespace

5a54353

Added load_node_inputs() to RELEASE

7381f39

limdauto requested review from limdauto and removed request for limdauto February 28, 2020 13:38

yetudada changed the title ~~Framework on debugging kedro nodes~~ [KED-1425] Framework on debugging kedro nodes Mar 9, 2020

yetudada mentioned this pull request Mar 9, 2020

Incremental runs/"Run only missing" #221

Closed

Bug fixes

b8dabde

lorenabalan changed the title ~~[KED-1425] Framework on debugging kedro nodes~~ [KED-1455] Framework on debugging kedro nodes Apr 22, 2020

lorenabalan mentioned this pull request Apr 22, 2020

[KED-1455] kedro run pdb flag #325

Closed

lorenabalan closed this May 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KED-1455] Framework on debugging kedro nodes #225

[KED-1455] Framework on debugging kedro nodes #225

MigQ2 commented Feb 18, 2020

yetudada commented Mar 9, 2020

lorenabalan commented Mar 9, 2020

lorenabalan commented May 27, 2020

WaylonWalker commented May 27, 2020

[KED-1455] Framework on debugging kedro nodes #225

[KED-1455] Framework on debugging kedro nodes #225

Conversation

MigQ2 commented Feb 18, 2020

Description

Development notes

Checklist

Notice

yetudada commented Mar 9, 2020

lorenabalan commented Mar 9, 2020

lorenabalan commented May 27, 2020

WaylonWalker commented May 27, 2020