[KED-1455] Framework on debugging kedro nodes #225
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
I have been using kedro for some time now and I wanted to share some thoughts on how to approach kedro nodes debugging.
Usually when I have to debug a kedro node I take one of the following two approaches:
Use a debugger like
pdb
and put a breakpoint somewhere in the function of the node I want to debug. The good thing about this is that it's quite simple and I can easily jump into any other function called within the node. What I don't like is that some nodes have inputs which take a long time to load (i.e. a big pandasDataFrame
) and I need to restart the program whenever an Exception is raised, even if it was something very easy to fix, so if my node had many minor bugs I end up losing too much time loading all inputs several times.Open an interactive session (Jupyter Notebook or kedro ipython), and manually load the node inputs by calling
catalog.load()
multiple times, then sequentially feeding into the interpreter the lines of my function. What I like about this is approach is that if I find an issue with my code I can fix it on the fly and continue execution without loading all inputs again. What I don't like about this approach is having to manually load all node inputs viacatalog.load()
First of all I would like to ask out how you have tackled debugging kedro nodes.
Also, I'd like to share an attempt to automate a bit more
2.
, so that node inputs can be loaded automatically. The idea is to callcontext.load_node_inputs("my_node_name")
within an ipython/jupyter session and get all inputs loaded into it.Development notes
I informally tested my code with nodes that contain
partial
parameters,*args
and**kwargs
but I did not test other complex cases like decorated functions or nodes involvingTransformers
.I'm leaving this initial approach here and if you believe this is useful we could work on improving it and making it more formal.
I did not find a neat and tidy way to achieve this so my code ended up a bit complex and obscure, if you have any suggestions on other ways to implement this I'd love to hear!
Please share your thoughts, comment and suggest!
Checklist
RELEASE.md
fileNotice
I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":
I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.
I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.
I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.