Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip kernel specs from example notebooks #1617

Closed
jlstevens opened this issue Jun 27, 2017 · 16 comments
Closed

Strip kernel specs from example notebooks #1617

jlstevens opened this issue Jun 27, 2017 · 16 comments
Assignees
Labels
type: docs Related to the documentation and examples
Milestone

Comments

@jlstevens
Copy link
Contributor

I would like to throw together a quick script to do this before we release 1.8.

@jlstevens jlstevens added this to the v1.8 milestone Jun 27, 2017
@jlstevens jlstevens self-assigned this Jun 27, 2017
@jlstevens
Copy link
Contributor Author

jlstevens commented Jun 27, 2017

As described in a comment in #1537, I found a quick solution for now:

find . -name '*.ipynb' -exec /tmp/holoviews/examples/strip_kernel.sh {} \;

Where strip_kernel.sh is:

#!/bin/bash
jq --indent 1 \
    '
    (.cells[] | select(has("outputs")) | .outputs) = []
    | (.cells[] | select(has("execution_count")) | .execution_count) = null
    | .metadata = {"language_info": {"name":"python", "pygments_lexer": "ipython3"}}
    | .cells[].metadata = {}
    ' $1 > /tmp/stripped.ipynb

I'll reassigning to the next milestone as we want a more robust thing in future but this will do for 1.8.

Edit: I found out about the appropriate jq command here

@jlstevens jlstevens modified the milestones: v1.9, v1.8, v1.8.1 Jun 27, 2017
@jlstevens jlstevens added the type: docs Related to the documentation and examples label Jul 7, 2017
@jlstevens jlstevens changed the title Strip kernel specs from example notebook Strip kernel specs from example notebooks Jul 7, 2017
@jlstevens jlstevens modified the milestones: v1.8.1, 1.8.2 Jul 7, 2017
@ea42gh
Copy link
Contributor

ea42gh commented Jul 29, 2017

One minor change to strip_kernel.sh is to replace the last line with
' $1 > /tmp/$(basename $1)

@ceball
Copy link
Member

ceball commented Jul 30, 2017

Also, the issue/script name's a little misleading, because as far as I can see (unless I've misread it), the script is removing more than just the kernel spec - it removes lots of other annoying stuff too :)

I think this script would be very useful on any project with jupyter notebooks, e.g. asking people to run notebooks through it before submitting PRs. Obviously it's not the job of holoviews developers to solve this problem everywhere, but is there a way we could easily share it across at least our own projects?

One more thing is that this solution has dependencies other than python (jq, bash, find), whereas there are various solutions out there using python only. Is there a reason you didn't use one of those python solutions? The linked article argues for jq because speed is required, since the author's adding stripping to git filter/smudge (good luck with that...risky and unreliable in my experience!). However, you're using it as a standalone script, so I don't see what would be wrong with python. (Even if you have many notebooks to strip, a python version would only need to start up python once and then strip in bulk, so shouldn't be much slower...)

@jlstevens
Copy link
Contributor Author

jlstevens commented Jul 30, 2017

the script is removing more than just the kernel spec

True. clean_notebook.sh would be a better name.

is there a way we could easily share it across at least our own projects?

Put a Python version of this into param.ipython?

Is there a reason you didn't use one of those python solutions?

Yes. Laziness! I found this jq command online and just used it. There is no reason it can't be done with just Python with a little more effort.

If we agree to put it into param.ipython, then I'm happy to write the Python equivalent (with the expected notebook related dependencies).

@ea42gh
Copy link
Contributor

ea42gh commented Jul 31, 2017

Came across this: https://github.com/michaelaye/nbstripout
@michaelaye does appear on holoviews people list,
maybe asking some questions might be good?

@ceball
Copy link
Member

ceball commented Jul 31, 2017

is there a way we could easily share it across at least our own projects?

Put a Python version of this into param.ipython?

[...]

If we agree to put it into param.ipython, then I'm happy to write the Python equivalent (with the expected notebook related dependencies).

Much as I hate to add non-param stuff (or indeed any stuff...) to param, junk in notebooks is such an annoying issue on every project that I would do it!

It's a shame there isn't a --clean option for jupyter nbconvert --to notebook, or even individual --strip-xyz options. I wonder if a contribution like that would be accepted by nbconvert? I don't have a feel for it.

Came across this: https://github.com/michaelaye/nbstripout

Thanks, yes, nbstripout (https://github.com/kynan/nbstripout, from https://gist.github.com/minrk/6176788) is one of the "various python solutions out there" I was vaguely thinking about.

@jlstevens
Copy link
Contributor Author

It's a shame there isn't a --clean option for jupyter nbconvert --to notebook, or even individual --strip-xyz options.

That would be very useful which is why I don't hold out much hope of it ever happening. Even if nbconvert did accept a PR to support this (which seems unlikely to be quick given how much code seems to be in nbstripout), it might take a while before it would be released and available to us.

I'm open to just using nbstripout although I'm not sure whether it cleans out ipywidget metadata just yet. Partial stripping isn't very useful as you would still need to edit notebooks manually to clear out the remaining gunk.

@michaelaye
Copy link
Contributor

Problem with nbstripout is that it's slow. Have you guys seen this nice blog post?
http://timstaley.co.uk/posts/making-git-and-jupyter-notebooks-play-nice/

@ceball
Copy link
Member

ceball commented Jul 31, 2017

Problem with nbstripout is that it's slow.

I know what you mean if you're talking about for integration with git via smudge/clean filters, but I'm not proposing anything to integrate with git - just something people can run to clean up before submitting a notebook (change), so I don't think it needs to be fast. Even for multiple notebooks, it wouldn't be too bad because python would only need to start up once. (Unless somehow nbstripout is much much slower even not counting the time it takes to start python etc.)

Have you guys seen this nice blog post?

Jean-Luc linked to it in his original script, which is based on that blog post ;)

@jbednar
Copy link
Member

jbednar commented Jul 31, 2017

I'm ok with hiding this functionality in param/ipython.py, not because it belongs there but because I don't think I've ever even loaded that file into an editor, so I'm clearly not going to notice there being cruft there. I do think we need to have a suitable command that we can customize to fight new bogus metadata as it crops up in the wild, so I think we need to put the specification for it somewhere we can all agree to use in our projects.

I'd like to have a simple way to invoke it, i.e. something like nbclean *.ipynb.

@ceball
Copy link
Member

ceball commented Aug 2, 2017

something like nbclean *.ipynb

I agree - I'd like something really easy. My only concern with that is how annoying it would be to remove precious output from a notebook in place, by accident. Maybe it would need at least an --in-place option so the default is to create a new file, or something? I'm not sure.

Also, maybe it wouldn't be too bad to tell people to install something via pip (if python only) or conda (if not just python), so long as what came did the things we wanted by default without commandline options? Since people will already have installed jupyter notebook, they will likely have pip or conda. (E.g. it could be nbstripout, which can be pip installed - but unfortunately nbstripout doesn't remove all the metadata, I think.)

But if it has to go into param, that's ok with me.

And meanwhile, I've filed jupyter/nbconvert#637.

@philippjfr philippjfr modified the milestones: 1.8.4, v1.9 Sep 14, 2017
@philippjfr philippjfr modified the milestones: v1.9, v1.10 Oct 27, 2017
@basnijholt
Copy link
Contributor

basnijholt commented Jan 2, 2018

You can also set a git filter, this one works for me:

git config filter.nbclearoutput.clean "jupyter nbconvert --to notebook --ClearOutputPreprocessor.enabled=True --ClearOutputPreprocessor.remove_metadata_fields='[\"deletable\", \"editable\", \"collapsed\", \"scrolled\"]' --stdin --stdout"

and add a .gitattributes:

*.ipynb filter=nbclearoutput

@jlstevens
Copy link
Contributor Author

PR #2507 referenced above will strip the metadata from examples for 1.10. You can see the jq command I am now using in that PR.

Given all the discussion above, it seems that there is no standard way to generate 'clean' notebooks and as a consequence, supposedly 'cleared' notebooks will contain inconsistent cruft (not written by the notebook author) unless something is done to clear it. In this case, this involves an ad hoc jq command...

@jlstevens
Copy link
Contributor Author

Addressed in the PR merged above.

There is still some general discussion going on but as this particular issue is assigned to 1.10 and the metadata has now been cleared, I will close this issue.

I don't mind if we continue the general discussion here, or if we file a new issue either in this repo or some other (more appropriate) repo.

@philippjfr
Copy link
Member

IMO we should have a release checklist in a wiki or something with this being one of the items. Then we can try our best to check in only cleared notebooks, but at release time we can strip any that made it through somehow.

@ceball
Copy link
Member

ceball commented Apr 6, 2018

In case it helps, the doit/pyct library of tasks will support reporting and clearing extraneous metadata. But right now there's just an issue about choosing the underlying tool to use (https://github.com/pyviz/pyct/issues/3).

Further into the future is the idea of wrapping some of the tasks as web services that interact with github (https://github.com/pyviz/pyct/issues/5).

we should have a release checklist in a wiki or something

One aim of the doit/pyct project is to reduce manual release checklist content wherever possible (or otherwise, at least to share it across projects we maintain), but if you were to write a release checklist for holoviews it would help inform what tasks doit should supply :)

It's also an aim of that project that the tasks just use "standard" underlying tools, so tasks can be run independently by those without doit/pyct (e.g. the test related tasks just use standard tools such as pytest or nose).

(Note: the documentation on the pyct page is highly draft and not yet ready to read, but will be readable soon...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: docs Related to the documentation and examples
Projects
None yet
Development

No branches or pull requests

7 participants