Markdown based notebooks #103

avli · 2023-03-25T13:36:32Z

This PR is an outcome of Jupyter Notebook workshop. The JEP proposes an alternative Markdown-based serialization syntax for Jupyter notebooks that allows the lossless serialization from/to .ipynb, is reasonably human readable, interoperable with standard text tools, and is more VCS-friendly.

Creating a GitHub issue to decide if it's a JEP in this repository is skipped after discussing it with @fcollonval during the workshop.

Resolve #102

- small wording changes and typos - shortened and reduced the wordiness of one of the use cases

102-markdown-based-notebooks/markdown-based-notebooks.md

KathleenDollard · 2023-03-28T14:24:11Z

102-markdown-based-notebooks/markdown-based-notebooks.md

+    ```{jupyter.code-cell metadata={json object}}
+    :tags: [hide-output, show-input]
+
+    print(Hello!")


I am not familiar with this space, but from a simplistic perspective this is identical to the previous block except the json object indicator. I expected the tag or its contents to look like JSON. Also, looks like the print line is missing a double quote.

Also, is the reasoning on where snake, kebab and space cases are used. I had anticipated json-object.

Not to be picky, I love this work!

Hi @KathleenDollard thanks for the comment - yes there are multiple different ways of expressing the same thing here and, the motivation for that came from authoring in plain text rather than concerns on lossless serialisation, the fact that there are already a few different ways for people to author lightweight notebooks (in jupytext and in myst-notebooks for example) the different variation reflect supporting a few of the different existing styles that people use. @nthiery can maybe comment more on this?

Typo fixed in avli#2 ; thanks!

Indeed, the reasoning is to leave freedom to the author, depending on the
use case, on which syntax to use for cell metadata and cell parameters:

one-line json object when the priority is on lossless serialization (without polluting too much the text)

yaml when the priority is on human read/write ability

Thanks @KathleenDollard for the feedback!

Hi @KathleenDollard! Thank you for the feedback!

Answering your first comment, as @stevejpurves already mentioned, initially, it was decided to set a minimum amount of restrictions on how to write metadata. Theoretically, it allows users to select the suitable tradeoff between readability/writability. I'm unsure if such flexibility is a good idea, so more feedback is welcome to decide if the syntax should be more opinionated.

About the cases: do you suggest that the identifiers obey the JSON syntax (in other words, kebab-case shouldn't be used)?

jgm · 2023-03-28T16:23:14Z

Note that the syntax

```{jupyter.code-cell}

is incompatible with pandoc's markdown. Ideally, it would be nice if the proposed format could be read and processed by pandoc (and thus doesn't require a custom parser).

Why not use an attribute that is compatible? E.g.

{.jupyter .code-cell}

or

{.jupyter-code-cell}

or even just

{.code-cell}

There is currently no official attribute syntax for commonmark, but if this comes it is likely to be very similar to the pandoc attribute syntax.

See https://github.com/jgm/commonmark-hs/blob/master/commonmark-extensions/test/attributes.md

Similar remarks for other uses of {jupyter.XXX}.

stevejpurves · 2023-03-28T17:01:13Z

thanks for the comments @jgm

yes, the syntax

```{jupyter.code-cell}

is aimed at providing concrete "directives" in the document that can be used to specify the various notebook blocks, which go beyond code blocks and also specify output and attachments and other complex/rich types.

So the JEP isn't favoring any existing parser/library and while it isn't current compatible out of the box with pandoc it's also not with jupytext, myst or quarto out of the box -- although the syntax currently shares a lot with quarto and myst styles.

A custom parser / serializer or modifications existing parsers are probably going to be needed anyways in order to support the serialisation requirements around output and attachment blocks?

jgm · 2023-03-28T17:10:50Z

Yes, I understand the intent. But that intent can be met without departing from standard attribute syntax.

If you used one of the variants I suggested, or e.g. {.jupyter:code-cell}, which also works, then you'd be able to read one of these md notebooks with pandoc and process it with filters.

With your current syntax suggestion, that wouldn't be possible; you'd be giving up easy-interoperability for no good reason that I can discern.

jgm · 2023-03-28T17:12:10Z

A custom parser / serializer or modifications existing parsers are probably going to be needed anyways in order to support the serialisation requirements around output and attachment blocks?

This could all be handled with filters with the existing pandoc markdown or extended commonmark parser; none of it requires changes to the parser.

nthiery · 2023-03-30T11:35:07Z

Thanks @jgm for the feedback! The motivation for having jupyter somewhere is
for namespacing. Other than this, we certainly should consider variants of the
proposed syntax if this helps interoperability and increases the odds of being
consistent with whatever standards may emerge in the Markdown world.

Using .jupyter.code instead jupyter.code seems totally fine to me.

I am not sure about .jupyter .code: on the one hand, it's consistent with
the .code keyword of pandoc. On the other hand it carries less the idea
of namespacing.

Presumably a good guideline to follow is what would be customary in the
css world. I am by far not an expert there!

jonsequitur · 2023-03-30T15:25:25Z

Using Markdown for notebooks that display nicely as READMEs (similar to mwouts/jupytext#220) has been explored for Polyglot Notebooks / Try .NET. One detail from that design that might be of interest here is that we also put cell metadata after the code fence, but always prefixed with the language name in order to leverage existing syntax highlighting features.

Here's an example:

```python {metadata: ...}
x = 1
if x == 1:
    # indented four spaces
    print("x is 1.")
```

This renders with language-specific highlighting without displaying the metadata:

x = 1
if x == 1:
    # indented four spaces
    print("x is 1.")

jgm · 2023-03-30T17:36:58Z

Using .jupyter.code instead jupyter.code seems totally fine to me.

Some implementations may take .jupyter.code to be specifying two class names rather than one (and thus to be equivalent to .jupyter .code). And in general, even if implementations supported it, having . or : in class names is not ideal. (Colons need to be escaped in CSS, and periods conflict with the class syntax.)

.jupyter-code or .jupyter_code should be fine.

Another alternative would be to use a key-value pair: jupyter="code", jupyter="output", etc.

nthiery · 2023-03-30T20:15:11Z

One detail from that design that might be of interest here is that we also put cell metadata after the code fence, but always prefixed with the language name in order to leverage existing syntax highlighting features.

Thanks for the feedback that brings perspective to one of the open points.

I personally lean toward making this a recommended feature: parsers should support it;
writers (including humans!) are encouraged to use it, but don't have to depending on the
use case.

stevejpurves · 2023-03-30T20:33:46Z

On the class attribute syntax: I don't like the idea of syntax that overloads class attribute; {.code-cell} essentially equates to <div class="code-cell"></div> whereas {code-cell} essentially equates to <code></code>, which is semantically stronger.

Speaking from a jupyter point of view, I think we want strong semantics around what a jupyter code-cell (or output, or attachment) is (with or without the {}), and what information should be on them in terms of parameters, attributes, metadata, etc... these are not <div>s of a certain class they are semantically meaningful elements with a specific representation when serialized and are rendered as complex UI fragments in jupyter clients.

On interoperability: A block syntax of {code-cell} is already compatible with jupytext and MyST notebooks, with the introduction of new block types and the jupyter namespace {jupyter.code-cell} it is still well aligned with the block / directive syntax used by jupytext, myst and I think quarto - extension should be straightforward there.

jgm · 2023-03-30T21:44:17Z

A block syntax of {code-cell} is already compatible with jupytext and MyST notebooks

My point is that you should care about wider interoperability.

I think quarto - extension should be straightforward there.

Quarto is based on pandoc (it uses pandoc's parsers with a bunch of filters on top to process the AST), so you need to be interoperable with pandoc for that.

stevejpurves · 2023-03-31T08:09:55Z

A block syntax of {code-cell} is already compatible with jupytext and MyST notebooks

My point is that you should care about wider interoperability.

I think we do? and I think we're considering & discussing that here -- I guess what I'm not clear on is as there are multiple possible (probably conflicting) tools to be interoperable with, how to weight them. e.g. I'm not clear on the extent that pandoc is actively used alongside jupyter in the same way that jupytext is (i.e. in a tight loop over notebook development and execution) as opposed to say getting notebooks out to other formats for distribution of that material outside of jupyter.

Also other big point on interoperability is which hasn't been mentioned yet is GFM!

Maybe what were are missing the JEP so far are some clearer requirements like statements that can be discussed and agreed on, e.g.

Must render fully on github (GFM)
Must ____

As currently the "design goals" section is the closest to something like that but is still very loose: i.e. "The serialized notebook should be a valid Markdown file." whatever that means. This could better set the scene for then zeroing on the syntax.

Quarto is based on pandoc (it uses pandoc's parsers with a bunch of filters on top to process the AST), so you need to be interoperable with pandoc for that.

Ah ok, I thought it was pandoc flavored markdown + additional extensions -- are you saying that pandoc already supports the quarto code block syntax, which doesn't use class attributes and is close to the syntax already outlined in the JEP? or is this special handling of a language attribute by pandoc?

e.g. shown here

jgm · 2023-03-31T15:42:38Z

I suspect that's a documentation bug.
Pandoc allows

``` {.python}

or

``` python

I believe the same is true of Quarto, because they don't use a customized pandoc, just filters on top.

All I'm saying is that if there's any room for a choice between

{.jupyter-code}
{.jupyter:code}
{.jupyter.code}
{jupyter .code}
{jupyter-code}

etc., it would be desirable (in this planning stage) to pick one that pandoc can already handle. This increases interoperability at little cost. (This would have been a good design goal for MyST, too.)

krassowski · 2023-04-09T17:29:25Z

I would love to see a new section addressing the topic of trust and signatures (Jupyter Notebook security model). In particular: would signature for notebook be computed and stored in the markdown file?

if yes, how?
if no, will all cells/outputs be always treated as non-trusted upon opening the notebook in .nb.md format?

Please also see #95 (comment).

westurner · 2023-04-09T18:45:57Z

Markdown YAML front matter can contain YAML-LD/JSON-LD front matter
- W3C Verified Credentials:
  https://www.w3.org/TR/vc-data-model/#concrete-lifecycle-example
- Add JSONLD @context to the top level .ipynb node nbformat#44 (comment)
- Unintuitive API for trust/signing a notebook nbformat#98 (comment)
- You normalize the graph before signing it
- Do nb.md and .ipynb versions of the notebook parse to the same graph, which is then normalized and hashed and signed?

avli · 2023-04-11T14:07:45Z

@krassowski, thank you for raising this question!

As far as I understand from the documentation, the signature is produced from the outputs. Can we apply the same procedure to the outputs inside the Markdown file?

Most likely, I oversimplify things, and you probably see some rough edges. If so, could you share your thoughts?

echarles · 2023-04-15T05:44:40Z

cell outputs and attachment are mentioned at several places, but it is not clear to me if there is an option to have a companion file to markdown to persist those cell outputs and attachments.

nthiery · 2023-04-16T14:44:42Z

cell outputs and attachment are mentioned at several places, but it is not clear to me if there is an option to have a companion file to markdown to persist those cell outputs and attachments.

Thanks for your feedback. Externalising cell outputs and attachments (e.g. in companion files) is indeed a natural feature. During our discussions, various use cases and approaches emerged. For an incremental approach, and also because the feature could be relevant as well for traditional ipynb notebooks, we decided to propose to treat that feature in a followup JEP. See line 580 of: https://github.com/jupyter/enhancement-proposals/pull/103/files#diff-932448845fb9d55aef27789043a371eb872aa644507bf72e049f5ab536428238R580 With the current JEP, cell outputs and attachemnts can be stored inline only, or not at all.

echarles · 2023-04-16T15:14:31Z

in a followup JEP.

Well, I would feel more comfortable that this important topic be handled in this JEP to make sure all bits make sense. It can make sense to discuss them in separate forums, but giving my +1 on a partial solution which excludes difficult aspects is not appealing to me.

See line 580

oh yes, it was indeed excluded.

westurner · 2023-04-16T15:50:37Z

mhtml - ZIP Compressed HTML + assets with URLs rewritten in the resources, and thus different content hashes
https://github.com/WICG/webpackage#specifications
- Web Bundles
  Introducing the Web Bundles API
  
  A Web Bundle is a file format for encapsulating one or more HTTP resources in a single file. It can include one or more HTML files, JavaScript files, images, or stylesheets.
  
  Web Bundles, more formally known as Bundled HTTP Exchanges, are part of the Web Packaging proposal.
  
  [A figure demonstrating that a Web Bundle is a collection of web resources.]
  
  How Web Bundles work
  
  HTTP resources in a Web Bundle are indexed by request URLs, and can optionally come with signatures that vouch for the resources. Signatures allow browsers to understand and verify where each resource came from, and treats each as coming from its true origin. This is similar to how Signed HTTP Exchanges, a feature for signing a single HTTP resource, are handled.
  
  This article walks you through what a Web Bundle is and how to use one.
  
  Explaining Web Bundles
  
  To be precise, a Web Bundle is a CBOR file with a .wbn extension (by convention) which packages HTTP resources into a binary format, and is served with the application/webbundle MIME type. You can read more about this in the Top-level structure section of the spec draft.
  
  Web Bundles have multiple unique features:
  - Encapsulates multiple pages, enabling bundling of a complete website into a single file
  - Enables executable JavaScript, unlike MHTML
  - Uses HTTP Variants to do content negotiation, which enables internationalization with the Accept-Language header even if the bundle is used offline
  - Loads in the context of its origin when cryptographically signed by its publisher
  - Loads nearly instantly when served locally
  These features open multiple scenarios. One common scenario is the ability to build a self-contained web app that's easy to share and usable without an internet connection. [...]
(more notes at https://westurner.github.io/hnlog/#comment-29296573 )

.

Any new package format must support cryptographic signatures and ideally WoT identity
- W3C Verifiable Credentials
All of the resources in any new package SHOULD/MUST have URLs/URIs:
- W3C Web Annotations require stable URLs in order to share comments on resources with URIs
  - https://jupyterbook.org/en/stable/interactive/comments.html -> sphinx-comments https://sphinx-comments.readthedocs.io/en/latest/
    - Hypothes.is
    - Utterances
    - Dokie.li
    - "Help compare Comment and Annotation services: moderation, spam, notifications, configurability"
      https://github.com/orgs/executablebooks/discussions/102
Any new package format should support Linked Data bibliographic metadata:
Any new package format should have a declarative manifest with per-file hashes, a VC proof (~GPG .asc) and (bibliographic) metadata
Should this new package format specify dependency edges in any way?
- conda-forge, emscripten-forge
Should the .ipynb be the package manifest?

nthiery · 2023-04-16T15:54:06Z

Well, I would feel more comfortable that this important topic be handled in this JEP to make sure all bits make sense. It can make sense to discuss them in separate forums, but giving my +1 on a partial solution which excludes difficult aspects is not appealing to me.

Thanks for giving us the opportunity to detail and clarify our reasoning. In the use cases we had in mind, the feature did not look difficult, at least when it comes to the notebook format itself: one simple solution is to enable metadata for cell outputs and for attachements specifying that the data is not provided inline, but to be fetched from a given url. The feature is relevant for both Markdown and ipynb notebooks, and the above implementation does not depend on the format. Of course, that's not all there is to it to externalizing data -- like how you make sure, e.g., that companion files remain available or urls remain valid when the notebook is moved around -- but these difficulties are about tools and workflows, not the file format of the notebook. Does that sound adequate in the use cases you have in mind?

echarles · 2023-04-17T07:56:33Z

how you make sure, e.g., that companion files remain available or urls
remain valid when the notebook is moved around -- but these
difficulties are about tools and workflows, not the file format of the
notebook.

Keeping the companion file with its host is one aspect which is indeed not directly relevant to the file format.

My attention point was more about the cell id. With ipynb it a cell has id, input and output all together under a json stanza. It is easy to update them all at the same time. With a companion file, you completely loose that single structure and something on top needs to keep things in sync. Think to cell deletion, insertion, split...: al that will mutate the cell ids in ways that need to be reflected in the companion file. You will reply that this is also part of the tools and workflow, which I would agree, but I don't see in the format definition the concept of cell id (or code block id), nor the requirements that are put to the tooling developers to ensure users are safe while editing the content. In other words, this JEP should define that the proposed format will be indeed usable and will support companion files in any way.

willingc · 2023-04-27T13:40:48Z

I have mixed feelings on the format proposed for a few reasons:

The JEP should have a section on "How we communicate to the broader community" if the proposed changes are adopted. This is really important from a messaging standpoint for role of .ipynb format going forward.
While the technical merits seem appealing, will this open the door for further fragmentation of the .ipynb standard for notebooks? While it may not be the most modern approach now, it does, much like PDF (not an ideal technology), serve as a standard for notebook sharing.

fcollonval · 2023-06-12T15:38:06Z

We have started looking at this at the SSC meetings. We have decided to give at least another 2 weeks of discussion before moving forward.

allefeld · 2023-10-17T22:57:01Z

I think having a markdown-based alternative format for Jupyter notebooks is a great idea.

But supporting and slightly expanding on the interoperability issues @jgm raised: Just for simplicity's sake I would also suggest to as far as possible use or adapt an existing format, instead of introducing yet another variation.

Since a Quarto qmd file is already a functional alternative representation of a notebook (converted to ipynb for execution and back to md afterwards, including output cell contents), and it is already interoperable with Pandoc, why not build your solution on top of that?

In any case, I think it would be good to actively involve representatives of related projects in this process, e.g. Quarto's @cderv.

echarles · 2023-10-18T03:02:38Z

Since a Quarto qmd file is already a functional alternative representation of a notebook (converted to ipynb for execution and back to md afterwards, including output cell contents), and it is already interoperable with Pandoc, why not build your solution on top of that?

There as been mention of https://github.com/executablebooks/mystmd here, and I remember having seen public discussions between MyST and Quarto if I am not mistaken. What about targeting interoperability between ipynb and myst and then between myst and qmd?

Around ipynb interoperability, a general question is for me "How related/different would it be to https://github.com/mwouts/jupytext?"

cderv · 2023-10-18T10:03:55Z

are you saying that pandoc already supports the quarto code block syntax, which doesn't use class attributes and is close to the syntax already outlined in the JEP? or is this special handling of a language attribute by pandoc?

@stevejpurves @jgm Just chiming in to add some precision about this. The syntax of ```{python} is used for executable code blocks which support is brought by Quarto. #| echo: false inside the block (as on the screenshot shared) is a syntax for options to use for execution. So it is a specific Quarto syntax additional to Pandoc's code block syntax ```{.python} or ``` python, but compatible with the Markdown reader.

In Quarto, computation are handled before Pandoc conversion through engine, among them Jupyter engine. Results of computation stages will produce a .md intermediary file with Source Code Blocks and there results as Pandoc's Markdown syntax, to be process with Pandoc.

Hope it helps clarify. Happy to show more if needed.

cscheid · 2023-10-18T15:01:24Z

I suspect that's a documentation bug. Pandoc allows
``` {.python}
or
``` python
I believe the same is true of Quarto, because they don't use a customized pandoc, just filters on top.

Just to clarify a little bit more on the Quarto side: we switched to a custom Reader since (I believe) Pandoc 3. So we're no longer strictly "just filters on top", so that we wouldn't break backwards compatibility for the very common syntax

```{python}
code block
```

As @jgm pointed out, that is indeed not valid syntax for codeblock nodes in pure pandoc:

$ pandoc -f markdown -t native
```{python}
print("hello")
```
^D
[ Para [ Code ( "" , [] , [] ) "{python} print(\"hello\")" ]
]

But in quarto, you get this instead:

$ cat codeblock.qmd
---
engine: markdown # to avoid the execution of the code
---
```{python}
print("hello")
```
$ quarto render codeblock.qmd -t native -o -
pandoc -o /var/folders/nm/m64n9_z9307305n0xtzpp54m0000gn/T/quarto-sessionc91f1714/99369018/548c0fe7.native
  to: native
  standalone: true
  default-image-extension: png

Pandoc
  Meta { unMeta = fromList [] }
  [ CodeBlock ( "" , [ "{python}" ] , [] ) "print(\"hello\")"
  ]

If we request markdown output we don't get precisely the same codeblock, but it's close enough that it roundtrips correctly:

$ quarto render codeblock.qmd -t markdown -o -
pandoc -o /var/folders/nm/m64n9_z9307305n0xtzpp54m0000gn/T/quarto-sessiona858c56a/94c20cae/e83363f1.md
  to: markdown
  standalone: true
  default-image-extension: png

---
toc-title: Table of contents
---

``` {python}
print("hello")
```

minrk · 2023-10-18T15:02:03Z

I do in general think it would be better for everyone if we were to officially adopt (and potentially extend) an existing format, since there are at least three of these now, rather than define another new format for more text-friendly notebook serialization. I think a pretty strong case has to be made that none of these formats can be built on successfully before defining a new format, and I don't feel like that's been done. I'd start from what do myst/quarto/jupytext not do that we need, and how can we fill those gaps (if any) by building on those tools (or not).

allefeld · 2023-10-19T17:47:13Z

Sorry, I claimed that qmd is Pandoc-interoperable, which it is not exactly, the exception being executable code blocks.

I'm not involved in Quarto development, but I have taken part in discussions on Quarto, and from that I know that there are mid-term plans to implement the initial extraction of code also via Pandoc, which needs a custom reader. @cscheid, I'm not sure whether that custom reader would be identical to the one you mentioned as already being used now? Would that mean that through that custom reader Pandoc would take over the complete work of initial qmd → ipynb conversion, before calling NBClient? If yes, that might be a good starting point for something like qmd to take over the role of ipynb, i.e. clients supporting the new notebook format could use the same custom reader.

cscheid · 2023-10-19T19:08:04Z

I'm not involved in Quarto development, but I have taken part in discussions on Quarto, and from that I know that there are mid-term plans to implement the initial extraction of code also via Pandoc, which needs a custom reader.

I'm sorry - I'm not sure what you're referring to here.

westurner · 2023-10-19T19:43:13Z

A combination of MyST-Markdown (Jupyter-book (Sphinx)) and QMD (Quarto, nbdev) would be a great thing.

jupyter/nbformat does not and should not specify docutils or pandoc.

Additional criteria:

Output JSON-LD/YAML-LD from YAML-LD/JSON-LD (1) within the Markdown YAML front matter, and (2) from within syntax blocks in the document
- docutils .rst, MyST-Markdown .myst.md
  - "role" and "directive" declarations in the document
    - these are transformed to HTML, LaTeX by docutils
- quarto RenderScripts
  - these generate HTML on stdout (when then have the .qmd.py file extension)
Register a MIME type // file extension for opening these notebook documents (and their kernels given their kernel parameters)
- IIRC some OS'es will not recognize a second file extension component:
  assert os.splitext("example.fmt.md") == ("example.fmt", "md")
Build slides with e.g. jupyter-slides
- To make slides from a notebook, you add per-cell metadata to indicate whether it's a (with the Property Inspector on the Right Sidebar with desktop and currently mobile JupyterLab, for example)
- Create a jupyter-slides (name to be discussed ;-) organization to house several Jupyter slideshows projects damianavila/RISE#635
  - damianavila/RISE@bf59efa
  - https://github.com/jupyterlab-contrib/rise
  - https://github.com/deathbeds/jupyterlab-deck#slides
    - list of slide cell types
- What does cell metadata for slides look like with MyST?
  - https://myst-parser.readthedocs.io/en/latest/syntax/roles-and-directives.html
  - https://jupyterbook.org/en/stable/content/myst.html#more-arguments-and-metadata-in-directives
  - Markdown syntax for roles and directives executablebooks/MyST-Parser#63
    - MyST directives CodeBlock nodes normalized to Code nodes jgm/pandoc#7622
      - https://pandoc.org/MANUAL.html#jupyter-notebooks
      - https://pandoc.org/MANUAL.html#markdown-variants
        
        gfm (markdown_github)
        
        commonmark_x
        
        MyST Markdown (markdown_myst, myst)
        
        QMD Markdown (markdown_qmd, qmd)
      - https://pandoc.org/MANUAL.html#options-affecting-specific-writers-1
- What does cell metadata for slides look like with QMD?
  - https://quarto.org/docs/tools/jupyter-lab.html#output-options
  - divs and classes
    - like raw input cells, raw HTML CSS styles don't directly transform to other output formats like LaTeX (and markdown parsers configured to forbid raw HTML, like GitHub/GitLab/etc)
    - https://nbdev.fast.ai/tutorials/qmd_intro.html#divs-and-classes
    - https://quarto.org/docs/authoring/markdown-basics.html#divs-and-spans
- What does cell metadata to show or hide input or output cells look like with MyST/QMD?
  - Hiding or Removing input, output, and cell cells with Jupyter Book and MyST Markdown:
What does the syntax look like when rendered as plain Markdown or better by GitHub?
- https://github.com/github/markup#markups
- What does MysT Markdown look like when rendered e.g. on GitHub?
- What does QMD Markdown look like when rendered e.g. on GitHub?
- https://quarto.org/docs/output-formats/gfm.html
Does the syntax include cell ouput?
- Hide inputs, include outputs in Markdown format mwouts/jupytext#220

JupyterLab extensions:

Challenges / Opportunities:

Here's what the jupytext docs have for QMD:

Quarto¶

Quarto is a scientific and technical publishing system built on Pandoc. If you have quarto installed, Jupytext lets you edit .qmd documents as notebooks in Jupyter, and pair .ipynb notebooks with .qmd notebooks.

The conversion from .ipynb to .qmd and back directly calls quarto convert, and consequently requires an installation of Quarto v0.2.134 or higher.

Note that the round trip of .ipynb to .qmd to .ipynb has the effect of concatenating consecutive Markdown cells and turning raw cells into Markdown cells (since .qmd files represent all content as either Markdown or code cells).
- nbformat: https://github.com/jupyter/nbformat/tree/main/nbformat :
  - nbformat cell types:
    - raw
    - code
    - ~~rst~~ (~~nbsphinx~~, MyST Markdown)
    - markdown
  - nbformat output MIME types (IPython.display)
- https://github.com/jupyter/nbformat/network/dependents
  - Existing tool support for nbformat .ipynb jupyter notebook documents

allefeld · 2023-10-20T14:18:12Z

@cscheid:

I'm not involved in Quarto development, but I have taken part in discussions on Quarto, and from that I know that there are mid-term plans to implement the initial extraction of code also via Pandoc, which needs a custom reader.

I'm sorry - I'm not sure what you're referring to here.

I mean the discussion in quarto-dev/quarto-cli#3330:
"first pass of pandoc with a custom writer" quarto-dev/quarto-cli#3330 (reply in thread),
"we need to add an additional Pandoc pass that happens before engines" quarto-dev/quarto-cli#3330 (reply in thread),
use "initial Pandoc pass … not just for preprocessing, but to create the ipynb" quarto-dev/quarto-cli#3330 (reply in thread) by me, but supported by the following comment,
"Definitely agree that we need a parser." quarto-dev/quarto-cli#3330 (reply in thread).

cscheid · 2023-10-20T14:58:42Z

I apologize for further polluting this thread here, but I want to clarify a few points before further confusion sets in.

"first pass of pandoc with a custom writer" quarto-dev/quarto-cli#3330 (reply in thread),

Just to clarify for everyone: the user baptiste is not a quarto developer, and neither is allefeld, for other readers in here. Baptiste offering a suggestion and not one we're currently planing on implementing. My full reply was:

we already know we need to add an additional Pandoc pass that happens before engines (so that filters can add code cells that will be executed, and remove code cells as well)

My "remove code cells" comment is not about "extracting code cells" or the Pandoc syntax for code blocks. It is about the ability to identify executable code blocks for processing ahead of the execution engine.

cderv later says:

Definitely agree that we need a parser

In here, the context is that knitr eventually needs a parser in order to be able to detect and handle nested code cells, ultimately reducing the need for hacks like the multiple curly bracket treatment of code cells inside comments.

I appreciate the enthusiasm and energy to participate, but I'd just like to ask folks to try and refrain from stating or implying positions from quarto devs about the quarto project when they lack the appropriate context. If you need more clarification about the goals of the quarto project, please ask us quarto devs directly: that's me, cderv, dragonstyle, jjallaire, and rich-iannone. Thank you!

nthiery · 2023-10-20T19:27:56Z

Dear all,
I am so glad to see active discussions on this JEP! Thanks everyone for contributing.
I am under the water for a couple more days, but will provide feedback soon.

krassowski · 2023-10-20T22:54:48Z

From the standpoint of jupyter-lsp (which does not have an SSC representation), a format which enables encoding:

cell type
cell metadata
has a special type of a comment that we could use as anchors for tranclusions

would be amazing to enable jupyter-lsp/jupyterlab-lsp#467, quoting:

The Julynter experiment #378 demonstrated how notebook-specific IDE features could work. The ideas would include:

"empty cell" diagnostic

"cells execution not in order" diagnostic (obviously optional)

"cell with comments only could be a markdown cell" diagnostic

"remove empty cells" action

and many others (e.g. "ratio of markdown to code cells")

In order to make the language servers (optionally) cell-aware I propose we embrace the jupytext percent format:
# %% Optional title [cell type] key="value"
e.g. # %% for code cells and # %% [markdown] for markdown cell. We could store the cell execution number in metadata (key="value" part). We would allow the user to disable this feature.

Now, I am not advocating for any specific format, but it would be amazing if a future "go-to" Markdown format supported this kind of metadata in some way.

Note: for the most part such metadata should not be presented to the user, but it would still be valuable to have a way to achieve a full round trip from ipynb to markdown and back.

avli and others added 7 commits March 22, 2023 16:51

Add JEP for Markdown-based notebooks

76a7c57

Remove empty "Rationale" section

4939c72

Update markdown-based-notebooks.md

0182f1c

- small wording changes and typos - shortened and reduced the wordiness of one of the use cases

Replace "JSON blob" with "JSON object"

73354d5

Clarify note about empty leading thematic break

10027e9

Minor corrections

b741fa4

Update issue and pull request numbers

a0089e8

rowanc1 reviewed Mar 25, 2023

View reviewed changes

102-markdown-based-notebooks/markdown-based-notebooks.md Outdated Show resolved Hide resolved

westurner mentioned this pull request Mar 27, 2023

Hide inputs, include outputs in Markdown format mwouts/jupytext#220

Open

KathleenDollard reviewed Mar 28, 2023

View reviewed changes

Fixed typos spotted by reviewers

0f98181

tonyfast mentioned this pull request Apr 3, 2023

Add JEP for adding $schema to notebook format #97

Merged

33 tasks

krassowski mentioned this pull request Apr 16, 2023

Support interactivity for python source files (ipython's cell magics and live execution) jupyterlab/jupyterlab#14386

Open

11 tasks

westurner mentioned this pull request May 16, 2023

explore "Web Bundles"-like distribution jupyterlite/jupyterlite#1082

Open

fcollonval mentioned this pull request Jun 5, 2023

SSC meeting minutes 2023 jupyter/software-steering-council-team-compass#2

Open

echarles mentioned this pull request Jun 27, 2023

Adopting a text-based diagram syntax in Jupyter Markdown #101

Open

33 tasks

krassowski mentioned this pull request Oct 20, 2023

Break in compatibility with jupyterlab-lsp + python-lsp-ruff since 0.0.285 astral-sh/ruff#6847

Closed

ivanov mentioned this pull request Dec 16, 2023

Feature request: VCS, metadata friendly file format jupyter/jupyter#102

Closed

Zsailer mentioned this pull request Mar 4, 2024

A new notebook document format for improved workflow integration #4

Closed

Markdown based notebooks #103

Are you sure you want to change the base?

Markdown based notebooks #103

Conversation

avli commented Mar 25, 2023 • edited by fcollonval Loading

KathleenDollard Mar 28, 2023

Choose a reason for hiding this comment

KathleenDollard Mar 28, 2023

Choose a reason for hiding this comment

stevejpurves Mar 28, 2023

Choose a reason for hiding this comment

nthiery Mar 30, 2023

Choose a reason for hiding this comment

nthiery Mar 30, 2023

Choose a reason for hiding this comment

avli Apr 3, 2023

Choose a reason for hiding this comment

jgm commented Mar 28, 2023 • edited Loading

stevejpurves commented Mar 28, 2023

jgm commented Mar 28, 2023

jgm commented Mar 28, 2023

nthiery commented Mar 30, 2023

jonsequitur commented Mar 30, 2023

jgm commented Mar 30, 2023

nthiery commented Mar 30, 2023

stevejpurves commented Mar 30, 2023

jgm commented Mar 30, 2023

stevejpurves commented Mar 31, 2023

jgm commented Mar 31, 2023

krassowski commented Apr 9, 2023

westurner commented Apr 9, 2023

avli commented Apr 11, 2023

echarles commented Apr 15, 2023

nthiery commented Apr 16, 2023 via email

echarles commented Apr 16, 2023

westurner commented Apr 16, 2023

Introducing the Web Bundles API

How Web Bundles work

Explaining Web Bundles

nthiery commented Apr 16, 2023 via email

echarles commented Apr 17, 2023

willingc commented Apr 27, 2023

fcollonval commented Jun 12, 2023

allefeld commented Oct 17, 2023

echarles commented Oct 18, 2023

cderv commented Oct 18, 2023

cscheid commented Oct 18, 2023

minrk commented Oct 18, 2023

allefeld commented Oct 19, 2023

cscheid commented Oct 19, 2023

westurner commented Oct 19, 2023 • edited Loading

allefeld commented Oct 20, 2023

cscheid commented Oct 20, 2023

nthiery commented Oct 20, 2023

krassowski commented Oct 20, 2023

avli commented Mar 25, 2023 •

edited by fcollonval

Loading

jgm commented Mar 28, 2023 •

edited

Loading

westurner commented Oct 19, 2023 •

edited

Loading