Skip to content

Commit

Permalink
ruff server: Support Jupyter Notebook (*.ipynb) files (#11206)
Browse files Browse the repository at this point in the history
## Summary

Closes #10858.

`ruff server` now supports `*.ipynb` (aka Jupyter Notebook) files.
Extensive internal changes have been made to facilitate this, which I've
done some work to contextualize with documentation and an pre-review
that highlights notable sections of the code.

`*.ipynb` cells should behave similarly to `*.py` documents, with one
major exception. The format command `ruff.applyFormat` will only apply
to the currently selected notebook cell - if you want to format an
entire notebook document, use `Format Notebook` from the VS Code context
menu.

## Test Plan

The VS Code extension does not yet have Jupyter Notebook support
enabled, so you'll first need to enable it manually. To do this,
checkout the `pre-release` branch and modify `src/common/server.ts` as
follows:

Before:
![Screenshot 2024-05-13 at 10 59
06 PM](https://github.com/astral-sh/ruff/assets/19577865/c6a3c604-c405-4968-b8a2-5d670de89172)

After:
![Screenshot 2024-05-13 at 10 58
24 PM](https://github.com/astral-sh/ruff/assets/19577865/94ab2e3d-0609-448d-9c8c-cd07c69a513b)

I recommend testing this PR with large, complicated notebook files. I
used notebook files from [this popular
repository](https://github.com/jakevdp/PythonDataScienceHandbook/tree/master/notebooks)
in my preliminary testing.

The main thing to test is ensuring that notebook cells behave the same
as Python documents, besides the aforementioned issue with
`ruff.applyFormat`. You should also test adding and deleting cells (in
particular, deleting all the code cells and ensure that doesn't break
anything), changing the kind of a cell (i.e. from markup -> code or vice
versa), and creating a new notebook file from scratch. Finally, you
should also test that source actions work as expected (and across the
entire notebook).

Note: `ruff.applyAutofix` and `ruff.applyOrganizeImports` are currently
broken for notebook files, and I suspect it has something to do with
#11248. Once this is fixed, I
will update the test plan accordingly.

---------

Co-authored-by: nolan <nolan.king90@gmail.com>
  • Loading branch information
snowsignal and nolanking90 authored May 21, 2024
1 parent 84531d1 commit b0731ef
Show file tree
Hide file tree
Showing 39 changed files with 1,582 additions and 620 deletions.
4 changes: 2 additions & 2 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ libc = { version = "0.2.153" }
libcst = { version = "1.1.0", default-features = false }
log = { version = "0.4.17" }
lsp-server = { version = "0.7.6" }
lsp-types = { version = "0.95.0", features = ["proposed"] }
lsp-types = { git="https://github.com/astral-sh/lsp-types.git", rev = "3512a9f", features = ["proposed"] }
matchit = { version = "0.8.1" }
memchr = { version = "2.7.1" }
mimalloc = { version = "0.1.39" }
Expand Down
2 changes: 1 addition & 1 deletion crates/ruff_notebook/src/cell.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ impl fmt::Display for SourceValue {

impl Cell {
/// Return the [`SourceValue`] of the cell.
pub(crate) fn source(&self) -> &SourceValue {
pub fn source(&self) -> &SourceValue {
match self {
Cell::Code(cell) => &cell.source,
Cell::Markdown(cell) => &cell.source,
Expand Down
8 changes: 7 additions & 1 deletion crates/ruff_notebook/src/notebook.rs
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ impl Notebook {
reader.read_exact(&mut buf).is_ok_and(|()| buf[0] == b'\n')
});
reader.rewind()?;
let mut raw_notebook: RawNotebook = match serde_json::from_reader(reader.by_ref()) {
let raw_notebook: RawNotebook = match serde_json::from_reader(reader.by_ref()) {
Ok(notebook) => notebook,
Err(err) => {
// Translate the error into a diagnostic
Expand All @@ -113,7 +113,13 @@ impl Notebook {
});
}
};
Self::from_raw_notebook(raw_notebook, trailing_newline)
}

pub fn from_raw_notebook(
mut raw_notebook: RawNotebook,
trailing_newline: bool,
) -> Result<Self, NotebookError> {
// v4 is what everybody uses
if raw_notebook.nbformat != 4 {
// bail because we should have already failed at the json schema stage
Expand Down
1 change: 1 addition & 0 deletions crates/ruff_server/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ ruff_python_codegen = { workspace = true }
ruff_python_formatter = { workspace = true }
ruff_python_index = { workspace = true }
ruff_python_parser = { workspace = true }
ruff_notebook = { path = "../ruff_notebook" }
ruff_source_file = { workspace = true }
ruff_text_size = { workspace = true }
ruff_workspace = { workspace = true }
Expand Down
58 changes: 52 additions & 6 deletions crates/ruff_server/src/edit.rs
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
//! Types and utilities for working with text, modifying source files, and `Ruff <-> LSP` type conversion.

mod document;
mod notebook;
mod range;
mod replacement;

use std::collections::HashMap;
use std::{collections::HashMap, path::PathBuf};

pub use document::Document;
pub(crate) use document::DocumentVersion;
pub use document::TextDocument;
use lsp_types::PositionEncodingKind;
pub(crate) use range::{RangeExt, ToRangeExt};
pub(crate) use notebook::NotebookDocument;
pub(crate) use range::{NotebookRange, RangeExt, ToRangeExt};
pub(crate) use replacement::Replacement;

use crate::session::ResolvedClientCapabilities;
use crate::{fix::Fixes, session::ResolvedClientCapabilities};

/// A convenient enumeration for supported text encodings. Can be converted to [`lsp_types::PositionEncodingKind`].
// Please maintain the order from least to greatest priority for the derived `Ord` impl.
Expand All @@ -29,6 +31,37 @@ pub enum PositionEncoding {
UTF8,
}

/// A unique document ID, derived from a URL passed as part of an LSP request.
/// This document ID can point to either be a standalone Python file, a full notebook, or a cell within a notebook.
#[derive(Clone, Debug)]
pub(crate) enum DocumentKey {
Notebook(PathBuf),
NotebookCell(lsp_types::Url),
Text(PathBuf),
}

impl DocumentKey {
/// Converts the key back into its original URL.
pub(crate) fn into_url(self) -> lsp_types::Url {
match self {
DocumentKey::NotebookCell(url) => url,
DocumentKey::Notebook(path) | DocumentKey::Text(path) => {
lsp_types::Url::from_file_path(path)
.expect("file path originally from URL should convert back to URL")
}
}
}
}

impl std::fmt::Display for DocumentKey {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
match self {
Self::NotebookCell(url) => url.fmt(f),
Self::Notebook(path) | Self::Text(path) => path.display().fmt(f),
}
}
}

/// Tracks multi-document edits to eventually merge into a `WorkspaceEdit`.
/// Compatible with clients that don't support `workspace.workspaceEdit.documentChanges`.
#[derive(Debug)]
Expand Down Expand Up @@ -72,13 +105,25 @@ impl WorkspaceEditTracker {
}
}

/// Sets a series of [`Fixes`] for a text or notebook document.
pub(crate) fn set_fixes_for_document(
&mut self,
fixes: Fixes,
version: DocumentVersion,
) -> crate::Result<()> {
for (uri, edits) in fixes {
self.set_edits_for_document(uri, version, edits)?;
}
Ok(())
}

/// Sets the edits made to a specific document. This should only be called
/// once for each document `uri`, and will fail if this is called for the same `uri`
/// multiple times.
pub(crate) fn set_edits_for_document(
&mut self,
uri: lsp_types::Url,
version: DocumentVersion,
_version: DocumentVersion,
edits: Vec<lsp_types::TextEdit>,
) -> crate::Result<()> {
match self {
Expand All @@ -94,7 +139,8 @@ impl WorkspaceEditTracker {
document_edits.push(lsp_types::TextDocumentEdit {
text_document: lsp_types::OptionalVersionedTextDocumentIdentifier {
uri,
version: Some(version),
// TODO(jane): Re-enable versioned edits after investigating whether it could work with notebook cells
version: None,
},
edits: edits.into_iter().map(lsp_types::OneOf::Left).collect(),
});
Expand Down
6 changes: 3 additions & 3 deletions crates/ruff_server/src/edit/document.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,10 @@ use super::RangeExt;

pub(crate) type DocumentVersion = i32;

/// The state for an individual document in the server. Stays up-to-date
/// The state of an individual document in the server. Stays up-to-date
/// with changes made by the user, including unsaved changes.
#[derive(Debug, Clone)]
pub struct Document {
pub struct TextDocument {
/// The string contents of the document.
contents: String,
/// A computed line index for the document. This should always reflect
Expand All @@ -22,7 +22,7 @@ pub struct Document {
version: DocumentVersion,
}

impl Document {
impl TextDocument {
pub fn new(contents: String, version: DocumentVersion) -> Self {
let index = LineIndex::from_source_text(&contents);
Self {
Expand Down
202 changes: 202 additions & 0 deletions crates/ruff_server/src/edit/notebook.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
use std::{collections::HashMap, hash::BuildHasherDefault};

use anyhow::Ok;
use lsp_types::{NotebookCellKind, Url};
use rustc_hash::FxHashMap;

use crate::{PositionEncoding, TextDocument};

use super::DocumentVersion;

pub(super) type CellId = usize;

/// The state of a notebook document in the server. Contains an array of cells whose
/// contents are internally represented by [`TextDocument`]s.
#[derive(Clone, Debug)]
pub(crate) struct NotebookDocument {
cells: Vec<NotebookCell>,
metadata: ruff_notebook::RawNotebookMetadata,
version: DocumentVersion,
// Used to quickly find the index of a cell for a given URL.
cell_index: FxHashMap<lsp_types::Url, CellId>,
}

/// A single cell within a notebook, which has text contents represented as a `TextDocument`.
#[derive(Clone, Debug)]
struct NotebookCell {
url: Url,
kind: NotebookCellKind,
document: TextDocument,
}

impl NotebookDocument {
pub(crate) fn new(
version: DocumentVersion,
cells: Vec<lsp_types::NotebookCell>,
metadata: serde_json::Map<String, serde_json::Value>,
cell_documents: Vec<lsp_types::TextDocumentItem>,
) -> crate::Result<Self> {
let mut cell_contents: FxHashMap<_, _> = cell_documents
.into_iter()
.map(|document| (document.uri, document.text))
.collect();

let cells: Vec<_> = cells
.into_iter()
.map(|cell| {
let contents = cell_contents.remove(&cell.document).unwrap_or_default();
NotebookCell::new(cell, contents, version)
})
.collect();

Ok(Self {
version,
cell_index: Self::make_cell_index(cells.as_slice()),
metadata: serde_json::from_value(serde_json::Value::Object(metadata))?,
cells,
})
}

/// Generates a pseudo-representation of a notebook that lacks per-cell metadata and contextual information
/// but should still work with Ruff's linter.
pub(crate) fn make_ruff_notebook(&self) -> ruff_notebook::Notebook {
let cells = self
.cells
.iter()
.map(|cell| match cell.kind {
NotebookCellKind::Code => ruff_notebook::Cell::Code(ruff_notebook::CodeCell {
execution_count: None,
id: None,
metadata: serde_json::Value::Null,
outputs: vec![],
source: ruff_notebook::SourceValue::String(
cell.document.contents().to_string(),
),
}),
NotebookCellKind::Markup => {
ruff_notebook::Cell::Markdown(ruff_notebook::MarkdownCell {
attachments: None,
id: None,
metadata: serde_json::Value::Null,
source: ruff_notebook::SourceValue::String(
cell.document.contents().to_string(),
),
})
}
})
.collect();
let raw_notebook = ruff_notebook::RawNotebook {
cells,
metadata: self.metadata.clone(),
nbformat: 4,
nbformat_minor: 5,
};

ruff_notebook::Notebook::from_raw_notebook(raw_notebook, false)
.unwrap_or_else(|err| panic!("Server notebook document could not be converted to Ruff's notebook document format: {err}"))
}

pub(crate) fn update(
&mut self,
cells: Option<lsp_types::NotebookDocumentCellChange>,
metadata_change: Option<serde_json::Map<String, serde_json::Value>>,
version: DocumentVersion,
encoding: PositionEncoding,
) -> crate::Result<()> {
self.version = version;

if let Some(lsp_types::NotebookDocumentCellChange {
structure,
data,
text_content,
}) = cells
{
if let Some(structure) = structure {
let start = structure.array.start as usize;
let delete = structure.array.delete_count as usize;
if delete > 0 {
for cell in self.cells.drain(start..start + delete) {
self.cell_index.remove(&cell.url);
}
}
for cell in structure.array.cells.into_iter().flatten().rev() {
self.cells
.insert(start, NotebookCell::new(cell, String::new(), version));
}

// register any new cells in the index and update existing ones that came after the insertion
for (i, cell) in self.cells.iter().enumerate().skip(start) {
self.cell_index.insert(cell.url.clone(), i);
}
}
if let Some(cell_data) = data {
for cell in cell_data {
if let Some(existing_cell) = self.cell_by_uri_mut(&cell.document) {
existing_cell.kind = cell.kind;
}
}
}
if let Some(content_changes) = text_content {
for content_change in content_changes {
if let Some(cell) = self.cell_by_uri_mut(&content_change.document.uri) {
cell.document
.apply_changes(content_change.changes, version, encoding);
}
}
}
}
if let Some(metadata_change) = metadata_change {
self.metadata = serde_json::from_value(serde_json::Value::Object(metadata_change))?;
}
Ok(())
}

/// Get the current version of the notebook document.
pub(crate) fn version(&self) -> DocumentVersion {
self.version
}

/// Get the URI for a cell by its index within the cell array.
pub(crate) fn cell_uri_by_index(&self, index: CellId) -> Option<&lsp_types::Url> {
self.cells.get(index).map(|cell| &cell.url)
}

/// Get the text document representing the contents of a cell by the cell URI.
pub(crate) fn cell_document_by_uri(&self, uri: &lsp_types::Url) -> Option<&TextDocument> {
self.cells
.get(*self.cell_index.get(uri)?)
.map(|cell| &cell.document)
}

/// Returns a list of cell URIs in the order they appear in the array.
pub(crate) fn urls(&self) -> impl Iterator<Item = &lsp_types::Url> {
self.cells.iter().map(|cell| &cell.url)
}

fn cell_by_uri_mut(&mut self, uri: &lsp_types::Url) -> Option<&mut NotebookCell> {
self.cells.get_mut(*self.cell_index.get(uri)?)
}

fn make_cell_index(cells: &[NotebookCell]) -> FxHashMap<lsp_types::Url, CellId> {
let mut index =
HashMap::with_capacity_and_hasher(cells.len(), BuildHasherDefault::default());
for (i, cell) in cells.iter().enumerate() {
index.insert(cell.url.clone(), i);
}
index
}
}

impl NotebookCell {
pub(crate) fn new(
cell: lsp_types::NotebookCell,
contents: String,
version: DocumentVersion,
) -> Self {
Self {
url: cell.document,
kind: cell.kind,
document: TextDocument::new(contents, version),
}
}
}
Loading

0 comments on commit b0731ef

Please sign in to comment.