Skip to content

Commit

Permalink
Rollup merge of #93065 - dtolnay:ringbuffer, r=lcnr
Browse files Browse the repository at this point in the history
Pretty printer algorithm revamp step 2

This PR follows #92923 as a second chunk of modernizations backported from https://github.com/dtolnay/prettyplease into rustc_ast_pretty.

I've broken this up into atomic commits that hopefully are sensible in isolation. At every commit, the pretty printer is compilable and has runtime behavior that is identical to before and after the PR. None of the refactoring so far changes behavior.

The general theme of this chunk of commits is: the logic in the old pretty printer is doing some very basic things (pushing and popping tokens on a ring buffer) but expressed in a too-low-level way that I found makes it quite complicated/subtle to reason about. There are a number of obvious invariants that are "almost true" -- things like `self.left == self.buf.offset` and `self.right == self.buf.offset + self.buf.data.len()` and `self.right_total == self.left_total + self.buf.data.sum()`. The reason these things are "almost true" is the implementation tends to put updating one side of the invariant unreasonably far apart from updating the other side, leaving the invariant broken while unrelated stuff happens in between. The following code from master is an example of this:

https://github.com/rust-lang/rust/blob/e5e2b0be26ea177527b60d355bd8f56cd473bd00/compiler/rustc_ast_pretty/src/pp.rs#L314-L317

In this code the `advance_right` is reserving an entry into which to write a next token on the right side of the ring buffer, the `check_stack` is doing something totally unrelated to the right boundary of the ring buffer, and the `scan_push` is actually writing the token we previously reserved space for. Much of what this PR is doing is rearranging code to shrink the amount of stuff in between when an invariant is broken to when it is restored, until the whole thing can be factored out into one indivisible method call on the RingBuffer type.

The end state of the PR is that we can entirely eliminate `self.left` (because it's now just equal to `self.buf.offset` always) and `self.right` (because it's equal to `self.buf.offset + self.buf.data.len()` always) and the whole `Token::Eof` state which used to be the value of tokens that have been reserved space for but not yet written.

I found without these changes the pretty printer implementation to be hard to reason about and I wasn't able to confidently introduce improvements like trailing commas in `prettyplease` until after this refactor. The logic here is 43 years old at this point (Graydon translated it as directly as possible from the 1979 pretty printing paper) and while there are advantages to following the paper as closely as possible, in `prettyplease` I decided if we're going to adapt the algorithm to work better for Rust syntax, it was worthwhile making it easier to follow than the original.
  • Loading branch information
matthiaskrgr authored Jan 19, 2022
2 parents 623791d + 4d3faae commit fe93f08
Show file tree
Hide file tree
Showing 3 changed files with 103 additions and 112 deletions.
158 changes: 61 additions & 97 deletions compiler/rustc_ast_pretty/src/pp.rs
Original file line number Diff line number Diff line change
Expand Up @@ -167,14 +167,9 @@ pub enum Token {
Break(BreakToken),
Begin(BeginToken),
End,
Eof,
}

impl Token {
crate fn is_eof(&self) -> bool {
matches!(self, Token::Eof)
}

pub fn is_hardbreak_tok(&self) -> bool {
matches!(self, Token::Break(BreakToken { offset: 0, blank_space: SIZE_INFINITY }))
}
Expand All @@ -187,7 +182,6 @@ impl fmt::Display for Token {
Token::Break(_) => f.write_str("BREAK"),
Token::Begin(_) => f.write_str("BEGIN"),
Token::End => f.write_str("END"),
Token::Eof => f.write_str("EOF"),
}
}
}
Expand All @@ -212,10 +206,6 @@ pub struct Printer {
margin: isize,
/// Number of spaces left on line
space: isize,
/// Index of left side of input stream
left: usize,
/// Index of right side of input stream
right: usize,
/// Ring-buffer of tokens and calculated sizes
buf: RingBuffer<BufEntry>,
/// Running size of stream "...left"
Expand All @@ -233,6 +223,9 @@ pub struct Printer {
print_stack: Vec<PrintStackElem>,
/// Buffered indentation to avoid writing trailing whitespace
pending_indentation: isize,
/// The token most recently popped from the left boundary of the
/// ring-buffer for printing
last_printed: Option<Token>,
}

#[derive(Clone)]
Expand All @@ -241,39 +234,34 @@ struct BufEntry {
size: isize,
}

impl Default for BufEntry {
fn default() -> Self {
BufEntry { token: Token::Eof, size: 0 }
}
}

impl Printer {
pub fn new() -> Self {
let linewidth = 78;
let mut buf = RingBuffer::new();
buf.advance_right();
Printer {
out: String::new(),
margin: linewidth as isize,
space: linewidth as isize,
left: 0,
right: 0,
buf,
buf: RingBuffer::new(),
left_total: 0,
right_total: 0,
scan_stack: VecDeque::new(),
print_stack: Vec::new(),
pending_indentation: 0,
last_printed: None,
}
}

pub fn last_token(&self) -> Token {
self.buf[self.right].token.clone()
pub fn last_token(&self) -> Option<&Token> {
self.last_token_still_buffered().or_else(|| self.last_printed.as_ref())
}

pub fn last_token_still_buffered(&self) -> Option<&Token> {
self.buf.last().map(|last| &last.token)
}

/// Be very careful with this!
pub fn replace_last_token(&mut self, t: Token) {
self.buf[self.right].token = t;
pub fn replace_last_token_still_buffered(&mut self, t: Token) {
self.buf.last_mut().unwrap().token = t;
}

fn scan_eof(&mut self) {
Expand All @@ -287,89 +275,63 @@ impl Printer {
if self.scan_stack.is_empty() {
self.left_total = 1;
self.right_total = 1;
self.right = self.left;
self.buf.truncate(1);
} else {
self.advance_right();
self.buf.clear();
}
self.scan_push(BufEntry { token: Token::Begin(b), size: -self.right_total });
let right = self.buf.push(BufEntry { token: Token::Begin(b), size: -self.right_total });
self.scan_stack.push_front(right);
}

fn scan_end(&mut self) {
if self.scan_stack.is_empty() {
self.print_end();
} else {
self.advance_right();
self.scan_push(BufEntry { token: Token::End, size: -1 });
let right = self.buf.push(BufEntry { token: Token::End, size: -1 });
self.scan_stack.push_front(right);
}
}

fn scan_break(&mut self, b: BreakToken) {
if self.scan_stack.is_empty() {
self.left_total = 1;
self.right_total = 1;
self.right = self.left;
self.buf.truncate(1);
self.buf.clear();
} else {
self.advance_right();
self.check_stack(0);
}
self.check_stack(0);
self.scan_push(BufEntry { token: Token::Break(b), size: -self.right_total });
let right = self.buf.push(BufEntry { token: Token::Break(b), size: -self.right_total });
self.scan_stack.push_front(right);
self.right_total += b.blank_space;
}

fn scan_string(&mut self, s: Cow<'static, str>) {
if self.scan_stack.is_empty() {
self.print_string(s);
self.print_string(&s);
} else {
self.advance_right();
let len = s.len() as isize;
self.buf[self.right] = BufEntry { token: Token::String(s), size: len };
self.buf.push(BufEntry { token: Token::String(s), size: len });
self.right_total += len;
self.check_stream();
}
}

fn check_stream(&mut self) {
if self.right_total - self.left_total > self.space {
if Some(&self.left) == self.scan_stack.back() {
let scanned = self.scan_pop_bottom();
self.buf[scanned].size = SIZE_INFINITY;
while self.right_total - self.left_total > self.space {
if *self.scan_stack.back().unwrap() == self.buf.index_of_first() {
self.scan_stack.pop_back().unwrap();
self.buf.first_mut().unwrap().size = SIZE_INFINITY;
}
self.advance_left();
if self.left != self.right {
self.check_stream();
if self.buf.is_empty() {
break;
}
}
}

fn scan_push(&mut self, entry: BufEntry) {
self.buf[self.right] = entry;
self.scan_stack.push_front(self.right);
}

fn scan_pop(&mut self) -> usize {
self.scan_stack.pop_front().unwrap()
}

fn scan_top(&self) -> usize {
*self.scan_stack.front().unwrap()
}

fn scan_pop_bottom(&mut self) -> usize {
self.scan_stack.pop_back().unwrap()
}

fn advance_right(&mut self) {
self.right += 1;
self.buf.advance_right();
}

fn advance_left(&mut self) {
let mut left_size = self.buf[self.left].size;
let mut left_size = self.buf.first().unwrap().size;

while left_size >= 0 {
let left = self.buf[self.left].token.clone();
let left = self.buf.first().unwrap().token.clone();

let len = match left {
Token::Break(b) => b.blank_space,
Expand All @@ -385,39 +347,38 @@ impl Printer {

self.left_total += len;

if self.left == self.right {
self.buf.advance_left();
if self.buf.is_empty() {
break;
}

self.buf.advance_left();
self.left += 1;

left_size = self.buf[self.left].size;
left_size = self.buf.first().unwrap().size;
}
}

fn check_stack(&mut self, k: usize) {
if !self.scan_stack.is_empty() {
let x = self.scan_top();
match self.buf[x].token {
fn check_stack(&mut self, mut k: usize) {
while let Some(&x) = self.scan_stack.front() {
let mut entry = &mut self.buf[x];
match entry.token {
Token::Begin(_) => {
if k > 0 {
self.scan_pop();
self.buf[x].size += self.right_total;
self.check_stack(k - 1);
if k == 0 {
break;
}
self.scan_stack.pop_front().unwrap();
entry.size += self.right_total;
k -= 1;
}
Token::End => {
// paper says + not =, but that makes no sense.
self.scan_pop();
self.buf[x].size = 1;
self.check_stack(k + 1);
self.scan_stack.pop_front().unwrap();
entry.size = 1;
k += 1;
}
_ => {
self.scan_pop();
self.buf[x].size += self.right_total;
if k > 0 {
self.check_stack(k);
self.scan_stack.pop_front().unwrap();
entry.size += self.right_total;
if k == 0 {
break;
}
}
}
Expand Down Expand Up @@ -477,7 +438,7 @@ impl Printer {
}
}

fn print_string(&mut self, s: Cow<'static, str>) {
fn print_string(&mut self, s: &str) {
let len = s.len() as isize;
// assert!(len <= space);
self.space -= len;
Expand All @@ -491,21 +452,21 @@ impl Printer {
self.out.reserve(self.pending_indentation as usize);
self.out.extend(std::iter::repeat(' ').take(self.pending_indentation as usize));
self.pending_indentation = 0;
self.out.push_str(&s);
self.out.push_str(s);
}

fn print(&mut self, token: Token, l: isize) {
match token {
Token::Begin(b) => self.print_begin(b, l),
match &token {
Token::Begin(b) => self.print_begin(*b, l),
Token::End => self.print_end(),
Token::Break(b) => self.print_break(b, l),
Token::Break(b) => self.print_break(*b, l),
Token::String(s) => {
let len = s.len() as isize;
assert_eq!(len, l);
self.print_string(s);
}
Token::Eof => panic!(), // Eof should never get here.
}
self.last_printed = Some(token);
}

// Convenience functions to talk to the printer.
Expand Down Expand Up @@ -560,7 +521,10 @@ impl Printer {
}

pub fn is_beginning_of_line(&self) -> bool {
self.last_token().is_eof() || self.last_token().is_hardbreak_tok()
match self.last_token() {
Some(last_token) => last_token.is_hardbreak_tok(),
None => true,
}
}

pub fn hardbreak_tok_offset(off: isize) -> Token {
Expand Down
37 changes: 30 additions & 7 deletions compiler/rustc_ast_pretty/src/pp/ring.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,20 +22,43 @@ impl<T> RingBuffer<T> {
RingBuffer { data: VecDeque::new(), offset: 0 }
}

pub fn advance_right(&mut self)
where
T: Default,
{
self.data.push_back(T::default());
pub fn is_empty(&self) -> bool {
self.data.is_empty()
}

pub fn push(&mut self, value: T) -> usize {
let index = self.offset + self.data.len();
self.data.push_back(value);
index
}

pub fn advance_left(&mut self) {
self.data.pop_front().unwrap();
self.offset += 1;
}

pub fn truncate(&mut self, len: usize) {
self.data.truncate(len);
pub fn clear(&mut self) {
self.data.clear();
}

pub fn index_of_first(&self) -> usize {
self.offset
}

pub fn first(&self) -> Option<&T> {
self.data.front()
}

pub fn first_mut(&mut self) -> Option<&mut T> {
self.data.front_mut()
}

pub fn last(&self) -> Option<&T> {
self.data.back()
}

pub fn last_mut(&mut self) -> Option<&mut T> {
self.data.back_mut()
}
}

Expand Down
20 changes: 12 additions & 8 deletions compiler/rustc_ast_pretty/src/pprust/state.rs
Original file line number Diff line number Diff line change
Expand Up @@ -328,9 +328,9 @@ pub trait PrintState<'a>: std::ops::Deref<Target = pp::Printer> + std::ops::Dere
CommentStyle::BlankLine => {
// We need to do at least one, possibly two hardbreaks.
let twice = match self.last_token() {
pp::Token::String(s) => ";" == s,
pp::Token::Begin(_) => true,
pp::Token::End => true,
Some(pp::Token::String(s)) => ";" == s,
Some(pp::Token::Begin(_)) => true,
Some(pp::Token::End) => true,
_ => false,
};
if twice {
Expand Down Expand Up @@ -686,11 +686,15 @@ pub trait PrintState<'a>: std::ops::Deref<Target = pp::Printer> + std::ops::Dere
fn break_offset_if_not_bol(&mut self, n: usize, off: isize) {
if !self.is_beginning_of_line() {
self.break_offset(n, off)
} else if off != 0 && self.last_token().is_hardbreak_tok() {
// We do something pretty sketchy here: tuck the nonzero
// offset-adjustment we were going to deposit along with the
// break into the previous hardbreak.
self.replace_last_token(pp::Printer::hardbreak_tok_offset(off));
} else if off != 0 {
if let Some(last_token) = self.last_token_still_buffered() {
if last_token.is_hardbreak_tok() {
// We do something pretty sketchy here: tuck the nonzero
// offset-adjustment we were going to deposit along with the
// break into the previous hardbreak.
self.replace_last_token_still_buffered(pp::Printer::hardbreak_tok_offset(off));
}
}
}
}

Expand Down

0 comments on commit fe93f08

Please sign in to comment.