Skip to content

Commit

Permalink
Reorganize the "Source Code Representation" chapters (#2142)
Browse files Browse the repository at this point in the history
  • Loading branch information
BoxyUwU authored Nov 21, 2024
1 parent ffd9a44 commit 5f5e0b5
Show file tree
Hide file tree
Showing 6 changed files with 80 additions and 354 deletions.
28 changes: 15 additions & 13 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,10 +96,6 @@
# Source Code Representation

- [Prologue](./part-3-intro.md)
- [Command-line arguments](./cli.md)
- [rustc_driver and rustc_interface](./rustc-driver/intro.md)
- [Example: Type checking](./rustc-driver/interacting-with-the-ast.md)
- [Example: Getting diagnostics](./rustc-driver/getting-diagnostics.md)
- [Syntax and the AST](./syntax-intro.md)
- [Lexing and Parsing](./the-parser.md)
- [Macro expansion](./macro-expansion.md)
Expand All @@ -118,10 +114,22 @@
- [MIR construction](./mir/construction.md)
- [MIR visitor and traversal](./mir/visitor.md)
- [MIR queries and passes: getting the MIR](./mir/passes.md)
- [Identifiers in the Compiler](./identifiers.md)
- [Closure expansion](./closure.md)
- [Inline assembly](./asm.md)

# Supporting Infrastructure

- [Command-line arguments](./cli.md)
- [rustc_driver and rustc_interface](./rustc-driver/intro.md)
- [Example: Type checking](./rustc-driver/interacting-with-the-ast.md)
- [Example: Getting diagnostics](./rustc-driver/getting-diagnostics.md)
- [Errors and Lints](diagnostics.md)
- [Diagnostic and subdiagnostic structs](./diagnostics/diagnostic-structs.md)
- [Translation](./diagnostics/translation.md)
- [`LintStore`](./diagnostics/lintstore.md)
- [Error codes](./diagnostics/error-codes.md)
- [Diagnostic items](./diagnostics/diagnostic-items.md)
- [`ErrorGuaranteed`](./diagnostics/error-guaranteed.md)

# Analysis

- [Prologue](./part-4-intro.md)
Expand Down Expand Up @@ -190,13 +198,7 @@
- [Closure constraints](./borrow_check/region_inference/closure_constraints.md)
- [Error reporting](./borrow_check/region_inference/error_reporting.md)
- [Two-phase-borrows](./borrow_check/two_phase_borrows.md)
- [Errors and Lints](diagnostics.md)
- [Diagnostic and subdiagnostic structs](./diagnostics/diagnostic-structs.md)
- [Translation](./diagnostics/translation.md)
- [`LintStore`](./diagnostics/lintstore.md)
- [Error codes](./diagnostics/error-codes.md)
- [Diagnostic items](./diagnostics/diagnostic-items.md)
- [`ErrorGuaranteed`](./diagnostics/error-guaranteed.md)
- [Closure capture inference](./closure.md)
- [Async closures/"coroutine-closures"](coroutine-closures.md)

# MIR to Binaries
Expand Down
236 changes: 15 additions & 221 deletions src/asm.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,111 +54,26 @@ string parsing. The remaining options are mostly passed through to LLVM with lit

## AST

`InlineAsm` is represented as an expression in the AST:

```rust
pub struct InlineAsm {
pub template: Vec<InlineAsmTemplatePiece>,
pub template_strs: Box<[(Symbol, Option<Symbol>, Span)]>,
pub operands: Vec<(InlineAsmOperand, Span)>,
pub clobber_abi: Option<(Symbol, Span)>,
pub options: InlineAsmOptions,
pub line_spans: Vec<Span>,
}

pub enum InlineAsmRegOrRegClass {
Reg(Symbol),
RegClass(Symbol),
}

pub enum InlineAsmOperand {
In {
reg: InlineAsmRegOrRegClass,
expr: P<Expr>,
},
Out {
reg: InlineAsmRegOrRegClass,
late: bool,
expr: Option<P<Expr>>,
},
InOut {
reg: InlineAsmRegOrRegClass,
late: bool,
expr: P<Expr>,
},
SplitInOut {
reg: InlineAsmRegOrRegClass,
late: bool,
in_expr: P<Expr>,
out_expr: Option<P<Expr>>,
},
Const {
anon_const: AnonConst,
},
Sym {
expr: P<Expr>,
},
}
```
`InlineAsm` is represented as an expression in the AST with the [`ast::InlineAsm` type][inline_asm_ast].

The `asm!` macro is implemented in `rustc_builtin_macros` and outputs an `InlineAsm` AST node. The
template string is parsed using `fmt_macros`, positional and named operands are resolved to
explicit operand indices. Since target information is not available to macro invocations,
validation of the registers and register classes is deferred to AST lowering.

[inline_asm_ast]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_ast/ast/struct.InlineAsm.html

## HIR

`InlineAsm` is represented as an expression in the HIR:

```rust
pub struct InlineAsm<'hir> {
pub template: &'hir [InlineAsmTemplatePiece],
pub template_strs: &'hir [(Symbol, Option<Symbol>, Span)],
pub operands: &'hir [(InlineAsmOperand<'hir>, Span)],
pub options: InlineAsmOptions,
pub line_spans: &'hir [Span],
}

pub enum InlineAsmRegOrRegClass {
Reg(InlineAsmReg),
RegClass(InlineAsmRegClass),
}

pub enum InlineAsmOperand<'hir> {
In {
reg: InlineAsmRegOrRegClass,
expr: Expr<'hir>,
},
Out {
reg: InlineAsmRegOrRegClass,
late: bool,
expr: Option<Expr<'hir>>,
},
InOut {
reg: InlineAsmRegOrRegClass,
late: bool,
expr: Expr<'hir>,
},
SplitInOut {
reg: InlineAsmRegOrRegClass,
late: bool,
in_expr: Expr<'hir>,
out_expr: Option<Expr<'hir>>,
},
Const {
anon_const: AnonConst,
},
Sym {
expr: Expr<'hir>,
},
}
```
`InlineAsm` is represented as an expression in the HIR with the [`hir::InlineAsm` type][inline_asm_hir].

AST lowering is where `InlineAsmRegOrRegClass` is converted from `Symbol`s to an actual register or
register class. If any modifiers are specified for a template string placeholder, these are
validated against the set allowed for that operand type. Finally, explicit registers for inputs and
outputs are checked for conflicts (same register used for different operands).

[inline_asm_hir]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/struct.InlineAsm.html

## Type checking

Each register class has a whitelist of types that it may be used with. After the types of all
Expand All @@ -169,152 +84,29 @@ be used for an operand based on the type that was passed into it.

## THIR

`InlineAsm` is represented as an expression in the THIR:

```rust
crate enum ExprKind<'tcx> {
// [..]
InlineAsm {
template: &'tcx [InlineAsmTemplatePiece],
operands: Box<[InlineAsmOperand<'tcx>]>,
options: InlineAsmOptions,
line_spans: &'tcx [Span],
},
}
crate enum InlineAsmOperand<'tcx> {
In {
reg: InlineAsmRegOrRegClass,
expr: ExprId,
},
Out {
reg: InlineAsmRegOrRegClass,
late: bool,
expr: Option<ExprId>,
},
InOut {
reg: InlineAsmRegOrRegClass,
late: bool,
expr: ExprId,
},
SplitInOut {
reg: InlineAsmRegOrRegClass,
late: bool,
in_expr: ExprId,
out_expr: Option<ExprId>,
},
Const {
value: &'tcx Const<'tcx>,
span: Span,
},
SymFn {
expr: ExprId,
},
SymStatic {
def_id: DefId,
},
}
```
`InlineAsm` is represented as an expression in the THIR with the [`InlineAsmExpr` type][inline_asm_thir].

The only significant change compared to HIR is that `Sym` has been lowered to either a `SymFn`
whose `expr` is a `Literal` ZST of the `fn`, or a `SymStatic` which points to the `DefId` of a
`static`.

[inline_asm_thir]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/thir/struct.InlineAsmExpr.html

## MIR

`InlineAsm` is represented as a `Terminator` in the MIR:

```rust
pub enum TerminatorKind<'tcx> {
// [..]

/// Block ends with an inline assembly block. This is a terminator since
/// inline assembly is allowed to diverge.
InlineAsm {
/// The template for the inline assembly, with placeholders.
template: &'tcx [InlineAsmTemplatePiece],

/// The operands for the inline assembly, as `Operand`s or `Place`s.
operands: Vec<InlineAsmOperand<'tcx>>,

/// Miscellaneous options for the inline assembly.
options: InlineAsmOptions,

/// Source spans for each line of the inline assembly code. These are
/// used to map assembler errors back to the line in the source code.
line_spans: &'tcx [Span],

/// Destination block after the inline assembly returns, unless it is
/// diverging (InlineAsmOptions::NORETURN).
destination: Option<BasicBlock>,
},
}

pub enum InlineAsmOperand<'tcx> {
In {
reg: InlineAsmRegOrRegClass,
value: Operand<'tcx>,
},
Out {
reg: InlineAsmRegOrRegClass,
late: bool,
place: Option<Place<'tcx>>,
},
InOut {
reg: InlineAsmRegOrRegClass,
late: bool,
in_value: Operand<'tcx>,
out_place: Option<Place<'tcx>>,
},
Const {
value: Box<Constant<'tcx>>,
},
SymFn {
value: Box<Constant<'tcx>>,
},
SymStatic {
def_id: DefId,
},
}
```
`InlineAsm` is represented as a `Terminator` in the MIR with the [`TerminatorKind::InlineAsm` variant][inline_asm_mir]

As part of THIR lowering, `InOut` and `SplitInOut` operands are lowered to a split form with a
separate `in_value` and `out_place`.

Semantically, the `InlineAsm` terminator is similar to the `Call` terminator except that it has
multiple output places where a `Call` only has a single return place output.

[inline_asm_mir]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/mir/enum.TerminatorKind.html#variant.InlineAsm

## Codegen

Operands are lowered one more time before being passed to LLVM codegen:

```rust
pub enum InlineAsmOperandRef<'tcx, B: BackendTypes + ?Sized> {
In {
reg: InlineAsmRegOrRegClass,
value: OperandRef<'tcx, B::Value>,
},
Out {
reg: InlineAsmRegOrRegClass,
late: bool,
place: Option<PlaceRef<'tcx, B::Value>>,
},
InOut {
reg: InlineAsmRegOrRegClass,
late: bool,
in_value: OperandRef<'tcx, B::Value>,
out_place: Option<PlaceRef<'tcx, B::Value>>,
},
Const {
string: String,
},
SymFn {
instance: Instance<'tcx>,
},
SymStatic {
def_id: DefId,
},
}
```
Operands are lowered one more time before being passed to LLVM codegen, this is represented by the [`InlineAsmOperandRef` type][inline_asm_codegen] from `rustc_codegen_ssa`.

The operands are lowered to LLVM operands and constraint codes as follow:
- `out` and the output part of `inout` operands are added first, as required by LLVM. Late output
Expand All @@ -339,6 +131,8 @@ Note that LLVM is sometimes rather picky about what types it accepts for certain
so we sometimes need to insert conversions to/from a supported type. See the target-specific
ISelLowering.cpp files in LLVM for details of what types are supported for each register class.

[inline_asm_codegen]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/traits/enum.InlineAsmOperandRef.html

## Adding support for new architectures

Adding inline assembly support to an architecture is mostly a matter of defining the registers and
Expand Down
2 changes: 1 addition & 1 deletion src/closure.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Closure Expansion in rustc
# Closure Capture Inference

This section describes how rustc handles closures. Closures in Rust are
effectively "desugared" into structs that contain the values they use (or
Expand Down
36 changes: 29 additions & 7 deletions src/hir.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,18 +78,40 @@ the compiler a chance to observe that you accessed the data for

## Identifiers in the HIR

There are a bunch of different identifiers to refer to other nodes or definitions
in the HIR. In short:
- A [`DefId`] refers to a *definition* in any crate.
- A [`LocalDefId`] refers to a *definition* in the currently compiled crate.
- A [`HirId`] refers to *any node* in the HIR.
The HIR uses a bunch of different identifiers that coexist and serve different purposes.

For more detailed information, check out the [chapter on identifiers][ids].
- A [`DefId`], as the name suggests, identifies a particular definition, or top-level
item, in a given crate. It is composed of two parts: a [`CrateNum`] which identifies
the crate the definition comes from, and a [`DefIndex`] which identifies the definition
within the crate. Unlike [`HirId`]s, there isn't a [`DefId`] for every expression, which
makes them more stable across compilations.

- A [`LocalDefId`] is basically a [`DefId`] that is known to come from the current crate.
This allows us to drop the [`CrateNum`] part, and use the type system to ensure that
only local definitions are passed to functions that expect a local definition.

- A [`HirId`] uniquely identifies a node in the HIR of the current crate. It is composed
of two parts: an `owner` and a `local_id` that is unique within the `owner`. This
combination makes for more stable values which are helpful for incremental compilation.
Unlike [`DefId`]s, a [`HirId`] can refer to [fine-grained entities][Node] like expressions,
but stays local to the current crate.

- A [`BodyId`] identifies a HIR [`Body`] in the current crate. It is currently only
a wrapper around a [`HirId`]. For more info about HIR bodies, please refer to the
[HIR chapter][hir-bodies].

These identifiers can be converted into one another through the [HIR map][map].

[`DefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefId.html
[`LocalDefId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.LocalDefId.html
[`HirId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir_id/struct.HirId.html
[ids]: ./identifiers.md#in-the-hir
[`BodyId`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/struct.BodyId.html
[`CrateNum`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.CrateNum.html
[`DefIndex`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def_id/struct.DefIndex.html
[`Body`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/struct.Body.html
[hir-map]: ./hir.md#the-hir-map
[hir-bodies]: ./hir.md#hir-bodies
[map]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/hir/map/struct.Map.html

## The HIR Map

Expand Down
Loading

0 comments on commit 5f5e0b5

Please sign in to comment.