Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mlir][docs] Update Bytecode documentation #99854

Merged
merged 1 commit into from
Aug 19, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 27 additions & 21 deletions mlir/docs/BytecodeFormat.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# MLIR Bytecode Format

This documents describes the MLIR bytecode format and its encoding.
This document describes the MLIR bytecode format and its encoding.
This format is versioned and stable: we don't plan to ever break
compatibility, that is a dialect should be able to deserialize and
older bytecode. Similarly, we support back-deployment we an older
version of the format can be targetted.
compatibility, that is a dialect should be able to deserialize any
older bytecode. Similarly, we support back-deployment so that an
older version of the format can be targetted.

That said, it is important to realize that the promises of the
bytecode format are made assuming immutable dialects: the format
Expand All @@ -19,7 +19,7 @@ information while decoding the input IR, and gives an opportunity
to each dialect for which a version is present to perform IR
upgrades post-parsing through the `upgradeFromVersion` method.
There is no restriction on what kind of information a dialect
is allowed to encode to model its versioning
is allowed to encode to model its versioning.

[TOC]

Expand Down Expand Up @@ -172,31 +172,37 @@ dialects that were also referenced.
```
dialect_section {
numDialects: varint,
dialectNames: varint[],
numTotalOpNames: varint,
opNames: op_name_group[]
dialectNames: dialect_name_group[],
opNames: dialect_ops_group[] // ops grouped by dialect
}

op_name_group {
dialect: varint // (dialectID << 1) | (hasVersion),
version : dialect_version_section
numOpNames: varint,
opNames: varint[]
dialect_name_group {
nameAndIsVersioned: varint // (dialectID << 1) | (hasVersion),
version: dialect_version_section // only if versioned
}

dialect_version_section {
size: varint,
version: byte[]
}

dialect_ops_group {
dialect: varint,
numOpNames: varint,
matthias-springer marked this conversation as resolved.
Show resolved Hide resolved
opNames: op_name_group[]
}

op_name_group {
nameAndIsRegistered: varint // (nameID << 1) | (isRegisteredOp)
}
```

Dialects are encoded as a `varint` containing the index to the name string
within the string section, plus a flag indicating whether the dialect is
versioned. Operation names are encoded in groups by dialect, with each group
containing the dialect, the number of operation names, and the array of indexes
to each name within the string section. The version is encoded as a nested
section.
section for each dialect.

### Attribute/Type Sections

Expand Down Expand Up @@ -249,19 +255,19 @@ its assembly format, or via a custom dialect defined encoding.

In the case where a dialect does not define a method for encoding the attribute
or type, the textual assembly format of that attribute or type is used as a
fallback. For example, a type of `!bytecode.type` would be encoded as the null
terminated string "!bytecode.type". This ensures that every attribute and type
may be encoded, even if the owning dialect has not yet opted in to a more
fallback. For example, a type `!bytecode.type<42>` would be encoded as the null
terminated string "!bytecode.type<42>". This ensures that every attribute and
type can be encoded, even if the owning dialect has not yet opted in to a more
efficient serialization.

TODO: We shouldn't redundantly encode the dialect name here, we should use a
reference to the parent dialect instead.

##### Dialect Defined Encoding

In addition to the assembly format fallback, dialects may also provide a custom
encoding for their attributes and types. Custom encodings are very beneficial in
that they are significantly smaller and faster to read and write.
As an alternative to the assembly format fallback, dialects may also provide a
custom encoding for their attributes and types. Custom encodings are very
beneficial in that they are significantly smaller and faster to read and write.

Dialects can opt-in to providing custom encodings by implementing the
`BytecodeDialectInterface`. This interface provides hooks, namely
Expand Down Expand Up @@ -377,7 +383,7 @@ uselist {

The encoding of an operation is important because this is generally the most
commonly appearing structure in the bytecode. A single encoding is used for
every type of operation. Given this prevelance, many of the fields of an
every type of operation. Given this prevalence, many of the fields of an
operation are optional. The `encodingMask` field is a bitmask which indicates
which of the components of the operation are present.

Expand Down
Loading