Skip to content

Commit

Permalink
docs(contrib): Start guidelines for schema design
Browse files Browse the repository at this point in the history
This was inspired by a recent Cargo team discussion on whether we should
generally elide default values.
This will also help with https://rust-lang.github.io/rust-project-goals/2025h1/cargo-plumbing.html

Case studies in schema design:
- rust-lang#14506
- rust-lang#10543
  • Loading branch information
epage committed Jan 9, 2025
1 parent c3a2b1a commit b9cad3f
Show file tree
Hide file tree
Showing 2 changed files with 48 additions and 0 deletions.
1 change: 1 addition & 0 deletions src/doc/contrib/src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
- [Architecture](./implementation/architecture.md)
- [New packages](./implementation/packages.md)
- [New subcommands](./implementation/subcommands.md)
- [Data Schemas](./implementation/schemas.md)
- [Console Output](./implementation/console.md)
- [Filesystem](./implementation/filesystem.md)
- [Formatting](./implementation/formatting.md)
Expand Down
47 changes: 47 additions & 0 deletions src/doc/contrib/src/implementation/schemas.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Data Schemas

Cargo reads and writes user and machine facing data formats, like
- `Cargo.toml`, read and written on `cargo package`
- `Cargo.lock`, read and written
- `.cargo/config.toml`, read-only
- `cargo metadata` output
- `cargo build --message-format` output

## Schema Design

Generally,
- Fields should be kebab case
- `#[serde(rename_all = "kebab-case")]` should be applied defensively
- Fields should only be present when needed, saving space and parse time
- Also, we can always switch to always outputting the fields but its harder to stop outputting them
- `#[serde(skip_serializing_if = "Default::default")]` should be applied liberally
- For output, prefer [jsonlines](https://jsonlines.org/) as it allows streaming output and flexibility to mix content (e.g. adding diagnostics to output that didn't prevously have it
- `#[serde(deny_unknown_fields)]` should not be used to allow evolution of formats, including feature gating

## Schema Evolution Strategies

When changing a schema for data that is read, some options include:
- Adding new fields is relatively safe
- If the field must not be ignored when present,
have a transition period where it is invalid to use on stable Cargo before stabilizing it or
error if its used before supported within the schema version
(e.g. `edition` requires a minimum `package.rust-version`, if present)
- Adding new values to a field is relatively safe
- Unstable values should fail on stable Cargo
- Version the structure and interpretation of the data (e.g. the `edition` field or `package.resolver` which has an `edition` fallback)

Note: some formats that are read are also written back out
(e.g. `cargo package` generating a `Cargo.toml` file)
and those strategies need to be considered as well.

When changing a schema for data that is written, some options include:
- Add new fields if the presence can be ignored
- Infer permission from the users use of the new schema (e.g. a new alias for an `enum` variant)
- Version the structure and interpretation of the format
- Defaulting to the latest version with a warning that behavior may change (e.g. `cargo metadata --format-version`, `edition` in cargo script)
- Defaulting to the first version, eventually warning the user of the implicit stale behavior (e.g. `package.edition` in `Cargo.toml`)
- Without a default (e.g. `package.rust-version`, or a command-line flag like `--format-version`)

Note: While `serde` makes it easy to support data formats that add new fields,
new data types or supported values for a field are more difficult to future-proof
against.

0 comments on commit b9cad3f

Please sign in to comment.