ARCHITECTURE.md

# Architecture

This document gives you a bird-eye view of the architecture of Pavex.  
This is an ideal starting point if you want to contribute or gain a deeper understanding of its inner workings.

## How does Pavex work?

A Pavex project goes through three stages in order to generate runnable application code:

1. Build an instance of `Blueprint`, a representation of the desired application behaviour;
2. Serialize `Blueprint`, either to a file or in-memory;
3. Generate the application source code using `pavex_cli`, using the serialized `Blueprint` as input.

In a diagram:

```mermaid
flowchart TB
    subgraph A["Stage 1 / Define behaviour"]
        direction LR
        app_b[Blueprint] -->|Using| pavex_builder[pavex_builder]
    end

    subgraph B["Stage 2 / Serialize the blueprint"]
        direction LR
        app_b_json["app_blueprint.ron"]
    end

    A -->|Serialized to| B

    B -->|Input file for pavex_cli| C
    app_crate[Application library crate] -->|Using| pavex_runtime[pavex_runtime]


    subgraph C["Stage 3 / Generate application source code"]
        direction LR
        app_crate
        pavex_runtime
    end

    C -->|Consumed by| app_binary[Application binary]
    C -->|Consumed by| tests[Black-box tests]
```

As you can see in the diagram, the Pavex project is actually underpinned by three user-facing components:

- `pavex_builder`, where `Blueprint` lives;
- `pavex_runtime`, the "typical" web framework;
- `pavex_cli`, the transpiler.

### `pavex_runtime`

You can put `pavex_runtime` in the same bucket of `axum` or `actix-web`: it exposes the types and abstractions that are
needed at runtime to handle incoming requests.

You will see `pavex_runtime` in two contexts:

- in the signature and implementations of route handlers and type constructors, written by application developers;
- in the source code generated by `pavex_cli`.

```rust
use pavex_runtime::{Body, Response};

// A request handler, returning a response as output.
// The response type (and its body type) live in `pavex_runtime`.
pub fn stream_file(
    inner: std::path::PathBuf,
    http_client: reqwest::Client,
) -> Response<Body> { /* */ }
```

`pavex_runtime` is currently very barebone: it re-exports types from `hyper`, `http` and `matchit` (our HTTP router
of choice) without too many ceremonies.
It will polished down the line, once the bulk of the work on `pavex_cli` is complete.

### `pavex_builder`

`pavex_builder` is the interface used to craft a `Blueprint` - a specification of how the application is supposed to
behave.

```rust
use pavex_builder::{f, Blueprint, Lifecycle};

/// The blueprint for our application.
/// It lists all its routes and provides constructors for all the types
/// that will be needed to invoke `stream_file`, our request handler.
///
/// This will be turned into a ready-to-run web server by `pavex_cli`.
pub fn blueprint() -> Blueprint {
    Blueprint::new()
        .constructor(f!(crate::load_configuration), Lifecycle::Singleton)
        .constructor(f!(crate::http_client), Lifecycle::Singleton)
        .constructor(f!(crate::extract_path), Lifecycle::RequestScoped)
        .constructor(f!(crate::logger), Lifecycle::Transient)
        .route(GET, "/home", f!(crate::stream_file))
}
```

A `Blueprint` captures two types of information:

- route handlers (e.g. use `my_handler` for all incoming `/home` requests);
- type constructors (e.g. use `my_constructor` every time you need to build an instance of type `MyType`).

For each type constructor, the developer must specify the lifecycle of its output type:

- _singleton_ - an instance is built once before, the application starts, and re-used for all incoming requests;
- _request-scoped_ - a new instance is built for every incoming request and re-used throughout the handling of that
  specific request;
- _transient_ - a new instance is built every time the type is needed, potentially multiple times for each incoming
  request.

All this information is encoded into a `Blueprint` and passed as input to `pavex_cli` to generate the application's
source code.

### `pavex_cli` and Pavex

`pavex_cli` is our transpiler, the component in charge of transforming a `Blueprint` into a ready-to-run web
server.  
It is packaged as a binary, a thin wrapper over the (internal) Pavex crate.

The transpiler is where most of the complexity lives.  
It must generate:

- a struct representing the application state;
- a function to build an instance of the application state, ahead of launching the web server;
- a function to build the HTTP router;
- a dispatch function (built on top of the HTTP router) to dispatch incoming requests to the correct handlers;
- for each route, a function that takes as input the server state and the incoming request while returning an HTTP
  response as output.

What is `pavex_cli` getting as input?  
Something that looks like this:

```text
(
    constructors: [
        (
            registered_at: "app",
            import_path: "crate :: http_client",
        ),
        (
            registered_at: "app",
            import_path: "crate :: extract_path",
        ),
        (
            registered_at: "app",
            import_path: "crate :: logger",
        ),
    ],
    handlers: [
        (
            registered_at: "app",
            import_path: "crate :: stream_file",
        ),
    ],
    component_lifecycles: {
        (
            registered_at: "app",
            import_path: "crate :: http_client",
        ): Singleton,
        (
            registered_at: "app",
            import_path: "crate :: extract_path",
        ): RequestScoped,
        (
            registered_at: "app",
            import_path: "crate :: logger",
        ): Transient,
    },
    router: {
        "/home": (
            registered_at: "app",
            import_path: "crate :: stream_file",
        ),
    },
    handler_locations: { /* */ },
    constructor_locations: { /* */ }
)
```

We have the raw path of the functions and methods registered by the developer. We need to turn this into working source
code!

To make this happen, we need to turn those strings into structured metadata.  
For each of those functions and methods, we want to know:

- their input parameters;
- their output type.

But Rust does not have reflection, nor at compile-time nor at runtime!  
Luckily enough, there is a feature currently baking in `nightly` that, if you squint hard enough, looks like
reflection: `rustdoc`'s JSON output.

Using

```bash
cargo +nightly rustdoc -p library_name --lib -- -Zunstable-options -wjson
```

You can get a structured representation of all the types in `library_name`.  
This is what Pavex does: for each registered route handler and constructor, it builds the documentation for the crate
it belongs to and extracts the relevant bits of information from `rustdoc`'s output.

If you are going through the source code, this is the process that converts a `RawCallableIdentifiers` into a `Callable`
, with `ResolvedPath` as an intermediate step.

`Callable` looks like this:

```rust
struct Callable {
    pub output_fq_path: ResolvedType,
    pub callable_fq_path: ResolvedPath,
    pub inputs: Vec<ResolvedType>,
}

pub struct ResolvedType {
    pub package_id: PackageId,
    pub base_type: Vec<String>,
    pub generic_arguments: Vec<ResolvedType>,
}
```

After this phase, we have a collection of `Callable` instances representing our constructors and handlers.  
It's a puzzle that we need to solve, starting from the handlers: how do we build instances of the types that they take
as inputs?

The framework machinery, as we discussed before, provides the request processing pipeline with two types out of the box:
the incoming request and the application state.  
The constructors registered by the developer can then be used to _transform_ those types and/or _extract_ information
out of them.

For each handler, we try to build a **dependency graph**: we go through the input types of the request handler function
and check if we have a corresponding constructor that returns an instance of that type; if we do, we then recursively
look at the constructor signature to find out what types _the constructor_ needs as inputs; we recurse further, until we
have everything mapped out as a graph with graph edges used to keep track of the "is needed to build" relationship.

To put in an image, we want to build something like this for each route:

```mermaid
flowchart TB
    handler["app::stream_file(std::path::Pathbuf, app::Logger, reqwest::Client)"]
    client[reqwest::Client]
    logger[app::Logger]
    config[app::Config]
    path[std::path::PathBuf]
    request[http::request::Request]

    config --> client
    client --> handler
    logger --> handler
    path --> handler
    request --> path
```

This information is encoded in the `CallableDependencyGraph` struct.  
At this point, we are only looking at types and signatures: we are not taking into account the _lifecycle_ of those
types.  
E.g. is `reqwest::Client` a singleton that needs to be built once and reused? Or a transient type, that must be build
from scratch every time it is needed?

By taking into account these additional pieces of information, we build a `HandlerCallGraph` for each handler function,
starting from its respective `CallableDependencyGraph`. It looks somewhat like this:

```mermaid
flowchart TB
    handler["app::stream_file(std::path::Pathbuf, app::Logger, reqwest::Client)"]
    client[reqwest::Client]
    logger[app::Logger]
    state[ServerState]
    path[std::path::PathBuf]
    request[http::request::Request]

    state --> client
    client --> handler
    logger --> handler
    path --> handler
    request --> path
```

You can spot how `reqwest::Client` is now fetched from `app::ServerState` instead of being built from scratch
from `app::Config`.

Armed with this representation, Pavex can now generate the source code for the application library crate.  
Using the same example, assuming the application has a single route, we get the following code:

```rust
use pavex_runtime::routing::Router;
use pavex_runtime::hyper::server::{Builder, conn::AddrIncoming};

struct ServerState {
    router: Router<u32>,
    application_state: ApplicationState,
}

pub struct ApplicationState {
    s0: app::HttpClient,
}

/// The entrypoint to build the application state, a pre-requisite to launching the web server.
pub fn build_application_state(v0: app::Config) -> crate::ApplicationState {
    // [...]
}

/// The entrypoint to launch the web server.
pub async fn run(
    server_builder: Builder<AddrIncoming>,
    application_state: ApplicationState,
) -> Result<(), anyhow::Error> {
    // [...]
}

fn route_request(
    request: pavex_runtime::http::Request<pavex_runtime::hyper::body::Body>,
    server_state: std::sync::Arc<ServerState>,
) -> pavex_runtime::http::Response<pavex_runtime::hyper::body::Body> {
    let route_id = server_state
        .router
        .at(request.uri().path())
        .expect("Failed to match incoming request path");
    match route_id.value {
        0u32 => route_handler_0(server_state.application_state.s0.clone(), request),
        _ => panic!("This is a bug, no route registered for a route id"),
    }
}

pub fn route_handler_0(
    v0: app::HttpClient,
    v1: http::request::Request<hyper::body::Body>,
) -> http::response::Response<hyper::body::Body> {
    let v2 = app::extract_path(v1);
    let v3 = app::logger();
    app::stream_file(v2, v3, v0)
}
```

## Issues, limitations and risks

This section focuses on issues, limitations and risks that sit outside the Pavex project itself: obstacles that we
cannot remove on our own, but require coordination/collaboration with other projects.

Each risk is classified over two dimensions: impact and resolution likelihood.

For impact, we use the following emojis:

- 😭, severe impact on the developer experience/viability of the project;
- 😢, medium impact on the developer experience/viability of the project.

For resolution likelihood, we use the following emojis:

- 🔴, unlikely to be remediated on a medium time-horizon (>6 months, <2 years);
- 🟡, likely to be remediated on a medium time-horizon.

We do not care about the short term since Pavex itself still requires tons of work to be viable and it's unlikely to
be ready for prime time in less than 6 months.

### `rustdoc`'s JSON output is unstable (🟡😢)

`rustdoc`'s JSON output requires the `nightly` compiler.  
This is not a showstopper for production usage of Pavex since `nightly` is never used to compile
any code that is actually run at runtime, it is only used by the "reflection engine". Nonetheless, `nightly` can cause
breakage and unnecessary disruption due to its instability. `rustdoc`'s JSON output itself is quickly evolving,
including breaking changes that we must keep up with.

_Remediations_:

- Sit and wait. `rustdoc`'s JSON output is likely to be stabilised, therefore we will be able to drop `nightly` not too
  far into the future.

### `rustdoc` is slow (🔴😢)

Generating the JSON representation of `rustdoc`'s output takes time, especially if we need to generate it for several
crates in the dependency tree.

_Remediations_:

- The idea of hosting the JSON version of a crate's docs has
  been [floated around](https://github.com/rust-lang/docs.rs/issues/1285). This would allow us to download the rendered
  JSON instead of having to build it every time from scratch.
- `rustdoc`'s JSON output for third-party dependencies is highly cacheable given the dependency version and the set of
  activated features. Even if `docs.rs` chooses not to host the JSON output, other easy-to-run caching schemes can be
  devised (e.g. a private ready-to-go centralised cache to be hosted by an organization or a team in their private
  network).

### `pavex_cli` cannot be run from a build script (🔴😭)

Due to `cargo`'s very coarse locking scheme, it is not possible to invoke `cargo` itself from a `build.rs` script (
see [tracking issue](https://github.com/rust-lang/cargo/issues/6412)).  
Pavex relies on `cargo` commands to:

- build `rustdoc`'s JSON output for local and third-party crates;
- analyze the dependency tree (via `guppy` which in turn relies on `cargo metadata`);
- find the workspace root (via `guppy` which in turn relies on `cargo metadata`).

There seems to be no active effort to remove this limitation.

_Remediations_:

Pavex will rely on [`cargo-px`](https://github.com/LukeMathWalker/cargo-px) for code generation.