-
Notifications
You must be signed in to change notification settings - Fork 64
/
Copy pathARCHITECTURE.md
397 lines (309 loc) · 14.2 KB
/
ARCHITECTURE.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
# Architecture
This document gives you a bird-eye view of the architecture of Pavex.
This is an ideal starting point if you want to contribute or gain a deeper understanding of its inner workings.
## How does Pavex work?
A Pavex project goes through three stages in order to generate runnable application code:
1. Build an instance of `Blueprint`, a representation of the desired application behaviour;
2. Serialize `Blueprint`, either to a file or in-memory;
3. Generate the application source code using `pavex_cli`, using the serialized `Blueprint` as input.
In a diagram:
```mermaid
flowchart TB
subgraph A["Stage 1 / Define behaviour"]
direction LR
app_b[Blueprint] -->|Using| pavex_builder[pavex_builder]
end
subgraph B["Stage 2 / Serialize the blueprint"]
direction LR
app_b_json["app_blueprint.ron"]
end
A -->|Serialized to| B
B -->|Input file for pavex_cli| C
app_crate[Application library crate] -->|Using| pavex_runtime[pavex_runtime]
subgraph C["Stage 3 / Generate application source code"]
direction LR
app_crate
pavex_runtime
end
C -->|Consumed by| app_binary[Application binary]
C -->|Consumed by| tests[Black-box tests]
```
As you can see in the diagram, the Pavex project is actually underpinned by three user-facing components:
- `pavex_builder`, where `Blueprint` lives;
- `pavex_runtime`, the "typical" web framework;
- `pavex_cli`, the transpiler.
### `pavex_runtime`
You can put `pavex_runtime` in the same bucket of `axum` or `actix-web`: it exposes the types and abstractions that are
needed at runtime to handle incoming requests.
You will see `pavex_runtime` in two contexts:
- in the signature and implementations of route handlers and type constructors, written by application developers;
- in the source code generated by `pavex_cli`.
```rust
use pavex_runtime::{Body, Response};
// A request handler, returning a response as output.
// The response type (and its body type) live in `pavex_runtime`.
pub fn stream_file(
inner: std::path::PathBuf,
http_client: reqwest::Client,
) -> Response<Body> { /* */ }
```
`pavex_runtime` is currently very barebone: it re-exports types from `hyper`, `http` and `matchit` (our HTTP router
of choice) without too many ceremonies.
It will polished down the line, once the bulk of the work on `pavex_cli` is complete.
### `pavex_builder`
`pavex_builder` is the interface used to craft a `Blueprint` - a specification of how the application is supposed to
behave.
```rust
use pavex_builder::{f, Blueprint, Lifecycle};
/// The blueprint for our application.
/// It lists all its routes and provides constructors for all the types
/// that will be needed to invoke `stream_file`, our request handler.
///
/// This will be turned into a ready-to-run web server by `pavex_cli`.
pub fn blueprint() -> Blueprint {
Blueprint::new()
.constructor(f!(crate::load_configuration), Lifecycle::Singleton)
.constructor(f!(crate::http_client), Lifecycle::Singleton)
.constructor(f!(crate::extract_path), Lifecycle::RequestScoped)
.constructor(f!(crate::logger), Lifecycle::Transient)
.route(GET, "/home", f!(crate::stream_file))
}
```
A `Blueprint` captures two types of information:
- route handlers (e.g. use `my_handler` for all incoming `/home` requests);
- type constructors (e.g. use `my_constructor` every time you need to build an instance of type `MyType`).
For each type constructor, the developer must specify the lifecycle of its output type:
- _singleton_ - an instance is built once before, the application starts, and re-used for all incoming requests;
- _request-scoped_ - a new instance is built for every incoming request and re-used throughout the handling of that
specific request;
- _transient_ - a new instance is built every time the type is needed, potentially multiple times for each incoming
request.
All this information is encoded into a `Blueprint` and passed as input to `pavex_cli` to generate the application's
source code.
### `pavex_cli` and Pavex
`pavex_cli` is our transpiler, the component in charge of transforming a `Blueprint` into a ready-to-run web
server.
It is packaged as a binary, a thin wrapper over the (internal) Pavex crate.
The transpiler is where most of the complexity lives.
It must generate:
- a struct representing the application state;
- a function to build an instance of the application state, ahead of launching the web server;
- a function to build the HTTP router;
- a dispatch function (built on top of the HTTP router) to dispatch incoming requests to the correct handlers;
- for each route, a function that takes as input the server state and the incoming request while returning an HTTP
response as output.
What is `pavex_cli` getting as input?
Something that looks like this:
```text
(
constructors: [
(
registered_at: "app",
import_path: "crate :: http_client",
),
(
registered_at: "app",
import_path: "crate :: extract_path",
),
(
registered_at: "app",
import_path: "crate :: logger",
),
],
handlers: [
(
registered_at: "app",
import_path: "crate :: stream_file",
),
],
component_lifecycles: {
(
registered_at: "app",
import_path: "crate :: http_client",
): Singleton,
(
registered_at: "app",
import_path: "crate :: extract_path",
): RequestScoped,
(
registered_at: "app",
import_path: "crate :: logger",
): Transient,
},
router: {
"/home": (
registered_at: "app",
import_path: "crate :: stream_file",
),
},
handler_locations: { /* */ },
constructor_locations: { /* */ }
)
```
We have the raw path of the functions and methods registered by the developer. We need to turn this into working source
code!
To make this happen, we need to turn those strings into structured metadata.
For each of those functions and methods, we want to know:
- their input parameters;
- their output type.
But Rust does not have reflection, nor at compile-time nor at runtime!
Luckily enough, there is a feature currently baking in `nightly` that, if you squint hard enough, looks like
reflection: `rustdoc`'s JSON output.
Using
```bash
cargo +nightly rustdoc -p library_name --lib -- -Zunstable-options -wjson
```
You can get a structured representation of all the types in `library_name`.
This is what Pavex does: for each registered route handler and constructor, it builds the documentation for the crate
it belongs to and extracts the relevant bits of information from `rustdoc`'s output.
If you are going through the source code, this is the process that converts a `RawCallableIdentifiers` into a `Callable`
, with `ResolvedPath` as an intermediate step.
`Callable` looks like this:
```rust
struct Callable {
pub output_fq_path: ResolvedType,
pub callable_fq_path: ResolvedPath,
pub inputs: Vec<ResolvedType>,
}
pub struct ResolvedType {
pub package_id: PackageId,
pub base_type: Vec<String>,
pub generic_arguments: Vec<ResolvedType>,
}
```
After this phase, we have a collection of `Callable` instances representing our constructors and handlers.
It's a puzzle that we need to solve, starting from the handlers: how do we build instances of the types that they take
as inputs?
The framework machinery, as we discussed before, provides the request processing pipeline with two types out of the box:
the incoming request and the application state.
The constructors registered by the developer can then be used to _transform_ those types and/or _extract_ information
out of them.
For each handler, we try to build a **dependency graph**: we go through the input types of the request handler function
and check if we have a corresponding constructor that returns an instance of that type; if we do, we then recursively
look at the constructor signature to find out what types _the constructor_ needs as inputs; we recurse further, until we
have everything mapped out as a graph with graph edges used to keep track of the "is needed to build" relationship.
To put in an image, we want to build something like this for each route:
```mermaid
flowchart TB
handler["app::stream_file(std::path::Pathbuf, app::Logger, reqwest::Client)"]
client[reqwest::Client]
logger[app::Logger]
config[app::Config]
path[std::path::PathBuf]
request[http::request::Request]
config --> client
client --> handler
logger --> handler
path --> handler
request --> path
```
This information is encoded in the `CallableDependencyGraph` struct.
At this point, we are only looking at types and signatures: we are not taking into account the _lifecycle_ of those
types.
E.g. is `reqwest::Client` a singleton that needs to be built once and reused? Or a transient type, that must be build
from scratch every time it is needed?
By taking into account these additional pieces of information, we build a `HandlerCallGraph` for each handler function,
starting from its respective `CallableDependencyGraph`. It looks somewhat like this:
```mermaid
flowchart TB
handler["app::stream_file(std::path::Pathbuf, app::Logger, reqwest::Client)"]
client[reqwest::Client]
logger[app::Logger]
state[ServerState]
path[std::path::PathBuf]
request[http::request::Request]
state --> client
client --> handler
logger --> handler
path --> handler
request --> path
```
You can spot how `reqwest::Client` is now fetched from `app::ServerState` instead of being built from scratch
from `app::Config`.
Armed with this representation, Pavex can now generate the source code for the application library crate.
Using the same example, assuming the application has a single route, we get the following code:
```rust
use pavex_runtime::routing::Router;
use pavex_runtime::hyper::server::{Builder, conn::AddrIncoming};
struct ServerState {
router: Router<u32>,
application_state: ApplicationState,
}
pub struct ApplicationState {
s0: app::HttpClient,
}
/// The entrypoint to build the application state, a pre-requisite to launching the web server.
pub fn build_application_state(v0: app::Config) -> crate::ApplicationState {
// [...]
}
/// The entrypoint to launch the web server.
pub async fn run(
server_builder: Builder<AddrIncoming>,
application_state: ApplicationState,
) -> Result<(), anyhow::Error> {
// [...]
}
fn route_request(
request: pavex_runtime::http::Request<pavex_runtime::hyper::body::Body>,
server_state: std::sync::Arc<ServerState>,
) -> pavex_runtime::http::Response<pavex_runtime::hyper::body::Body> {
let route_id = server_state
.router
.at(request.uri().path())
.expect("Failed to match incoming request path");
match route_id.value {
0u32 => route_handler_0(server_state.application_state.s0.clone(), request),
_ => panic!("This is a bug, no route registered for a route id"),
}
}
pub fn route_handler_0(
v0: app::HttpClient,
v1: http::request::Request<hyper::body::Body>,
) -> http::response::Response<hyper::body::Body> {
let v2 = app::extract_path(v1);
let v3 = app::logger();
app::stream_file(v2, v3, v0)
}
```
## Issues, limitations and risks
This section focuses on issues, limitations and risks that sit outside the Pavex project itself: obstacles that we
cannot remove on our own, but require coordination/collaboration with other projects.
Each risk is classified over two dimensions: impact and resolution likelihood.
For impact, we use the following emojis:
- 😭, severe impact on the developer experience/viability of the project;
- 😢, medium impact on the developer experience/viability of the project.
For resolution likelihood, we use the following emojis:
- 🔴, unlikely to be remediated on a medium time-horizon (>6 months, <2 years);
- 🟡, likely to be remediated on a medium time-horizon.
We do not care about the short term since Pavex itself still requires tons of work to be viable and it's unlikely to
be ready for prime time in less than 6 months.
### `rustdoc`'s JSON output is unstable (🟡😢)
`rustdoc`'s JSON output requires the `nightly` compiler.
This is not a showstopper for production usage of Pavex since `nightly` is never used to compile
any code that is actually run at runtime, it is only used by the "reflection engine". Nonetheless, `nightly` can cause
breakage and unnecessary disruption due to its instability. `rustdoc`'s JSON output itself is quickly evolving,
including breaking changes that we must keep up with.
_Remediations_:
- Sit and wait. `rustdoc`'s JSON output is likely to be stabilised, therefore we will be able to drop `nightly` not too
far into the future.
### `rustdoc` is slow (🔴😢)
Generating the JSON representation of `rustdoc`'s output takes time, especially if we need to generate it for several
crates in the dependency tree.
_Remediations_:
- The idea of hosting the JSON version of a crate's docs has
been [floated around](https://github.com/rust-lang/docs.rs/issues/1285). This would allow us to download the rendered
JSON instead of having to build it every time from scratch.
- `rustdoc`'s JSON output for third-party dependencies is highly cacheable given the dependency version and the set of
activated features. Even if `docs.rs` chooses not to host the JSON output, other easy-to-run caching schemes can be
devised (e.g. a private ready-to-go centralised cache to be hosted by an organization or a team in their private
network).
### `pavex_cli` cannot be run from a build script (🔴😭)
Due to `cargo`'s very coarse locking scheme, it is not possible to invoke `cargo` itself from a `build.rs` script (
see [tracking issue](https://github.com/rust-lang/cargo/issues/6412)).
Pavex relies on `cargo` commands to:
- build `rustdoc`'s JSON output for local and third-party crates;
- analyze the dependency tree (via `guppy` which in turn relies on `cargo metadata`);
- find the workspace root (via `guppy` which in turn relies on `cargo metadata`).
There seems to be no active effort to remove this limitation.
_Remediations_:
Pavex will rely on [`cargo-px`](https://github.com/LukeMathWalker/cargo-px) for code generation.