Skip to content

Commit

Permalink
Begin work on a guide (#1402)
Browse files Browse the repository at this point in the history
Stubbing out mostly.
  • Loading branch information
workingjubilee authored Nov 28, 2023
1 parent 6b5da2b commit 4cacae3
Show file tree
Hide file tree
Showing 18 changed files with 190 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
book
6 changes: 6 additions & 0 deletions docs/book.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[book]
authors = ["Jubilee Young"]
language = "en"
multilingual = false
src = "src"
title = "Building Postgres Extensions with Rust"
17 changes: 17 additions & 0 deletions docs/src/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Working with Postgres Extensions

- [Working with PGRX](./extension/README.md)
- [Building Extensions with PGRX](./extension/build.md)
- [Cross Compiling](./extension/build/cross-compile.md)
- [Writing Extensions with PGRX](./extension/write.md)
- [Testing Extensions with PGRX](./extension/test.md)
- [Memory Checking](./extension/test/memory-checking.md)
- [Basics of Postgres Internals](./pg-internal.md)
- [Pass-By-Datum](./pg-internal/datum.md)
- [Memory Contexts](./pg-internal/memory-context.md)
- [Varlena Types](./pg-internal/varlena.md)
- [`sigsetjmp` & `siglongjmp`](./pg-internal/setjmp-longjmp.md)
- [Contributing](./contributing.md)
- [PGRX Internals](./contributing/pgrx-internal.md)
- [Releases](./contributing/release.md)
- [Design Decisions](./design-decisions.md)
File renamed without changes.
1 change: 1 addition & 0 deletions docs/src/contributing/pgrx-internal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# PGRX Internals
File renamed without changes.
File renamed without changes.
33 changes: 33 additions & 0 deletions docs/src/extension/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Working with PGRX

The idea of pgrx is that writing Postgres extensions with `pgxs.mk` requires
- writing a bunch of C code that must manually handle many Postgres invariants
- writing SQL that then loads the extension properly, including many
[CREATE FUNCTION] and [CREATE TYPE] declarations

This demands programmers who wish to write Postgres extensions to become
experts in C, SQL, and the inner workings of Postgres, on top of having useful
domain knowledge for the actual extension.

Alternatively, with Rust, safe abstractions can be designed to encode the
invariants that Postgres requires in types. Powerful procedural macros can
generate the code to handle the Postgres function ABI, or even write the
needed SQL declarations! This is what pgrx does, with the intent to allow
writing extensions correctly while only being familiar with a single language:
Rust.

...and pgrx. While the annotations that pgrx requires are easy to write, they
aren't necessarily automatic. And even if it is simpler than `pgxs.mk`, the
pgrx build system, primarily wielded through `cargo pgrx`, sometimes needs user
intervention to fix problems. Most of this is in service of allowing you to
adjust exactly how much pgrx assists you, so that it doesn't get in the way
if you need to force something inside it.

<!-- the following may currently be aspirational rather than actual -->
This guide assumes the reader (you) are a Rust programmer, but it does not
expect you to be intimately familiar with Postgres, nor deeply nuanced in FFI.
It may assume familiarity with C and SQL work, but as long as you have written
`extern "C"` or `LEFT JOIN` before, you should be fine.

[CREATE FUNCTION]: https://www.postgresql.org/docs/current/sql-createfunction.html
[CREATE TYPE]: https://www.postgresql.org/docs/current/sql-createtype.html
2 changes: 2 additions & 0 deletions docs/src/extension/build.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Building Extensions with PGRX
<!-- TODO: explain the build system more -->
File renamed without changes.
5 changes: 5 additions & 0 deletions docs/src/extension/test.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Testing Extensions with PGRX

Both `cargo test` and `cargo pgrx test` can be used to run tests using the `pgrx-tests` framework.
Tests annotated with `#[pg_test]` will be run inside a Postgres database.
<!-- TODO: explain the test framework more -->
File renamed without changes.
3 changes: 3 additions & 0 deletions docs/src/extension/write.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Writing Extensions with PGRX

<!-- TODO: write all of this -->
1 change: 1 addition & 0 deletions docs/src/pg-internal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Basics of Postgres Internals
32 changes: 32 additions & 0 deletions docs/src/pg-internal/datum.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Pass-By-Datum

The primary way that Postgres passes values between Postgres functions that can hypothetically
have any type is using the "Datum" type. The declaration is written thus in the source code:
```c
typedef uintptr_t Datum;
```

The way Postgres uses Datum is more like a sort of union, which might be logically described as
```rust
#[repr(magic)] // This is not actually ABI-conformant
union Datum {
bool,
i8,
i16,
i32,
i64,
f32,
f64,
Oid,
*mut varlena,
*mut c_char, // null-terminated cstring
*mut c_void,
}
```

Thus, sometimes it is a raw pointer, and sometimes it is a value that can fit in a pointer.
This causes it to incur several of the hazards of being a raw pointer, likely to a lifetime-bound
allocation, yet be copied around with the gleeful abandon that one reserves for ordinary bytes.
The only way to determine which variant is in actual usage is to have some other contextual data.

<!-- TODO: finish out the Datum<'_> drafts and provide alternatives to worrying about pointers -->
61 changes: 61 additions & 0 deletions docs/src/pg-internal/memory-context.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Memory Contexts

Postgres uses a set of "memory contexts" in order to manage memory and prevent leakage, despite
the fact that Postgres code may churn through tables with literally millions of rows. Most of the
memory contexts that an extension's code is likely to be invoked in are transient contexts that
will not outlive the current transaction. These memory contexts will be freed, including all of
their contents, at the end of that transaction. This means that allocations using memory contexts
will quickly be cleaned up, even in C extensions that don't have the power of Rust's compile-time
memory management. However, this is incompatible with certain assumptions Rust makes about safety,
thus making it tricky to correctly bind this code.

<!-- TODO: finish out `MemCx` drafts and provide alternatives to worrying about allocations -->

## What `palloc` calls to
In extension code, especially that written in C, you may notice calls to the following functions
for allocation and deallocation, instead of the usual `malloc` and `free`:

```c
typedef size_t Size;

extern void *palloc(Size size);
extern void *palloc0(Size size);
extern void *palloc_extended(Size size, int flags);

extern void pfree(void *pointer);
```
<!--
// Only in Postgres 16+
extern void *palloc_aligned(Size size, Size alignto, int flags);
-->
When combined with appropriate type definitions, the `palloc` family of functions are identical to
calling the following functions and passing the `CurrentMemoryContext` as the first argument:
```c
typedef struct MemoryContextData *MemoryContext;
#define PGDLLIMPORT
extern PGDLLIMPORT MemoryContext CurrentMemoryContext;
extern void *MemoryContextAlloc(MemoryContext context, Size size);
extern void *MemoryContextAllocZero(MemoryContext context, Size size);
extern void *MemoryContextAllocExtended(MemoryContext context,
Size size, int flags);
```
<!--
// Only in Postgres 16+
extern void *MemoryContextAllocAligned(MemoryContext context,
Size size, Size alignto, int flags);
-->

Notice that `pfree` only takes the pointer as an argument, effectively meaning every allocation
must know what context it belongs to in some way.

### `CurrentMemoryContext` makes `impl Deref` hard

<!-- TODO: this segment. -->

### Assigning lifetimes to `palloc` is hard

<!-- TODO: this segment. -->
27 changes: 27 additions & 0 deletions docs/src/pg-internal/setjmp-longjmp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# sigsetjmp & siglongjmp

In order to handle errors that may be distributed widely across the database and deeply nested,
Postgres uses `sigsetjmp` and `siglongjmp` in a certain "calling convention" to handle a stack
of error-handling steps. At a "try-catch" site, `sigsetjmp` is called, and at an error site,
`siglongjmp` is called, each time manipulating a global stack of error contexts to allow nested
try-catches. To address the fact that Rust code is preferably not jumped over, instead properly
handling its destructors via unwinding, pgrx guards calls into C with a function that handles the
global state and then panics. Likewise, Rust panics are hooked in ways that then propagate into
errors in Postgres.

<!--
TODO: Make the next statement slightly untrue by making it easier to call functions unsoundly so
that we can call certain functions in tight loops with only a single guard on the inner loop.
-->
The functions normally accessed via `pgrx::pg_sys` are `unsafe`, but are less unsafe than some C
functions because of this guard. You do not need to worry about `siglongjmp` when calling those.
However, if you define your own `extern "C" fn` for *Postgres* to call, you may need to apply
`#[pg_guard]` to handle such deep nesting between Rust and C calling contexts.

If you do, try to limit the amount of code that lies within the scope of that guard, as it is easy
to make a mistake that makes this guard useless. Any code that is part of the guarded scope should
not have any destructors, because it is called *after* `sigsetjmp` is called. Thus, destructors
in that scope will be skipped over! The mentioned FFI functions which are already guarded by pgrx
each wrap only one call, which is the most appropriate scope in the majority of cases.

<!-- TODO: Provide more context on appropriate code, explain C-unwind a bit -->
1 change: 1 addition & 0 deletions docs/src/pg-internal/varlena.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Varlena Types

0 comments on commit 4cacae3

Please sign in to comment.