Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incrementally introducing MemCx and DetoastDatum #1337

Open
workingjubilee opened this issue Oct 12, 2023 · 1 comment
Open

Incrementally introducing MemCx and DetoastDatum #1337

workingjubilee opened this issue Oct 12, 2023 · 1 comment

Comments

@workingjubilee
Copy link
Member

workingjubilee commented Oct 12, 2023

The Lifetimes of Data in Context

When an arbitrary pointer is initially yielded from Postgres, we get what is effectively:

enum Datum<'dat, V, R>
where
    V: 'static,
{
    Null,
    Value(V),
    Ref(DatumRef<'dat, R>),
    Mut(DatumMut<'dat, R>),
}

This pleasant Rust representation is, unfortunately, not something that Postgres meaningfully exposes. This type "exists" but purely in virtual logic, not the code: instead you get to know which variant of this you are dealing with via side-channel information. In many cases, you are presumed to already know which variant you will receive.

When we then operate on either DatumRef or DatumMut, we usually need to detoast it, which requires an operation that abstractly looks like:

fn detoast_datum<'dat, 'mcx, T>(dat: Datum<'dat>, mcx: &'mcx MemCx<'mcx>) -> Detoast<'dat, 'mcx, T> {
    let detoast = mcx.detoast_datum(dat);
    if datum.ptr_eq(detoast) {
        Detoast::Not(T::from_datum(detoast))
    } else {
        Detoast::Was(T::from_detoast(detoast)
    }
}

In some cases, detoasting does not occur, but it is a wildly dominant case and the source of many bugs in the codebase. Data in Postgres can sometimes even require recursive detoasting, i.e. a toasted varlena may be detoasted into data that contains potentially multiple toasted varlenas that each have to be detoasted themselves. Even if this isn't the case for a given compound structure, in general the actual "leaf" type we want to interact with will sometimes require detoasting. As we want to abstract these to handle them somewhat uniformly, the core abstraction, the one which everything will build on, will need to be able to capture the gnarliest cases, so we want to correctly represent the resulting lifetimes of slices of toast.

Required Types

Some of the above example code mentions some hypothetical types. The following, however, are actually required.

Datum<'lt>

As distinct from pg_sys::Datum, henceforth known as RawDatum, this is a datum with a lifetime bound to reflect the limited lifetime of the origin before its deallocation. Copying types out without bounds remains possible, still, but most of the complicated uses of it will also require the next type:

&'mcx MemCx<'mcx>

Almost always obtained as a borrow, this is a MemoryContext, fundamentally, but it also needs lifetime bounds (and thus gets a shorter name, since there's going to be another 4~8 characters dotting every usage).

Usage of this is tricky. In some cases this may be integrated into pg_extern functions, fundamentally, by obtaining it inside the wrapping function, and then passing it in as an argument, so that the lifetime, as far as the inner Rust function can discern, can reflect the lifetime of the functions own execution. In other cases, it may revolve around closure-based usage (akin to Spi::connect but less broken).

Required Traits

Obviously, this is entirely incompatible with trait FromDatum as it currently stands, so that needs to be reworked: it doesn't have lifetime bounds, after all.

trait DetoastDatum<'dat, 'mcx>

A tentative name for the trait that can unpack data into decompressed, direct forms, containing the necessary fn detoast_datum function.

So... PR?

Unfortunately, the work to integrate these components, alone, into the codebase, is necessarily going to be absolutely massive, and this design itself is the result of discarding about half-a-dozen other designs, starting with "the codebase as-is". Thus this issue as a touchstone for "where are we?"

@eeeebbbbrrrr
Copy link
Contributor

eeeebbbbrrrr commented Oct 12, 2023

200w

(you can delete this annoying comment after you've giggled a little bit)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants