Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Postgres types, guarantee resolution before usage #1810

Closed
wants to merge 8 commits into from

Conversation

demurgos
Copy link
Contributor

I am sharing my progress while refactoring the Postgres types (see #1797).

The main idea is to split PgType as it's too general. The new model has the following hierarchy:

  • PgBuiltinType: Any type from the default catalog, natively known by SQLx
  • PgType: A fully resolved type, corresponds to builtin types + resolved custom types
  • PgLazyType: A type that may not be resolved yet, corresponds to fully resolved types + DeclaredBy{Name|Oid}

I also started implementing the TypeRegistry struct to replace the ad-hoc per-connection cache. Since it contains all resolved types, it should provide PgType known to be fully resolved (including subtypes).

@demurgos demurgos force-pushed the refactor-pg-types branch from a384267 to c61947a Compare April 17, 2022 20:25
@demurgos
Copy link
Contributor Author

demurgos commented Apr 21, 2022

I was able to advance on this PR and the hardest part should be done.

The PgTypeRegistry struct keeps track of the current state of all the known types. In particular, it knows if a type is fully resolved (i.e. including all its transitive dependencies). This is achieved by implementing a depth-first search through the dependencies. This search is represented with the struct PgTypeResolution and can be paused and resumed (it follows the nightly Generator trait).

This aspect is important: it allows to insert types in any order into the registry. When a type is added, we create a PgTypeResolution to check its dependencies. It runs until the check completes, or until we find a not-yet-resolved dependency. In this situation, the search is paused and stored in the registry. At a later point, when we get data about this missing dependency, we can resume the search from where we stopped.

This also means that at every point in time, we know which dependencies need to be resolved: this is what drives queries to the database. I still need to implement this part, but it should be relatively simple. I also need to add more tests and improve diagnostics.

Note: a previous version performed the checks when reading the type information, but the current design with generators allows to compute the check once at insertion. I expect way more reads than type insertions so this is the better trade-off.

@demurgos
Copy link
Contributor Author

demurgos commented Apr 24, 2022

I started the work in my PR by focusing on types. However, by working on it, it seems that the core of issue is synchronization of the catalog between the local program and remote database. Caching and analyzing type information is only part of it.

Following this, I am renaming the main struct from TypeRegistry to LocalPgCatalog: this is more general. The idea is that it may be extended in the future, for example to cache namespaces (it may required to improve type resolution by name).

@demurgos demurgos force-pushed the refactor-pg-types branch from c83c1b3 to bdd485c Compare April 24, 2022 19:35
@demurgos demurgos force-pushed the refactor-pg-types branch from bdd485c to 2e659e4 Compare April 27, 2022 21:02
@demurgos
Copy link
Contributor Author

demurgos commented May 1, 2022

Some updates:
I added checks to detect inconsistent DB responses. For example when a type kind changed or when the status present/missing changed. In practice we cache everything forever and only fetch each type once so these errors should not happen. But it still provide nice sanity checks for non-hot paths.

I also started work on integrating my changes into the rest of SQLx by replacing the fields cache_type_info and cache_type_oid from PgConnection. The biggest challenge was that my API returned views for resolved types, using a shared lifetime. However, most of SQLx assumes that types are static. Changing this assumption is out-of-scope here (but should probably be explored), so I added a new API based on Arc and Weak. The types are rooted in the catalog, and internal references use Weak to avoid memory leaks.

My current challenge regarding the integration with SQLx is somewhat related to the previous one. There is a single TypeInfo associated type, without any distinction between fetched and declared types. It makes it hard to staticly track if the type is resolved or not. I think I'll add a new associated type to represent "resolved" types (fetched with all transitive dependencies).

@abonander
Copy link
Collaborator

@demurgos it's been a while, do you have any interest in finishing this? We can earmark it for 0.7.0 so you can make breaking changes if necessary.

@abonander abonander closed this Sep 13, 2022
@demurgos
Copy link
Contributor Author

Hey 👋
Since I opened this MR, life got in the way and I could not manage it until completion. The design in this draft was solid though and I see that there are still issues with custom type support for Postgres. I'll open a dedicated issue and rebase my changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants