Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify type autoloading with pgxpool #2048

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

nicois
Copy link
Contributor

@nicois nicois commented Jun 17, 2024

Provide some backwards-compatible configuration options for pgxpool
which streamlines the use of the bulk loading of custom types:

  • AutoLoadTypes: a list of type (or class) names to automatically
    load for each connection, automatically also loading any other
    types these depend on.
  • ReuseTypeMaps: if enabled, pgxpool will cache the typemap information,
    avoiding the need to perform any further queries as new connections
    are created.

ReuseTypeMaps is disabled by default as in some situations, a
connection string might resolve to a pool of servers which do not share
the same type name -> OID mapping.

@nicois nicois force-pushed the nicois/pgxpool-autoload-types branch from 8714ac7 to c98f597 Compare June 17, 2024 13:40
@nicois
Copy link
Contributor Author

nicois commented Jun 17, 2024

A recommended way of using this would be something like:

        config.AutoLoadTypes = []string{"my_type", "another_type"}
        config.ReuseTypeMaps = true
        config.MinConns = 10

Which, compared to previously, would:

  • make a single SQL call as soon as the pgxpool was created (as MinConns > 0, the pool will immediately begin filling)
  • the type map information will be loaded, without needing to worry about explicitly naming other custom types; they will be automatically detected
  • subsequent connections will not need to perform any SQL queries to load custom types while being initialised

Together this should result in a significant improvement both to startup times and whenever new connections are being added to the connection pool, whenever custom types are being registered.

@nicois nicois marked this pull request as ready for review June 17, 2024 13:48
@jackc
Copy link
Owner

jackc commented Jun 18, 2024

I think I'd like to wait and see what happens with #2046 before looking too much into this. But there are a few things that come to mind.

  1. How does AutoLoadTypes work with non-derived types. e.g. I have to manually register hstore. How do I register _hstore? A []string of types to load and register is really convenient, but I'm not sure it would work in every case.
  2. That might affect how the types are shared in pgxpool.
  3. If these can be resolved perhaps some of this logic could / should be pushed down into something usable from

@nicois
Copy link
Contributor Author

nicois commented Jun 18, 2024

Agreed, let's get the other PR sorted before finessing this PR.

Related array types should be handled by the new loader. I haven't experimented with this in depth, as I haven't run into any problems with the code, but if there is an example/test where array types aren't supported, I can remedy that, probably in the other PR.

@jackc
Copy link
Owner

jackc commented Jun 19, 2024

Related array types should be handled by the new loader.

Right, I expect it would work - once the underlying type is loaded. But I wasn't sure if the AutoLoadTypes would be handled first, leaving it unusable in those cases.

@nicois
Copy link
Contributor Author

nicois commented Jun 19, 2024

Related array types should be handled by the new loader.

Right, I expect it would work - once the underlying type is loaded. But I wasn't sure if the AutoLoadTypes would be handled first, leaving it unusable in those cases.

If it works correctly, this line should collect any dependent array types. I can look at adding a test in the other PR to explicitly ensure this works.

@jackc
Copy link
Owner

jackc commented Jun 20, 2024

What I mean is, what happens when it tries to load _hstore when hstore is not registered yet?

@nicois nicois force-pushed the nicois/pgxpool-autoload-types branch 2 times, most recently from 90826ee to a57930b Compare June 21, 2024 13:33
@nicois
Copy link
Contributor Author

nicois commented Jun 21, 2024

I see what you mean. If hstore can only be manually registered, we need to provide the means to do so before the autoloader.
The autoloader is set to execute after the afterConnect hook has finished. This can be used to perform any manual registration which is required beforehand.
Of course, whatever happens in afterConnect will not be cached; if the user want to avoid unnecessary database calls, they would have to implement something equivalent to reuseTypeMap.

@nicois nicois force-pushed the nicois/pgxpool-autoload-types branch from a57930b to 241ed08 Compare June 21, 2024 21:44
@nicois
Copy link
Contributor Author

nicois commented Jun 22, 2024

Thinking a bit more about manual registration of certain types, and how to reduce the number of queries there: what do you think of adding something like a map[string] func(c conn.Conn, oid uint32) error field to the pgxpool config. Then if they want to register say hstore on each connection, they can assign their custom registration function to this map against the "hstore" key, and the pgxpool will retrieve the OIDs for each of these types, then call the functions with that information?
Not only would this be cleaner, but if the reuse... configuration boolean was set, pgxpool will be able to internally cache the type name -> OID mapping, and also only need to make a database call when the first connection is created.
Perhaps the pgxpool config struct could be given a method, something like

func (c *Config) WithManualTypeRegistration(typeName string, f func(c conn.Conn, oid uint32) error) *Config {
   ...
}

(or perhaps passing in the typemap object instead of the connection, if you prefer)
If this makes some sense to you, I can create a separate PR to discuss it further, once this PR is merged.

@nicois nicois force-pushed the nicois/pgxpool-autoload-types branch from 241ed08 to f5bcb7b Compare June 23, 2024 00:17
@nicois
Copy link
Contributor Author

nicois commented Jun 23, 2024

I have rebased this on top of the other PR, and added a comment relating to the autoload config settings to clarify that custom type registration done in AfterConnect will be respected.
Discussion about further optimising the loading of the custom registration can be talked about if/when this PR is merged.

@nicois nicois force-pushed the nicois/pgxpool-autoload-types branch from f5bcb7b to 2538c40 Compare June 23, 2024 00:37
@nicois nicois changed the title Simplify custom type autoloading with pgxpool Simplify type autoloading with pgxpool Jun 23, 2024
@nicois nicois force-pushed the nicois/pgxpool-autoload-types branch 4 times, most recently from b8b9930 to 59927e7 Compare June 27, 2024 12:19
@nicois
Copy link
Contributor Author

nicois commented Jun 29, 2024

Thinking about https://pkg.go.dev/sync#OnceValue : I can see how this can replace the use of a mutex. Were you thinking that the pgxpool's struct would contain a function used to register the types? And that if the reuse flag was set, that function would be replaced by the OnceValue version of it?

The primary question is whether the user provides the autoload information via the pgxpool's config, or via some other mechanism.

@jackc
Copy link
Owner

jackc commented Jun 29, 2024

Were you thinking that the pgxpool's struct would contain a function used to register the types?

Yes, something like this. Though since pgxpool already has a AfterConnect hook it might just be using AfterConnect in a specific way.

And that if the reuse flag was set, that function would be replaced by the OnceValue version of it?

The reuse flag wouldn't have to exist. The function would use OnceValue internally. From pgxpool's point of view, it has a function it calls to get (or get and register) the type information. This technique could be documented, or maybe even a new standalone function that encapsulates this concept and returns a type loading function that internally uses OnceValue.

Potentially, this lets the user control autoload and type reuse without pgxpool needing any explicit support.

@nicois
Copy link
Contributor Author

nicois commented Jun 29, 2024

Without reuse being optional, how would heterogeneous servers be supported, where oids could differ?

Whether it's via a config setting or environment variable, if this is a possible situation, we should support it, shouldn't we?

If not, I certainly agree that it would be nice to eliminate the need to support such a workflow. This was based on what you suggested earlier.

@jackc
Copy link
Owner

jackc commented Jun 29, 2024

Reuse would still be optional. It would be up to the caller to implement.

Here is what I'm thinking:

	makeLoadTypesOnce := func() func(ctx context.Context, conn *pgx.Conn) error {
		var mux sync.Mutex
		var loaded bool
		var types []*pgtype.Type

		return func(ctx context.Context, conn *pgx.Conn) error {
			mux.Lock()
			defer mux.Unlock()

			if loaded {
				return types, nil
			}

			var err error
			types, err = conn.LoadTypes(ctx, "mytype", "myothertype")
			if err != nil {
				return err
			}

			loaded = true
			return nil
		}
	}

	dbconfig.AfterConnect = makeLoadTypesOnce()

I haven't actually tested it, but I think it should work. I wasn't able to figure out a way to get sync.OnceValue to work as it needs the ctx and conn from the first call. But the sync.Mutex works fine.

Also, prototyping this code made me even more in favor of this logic being activated through the existing AfterConnect hook. The project that I was working in had this AfterConnect hook already installed:

	dbconfig.AfterConnect = func(ctx context.Context, conn *pgx.Conn) error {
		pgxuuid.Register(conn.TypeMap())
		return nil
	}

It's using github.com/jackc/pgx-gofrs-uuid to integrate with github.com/gofrs/uuid. It is important that this type be registered before the any composites that have a UUID field or the wrong UUID type could be used.

It could be modified as follows to load types once after registering the custom UUID type:

	loadTypesOnce := makeLoadTypesOnce()
	dbconfig.AfterConnect = func(ctx context.Context, conn *pgx.Conn) error {
		pgxuuid.Register(conn.TypeMap())
		err := loadTypesOnce(ctx, conn)
		if err != nil {
			return err
		}
		return nil
	}

Not sure if we would want a function that creates / simplifies the load once logic or not. It would be more convenient. But I can imagine different applications having slightly different preferences so it might be easier to document the pattern instead.

@nicois
Copy link
Contributor Author

nicois commented Jun 30, 2024

We can make this something where each user needs to write/adapt their AfterConnect to do this. I just thought that type registration was such a common pattern that we could ensure it would done in a consistent way through providing helpers to manage the autoloading, and to keep AfterConnect free for anything special they want done apart from type registration.
The other benefit to reuse being a configuration option is that it is more likely to be exposed to end-users via a DSN, so if when all they have is a precompiled binary and the author didn't consider supporting the reuse of type registration, those end-users can still opt in.

@nicois nicois force-pushed the nicois/pgxpool-autoload-types branch from 59927e7 to d3dbf83 Compare June 30, 2024 22:13
@nicois
Copy link
Contributor Author

nicois commented Jun 30, 2024

@jackc I have made a helper which can be used to build an AfterCommit hook. I have looked at the uuid code yet, but wanted to share what I've got now, as it's already fairly compact.

I think this strikes a reasonable balance, where this helper can keep things sufficiently DRY while still allowing people to decide whether to chain this with other AfterConnect logic.

I'm also interested in what you think of the environment variable. I do think that since we aren't using the pgxpool config for this any more, it wouldn't be possible for a DSN to control the reuse of type information, so an optional environment variable seems like a good way to let this setting be controlled by end users.

@nicois nicois force-pushed the nicois/pgxpool-autoload-types branch from d3dbf83 to bcee690 Compare June 30, 2024 22:50
@nicois nicois force-pushed the nicois/pgxpool-autoload-types branch 2 times, most recently from f366731 to e5ee693 Compare July 1, 2024 12:02
@nicois
Copy link
Contributor Author

nicois commented Jul 2, 2024

If the approach shown here is one you are happy with, it might actually make sense for me to integrate this with #2049 , as the helper function's signature will change when support for custom registration functions is added.

Copy link
Owner

@jackc jackc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually not in favor of the environment variable for two reasons.

  1. The caller could read an environment variable if they wanted to. This could be as simple as AutoloadAfterConnect([]string{"mytype", "myothertype"}, os.Getenv("PGXPOOL_REUSE_TYPEMAP") == "y"). I don't think we gain anything by having it internal to this function.
  2. pgx's use of environment variables is limited to when creating / parsing the connection / pool config. An environment variable is a global variable, and isolating their usage to a single point makes the code easier to reason about.

If the approach shown here is one you are happy with, it might actually make sense for me to integrate this with #2049 , as the helper function's signature will change when support for custom registration functions is added.

I do think we will want to wait to merge until we also have figured out how to handle custom registration for types like hstore and PostGIS types.

I think the custom registration functions are worth looking into.

I also think it's worth considering letting LoadTypes load base types and have it leave the Codec empty. The caller could populate the Codec for base types. This would allow a single query to suffice for all type registration.

It also might be worth considering doing nothing. There are relatively few base types that will be used by any given project relative to all the derived types. A few extra queries might be inconsequential relative to the complexity of optimizing them away.

pgxpool/pool.go Outdated Show resolved Hide resolved
pgxpool/pool.go Outdated Show resolved Hide resolved
@nicois
Copy link
Contributor Author

nicois commented Jul 3, 2024

The reason for wanting the environment variable is because I expect there are a significant number of projects which use pgx. Many of them will not bother exposing the reuse setting, either disabling it for safety, or enabling it as 99% of users will benefit from the extra speed, particularly with large numbers of connections.
Because the DSN doesn't provide the ability to control this boolean (and as it's not part of ConnConfig, but only defined in a bundled function), an environment variable would avoid the situation where people are trying to use an pgx-based application and are otherwise unable to access this setting.
This is the time when we have the opportunity to define a consistent method to control this flag. But I can remove it if you don't like it; this is your project.

@nicois nicois force-pushed the nicois/pgxpool-autoload-types branch from e5ee693 to 089aee3 Compare July 3, 2024 06:35
@jackc
Copy link
Owner

jackc commented Jul 4, 2024

I'd prefer to hold off on the environment variable. We can always add it later, but once it's there we can't remove it.

@nicois nicois force-pushed the nicois/pgxpool-autoload-types branch from 089aee3 to e6e5981 Compare July 4, 2024 05:30
@nicois
Copy link
Contributor Author

nicois commented Jul 4, 2024

OK, I've removed the environment variable reference.

@jackc
Copy link
Owner

jackc commented Jul 16, 2024

@nicois

There is still the race condition mentioned above.

But there is one other thing that needs to be figured out that was brought to my attention by #2089. EnumCodec has mutable state. The Type and Codec interfaces require that they are not modified after registration, but I suppose it is unclear whether they should be able to modify themselves. EnumCodec modifies itself, so it would not be safe to share between multiple connections. Not sure what the best solution is here.

@nicois
Copy link
Contributor Author

nicois commented Jul 16, 2024

I'll take a look at that race condition. I didn't notice your comment, sorry.

Relating to state mutation: if types and codecs shouldn't be mutated, should they have a Copy() member which produces an equivalent object with a separate internal state? If so, we could then iterate over these items using this method to generate copies for the new connection.
Potentially Copy() could return a pointer to itself, if the type/codec is one which is safe to reuse.

While LoadTypes is powerful on its own, retriving all salient type
information in a single query, it is particularly powerful and useful
when users of pgxpool can use this.

To streamline its use, provide a helper function in pgxpool suitable for
plugging directly into the AfterConnect configuration settings. It will
load the types and register them, also reusing them if appropriate.
@nicois nicois force-pushed the nicois/pgxpool-autoload-types branch from e6e5981 to 5124091 Compare July 17, 2024 00:27
@jackc
Copy link
Owner

jackc commented Jul 22, 2024

Relating to state mutation: if types and codecs shouldn't be mutated, should they have a Copy() member which produces an equivalent object with a separate internal state? If so, we could then iterate over these items using this method to generate copies for the new connection.

Seems reasonable.

Potentially Copy() could return a pointer to itself, if the type/codec is one which is safe to reuse.

I would make it an optional additional interface a Codec could also implement. So it would be assumed safe to use concurrently if it was not implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants