Skip to content

Kwil Oracles

Brennan Lamey edited this page Jan 8, 2024 · 1 revision

I'm putting this up to begin the discussion about what we should be working towards now, and why. Most of this has been discussed in some form or another, but I want to make sure we get on the same page about what the top priorities should be right now.

Kwil v0.7.0 should be about one major thing: composability.

It seems quite obvious that users are deeply interested in various sorts of "composability". The crux of the (some of) the biggest problems faced right now by Fractal and Truflation is that they need to include data from other sources within their schema(s):

  • Truflation wants to compose data across their schemas, such that a user can create a new schema based on the contents of several others.
  • Fractal wants to compose with data from other chains. In the case of NEAR, they want to tap into NEAR on-chain account information in real time. In the case of other chains they support, they want to reflect the state of access grants within the database to provide access control logic.
  • Both of these users, as well as several others, want to accrue value to their token using their Kwil data. Concretely, this means they need access to state about a token (living on Eth) within their database.

This involves solving some very hard problems:

  1. Getting "off-chain" data into the Kwil chain.
  2. Maintaining transactional consistency across database schemas.
  3. Syntax for using data from other schemas.

Getting "off-chain" data into Kwil

As discussed, the preferred way of getting non-Kwil data into Kwil is using oracles that submit CometBFT vote extensions. For users, there should be an implementable interface for how they can define what an oracle does.

Vote Extensions

As discussed, vote extensions should be kept fairly general throughout our system, similar to transactions. I'm sure that features like the token bridge and general oracles will iterated on, and so having as little logic in ABCI as possible is probably ideal here.

Essentially, we want to

  • Have a way for validators to submit pre-consensus logs (as vote extensions)
  • Expect that those logs will get processed in a block was a supermajority is met

All of the other nuances of what any such vote extension / log might do should probably be handled elsewhere. This is what we have done with transactions, and it has served us very well for rapidly implementing and changing things (a great example of this was how quickly we were able to turn around account transfers).

Oracles

Oracles will be compile time code that implement an interface for Kwil to understand. They should:

  • Be started when a node is brought up
  • Be stopped when a node is stopped
  • Be configurable via normal Kwil config

I have a basic example of this here, but it is by no means an implementation suggestion: https://github.com/kwilteam/kwil-db/wiki/Oracles

Oracles -> Event Store <- ABCI

The relationship between oracles and abci should be as follows:

  • Oracles are polling / listening to their respective sources, and can at any point in time write event to an event store
  • Events in the event store are given a unique ID based on content (hash of written data maybe)? This creates a unique identifier for nodes to refer to.
  • ABCI will pull from the event store in ExtendVote. This allows abci to pull events from the store as needed.

The Oracle Interface

Let's assume this is a basic starting point for our oracle interface:

// Oracle is the interface for the oracle.
type Oracle interface {
	// Start starts the oracle.
	// This would be run in a goroutine when the node starts.
	Start(ctx context.Context, rec OracleReceiver, config map[string]string) error

	// Stop stops the oracle.
	// It is called when a node is shutting down.
	Stop(ctx context.Context) error
}

// OracleReceiver is the interface that receives oracle data.
// It is implemented by Kwil, and is used by the oracle to send data to Kwil.
type OracleReceiver interface {
	// Submit submits the oracle data to Kwil.
	Submit(ctx context.Context, data *Submission) error
}

// Submission is a submission to the oracle.
type Submission struct {
	DBID string
	Action string
	Inputs []string
}

we would likely want to support configurations and such, but it is a starting point

An example implementation might look like

func (e *exampleOracle) Start(ctx context.Context, rec OracleReceiver, config map[string]string) error {
	e.close = make(chan struct{})
	
	// do some configuration

	go func() {
		for {
			select {
			case <-e.close:
				return
			default:
				// do stuff
				// submit to Kwil
				err := rec.Submit(ctx, &Submission{
					DBID: "xabc",
					Action: "my_action",
					Inputs: []string{"input1", "input2"},
				})
				if err != nil {
					fmt.Println(err)
				}
			}
		}
	}()
	
	return nil
}

  

func (e *exampleOracle) Stop(ctx context.Context) error {
	close(e.close)

	return nil
}

Schemas

Schemas would exist likely separately from oracles. We would certainly create ways to have "genesis" schemas that are built into the network, but by and large an oracle can target whatever schema and action it wants. A single oracle could even target many actions across many databases within the same event, if it so wanted.

action submit_from_oracle($input1, $input2) oracle {
	// do something
}

Configuration

Oracles could be given configurations by node operators. This would enable passing of environment variables, such as infura providers, api keys, etc.

config.toml:

[log]  
[app]  
[oracles]
[oracles.oracle1]
rpc_url = "google.com"
[oracles.oracle2]
api_key = "abc123"

Configs would be scoped to each oracle, as so:

Within oracle1:

func (oracle1) Start(ctx context.Context, rec OracleReceiver, config map[string]string) error {
	rpc := config["rpc_url"]
	// rpc = "google.com"

	api_key = config["api_key"]
	// api_key = nil, since we are in oracle1 and not oracle2
}

Examples

Example 1: Deposit Account Store

Deposit Account Store writes deposits to the account store.

Oracle

type depositOracle struct {
	close chan struct{}
}

func (e *depositOracle) Start(ctx context.Context, rec OracleReceiver, config map[string]string) error {
	e.close = make(chan struct{})

	chain, ok := config["chain"]
	if !ok {
		return fmt.Errorf("no chain configured for oracle")
	}

	rpc, ok := config["rpc"]
	if !ok {
		return fmt.Errorf("no rpc configured for oracle")
	}
	_ = chain
	_ = rpc

	go func() {
		for {
			select {
			case <-e.close:
				return
			default:
				// poll for new deposits

				// once received, submit to Kwil
				err := rec.Submit(ctx, &Submission{
					DBID:   "accounts",
					Action: "deposit",
					Inputs: []string{"0xdeadbeef", "100"},
				})
				if err != nil {
					fmt.Println(err)
				}
			}
		}
	}()

	return nil
}

Example 2: idOS NEAR Addresses

The idOS NEAR Addresses oracle keeps a database updated with the mappings of NEAR accounts.

In addition to the oracle being included at compile time, the schema would probably either be embedded at compile time or set at runtime, uniformly across the network, for the lifetime of the network.

Schema

database near_accounts;

table accounts {
	id text primary,
	implicit_address text
}

table addresses {
	id text primary,
	name text unique,
	account_id text,
	foreign_key(account_id) references accounts(id)
}

table access_keys {
	id text primary,
	public_key text unique,
	account_id text,
	foreign_key(account_id) references accounts(id)
}

// add_account gets called when a new account is used on the NEAR blockchain
action add_account($id, $id2, $addr) oracle {
	INSERT INTO accounts (id, implicit_address)
	VALUES ($id, $addr);

	// implicit addresses are also valid identifiers
	INSERT INTO addresses (id, name, account_id)
	VALUES ($id2, $addr, $id);
}

// add_address gets called each time a named address is given to an account
action add_address($id, $name, $account_id) oracle {
	INSERT INTO addresses (id, name, account_id)
	VALUES ($id, $name, $account_id);
}

// and more for removing addresses, adding access keys, etc.

Oracle

The oracle might look something like:

type nearOracle struct {
	close chan struct{}
}

func (e *nearOracle) Start(ctx context.Context, rec OracleReceiver, config map[string]string) error {
	e.close = make(chan struct{})

	rpc, ok := config["rpc"]
	if !ok {
		return fmt.Errorf("no rpc configured for oracle")
	}
	_ = rpc

	go func() {
		for {
			select {
			case <-e.close:
				return
			default:
				// poll for new events
				// from the chain:
				var e eventType
				var submission *Submission

				// switch based on event type
				switch e {
				case eventTypeNewAccount:
					submission = &Submission{
						DBID:   "near_accounts",
						Action: "add_account",
						Inputs: []string{"account_id_1", "address_id_1", "implicit_address"},
					}
				case eventTypeNewAddress:
					submission = &Submission{
						DBID:   "near_accounts",
						Action: "add_address",
						Inputs: []string{"address_id_2", "0xdeadbeef", "account_id_1"},
					}
				}

				// once received, submit to Kwil
				err := rec.Submit(ctx, submission)
				if err != nil {
					fmt.Println(err)
				}
			}
		}
	}()

	return nil
}

Composing Data Within Kwil

Assuming we have the above implemented correctly, we now get posed with a new question: How do I use this data within Kwil?

This is another feature of Kwil that we have been talking about quite literally since we started, and are now in a position to build. There should be a way for people to import schemas on their network, just like they import a package. Users should be able to compose queries with the imported schemas.

I would imagine that users would also create oracles and schemas as standards, and make them reusable across networks.

Example 1: ENS

A network with the ENS oracle would also have the following schema:

database ens;

table names {
	id text primary,
	name text unique not_null,
	wallet_id text not_null,
	foreign_key(wallet_id) references wallets(id)
}

table wallets {
	id string primary,
	address string not_null unique
}

action create_wallet($id, $address) oracle {
	INSERT INTO wallets (id, address)
	VALUES ($id, $address);
}

action create_name($id, $name, $address) oracle {
	INSERT INTO names (id, name, wallet_id)
	VALUES ($id, $name, (
		SELECT id
		FROM wallets
		WHERE address = $address
	));
}

action transfer_name($name, $recipient) oracle {
	UPDATE names
	SET wallet_id = (
		SELECT id
		FROM wallets
		WHERE address = $recipient
	)
	WHERE name = $name;
}

// assert_can_use_name will error if the caller
// does not own an ENS name
action assert_owns_name($name) public view {
	SELECT CASE
		WHEN NOT EXISTS (
			SELECT n.id
			FROM names AS n
			INNER JOIN wallets AS w
			ON w.id = n.wallet_id
			WHERE n.name = $name
			AND w.address = @caller
		)
		THEN ERROR('caller does not own name')
	END;
}

// get_ownership returns a mapping of
// ENS names to wallet addresses
action get_ownership() public view {
	SELECT n.name AS name, w.address AS address
	FROM wallets AS w
	LEFT JOIN names AS n
	ON w.id = n.wallet_id;
}

They could then use this data in their own schemas. For "network-level" schemas, I propose that we alter the DBID generation process, such that if a DBID only has a name and no owner, then its DBID is simply its name.

Mutative (INSERT, UPDATE, DELETE) actions can be called like an extensions. SELECTs will be called as common table expressions:

database social_network;

use ens as ens;

// table posts has social media posts that are owned by an ENS name
// posts are ownable assets. Users can transfer them between authors
table posts {
	id text primary,
	text text not_null,
	author text not_null // author would be the ENS name
}

action make_post($id, $title, $name) public {
	ens.assert_owns_name($name);

	INSERT INTO posts (id, text, author)
	VALUES ($id, $title, $name);
}

// get_author_posts gets all posts from a wallet,
// across all of their ENS names.  It takes one
// argument, which is their wallet address.
action get_author_posts($address) public view {
	WITH ens_names AS ens.get_ownership()
	SELECT p.text, p.author
	FROM posts AS p
	INNER JOIN ens_names AS e
	ON e.name = p.author
	WHERE e.address = address;
}

Postgres

While there are a few things I still need to figure out regarding the logic for composability, the biggest (and most concerning change) is that we would need true cross-db atomicity. Right now, we use idempotency, however this would not work if data from different databases relied on each other.

There are two ways we can achieve this:

  1. Going back to a 2pc protocol, where we can identify the end state before any idempotent commits occur.
  2. Turn off SQLite WAL mode, which removes concurrent readers (but allows for linked databases to be atomic).
  3. Use Postgres.

I am heavily inclined to use Postgres, for several reasons:

  1. We have found this stuff is really really hard to get right, so I'd rather not reinvent what Postgres already does.
  2. SQLite can only be configured to support up to 125 linked databases. While this is a good amount, this would likely not fill Truflation's requirement. Their idea is for users to be able to deploy their own schemas, composing on others. We would hit this limit quite quickly.
  3. Postgres supports TOAST, which is something that Fractal really needs. We otherwise will have to implement it ourselves.
  4. Postgres has PL/PGSQL. We have talked a lot about extending Kuneiform to handle logic. PL/PGSQL does this. Essentially, all we would have to do is define our own (more restrictive) syntax for it and then rewrite it to PL/PGSQL in the engine, just like we do with SQL. While this probably isn't something we will handle this release, it is still very much a top of the bucket list item.
  5. Migrations would be easier with Postgres. Postgres's improved locking mechanisms means that we can do long-running migrations while the database is still running. SQLite's primitive locking mechanisms would make long-running migrations very hard, if not impossible. Also not something we are handling this version, but a top priority nonetheless.

There are some downsides to Postgres:

  1. It makes deployments more complex, since there is an extra service to run, instead of a single binary. We will likely look much more seriously at containers as a distribution method since we can package Postgres in a container with Kwild.
  2. It makes testing more complex.

This is obviously a really big change, but the way I see things going, I think we should really consider it.