Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: updating Schema and system tables #1662

Closed
wants to merge 2 commits into from
Closed

Conversation

kazimuth
Copy link
Contributor

@kazimuth kazimuth commented Sep 3, 2024

Description of Changes

Sketch of updating the system tables + updating ModuleDef to be a little easier to add constraints to, as suggested by Mazdak a while back.

I will probably split off the ModuleDef update into its own PR, since it is fairly contained and can be merged in the next day or two.
That is blocked on merging #1658 and #1661.

This code is full of errors, the draft is just here to allow review of the new Schema definitions.

API and ABI breaking changes

This breaks the ABI and API, it rearranges system tables.

Expected complexity level and risk

4, this is going to reverberate across the entire codebase.

Testing

Describe any testing you've done, and any testing you'd like your reviewers to do,
so that you're confident that all the changes work as expected!

  • Write a test you've completed here.
  • Write a test you want a reviewer to do here, so they can check it off when they're satisfied.

@kazimuth
Copy link
Contributor Author

kazimuth commented Sep 3, 2024

Key questions:

  • Can the system tables store enums (with data) at all?
  • Can we add variants to an enum stored in the system tables without it being a breaking change?
  • Can we add a new system table without it being a breaking change?

@gefjon
Copy link
Contributor

gefjon commented Sep 3, 2024

First let me say I'm addressing this from the perspective of: we maintain a known schema for each system table, supplied by the host, and the data in the snapshot and/or log must conform to that schema. That may not always be the case, but it is the case now, and frankly I don't think we have time to change it before 1.0.

Can the system tables store enums (with data) at all?

Sure. Why wouldn't they be able to?

Can we add variants to an enum stored in the system tables without it being a breaking change?

First of all, obviously you can't take a newer module, commitlog or snapshot and run it on an older host if you do that. The old host will see what it believes to be an invalid discriminant and likely panic.

Second of all, you can only take an older module, commitlog or snapshot and run it on a newer host if the new variant doesn't change the BFLATN layout of the sum type. This means that the new variant's payload must have size and alignment less than or equal to the largest and most-aligned previously-existing variant(s).

Can we add a new system table without it being a breaking change?

Again only in the new-host-old-data sense, and not the old-host-new-data sense, but sure, I don't see why not. I would expect a not-present table to behave the same as an empty table in either a snapshot or a commitlog. 'Course, we should test that, because there's a good chance I made an incorrect assumption somewhere in the snapshot-restoration code, but I don't think there's a fundamental problem or even a significant barrier to this working.

// the schema? Does this need to store a byte array instead?
"data", ConstraintData = 3,
}
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this to mirror how constraints are represented in ModuleDef. If we're going to have an enum here, that's what we should do in ModuleDef as well.

Notably postgres appears to put them all in a single table, but I would like you to confirm that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah JK you seem to have done that below.

@@ -173,26 +173,30 @@ macro_rules! st_fields_enum {
// WARNING: For a stable schema, don't change the field names and discriminants.
st_fields_enum!(enum StTableFields {
"table_id", TableId = 0,
"table_name", TableName = 1,
"name", TableName = 1,
"table_type", TableType = 2,
"table_access", TablesAccess = 3,
});
Copy link
Contributor

@cloutiertyler cloutiertyler Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to rename table_name to name, we should probably do table_type and table_access. Incidentally, I don't know what table_type even is.

Better yet, I would just leave these names as is to reducer code churn and errors and then we can do a final pass at the end. So leave it as:

st_fields_enum!(enum StTableFields {
    "table_id", TableId = 0,
    "table_name", TableName = 1,
    "table_name", TableName = 1,
    "table_type", TableType = 2,
    "table_access", TablesAccess = 3,
});

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to keep the prefix for the columns, that helps with debugging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah and joins.

// TODO(jgilles): can a system table store an enum?
// What if we need to add new variants to the enum? Will that break
// the schema? Does this need to store a byte array instead?
"index_algorithm", IndexAlgorithm = 3,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unfortunately doesn't tell me what the type of this field is.

Could you put a schema for all the tables in a comment like this:

| table_id | table_name  | table_type | foobar          |
|----------|-------------|------------|-----------------|
| u32      | String      | u32        | IndexAlgorithm  |

enum IndexAlgorithm {
    Foobar(String)
}

Or something similar? It's just hard to comment on the changes without the type info.

@kazimuth
Copy link
Contributor Author

kazimuth commented Sep 6, 2024

One other questions from Tyler:

Should we store primary_key in the schema? I realize we don't currently use it and that it's not meaningful except to determine what should be an update vs an insert/delete, but how do you compare two ModuleDefs which differ by PrimaryKey in ponderautomigrate? I guess it just ignores them?

primary_key can always be extracted by loading the ModuleDef, but I'm not sure if this is the best idea. I guess we need to decide if you need to be able to exactly reconstruct the ModuleDef from the system tables. Choices:

  1. Keep information that only matters to client codegen in ModuleDef, not in the system tables.
    We'll need to read the ModuleDefs from the loaded module to run ponder_automigrate.
  2. Add client codegen information (Type names, reducer names, misc exports, the typespace) to the system tables.
    Then we can exactly reconstruct the input ModuleDef as needed.

I'm not sure which is the best option here. 2 leaks some "nominal" information into the system tables, but it gives them a more complete view of the system. We have to keep the module around anyway (so that we can run it!) so I don't think 1 is the worst idea.

"index_name", IndexName = 2,

// TODO(jgilles): can a system table store an enum?
// What if we need to add new variants to the enum? Will that break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, system tables can store any SATN type.

@@ -173,26 +173,30 @@ macro_rules! st_fields_enum {
// WARNING: For a stable schema, don't change the field names and discriminants.
st_fields_enum!(enum StTableFields {
"table_id", TableId = 0,
"table_name", TableName = 1,
"name", TableName = 1,
"table_type", TableType = 2,
"table_access", TablesAccess = 3,
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to keep the prefix for the columns, that helps with debugging.

@kazimuth kazimuth closed this Sep 20, 2024
@kazimuth
Copy link
Contributor Author

Closed in favor of #1697

@kazimuth kazimuth deleted the jgilles/system_update branch September 20, 2024 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants