-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relationships selected in SQL-based datastores now elide columns that have static values #2096
base: main
Are you sure you want to change the base?
Conversation
d6104da
to
44f2cbb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments. What I saw looked good but I wouldn't mind having another set of eyes on it.
} | ||
|
||
// QueryRelationships queries relationships for the given query and transaction. | ||
func QueryRelationships[R Rows, C ~map[string]any](ctx context.Context, queryInfo QueryInfo, sqlStatement string, args []any, span trace.Span, tx Querier[R], withIntegrity bool) (datastore.RelationshipIterator, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does the ~
mean in this context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~T means the set of all types whose underlying type is T
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you did something like type MyType map[string]any
, doing ~map[string]any
will also accept that and not just explicitly map[string]any
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which is exactly what we do
colIntegrityKeyID, | ||
colIntegrityHash, | ||
colTimestamp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For myself: this was moved into extraFields
on the common.SchemaInformation struct.
type wrappedTX struct { | ||
tx querier | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What role does this play?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It translates the interface
internal/datastore/common/sql.go
Outdated
ColCaveatName string | ||
ColCaveatContext string | ||
PaginationFilterType PaginationFilterType | ||
PlaceholderFolder sq.PlaceholderFormat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Folder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also is this in here because this allows you to push more logic up into the common sql logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
var resourceObjectType string | ||
var resourceObjectID string | ||
var relation string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean anything that these declarations moved out of the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means they are on the heap, which isn't good, but it also means we don't need to recalculate them every iteration, which is good
pkg/datastore/test/relationships.go
Outdated
@@ -1573,10 +1573,7 @@ func ConcurrentWriteSerializationTest(t *testing.T, tester DatastoreTester) { | |||
<-waitToFinish | |||
return err | |||
}) | |||
if err != nil { | |||
panic(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it this is unnecessary when we're using MustBugF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in a test, so I changed it to just return the error
@@ -562,17 +654,57 @@ func (tqs QueryExecutor) ExecuteQuery( | |||
limit = *queryOpts.Limit | |||
} | |||
|
|||
toExecute := query.limit(limit) | |||
if limit < math.MaxInt64 { | |||
query = query.limit(limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this conditional just to reduce what's going over the wire?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, yes
44f2cbb
to
aa6f111
Compare
Updated |
b45f0a7
to
65b4e07
Compare
aaad075
to
334d8ea
Compare
334d8ea
to
228953f
Compare
228953f
to
a904d41
Compare
a904d41
to
75efa5a
Compare
96bcf17
to
d98d7d2
Compare
columns that have static values This vastly reduces data over the wire, as well as deserialization time and memory usage
This will allow us to centrally register additional datastore validation that only runs at test time
This validation test acts as a proxy in the memdb testing datastore and validates that the column elision code (which *isn't* used in memdb) matches the static fields to the values returned for all relationships loaded
d98d7d2
to
a833e25
Compare
Rebased |
|
||
// StaticValueOrAddColumnForSelect adds a column to the list of columns to select if the value | ||
// is not static, otherwise it sets the value to the static value. | ||
func StaticValueOrAddColumnForSelect(colsToSelect []any, queryInfo QueryInfo, colName string, field *string) []any { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are these []any
? aren't columns always []string
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are actually string reference, and it returns an []any
because that's what the rows.Scan
requires.
This isn't adding the static value to the slice, its adding the reference to the string value for collecting the non-static value
// and injects additional proxies for validation at test time. | ||
// NOTE: These additional proxies are not performant for use in production (but then, | ||
// neither is memdb) | ||
func NewMemDBDatastoreForTesting( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep this in the memdb
package? This just feels hard to find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want the memdb package to have requirements on any of the common SQL code, since it is an odd mixture
|
||
// WithAdditionalFilter returns a new SchemaQueryFilterer with an additional filter applied to the query. | ||
func (sqf SchemaQueryFilterer) WithAdditionalFilter(filter func(original sq.SelectBuilder) sq.SelectBuilder) SchemaQueryFilterer { | ||
return SchemaQueryFilterer{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this, what about a Clone()
method (with a test), and then modify the clone? Then you can't accidentally omit a field in these With
methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. This is also why I want to have the linter check these, but I'll add a Clone
|
||
colsToSelect = append(colsToSelect, &expiration) | ||
|
||
if withIntegrity { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it make sense to have determine this from queryinfo instead of a seperate boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are injected at different places. To add it to QueryInfo, we'd need a Clone or a WithIntegrity(bool) on QueryInfo. I'm happy to make that change, if you like
} | ||
|
||
// QueryRelationships queries relationships for the given query and transaction. | ||
func QueryRelationships[R Rows, C ~map[string]any](ctx context.Context, queryInfo QueryInfo, sqlStatement string, args []any, span trace.Span, tx Querier[R], withIntegrity bool) (datastore.RelationshipIterator, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be simpler to have a QueryRelationshipsNoElision
(or whatever name) insted of all of the queryInfo.Schema.ColumnOptimization == ColumnOptimizationOptionNone
throughout. Then the query constructors can just pick the implementation on startup and not check at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was all intended to be temporary until we end the experiment and remove the flag. I can change it to use a different query function as you suggest, but I'm not sure its worth the effort for a temp flag
@@ -580,19 +702,73 @@ func (tqs QueryExecutor) ExecuteQuery( | |||
limit = *queryOpts.Limit | |||
} | |||
|
|||
toExecute := query.limit(limit) | |||
if limit < math.MaxInt64 { | |||
query = query.limit(limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the impact of no longer always having a limit clause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, nothing. It just means sending less bytes over the wire
internal/datastore/common/sql.go
Outdated
// Set the column names to select. | ||
columnNamesToSelect := make([]string, 0, 8+len(query.extraFields)) | ||
|
||
columnNamesToSelect = checkColumn(columnNamesToSelect, query.schema.ColumnOptimization, query.filteringColumnTracker, query.schema.ColNamespace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this logic is essentially replicating the logic in QueryRelationships - there's no way to unify those two things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto for spanner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not easily; it is because of Spanner that its hard to do: Spanner needs its own executor code (it can't use the one relationships.go because it has different handling of types for caveats and expiration), but by putting the column selection here, we do share at least that portion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can column selection be factored out and called from both places? this seems easy to get out of sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could push it down to the use sites, but it would be fairly similar.
Out of sync is fairly evident: every single read test immediately fails
@@ -417,7 +435,24 @@ type querier interface { | |||
QueryContext(context.Context, string, ...interface{}) (*sql.Rows, error) | |||
} | |||
|
|||
func newMySQLExecutor(tx querier) common.ExecuteQueryFunc { | |||
type wrappedTX struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it wrapped? can we have a better name here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It rewrites it to match the interface expected by the common executor. Any suggestions? wrappedForCommon
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah anything that will indicate the reason for the wrapping, like withCommonTx
or asCommonTx
or commonTx
0d16bd1
to
e0692f6
Compare
This moves the behavior out of Spanner datastore and into a common lib where possible
} | ||
|
||
toExecute.queryBuilder = toExecute.queryBuilder.From(from) | ||
columnNamesToSelect = append(columnNamesToSelect, b.Schema.ColExpiration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should there be a way to skip expiration fields if we know the rel can't have expiration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was going to do it a followup PR, but I'm happy to do so here. Let me know which you prefer
This vastly reduces data over the wire, as well as deserialization time and memory usage
Fixes #59