-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial introduction of providers to the database #955
Conversation
b61edbf
to
86fad62
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two quick comments, I need to wrap my head around where are you going with this. I /really/ like that we would get rid of referencing GitHub everywhere with a constant though!
|
||
-- Create default GitHub provider | ||
INSERT INTO providers (name, group_id, definition) | ||
VALUES ('github', 1, '{}'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not strictly related to what this work is trying to accomplish, but shouldn't we create higher numbered migrations from this point on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of patches like this (work that still needs to be done) is why I was not too keen on freezing the initial migration and starting to work on higher numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change from group IDs to projects will be equally disruptive.
18e5ea5
to
41e65b0
Compare
provider_id UUID NOT NULL REFERENCES providers(id) ON DELETE CASCADE, | ||
group_id INTEGER NOT NULL REFERENCES groups(id) ON DELETE CASCADE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If providers are nested under group_id
s, do we need an explicit group_id
on this table and the following ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by this comment, there is a group I'd in this table and most of the ones I've changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was my point -- if provider_id
is unique by group, then if you know the provider_id
, you know the corresponding group. My suggestion was to remove the group_id
field here, since it was redundant with the provider. (Alternatively, since providers are unique by group_id
and name
, you could have a provider_name
here along with the group_id
and use that to index into the Providers
table -- possibly even as a primary index rather than the UUID.)
I think part of this is that my brain is used to thinking about partitioning SQL data stores across shards, and I'm trying to latch onto those sharding keys as an organizing mechanism even though we don't need it for scale at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah! I see the misunderstanding. No, there could be several providers in one group. The name is just unique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can be several providers in one group, but we want to maintain the constraint that <this row>.group_id == <this row>.provider_id.<providers row>.group_id
, right? I'm not sure there's a good way to express that in SQL; I was suggesting that we could remove the <this row>.group_id
, or use group_id
and and provider_name
to avoid possible future database inconsistencies, e.g.
CREATE TABLE provider_access_tokens (
id SERIAL PRIMARY KEY,
group_id INTEGER NOT NULL REFERENCES groups(id) ON DELETE CASCADE,
provider TEXT NOT NULL,
owner_filter TEXT,
encrypted_token TEXT NOT NULL,
expiriation_time TIMESTAMP NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
FOREIGN KEY (group_id, provider) REFERENCES providers(group_id, name) ON DELETE CASCADE
);
@@ -154,7 +168,7 @@ CREATE TABLE artifact_versions ( | |||
|
|||
CREATE TABLE session_store ( | |||
id SERIAL PRIMARY KEY, | |||
provider TEXT NOT NULL, | |||
provider UUID NOT NULL REFERENCES providers(id) ON DELETE CASCADE, | |||
grp_id INTEGER, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this group_id
as we'd generally conceive of it? (Maybe I shouldn't poke that sleeping bear...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, which in the near future will have to change to projects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I specifically meant the spelling was different than elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, yeah, I don´t know why that is. We should probably change this in a separate PR.
internal/providers/providers.go
Outdated
@@ -32,13 +34,13 @@ import ( | |||
// instead of a concrete github client. | |||
func BuildClient( | |||
ctx context.Context, | |||
prov string, | |||
prov uuid.UUID, | |||
groupID int32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need groupID
here anymore.
Do we have a guess as to how many more database drops we have coming up? Should we plan for weekly drops until the second week of October, or is this the last one we forsee? |
Two drops based on providers, one drop based.on group to projects, then one on IAM. That's what I foresee. |
Thanks for the forecast on database drops. A few clarifications, but I'm willing to drop my required change. |
Agreed on number of times we drop the database.
a5c215e
to
53d5a0c
Compare
53d5a0c
to
2166962
Compare
I talked with @yrobla and ended up changing the logic for the RevokeOAuthTokens call to what it intended to be with this providers concept. |
a1de46c
to
7392ff8
Compare
acf4172
to
cb3e901
Compare
(Let me know when you want another review) |
Now 😃 |
@jhrozek could you also take a look at this one when you get a chance? Would really use as many eyes as possible here. |
Sure, thanks for the reminder! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't really find any issues during code review. I only left one comment.
|
||
-- Create default GitHub provider | ||
INSERT INTO providers (name, group_id, implements, definition) | ||
VALUES ('github', 1, ARRAY ['github', 'git', 'rest']::provider_type[], '{}'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question and something that can be fixed in a migration later on - why did you choose this particular set for GitHub? I understand github
and git
, but what would we use rest
for here? Also, did you consider adding oci
since ghcr.io is tied to GitHub and we already support it (although in kind of a tacky way) for artifacts?
Not saying this is wrong, I'm just trying to picture how would other providers look like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't really see full-straight forward support for an oci
provider from github. I can add it once I work on separating implementations from interfaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments, but LGTM
@@ -115,7 +132,7 @@ CREATE TABLE signing_keys ( | |||
CREATE TABLE repositories ( | |||
id SERIAL PRIMARY KEY, | |||
provider TEXT NOT NULL, | |||
group_id INTEGER NOT NULL REFERENCES groups(id) ON DELETE CASCADE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to keep this REFERENCES
foreign key constraint? I suppose it's not strictly needed...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's part of the group_id
+ provider
foreign key. Are you suggesting we re-add it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also a foreign key into the groups
table, but we can rely on the transitive ON DELETE CASCADE
from providers
if we prefer.
@@ -316,20 +306,27 @@ func (s *Server) parseGithubEventForProcessing( | |||
return nil | |||
} | |||
|
|||
return s.parseRepoEvent(context.Background(), ghclient.Github, payload, msg) | |||
return s.parseRepoEvent(context.Background(), payload, msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm -- the way we've done this right now, we don't pass pull_request
or package
type events through to s.parseRepoEvent
. Not to fix here, but it feels like we should be using watermill's built-in fanout, as tested here:
https://github.com/stacklok/mediator/blob/main/internal/events/eventer_test.go#L88
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, we probably need to revisit this and reverify the logic. Right now what we do in each function is build a message with appropriate metadata for the event handler. Each branch will have a little more metadata needed, but I guess most of this could be refactored into pieces to be more readable.
|
||
// need to read all tokens from the provider and revoke them | ||
tokens, err := s.store.GetAccessTokenByProvider(ctx, ghclient.Github) | ||
// This is in case of a security breach, where we need to revoke all tokens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this make more sense to execute as raw SQL if that happens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you raise this, I think I would prefer this to be part of an administrator's playbook instead of a function an admin can do via the control plane. This merely documented what the feature was intended for and made it work with the newer logic. Let me raise a bug to remove this altogether in further iterations.
This does indeed remove the tokens from the DB, but it also calls the GH API to invalidate the token, so that part might need a nice utility. We could have it as a separate sub-command or a separate container an admin could leverage.
if err != nil { | ||
return nil, status.Errorf(codes.InvalidArgument, "cannot infer group id") | ||
} | ||
in.GroupId = group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You could put in.GroupId
up in the assignment on line 263 if you want, since changes to the request message will be ignored when you return an error.
if existingRepo.GroupID != groupId { | ||
fmt.Println("got request to sync repository of different group. Skipping") | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At some point, we should record a metric here, but feel free to hold off for a future PR.
This simply replaces the usages of the `github` provider checks and instead provides a simple implementation of providers that will be expanded upon. The intent is to leave the same functionality as-is, but instead rely on a provider that's installed on every organization by default. I've left traces of how the implementation will evolve by adding a definition and a `providers_type` column to the providers database. These will serve as the means of instantiating providers dynamically, instead of always building the github one as we do today. But, to keep things short, I've left that out of this PR.
This removes the group information from entities in favor of relying on providers for data integrity.
This reverts commit 84ce748.
Co-Authored-By: Evan Anderson <evan@stacklok.com>
These prevent us from creating multiple objects which is not the desired outcome.
cb3e901
to
43ced75
Compare
Rebased and addressed some of the comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit, but noticed that on some places you missed using the providerError
(and used its code representation) when searching for a provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think most if not all comments have been addressed and the PR is now large enough that it makes sense to just keep on going with incremental PRs.
@@ -115,7 +132,7 @@ CREATE TABLE signing_keys ( | |||
CREATE TABLE repositories ( | |||
id SERIAL PRIMARY KEY, | |||
provider TEXT NOT NULL, | |||
group_id INTEGER NOT NULL REFERENCES groups(id) ON DELETE CASCADE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's also a foreign key into the groups
table, but we can rely on the transitive ON DELETE CASCADE
from providers
if we prefer.
This simply replaces the usages of the
github
provider checks and insteadprovides a simple implementation of providers that will be expanded upon.
The intent is to leave the same functionality as-is, but instead rely on a
provider that's installed on every organization by default.
I've left traces of how the implementation will evolve by adding a definition
and a
providers_type
column to the providers database. These will serve as the meansof instantiating providers dynamically, instead of always building the github one
as we do today. But, to keep things short, I've left that out of this PR.
Related-to: stacklok/epics#34