-
Notifications
You must be signed in to change notification settings - Fork 514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a namespace field to tuf metadata storage #1110
Conversation
5c25351
to
a1e369a
Compare
3e7c864
to
5615777
Compare
@@ -0,0 +1,18 @@ | |||
CREATE TABLE `namespaces` ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this looks pretty good. I have two thoughts, one of which may not be immediately relevant:
|
Working on the RDB tests and will update!
What about changing the table name from
How about
I changed the constraints to include the
We might be talking about different things, but the plan was to make the For example:
That's the rough plan for pending metadata for thresholding, were you thinking of a different type of "pending" metadata? |
823d196
to
011654d
Compare
Ah thanks for the explanation. I thought the "namespaces" (I'll stick with the term for now) would be more like "pending_signatures", "pending" (has enough signatures but maybe needs a final OK or for say, a periodic publish to happen) and "published", rather than referencing a signer by one means or another (name/keyID). If that's the plan, I don't think it even makes sense to have a table for them. Would we write every new "namespace" into that table to satisfy the foreign key relationship? What's the point of that (and of having the foreign key relationship) if new "namespaces" can be dynamically created (by creating new delegations with thresholds) and we'll just update the table when we see them? I don't think it would actually provide any meaningful guarantees. I just re-read your initial comments and I think it might be worth addressing some of the ideas you had for future uses of namespaces:
Notary doesn't care if you're signing docker images or python packages, it's just a TUF repo. GUNs are already meant to indicate ownership/context, hence all docker hub images are signed under
The ORM (gorm) already supports soft deletes. It's only our uniqueness constraints that are preventing us using it. Additionally one would still run into the problem if say, they deleted the same role/version twice at different points in time, so this only half solves a problem in its most basic form.
Those views are achieved client side. To make it work as you describe, I believe it would require multiple root files referencing different targets keys. PEP 480 doesn't describe this structure as far as I remember (and having taken another brief scan), instead it would be on a user to say "I know packages in the 'claimed' bucket are more secure so I only want to use those" (something the user would still have to do even if we maintained a special view server side). |
That pretty much has to be the plan - it's been a while, but if you read back through #841 (and related linked discussions: https://github.com/theupdateframework/tuf/issues/339), we hashed out how the coordination would have to work to avoid DOSing and/or bad things in the case of a single key compromise. In the case where you have only 1 staged copy at a time, the compromise of a single delegation key allows someone to continually clobber the staged metadata. (e.g. if you, cyli and I all sign for some metadata, I can continue to push garbage "staged" metadata and the two of you would have to race me to actually hit a 2/3 threshold!) As for utility, I was thinking having a separate table + fk constraint would be useful for garbage collecting? You can construct a list of candidate namespaces by querying the namespace table. I'm not 100% sold on the utility so I can drop it for now, we can always add it in the future if needed. I'll update to address all of this. |
@ecordell sorry, maybe I wasn't clear. My issue isn't with the process, only the implementation (the utility as you put it). It doesn't make sense to have the extra table unless the foreign key constraint is enforceable, but for it to be enforceable in a meaningful way, it also needs to be a static set of possibilities. If we're just creating namespaces dynamically, there's no point in having it, it's just not giving us anything. |
7a62e82
to
7565b99
Compare
@endophage updated so that there's simply a field with no fk, and no extra table. |
7db65dc
to
a95d5fa
Compare
Other than still really disliking the name of the new |
working on a new design that should address both the naming concerns and also make this more flexible (realized that this one won't easily support some of the queries we'd want to do, e.g. "if there's staged metadata, give me the version with the most signatures") |
042eb07
to
7ae5107
Compare
Okay so this took a bit longer than planned thanks to some fun gorm issues, but here's the revised design:
This lets us do nice queries like "staged metadata for GUN/Role with the most signatures" or "staged metadata signed by X but not yet signed by Y or Z" which we will need when presenting these states, eventually, to the user. You'll notice that compared to the I checked all of the queries being generated in SQL and they look good. The joins are minimal and use indexes. I've also updated gorm to v1.0 and made the necessary changes. The older version that we were using before was generating bad SQL when using a join table. What is the status of RethinkDB? I was going to add support for that but noticed the changefeed isn't implemented there either. Is it deprecated? |
cmd/notary-server/main_test.go
Outdated
@@ -407,7 +407,7 @@ func TestGetGUNPRefixes(t *testing.T) { | |||
// For sanity, make sure we can always parse the sample config | |||
func TestSampleConfig(t *testing.T) { | |||
var registerCalled = 0 | |||
_, _, err := parseServerConfig("../../fixtures/server-config.json", fakeRegisterer(®isterCalled), false) | |||
_, _, err := parseServerConfig("../../fixtures/server-config.sqlite.json", fakeRegisterer(®isterCalled), false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gorm 1.0 errors if it can't establish a connection immediately, so switched to sqlite for testing config parsing.
7ce2685
to
9527b8c
Compare
e36c6b7
to
184c434
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only consider the check in isValidTargetsStructure
to be a necessary change. The rest are questions/future minor polish but I don't consider them blockers.
|
||
// GetChecksum returns the given TUF role file and creation date for the | ||
// GUN with the provided checksum. If the given (gun, role, checksum) are | ||
// not found, it returns storage.ErrNotFound | ||
GetChecksum(gun data.GUN, tufRole data.RoleName, checksum string) (created *time.Time, data []byte, err error) | ||
GetChecksum(gun data.GUN, tufRole data.RoleName, checksum string, channels ...*Channel) (created *time.Time, data []byte, err error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checksum is sufficiently specific I wonder if it's necessary to also pass a channel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same checksum can live in multiple channels when staging metadata
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah of course!
@@ -33,17 +33,17 @@ type MetaStore interface { | |||
// GetCurrent returns the modification date and data part of the metadata for | |||
// the latest version of the given GUN and role. If there is no data for | |||
// the given GUN and role, an error is returned. | |||
GetCurrent(gun data.GUN, tufRole data.RoleName) (created *time.Time, data []byte, err error) | |||
GetCurrent(gun data.GUN, tufRole data.RoleName, channels ...*Channel) (created *time.Time, data []byte, err error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should GetCurrent
implicitly mean published
and maybe there should be a new method GetLatest
that takes a channel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current behavior is that GetCurrent(gun, role)
uses the published channel, GetCurrent(gun, role, channel)
uses the specified channel. I'm happy to add another wrapper method but that seems not dissimilar from your suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh, yeah, this is fine. I had a moment of forgetfulness that a ...
can also be left empty by the caller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like in all the implementations, if multiple channels are passed, only the data in the first channel the data is found in is passed (and the channels are checked in order of preference). Should this behavior be documented as a requirement in the interface, and a general test be added (maybe in server/storage/storage_test.go
) to make sure that even if there is a TUF file with the same GUN and role is in multiple channels, this function will only return the data from the first?
Similarly for GetVersion
. I guess for GetChecksum
it doesn't really matter, since if it's the same checksum, it doesn't matter which channel it's in because all the data will be the same, although the creation time may be different. If the creation time matters, then perhaps a similar test can be added for GetChecksum
?
server/storage/memory.go
Outdated
} | ||
|
||
func checksumKey(gun data.GUN) string { | ||
return fmt.Sprintf("%s", gun) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we really need this function at all it should just be return gun.String()
server/storage/sql_models.go
Outdated
} | ||
|
||
// Published is the channel all fully signed, validated metadata lives in | ||
var Published = Channel{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a var
block
var (
Published = ...
Staged = ...
)
tuf/data/targets.go
Outdated
@@ -43,7 +42,7 @@ func isValidTargetsStructure(t Targets, roleName RoleName) error { | |||
} | |||
|
|||
for _, roleObj := range t.Delegations.Roles { | |||
if !IsDelegation(roleObj.Name) || path.Dir(roleObj.Name.String()) != roleName.String() { | |||
if !IsDelegation(roleObj.Name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the longer check should stay in. It's checking that all the delegations are direct children of the parent. i.e. if the targets.json file contains a "targets/foo/bar" delegation, that would be invalid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to get away from the strict hierarchy, looking forward to multi-role delegation. The DFS for delegations just needs to check for a cycle and otherwise doesn't really need to care about the names. Is there a reason for enforcing the hierarchy that I'm missing?
Strict hierarchy is required by the spec I believe. They simplified it for the 1.0 release such that the delegation entry now only contains its own segment and not the whole path. To implement that though (and I really want to see that change get in) I'd rather we approach it specifically rather than piecemeal get there.
…On Jun 6, 2017, 04:25 -0700, Evan Cordell ***@***.***>, wrote:
@ecordell commented on this pull request.
In tuf/data/targets.go:
> @@ -43,7 +42,7 @@ func isValidTargetsStructure(t Targets, roleName RoleName) error {
}
for _, roleObj := range t.Delegations.Roles {
- if !IsDelegation(roleObj.Name) || path.Dir(roleObj.Name.String()) != roleName.String() {
+ if !IsDelegation(roleObj.Name) {
I was trying to get away from the strict hierarchy, looking forward to multi-role delegation. The DFS for delegations just needs to check for a cycle and otherwise doesn't really need to care about the names. Is there a reason for enforcing the hierarchy that I'm missing?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Signed-off-by: Evan Cordell <cordell.evan@gmail.com>
Signed-off-by: Evan Cordell <cordell.evan@gmail.com>
Signed-off-by: Evan Cordell <cordell.evan@gmail.com>
Signed-off-by: Evan Cordell <cordell.evan@gmail.com>
Signed-off-by: Evan Cordell <cordell.evan@gmail.com>
Signed-off-by: Evan Cordell <cordell.evan@gmail.com>
ea277f8
to
fe55a08
Compare
fe55a08
to
e407047
Compare
Signed-off-by: Evan Cordell <cordell.evan@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@ecordell Looking now, although apologies will take me a little bit to re-familiarize myself :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, I haven't finished reviewing everything yet - I got sidetracked. I just wanted to include a couple of comments first - and I'll do another pass tomorrow.
I had a non-blocking question about garbage collection, should we choose to implement it - how do we want to implement that? It can totally be done in a different PR in the future, but I just wanted to make sure we considered it so that we didn't make a change that we'd have to undo in order to support it.
Would it be like "find every tuf_file
ID that is not matched to the Published
channel, and for each one, delete every other entry in tuf_file
with the same gun, role, and version but different ID"?
@@ -0,0 +1,25 @@ | |||
ALTER TABLE "tuf_files" DROP CONSTRAINT "tuf_files_gun_role_version_key"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb SQL question: Is this constraint automatically created in 0001_initial.up.sql
when we specify UNIQUE ("gun", "role", "version")
? I didn't see this particular name anywhere in our repo, so was just wondering if this was something postgres specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, exactly. It's created by UNIQUE
in postgres.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool thanks for explaining!
@@ -33,17 +33,17 @@ type MetaStore interface { | |||
// GetCurrent returns the modification date and data part of the metadata for | |||
// the latest version of the given GUN and role. If there is no data for | |||
// the given GUN and role, an error is returned. | |||
GetCurrent(gun data.GUN, tufRole data.RoleName) (created *time.Time, data []byte, err error) | |||
GetCurrent(gun data.GUN, tufRole data.RoleName, channels ...*Channel) (created *time.Time, data []byte, err error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like in all the implementations, if multiple channels are passed, only the data in the first channel the data is found in is passed (and the channels are checked in order of preference). Should this behavior be documented as a requirement in the interface, and a general test be added (maybe in server/storage/storage_test.go
) to make sure that even if there is a TUF file with the same GUN and role is in multiple channels, this function will only return the data from the first?
Similarly for GetVersion
. I guess for GetChecksum
it doesn't really matter, since if it's the same checksum, it doesn't matter which channel it's in because all the data will be the same, although the creation time may be different. If the creation time matters, then perhaps a similar test can be added for GetChecksum
?
Role data.RoleName | ||
Version int | ||
Data []byte | ||
Channels []*Channel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like, from the memory implementation (I am not very familiar with advanced SQL syntax like joins and grouping, so I hadn't gotten to looking at the SQL implementation yet :)) that if multiple channels are passed, the same data will be added to all the channels.
Out of curiosity, what is the use case for staging the same data to multiple channels at once? In the case of published and staging, you probably don't need to write to both. In the case where each channel is a particular user or key ID, it also seems strange to write to multiple key IDs. Is the case for writing to both staging and a particular key ID? (I was a little fuzzy on how that would work for threshold signing).
If that's the case, and adding the same update should apply to every channel in the channel list (as opposed to the Get*
functions, for which if multiple channels are passed only the data from the first channel that gun and role are in will be returned), should this be documented in the storage interface's UpdateCurrent
function declaration? And in server/storage/storage_test.go
's testUpdateCurrentInChannel
, can we update multiple channels at the same time and ensure that getting for both channels returns the same data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the case for writing to both staging and a particular key ID? (I was a little fuzzy on how that would work for threshold signing).
Yeah, that was the plan. You upload metadata and it gets put into channels based on the keys it's signed with, then users can coordinate about which ones to promote by picking the versions they want to sign. Then you can get "all staged" by just querying the staging channel, or a particular staged version by querying for (staged, key_id).
(as opposed to the Get* functions, for which if multiple channels are passed only the data from the first channel that gun and role are in will be returned),
I think you're just seeing the deficiencies of the in-memory implementation. For testing purposes I didn't think it was necessary to try to replicate everything that sql gives you, but maybe it is worth the effort?
The SQL implementation gives you what you would expect from the interfaces: when you Get
from a set of channels, the file must exist in all of them, and if you Update
to a set of channels, the metadata will be placed in all of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(The issue with the in-memory implementation is the answer to your other comment as well, for some reason GH won't let me reply to it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for some reason GH won't let me reply to it
I think that's because I started a review, but in my "review" I replied to someone else's comment, and github posts my response both as part of my review and as part of the original comment. :| You might be able to respond in the original comment.
The SQL implementation gives you what you would expect from the interfaces: when you Get from a set of channels, the file must exist in all of them, and if you Update to a set of channels, the metadata will be placed in all of them.
Ah ok, thanks for explaining. Apologies, I had misunderstood what the intended behavior of Get*
in the SQL implementation due to the fact that I missed the PRIMARY KEY (tuf_file_id, channel_id)
in the channel table creation SQL, and because I just saw Where("channel_id IN (?)", channelIDs)
in TufFilesInChannels
without reading further.
think you're just seeing the deficiencies of the in-memory implementation. For testing purposes I didn't think it was necessary to try to replicate everything that sql gives you, but maybe it is worth the effort?
I think the intention of the in-memory implementation is to provide basically a verified fake for others to use, and as a reference implementation for how everything else works. I'm ok with leaving it for now and we can address that in a future PR, so long as we document the deficiency.
I do think we should add more test cases for getting data from multiple channels, though, to be shared across the implementations of the storage interface (e.g. server/storage/storage_test.go
, although in the in-memory implementation tests can just not call that particular function), and document this expected behavior in the interface, just because it can be easier to see what the intended behavior is if someone (like me :D) isn't super familiar with SQL joins. I'd also be afraid of making a change in that SQL implementation code and breaking it!
That's what I was thinking, probably with a configurable age for deleting them |
@@ -115,7 +115,7 @@ func (rdb RethinkDB) UpdateCurrent(gun data.GUN, update MetaUpdate) error { | |||
|
|||
// UpdateCurrentWithTSChecksum adds new metadata version for the given GUN with an associated | |||
// checksum for the timestamp it belongs to, to afford us transaction-like functionality | |||
func (rdb RethinkDB) UpdateCurrentWithTSChecksum(gun, tsChecksum string, update MetaUpdate) error { | |||
func (rdb RethinkDB) UpdateCurrentWithTSChecksum(gun data.GUN, tsChecksum string, update MetaUpdate) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this! 👍
// Find tuf files matching all passed channels | ||
return func(db *gorm.DB) *gorm.DB { | ||
return db.Joins("INNER JOIN channels_tuf_files ON tuf_files.id = channels_tuf_files.tuf_file_id"). | ||
Where("channel_id IN (?)", channelIDs). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am likely just misunderstanding how this works due to lack of familiarity, but would this only enforce that the rows were in the right number of channels, as opposed to the channels specified? For instance, if there were 3 channels (Alice
, Bob
, and Carol
), and I wanted to get the latest metadata for GUN
and Timestamp
that was in both Alice
and Carol
's channels, but the database had:
- id=
5
,GUN
,Timestamp
, channel=Alice
(via thechannel_tuf_files
table) - id=
5
,GUN
,Timestamp
, channel=Bob
(via thechannel_tuf_files
table)
Would the scope consider that good because the id=5
is in one of (Alice
, Carol
), and it it is also in 2 distinct channels (Alice
, Bob
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, gorm makes this harder to read than it should be.
The scope says:
- Ignore any roles that aren't in the set of requested channels
- Count how many distinct channels a tuf file is in
- Because (
tuf_file_id
,channel_id
) is unique, if number of unique channels == number of requested channels, the tuf file is in all requested channels.
For your specific example it would be something like this:
- Query by
GUN, TS
inAlice
andCarol
channels tuf_files
is joined withchannel_tuf_files
- All rows in the joined table with
channel_id
!=Alice
orCarol
are discarded- this is the part that specifically answers your question - there is no longer a (
5
,GUN
,Timestamp
,Bob
) row in the result set
- this is the part that specifically answers your question - there is no longer a (
- The result set only has (
5
,Carol
) COUNT
of that set is 1- 2 channels were requested, so the result set returns empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, thanks for explaining. I didn't realize it ran the query for every channel in the Where()
and then joined.
} | ||
|
||
// The Channel is added separately from the TUFFile record to prevent gorm from issuing an Update | ||
if err = tx.Model(&tufFile).Association("Channels").Append(update.Channels).Error; err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, the Append
implementation in the Gorm code is kinda opaque - I think this will not error if a relationship already exists, since I think it eventually does an INSERT ... WHERE NOT EXISTS ...
, but would it be possible to add a test case for if the metadata already exists in one of the channels specified, but not in the others?
Just to make sure this behavior is consistent should we ever chose to switch ORMs, or upgrade, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I will add a test for it.
Getting gorm to generate the right SQL for this pr took me the better part of a day, I was almost ready to rip it out and replace it...but of course that would've made this much harder to review.
@@ -127,8 +164,9 @@ func (db *SQLStorage) UpdateMany(gun data.GUN, updates []MetaUpdate) error { | |||
// called, version ordering in the updates list must be enforced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally non-blocking observation, unrelated to this PR - can we just sort the updates and then call UpdateCurrent
? cc @endophage
}, | ||
).First(&row) | ||
}). | ||
Order("version desc").Limit(1).First(&row) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking: I don't think it hurts anything, but is ordering necessary here? If the version is different, won't the checksum be different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, I see no reason this is needed and can't remember why I changed it. It's potentially just a copy/paste error.
@@ -241,6 +289,7 @@ func (db *SQLStorage) Delete(gun data.GUN) error { | |||
} | |||
// if there weren't actually any records for the GUN, don't write | |||
// a deletion change record. | |||
// don't write a change if this isn't published yet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this supposed to correspond to some channel logic? (e.g. don't delete if the data isn't in the published channel?) If we're deleting everything in a GUN, should even staged metadata be deleted (since it probably won't be published in the future?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heh, yes this is a straggling comment - originally I scoped the delete but realized, as you point out, that when we delete a GUN we want to delete all staged copies as well
exists := db.Where("gun = ? and role = ? and version >= ?", | ||
gun.String(), update.Role.String(), update.Version).First(&TUFFile{}) | ||
|
||
exists := db.Scopes(TufFilesInChannels(update.Channels...)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking observations - I don't think any of these things are problems because if we have garbage collection, but I just wanted to note them in case I missed something.
If we're updating the non-published channels, it's possible to add very much older staged versions of metadata (if there was no staged metadata for that version or higher). Any attempt to publish this old staged metadata will fail anyway, so not an issue.
If someone publishes a really high version metadata in a non-published channel, that will only prevent users from staging thresholded metadata for a limited amount of time, because since it does not necessarily correspond to something in published, it will be garbage collected eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these are very good points, and we should add additional checks for those cases when actually implementing the threshold support.
CREATE TABLE `channels_tuf_files` ( | ||
`channel_id` INT(11) NOT NULL, | ||
`tuf_file_id` INT(11) NOT NULL, | ||
FOREIGN KEY (channel_id) REFERENCES channels(`id`) ON DELETE CASCADE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, does gorm automatically add this to both the mysql and postgres table generation? (didn't see something in the model that corresponded to this directive)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, I didn't use gorm to generate the table or migration
Sorry for taking so long to review this @ecordell! This looks pretty good to me - my only 2 concerns are:
I also had a question for the future thresholding implementations: Will there be there an API to create more channels? Deleting channels? If channels correspond to key IDs, will they be auto-reaped if the key is removed? |
That's open for discussion, but IMO if possible we should keep the channels as an implementation detail and hide them from the user. Exposing them might require more careful consideration and possibly a TAP? |
Makes sense - seems like hiding from the user makes the CLI/API simpler, too. |
This PR adds “channels” to tuf metadata. A
Published
andStaged
channel are added and the migration puts all existing metadata into thePublished
channel.Motivation
The primary motivation for introducing namespaces is to support “staged” views of metadata, so that a notary server can coordinate threshold signing between remote users.
As discussed in #841, it’s insufficient to have a single “staged” copy of metadata due to race conditions. There will be a maximum of N possible staged copies, where N is the number of users allowed to sign a particular metadata file. The limit to those users also serves as protection against DOSing by uploading multiple staged versions (each signer gets a max of 1 staged copy).
This is just the first step towards that, to keep things manageable.
Design decisions
This is not user-facing; users do not get to decide what channel their metadata lives in. For this reason the namespace is not exposed to client. Channel decisions will be made on the server during the update process.
A new
channels
table is added to allow channels to be easily added/removed. The expectation is that a few (e.g.Published
andStaged
) will be fairly static, but additional Channels will come and go. For example, there may be a new channel for each key that can sign metadata (and will only exist while during the staging process).The Changefeed only publishes changes in the
Published
channel for now. It may make sense to include other channels at some point, or to provide channel-specific changefeeds, but this first version preserves the existing behavior.