Add a namespace field to tuf metadata storage #1110

ecordell · 2017-03-01T16:37:33Z

This PR adds “channels” to tuf metadata. A Published and Staged channel are added and the migration puts all existing metadata into the Published channel.

Motivation

The primary motivation for introducing namespaces is to support “staged” views of metadata, so that a notary server can coordinate threshold signing between remote users.

As discussed in #841, it’s insufficient to have a single “staged” copy of metadata due to race conditions. There will be a maximum of N possible staged copies, where N is the number of users allowed to sign a particular metadata file. The limit to those users also serves as protection against DOSing by uploading multiple staged versions (each signer gets a max of 1 staged copy).

This is just the first step towards that, to keep things manageable.

Design decisions

This is not user-facing; users do not get to decide what channel their metadata lives in. For this reason the namespace is not exposed to client. Channel decisions will be made on the server during the update process.

A new channels table is added to allow channels to be easily added/removed. The expectation is that a few (e.g. Published and Staged) will be fairly static, but additional Channels will come and go. For example, there may be a new channel for each key that can sign metadata (and will only exist while during the staging process).

The Changefeed only publishes changes in the Published channel for now. It may make sense to include other channels at some point, or to provide channel-specific changefeeds, but this first version preserves the existing behavior.

endophage · 2017-03-01T18:46:35Z

migrations/server/mysql/0006_namespaces.up.sql

@@ -0,0 +1,18 @@
+CREATE TABLE `namespaces` (


YAAAAASSSS! I was just explaining this pattern to @riyazdf and @cyli a month or so ago :-D

(the pattern being a single column table rather than an enum)

endophage · 2017-03-01T18:56:54Z

Overall I think this looks pretty good. I have two thoughts, one of which may not be immediately relevant:

is "namespace" the right terminology? As a precursor towards thresholding and staged metadata, it's really a "state" or "stage" of the publishing process.
a. a sub point, I think we should be more explicit about the consts we use for the value. "default" doesn't tell me anything. Is "default" staged or published. I know which from reading the code, but I had to read more code than should be necessary to determine it. I'd say the default should be "published" and the const itself should be something like "FooPublished" where "Foo" is whatever we settle on as far as "Namespace", "Stage", "State", or something else.
(possibly for later) We need to take another look at the DB indices. I think the current implementation will prevent multiple iterations of the same role/version existing at the same time. This is desirable for published metadata, I can't remember what/if we settled on what would happen with pending metadata (I think I would be OK with only allowing one staged thing but want to make sure we've hashed out that experience.

ecordell · 2017-03-01T23:02:46Z

Working on the RDB tests and will update!

is "namespace" the right terminology? As a precursor towards thresholding and staged metadata, it's really a "state" or "stage" of the publishing process.

What about changing the table name from namespaces to labels, and changing the namespace field to state_label? Then additional labels can be added in the future to address some of the other use-cases of namespaces I suggested.

a sub point, I think we should be more explicit about the consts we use for the value. "default" doesn't tell me anything.

How about published as you suggested, or perhaps canonical or official?

We need to take another look at the DB indices. I think the current implementation will prevent multiple iterations of the same role/version existing at the same time.

I changed the constraints to include the namespace field, which means you can have the samerole/version in multiple records as long as they have different namespaces.

This is desirable for published metadata, I can't remember what/if we settled on what would happen with pending metadata (I think I would be OK with only allowing one staged thing but want to make sure we've hashed out that experience.

We might be talking about different things, but the plan was to make the namespace (or state_label) be used for pending metadata.

For example:

You and I are both signers for targets/releases.json with a threshold of 2.
I sign a new version and push it up
The notary server detects that the threshold is <2 and stashes it with namespace = ecordell (or perhaps namespace = <my_kid> would make more sense)
If you push up conflicting metadata, it would get stashed under namespace = endophage
Or if you pull my staged metadata, sign it, and push it up, it would be promoted to the default/canonical namespace and the stashed namespace = ecordell copy would be garbage collected.

That's the rough plan for pending metadata for thresholding, were you thinking of a different type of "pending" metadata?

endophage · 2017-03-02T17:23:19Z

Ah thanks for the explanation. I thought the "namespaces" (I'll stick with the term for now) would be more like "pending_signatures", "pending" (has enough signatures but maybe needs a final OK or for say, a periodic publish to happen) and "published", rather than referencing a signer by one means or another (name/keyID). If that's the plan, I don't think it even makes sense to have a table for them. Would we write every new "namespace" into that table to satisfy the foreign key relationship? What's the point of that (and of having the foreign key relationship) if new "namespaces" can be dynamically created (by creating new delegations with thresholds) and we'll just update the table when we see them? I don't think it would actually provide any meaningful guarantees.

I just re-read your initial comments and I think it might be worth addressing some of the ideas you had for future uses of namespaces:

Namespaces could be used to share a single notary server between multiple classes of objects. For example, a “container” namespace and a “python package” namespace.

Notary doesn't care if you're signing docker images or python packages, it's just a TUF repo. GUNs are already meant to indicate ownership/context, hence all docker hub images are signed under docker.io/* and I would expect python packages to be signed under something like pypi.org/*.

Could be used to “stash” a set of metadata when a user performs a potentially destructive operation (e.g. deleting a root) so that it can be recovered later.

The ORM (gorm) already supports soft deletes. It's only our uniqueness constraints that are preventing us using it. Additionally one would still run into the problem if say, they deleted the same role/version twice at different points in time, so this only half solves a problem in its most basic form.

Could be used to store different “views” of metadata a la the PyPI maximum security model. (A user could decide to look for targets only in the tree rooted at the offline key, or they could request from the online key as well).

Those views are achieved client side. To make it work as you describe, I believe it would require multiple root files referencing different targets keys. PEP 480 doesn't describe this structure as far as I remember (and having taken another brief scan), instead it would be on a user to say "I know packages in the 'claimed' bucket are more secure so I only want to use those" (something the user would still have to do even if we maintained a special view server side).

ecordell · 2017-03-02T20:39:43Z

If that's the plan, I don't think it even makes sense to have a table for them. Would we write every new "namespace" into that table to satisfy the foreign key relationship? What's the point of that (and of having the foreign key relationship) if new "namespaces" can be dynamically created (by creating new delegations with thresholds) and we'll just update the table when we see them? I don't think it would actually provide any meaningful guarantees.

That pretty much has to be the plan - it's been a while, but if you read back through #841 (and related linked discussions: https://github.com/theupdateframework/tuf/issues/339), we hashed out how the coordination would have to work to avoid DOSing and/or bad things in the case of a single key compromise. In the case where you have only 1 staged copy at a time, the compromise of a single delegation key allows someone to continually clobber the staged metadata. (e.g. if you, cyli and I all sign for some metadata, I can continue to push garbage "staged" metadata and the two of you would have to race me to actually hit a 2/3 threshold!)

As for utility, I was thinking having a separate table + fk constraint would be useful for garbage collecting? You can construct a list of candidate namespaces by querying the namespace table. I'm not 100% sold on the utility so I can drop it for now, we can always add it in the future if needed.

I'll update to address all of this.

endophage · 2017-03-02T21:19:17Z

@ecordell sorry, maybe I wasn't clear. My issue isn't with the process, only the implementation (the utility as you put it). It doesn't make sense to have the extra table unless the foreign key constraint is enforceable, but for it to be enforceable in a meaningful way, it also needs to be a static set of possibilities. If we're just creating namespaces dynamically, there's no point in having it, it's just not giving us anything.

ecordell · 2017-03-03T12:31:47Z

@endophage updated so that there's simply a field with no fk, and no extra table.

endophage · 2017-03-04T06:01:15Z

Other than still really disliking the name of the new Namespace type, I think I'm good to go on this. @cyli @riyazdf @HuKeping any thoughts on the new type name? My concerns are that I don't think it really represents what the new field is about (i.e. the publish state, and somewhat which signer proposed the pending version), and also that "Namespace" gets thrown around so much I wouldn't want anybody to confuse it with some other use in a different docker or associated project. Like I think containerd is officially adopting out "canonical" GUNs and calling them "Namespaces".

ecordell · 2017-03-07T20:08:19Z

working on a new design that should address both the naming concerns and also make this more flexible (realized that this one won't easily support some of the queries we'd want to do, e.g. "if there's staged metadata, give me the version with the most signatures")

ecordell · 2017-03-14T15:51:10Z

Okay so this took a bit longer than planned thanks to some fun gorm issues, but here's the revised design:

Instead of namespaces, I've added Channels which are the "alternate realities" of a TUF repository.
There are two default Channels: Published and Staged
Everything is transitioned to Published currently, and Staged is unused for now.
Channels are many-to-many with TUFFiles, so when we add threshold signing, each signer will get their own channel.

This lets us do nice queries like "staged metadata for GUN/Role with the most signatures" or "staged metadata signed by X but not yet signed by Y or Z" which we will need when presenting these states, eventually, to the user. You'll notice that compared to the namespace design, the APIs for channels are much nicer (IMO).

I checked all of the queries being generated in SQL and they look good. The joins are minimal and use indexes. I've also updated gorm to v1.0 and made the necessary changes. The older version that we were using before was generating bad SQL when using a join table.

What is the status of RethinkDB? I was going to add support for that but noticed the changefeed isn't implemented there either. Is it deprecated?

ecordell · 2017-03-14T16:17:18Z

cmd/notary-server/main_test.go

@@ -407,7 +407,7 @@ func TestGetGUNPRefixes(t *testing.T) {
 // For sanity, make sure we can always parse the sample config
 func TestSampleConfig(t *testing.T) {
 	var registerCalled = 0
-	_, _, err := parseServerConfig("../../fixtures/server-config.json", fakeRegisterer(&registerCalled), false)
+	_, _, err := parseServerConfig("../../fixtures/server-config.sqlite.json", fakeRegisterer(&registerCalled), false)


gorm 1.0 errors if it can't establish a connection immediately, so switched to sqlite for testing config parsing.

endophage

I only consider the check in isValidTargetsStructure to be a necessary change. The rest are questions/future minor polish but I don't consider them blockers.

endophage · 2017-06-05T18:36:54Z

server/storage/interface.go


 	// GetChecksum returns the given TUF role file and creation date for the
 	// GUN with the provided checksum. If the given (gun, role, checksum) are
 	// not found, it returns storage.ErrNotFound
-	GetChecksum(gun data.GUN, tufRole data.RoleName, checksum string) (created *time.Time, data []byte, err error)
+	GetChecksum(gun data.GUN, tufRole data.RoleName, checksum string, channels ...*Channel) (created *time.Time, data []byte, err error)


Checksum is sufficiently specific I wonder if it's necessary to also pass a channel?

The same checksum can live in multiple channels when staging metadata

Ah of course!

endophage · 2017-06-05T18:37:43Z

server/storage/interface.go

@@ -33,17 +33,17 @@ type MetaStore interface {
 	// GetCurrent returns the modification date and data part of the metadata for
 	// the latest version of the given GUN and role.  If there is no data for
 	// the given GUN and role, an error is returned.
-	GetCurrent(gun data.GUN, tufRole data.RoleName) (created *time.Time, data []byte, err error)
+	GetCurrent(gun data.GUN, tufRole data.RoleName, channels ...*Channel) (created *time.Time, data []byte, err error)


Should GetCurrent implicitly mean published and maybe there should be a new method GetLatest that takes a channel?

The current behavior is that GetCurrent(gun, role) uses the published channel, GetCurrent(gun, role, channel) uses the specified channel. I'm happy to add another wrapper method but that seems not dissimilar from your suggestion.

Ahhh, yeah, this is fine. I had a moment of forgetfulness that a ... can also be left empty by the caller.

It seems like in all the implementations, if multiple channels are passed, only the data in the first channel the data is found in is passed (and the channels are checked in order of preference). Should this behavior be documented as a requirement in the interface, and a general test be added (maybe in server/storage/storage_test.go) to make sure that even if there is a TUF file with the same GUN and role is in multiple channels, this function will only return the data from the first?

Similarly for GetVersion. I guess for GetChecksum it doesn't really matter, since if it's the same checksum, it doesn't matter which channel it's in because all the data will be the same, although the creation time may be different. If the creation time matters, then perhaps a similar test can be added for GetChecksum?

endophage · 2017-06-05T23:43:25Z

server/storage/memory.go

+}
+
+func checksumKey(gun data.GUN) string {
+	return fmt.Sprintf("%s", gun)


If we really need this function at all it should just be return gun.String()

endophage · 2017-06-05T23:49:04Z

server/storage/sql_models.go

+}
+
+// Published is the channel all fully signed, validated metadata lives in
+var Published = Channel{


use a var block

var ( Published = ... Staged = ... )

endophage · 2017-06-05T23:54:13Z

tuf/data/targets.go

@@ -43,7 +42,7 @@ func isValidTargetsStructure(t Targets, roleName RoleName) error {
 	}

 	for _, roleObj := range t.Delegations.Roles {
-		if !IsDelegation(roleObj.Name) || path.Dir(roleObj.Name.String()) != roleName.String() {
+		if !IsDelegation(roleObj.Name) {


the longer check should stay in. It's checking that all the delegations are direct children of the parent. i.e. if the targets.json file contains a "targets/foo/bar" delegation, that would be invalid.

I was trying to get away from the strict hierarchy, looking forward to multi-role delegation. The DFS for delegations just needs to check for a cycle and otherwise doesn't really need to care about the names. Is there a reason for enforcing the hierarchy that I'm missing?

endophage · 2017-06-06T15:04:05Z

Strict hierarchy is required by the spec I believe. They simplified it for the 1.0 release such that the delegation entry now only contains its own segment and not the whole path. To implement that though (and I really want to see that change get in) I'd rather we approach it specifically rather than piecemeal get there.

…

On Jun 6, 2017, 04:25 -0700, Evan Cordell ***@***.***>, wrote: @ecordell commented on this pull request. In tuf/data/targets.go: > @@ -43,7 +42,7 @@ func isValidTargetsStructure(t Targets, roleName RoleName) error { } for _, roleObj := range t.Delegations.Roles { - if !IsDelegation(roleObj.Name) || path.Dir(roleObj.Name.String()) != roleName.String() { + if !IsDelegation(roleObj.Name) { I was trying to get away from the strict hierarchy, looking forward to multi-role delegation. The DFS for delegations just needs to check for a cycle and otherwise doesn't really need to care about the names. Is there a reason for enforcing the hierarchy that I'm missing? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Signed-off-by: Evan Cordell <cordell.evan@gmail.com>

endophage

LGTM!

ecordell · 2017-06-10T13:30:00Z

@cyli @riyazdf @HuKeping anyone else have some time to review? I will rebase on monday!

cyli · 2017-06-15T18:52:36Z

@ecordell Looking now, although apologies will take me a little bit to re-familiarize myself :)

cyli

Apologies, I haven't finished reviewing everything yet - I got sidetracked. I just wanted to include a couple of comments first - and I'll do another pass tomorrow.

I had a non-blocking question about garbage collection, should we choose to implement it - how do we want to implement that? It can totally be done in a different PR in the future, but I just wanted to make sure we considered it so that we didn't make a change that we'd have to undo in order to support it.

Would it be like "find every tuf_file ID that is not matched to the Published channel, and for each one, delete every other entry in tuf_file with the same gun, role, and version but different ID"?

cyli · 2017-06-16T06:52:20Z

migrations/server/postgresql/0003_channels.up.sql

@@ -0,0 +1,25 @@
+ALTER TABLE "tuf_files" DROP CONSTRAINT "tuf_files_gun_role_version_key";


Dumb SQL question: Is this constraint automatically created in 0001_initial.up.sql when we specify UNIQUE ("gun", "role", "version")? I didn't see this particular name anywhere in our repo, so was just wondering if this was something postgres specific.

Yep, exactly. It's created by UNIQUE in postgres.

Cool thanks for explaining!

cyli · 2017-06-16T07:17:18Z

server/storage/interface.go

@@ -33,17 +33,17 @@ type MetaStore interface {
 	// GetCurrent returns the modification date and data part of the metadata for
 	// the latest version of the given GUN and role.  If there is no data for
 	// the given GUN and role, an error is returned.
-	GetCurrent(gun data.GUN, tufRole data.RoleName) (created *time.Time, data []byte, err error)
+	GetCurrent(gun data.GUN, tufRole data.RoleName, channels ...*Channel) (created *time.Time, data []byte, err error)


It seems like in all the implementations, if multiple channels are passed, only the data in the first channel the data is found in is passed (and the channels are checked in order of preference). Should this behavior be documented as a requirement in the interface, and a general test be added (maybe in server/storage/storage_test.go) to make sure that even if there is a TUF file with the same GUN and role is in multiple channels, this function will only return the data from the first?

Similarly for GetVersion. I guess for GetChecksum it doesn't really matter, since if it's the same checksum, it doesn't matter which channel it's in because all the data will be the same, although the creation time may be different. If the creation time matters, then perhaps a similar test can be added for GetChecksum?

cyli · 2017-06-16T07:36:08Z

server/storage/types.go

+	Role     data.RoleName
+	Version  int
+	Data     []byte
+	Channels []*Channel


It looks like, from the memory implementation (I am not very familiar with advanced SQL syntax like joins and grouping, so I hadn't gotten to looking at the SQL implementation yet :)) that if multiple channels are passed, the same data will be added to all the channels.

Out of curiosity, what is the use case for staging the same data to multiple channels at once? In the case of published and staging, you probably don't need to write to both. In the case where each channel is a particular user or key ID, it also seems strange to write to multiple key IDs. Is the case for writing to both staging and a particular key ID? (I was a little fuzzy on how that would work for threshold signing).

If that's the case, and adding the same update should apply to every channel in the channel list (as opposed to the Get* functions, for which if multiple channels are passed only the data from the first channel that gun and role are in will be returned), should this be documented in the storage interface's UpdateCurrent function declaration? And in server/storage/storage_test.go's testUpdateCurrentInChannel, can we update multiple channels at the same time and ensure that getting for both channels returns the same data?

Is the case for writing to both staging and a particular key ID? (I was a little fuzzy on how that would work for threshold signing).

Yeah, that was the plan. You upload metadata and it gets put into channels based on the keys it's signed with, then users can coordinate about which ones to promote by picking the versions they want to sign. Then you can get "all staged" by just querying the staging channel, or a particular staged version by querying for (staged, key_id).

(as opposed to the Get* functions, for which if multiple channels are passed only the data from the first channel that gun and role are in will be returned),

I think you're just seeing the deficiencies of the in-memory implementation. For testing purposes I didn't think it was necessary to try to replicate everything that sql gives you, but maybe it is worth the effort?

The SQL implementation gives you what you would expect from the interfaces: when you Get from a set of channels, the file must exist in all of them, and if you Update to a set of channels, the metadata will be placed in all of them.

(The issue with the in-memory implementation is the answer to your other comment as well, for some reason GH won't let me reply to it)

for some reason GH won't let me reply to it

I think that's because I started a review, but in my "review" I replied to someone else's comment, and github posts my response both as part of my review and as part of the original comment. :| You might be able to respond in the original comment.

The SQL implementation gives you what you would expect from the interfaces: when you Get from a set of channels, the file must exist in all of them, and if you Update to a set of channels, the metadata will be placed in all of them.

Ah ok, thanks for explaining. Apologies, I had misunderstood what the intended behavior of Get* in the SQL implementation due to the fact that I missed the PRIMARY KEY (tuf_file_id, channel_id) in the channel table creation SQL, and because I just saw Where("channel_id IN (?)", channelIDs) in TufFilesInChannels without reading further.

think you're just seeing the deficiencies of the in-memory implementation. For testing purposes I didn't think it was necessary to try to replicate everything that sql gives you, but maybe it is worth the effort?

I think the intention of the in-memory implementation is to provide basically a verified fake for others to use, and as a reference implementation for how everything else works. I'm ok with leaving it for now and we can address that in a future PR, so long as we document the deficiency.

I do think we should add more test cases for getting data from multiple channels, though, to be shared across the implementations of the storage interface (e.g. server/storage/storage_test.go, although in the in-memory implementation tests can just not call that particular function), and document this expected behavior in the interface, just because it can be easier to see what the intended behavior is if someone (like me :D) isn't super familiar with SQL joins. I'd also be afraid of making a change in that SQL implementation code and breaking it!

ecordell · 2017-06-16T13:46:46Z

Would it be like "find every tuf_file ID that is not matched to the Published channel, and for each one, delete every other entry in tuf_file with the same gun, role, and version but different ID"?

That's what I was thinking, probably with a configurable age for deleting them

cyli · 2017-06-16T18:52:57Z

server/storage/rethinkdb.go

@@ -115,7 +115,7 @@ func (rdb RethinkDB) UpdateCurrent(gun data.GUN, update MetaUpdate) error {

 // UpdateCurrentWithTSChecksum adds new metadata version for the given GUN with an associated
 // checksum for the timestamp it belongs to, to afford us transaction-like functionality
-func (rdb RethinkDB) UpdateCurrentWithTSChecksum(gun, tsChecksum string, update MetaUpdate) error {
+func (rdb RethinkDB) UpdateCurrentWithTSChecksum(gun data.GUN, tsChecksum string, update MetaUpdate) error {


Thanks for catching this! 👍

cyli · 2017-06-16T20:51:09Z

server/storage/sqldb.go

+	// Find tuf files matching all passed channels
+	return func(db *gorm.DB) *gorm.DB {
+		return db.Joins("INNER JOIN channels_tuf_files ON tuf_files.id = channels_tuf_files.tuf_file_id").
+			Where("channel_id IN (?)", channelIDs).


I am likely just misunderstanding how this works due to lack of familiarity, but would this only enforce that the rows were in the right number of channels, as opposed to the channels specified? For instance, if there were 3 channels (Alice, Bob, and Carol), and I wanted to get the latest metadata for GUN and Timestamp that was in both Alice and Carol's channels, but the database had:

id=5, GUN, Timestamp, channel=Alice (via the channel_tuf_files table)

id=5, GUN, Timestamp, channel=Bob (via the channel_tuf_files table)

Would the scope consider that good because the id=5 is in one of (Alice, Carol), and it it is also in 2 distinct channels (Alice, Bob)?

Sorry, gorm makes this harder to read than it should be.

The scope says:

Ignore any roles that aren't in the set of requested channels

Count how many distinct channels a tuf file is in

Because (tuf_file_id, channel_id) is unique, if number of unique channels == number of requested channels, the tuf file is in all requested channels.

For your specific example it would be something like this:

Query by GUN, TS in Alice and Carol channels

tuf_files is joined with channel_tuf_files

All rows in the joined table with channel_id != Alice or Carol are discarded

this is the part that specifically answers your question - there is no longer a (5, GUN, Timestamp, Bob) row in the result set

The result set only has (5, Carol)

COUNT of that set is 1

2 channels were requested, so the result set returns empty

Ah ok, thanks for explaining. I didn't realize it ran the query for every channel in the Where() and then joined.

cyli · 2017-06-16T21:30:58Z

server/storage/sqldb.go

+		}
+
+		// The Channel is added separately from the TUFFile record to prevent gorm from issuing an Update
+		if err = tx.Model(&tufFile).Association("Channels").Append(update.Channels).Error; err != nil {


Apologies, the Append implementation in the Gorm code is kinda opaque - I think this will not error if a relationship already exists, since I think it eventually does an INSERT ... WHERE NOT EXISTS ..., but would it be possible to add a test case for if the metadata already exists in one of the channels specified, but not in the others?

Just to make sure this behavior is consistent should we ever chose to switch ORMs, or upgrade, etc.

Good point, I will add a test for it.

Getting gorm to generate the right SQL for this pr took me the better part of a day, I was almost ready to rip it out and replace it...but of course that would've made this much harder to review.

cyli · 2017-06-16T21:35:01Z

server/storage/sqldb.go

@@ -127,8 +164,9 @@ func (db *SQLStorage) UpdateMany(gun data.GUN, updates []MetaUpdate) error {
 			// called, version ordering in the updates list must be enforced


Totally non-blocking observation, unrelated to this PR - can we just sort the updates and then call UpdateCurrent? cc @endophage

cyli · 2017-06-16T21:36:45Z

server/storage/sqldb.go

-		},
-	).First(&row)
+		}).
+		Order("version desc").Limit(1).First(&row)


Non-blocking: I don't think it hurts anything, but is ordering necessary here? If the version is different, won't the checksum be different?

I agree, I see no reason this is needed and can't remember why I changed it. It's potentially just a copy/paste error.

cyli · 2017-06-16T21:48:13Z

server/storage/sqldb.go

@@ -241,6 +289,7 @@ func (db *SQLStorage) Delete(gun data.GUN) error {
 		}
 		// if there weren't actually any records for the GUN, don't write
 		// a deletion change record.
+		// don't write a change if this isn't published yet


Was this supposed to correspond to some channel logic? (e.g. don't delete if the data isn't in the published channel?) If we're deleting everything in a GUN, should even staged metadata be deleted (since it probably won't be published in the future?)

heh, yes this is a straggling comment - originally I scoped the delete but realized, as you point out, that when we delete a GUN we want to delete all staged copies as well

cyli · 2017-06-16T22:04:28Z

server/storage/sqldb.go

-	exists := db.Where("gun = ? and role = ? and version >= ?",
-		gun.String(), update.Role.String(), update.Version).First(&TUFFile{})
+
+	exists := db.Scopes(TufFilesInChannels(update.Channels...)).


Non-blocking observations - I don't think any of these things are problems because if we have garbage collection, but I just wanted to note them in case I missed something.

If we're updating the non-published channels, it's possible to add very much older staged versions of metadata (if there was no staged metadata for that version or higher). Any attempt to publish this old staged metadata will fail anyway, so not an issue.

If someone publishes a really high version metadata in a non-published channel, that will only prevent users from staging thresholded metadata for a limited amount of time, because since it does not necessarily correspond to something in published, it will be garbage collected eventually.

I think these are very good points, and we should add additional checks for those cases when actually implementing the threshold support.

cyli · 2017-06-16T22:19:14Z

migrations/server/mysql/0006_channels.up.sql

+CREATE TABLE `channels_tuf_files` (
+  `channel_id` INT(11) NOT NULL,
+  `tuf_file_id` INT(11) NOT NULL,
+  FOREIGN KEY (channel_id) REFERENCES channels(`id`) ON DELETE CASCADE,


Out of curiosity, does gorm automatically add this to both the mysql and postgres table generation? (didn't see something in the model that corresponded to this directive)

I don't think so, I didn't use gorm to generate the table or migration

cyli · 2017-06-16T22:19:26Z

Sorry for taking so long to review this @ecordell! This looks pretty good to me - my only 2 concerns are:

Add a namespace field to tuf metadata storage #1110 (comment)
Could we add the following test cases?
- UpdateCurrent in storage_test.go (general behavior test):
  1. Adding to multiple channels at once, if the metadata doesn't exist in those channels, succeeds and Get* succeeds for the metadata we just updated if we specify all of those channels or one of those channels
  2. Adding to multiple channels at once, if the metadata exists in only some of those channels, succeeds and Get* succeeds for the metadata we just updated if we specify all of those new channels + all of the old channels, or just a subset of the channels
  3. Adding to one or multiple channels at once fails if the metadata already exists in all the channels specified
- Get* in storage_test.go (general behavior test):
  1. Specifying Staging when we get will fail if the metadata is only in Published
  2. Test for Add a namespace field to tuf metadata storage #1110 (comment) - if there are 3 channels, and the metadata is in the first two, and we want to get metadata that is in the last two channels, getting fails.

I also had a question for the future thresholding implementations: Will there be there an API to create more channels? Deleting channels? If channels correspond to key IDs, will they be auto-reaped if the key is removed?

ecordell · 2017-06-19T13:56:13Z

I also had a question for the future thresholding implementations: Will there be there an API to create more channels? Deleting channels? If channels correspond to key IDs, will they be auto-reaped if the key is removed?

That's open for discussion, but IMO if possible we should keep the channels as an implementation detail and hide them from the user. Exposing them might require more careful consideration and possibly a TAP?

cyli · 2017-06-19T20:19:59Z

That's open for discussion, but IMO if possible we should keep the channels as an implementation detail and hide them from the user. Exposing them might require more careful consideration and possibly a TAP?

Makes sense - seems like hiding from the user makes the CLI/API simpler, too.

GordonTheTurtle added the dco/no label Mar 1, 2017

ecordell force-pushed the namespaced-metadata branch from 5c25351 to a1e369a Compare March 1, 2017 16:38

GordonTheTurtle removed the dco/no label Mar 1, 2017

ecordell force-pushed the namespaced-metadata branch 2 times, most recently from 3e7c864 to 5615777 Compare March 1, 2017 16:53

endophage reviewed Mar 1, 2017

View reviewed changes

ecordell force-pushed the namespaced-metadata branch 2 times, most recently from 823d196 to 011654d Compare March 2, 2017 13:11

GordonTheTurtle added the dco/no label Mar 3, 2017

ecordell force-pushed the namespaced-metadata branch from 7a62e82 to 7565b99 Compare March 3, 2017 12:30

GordonTheTurtle removed the dco/no label Mar 3, 2017

ecordell force-pushed the namespaced-metadata branch 2 times, most recently from 7db65dc to a95d5fa Compare March 3, 2017 14:22

GordonTheTurtle added the dco/no label Mar 14, 2017

ecordell force-pushed the namespaced-metadata branch from 042eb07 to 7ae5107 Compare March 14, 2017 15:43

GordonTheTurtle removed the dco/no label Mar 14, 2017

ecordell commented Mar 14, 2017

View reviewed changes

ecordell force-pushed the namespaced-metadata branch 2 times, most recently from 7ce2685 to 9527b8c Compare March 14, 2017 17:55

GordonTheTurtle added the dco/no label Mar 15, 2017

ecordell force-pushed the namespaced-metadata branch from e36c6b7 to 184c434 Compare March 15, 2017 15:04

endophage suggested changes Jun 5, 2017

View reviewed changes

ecordell mentioned this pull request Jun 7, 2017

Delegations shouldn't require the entire path name #1180

Open

ecordell added 6 commits June 7, 2017 09:45

Add a namespace field to tuf metadata storage

24f043b

Signed-off-by: Evan Cordell <cordell.evan@gmail.com>

Remove namespaces table and more descriptive naming

16db507

Signed-off-by: Evan Cordell <cordell.evan@gmail.com>

Change Namespaces to Channels, many-to-many with TUFFiles

615a661

Signed-off-by: Evan Cordell <cordell.evan@gmail.com>

MetaUpdate has an array of Channels now, plus some minor refactoring

7aa289d

Signed-off-by: Evan Cordell <cordell.evan@gmail.com>

rename namespaces -> channels migration

1acc425

Signed-off-by: Evan Cordell <cordell.evan@gmail.com>

don't restrict delegation names

3da63b1

Signed-off-by: Evan Cordell <cordell.evan@gmail.com>

ecordell force-pushed the namespaced-metadata branch from ea277f8 to fe55a08 Compare June 7, 2017 13:45

GordonTheTurtle added the dco/no label Jun 7, 2017

ecordell force-pushed the namespaced-metadata branch from fe55a08 to e407047 Compare June 7, 2017 13:51

Storage cleanup

e407047

Signed-off-by: Evan Cordell <cordell.evan@gmail.com>

GordonTheTurtle removed the dco/no label Jun 7, 2017

endophage approved these changes Jun 7, 2017

View reviewed changes

cyli reviewed Jun 16, 2017

View reviewed changes

ecordell closed this Nov 17, 2023

		@@ -0,0 +1,25 @@
		ALTER TABLE "tuf_files" DROP CONSTRAINT "tuf_files_gun_role_version_key";

		@@ -127,8 +164,9 @@ func (db *SQLStorage) UpdateMany(gun data.GUN, updates []MetaUpdate) error {
		// called, version ordering in the updates list must be enforced

Add a namespace field to tuf metadata storage #1110

Add a namespace field to tuf metadata storage #1110

Conversation

ecordell commented Mar 1, 2017 • edited Loading

Motivation

Design decisions

endophage Mar 1, 2017 • edited Loading

Choose a reason for hiding this comment

endophage commented Mar 1, 2017

ecordell commented Mar 1, 2017 • edited Loading

endophage commented Mar 2, 2017

ecordell commented Mar 2, 2017 • edited Loading

endophage commented Mar 2, 2017

ecordell commented Mar 3, 2017

endophage commented Mar 4, 2017 • edited Loading

ecordell commented Mar 7, 2017

ecordell commented Mar 14, 2017 • edited Loading

Choose a reason for hiding this comment

endophage left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ecordell Jun 7, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

endophage commented Jun 6, 2017 via email

endophage left a comment

Choose a reason for hiding this comment

ecordell commented Jun 10, 2017

cyli commented Jun 15, 2017

cyli left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ecordell commented Jun 16, 2017

Choose a reason for hiding this comment

cyli Jun 16, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cyli Jun 16, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cyli commented Jun 16, 2017

ecordell commented Jun 19, 2017

cyli commented Jun 19, 2017

ecordell commented Mar 1, 2017 •

edited

Loading

endophage Mar 1, 2017 •

edited

Loading

ecordell commented Mar 1, 2017 •

edited

Loading

ecordell commented Mar 2, 2017 •

edited

Loading

endophage commented Mar 4, 2017 •

edited

Loading

ecordell commented Mar 14, 2017 •

edited

Loading

ecordell Jun 7, 2017 •

edited

Loading

cyli left a comment •

edited

Loading

cyli Jun 16, 2017 •

edited

Loading

cyli Jun 16, 2017 •

edited

Loading