Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Locksmith: Distributed Key Generation frame #27

Closed
wants to merge 4 commits into from

Conversation

Shadowfiend
Copy link
Contributor

Commits need cleanup, files need cleanup, interfaces need cleanup, but…………

main.go here fires up a 7-member group with threshold 4, initializes all of them,
runs a “distributed” key generation in a “broadcast channel”, and then does a threshold
signature with a random subset of 4 members and verifies it.

There's a commented walkthrough of how this works in a very simple case with
three players, some questions in a few places, and references to the
GJKR paper.

Feast yer eyes.

@mhluongo
Copy link
Member

mhluongo commented Feb 2, 2018

Rebase?

@Shadowfiend
Copy link
Contributor Author

This branch probably won't end up being the final say on this matter, which is why I didn't do too much commit cleanup.

// member.
accusedIDs map[bls.ID]bool
disqualifiedPlayers map[bls.ID]bool // all players disqualified during justification
qualifiedPlayers []bls.ID // a list of ids of all qualified players
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably needs to be more than one struct, btw. I kind of threw everything in here for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that the bls.ID is a 4 or 6 * 64bit binary value - is it a public key?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My interest in the size/shape of the bls.ID is that it has to be passed to the solidarity contracts. This means that we have to have an understanding of the underlying type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Believe the default build of libmcl that we use is 4 64-bit uints, so it's packable into 1 256-bit uint.

memberIds = append(memberIds, &member.ID)
}
return memberIds
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, this is not a thing anymore…

@mhluongo mhluongo self-requested a review February 2, 2018 20:48
// Want to be able to:
// - Use bls.ID for everything (e.g., C_ij, i and j are actually bls.ID).
/// - Alternatively, everyone needs to keep an ordered list of members,
// the output of our sortition.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually done now. Was worried about ordering, but ordering only matters during the signature verification step, and there it only matters that the right member ids are lined up with the right partial signatures.

// -> Aggregate signature from the group of the public key message?
// -> Each member generates the public key, and broadcasts their
// -> sig of the key.
// -> Key publishing publishes key + group aggregated signature of key.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where not having the chain mediate the keygen gets painful… I'm not sure we have a good way of proving that a submitted public key was the correct one, unless we require all members to submit their view of the group public key and check them all for equality. That raises the cost of the group formation, since everyone needs a chain call and each call will need to cross-check with the key that's already on chain. It also may require more blocks to finalize the group, since for a group size of 250 we'd need 250 public key submissions.

If we're okay with losing a group if any of its members got the wrong public key, we can simply keep the latest copy of the key in the registry, and have the registration call from each member increment a counter if the key matches what's already there. If it doesn't, we drop the group entry in the registry and emit an event that the group registration is invalid. If it does, we wait until the counter = the expected members of the group and then consider the group registered.

Other ideas welcome, as I don't like the everyone-submits plan…

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be confused but my understanding is that the validation with each key generation node takes place on the p2p network and could be done without hitting the blockchain. Let's talk about this on Monday.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chain needs to validate that a group's public key came from the nodes that a tx author says they did on group creation- otherwise I could submit a fake public key for any new group, and seize "control" of the group on chain, disrupting the relay.

@Shadowfiend easy fix- each participating client has to sign the resulting group public key with their staking pub key. It's bigger but there you go. Actually, if we go all-in on BLS the signatures can be aggregated as well, so it won't grow the size beyond 1 signature.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aggregating keys will also save a ton of gas on new group creation / key attestation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some relevant thoughts from today's session:

  • The contract can generate the members of the group via the same algorithm the members themselves used (gas cost is a concern).
  • Once you have group members, you can verify the transaction sender against your member list.

I'm not sure I follow the signing bit. Say we require each participant to sign the group's public key… And maybe we require t signatures for a successful submission? So basically I generate the pubkey after I have enough shares, then I sign it, then I broadcast the key + signature on the broadcast channel. I wait to receive enough signatures from everyone else before publishing to the chain an aggregate signature of t signatures of the group public key.

If that understanding is right:

  • We maybe need to implement additional BLS signature logic on-chain.
  • We've added some serious broadcast channel communication overhead, since now we need two rounds to complete (one for the initial signature shares, one for the signatures of the group public key).
  • We still need to deal with the fact that anyone with t signatures of the public key will be publishing to the chain. More to the point, that everyone with t signatures will be publishing.

Since we've got this multi-publishing problem, my suggestion was we leverage it as a solution instead. Transactions on chain are signed by the sender anyway, so if a member of the group m_0 submits a key to the chain, we know it is or isn't a member of that group. The contract can then stash the key on chain associated with the group. When member m_1 submits it, it gets cross-checked against the existing one. If it disagrees, we have a few approaches we can take including just immediately aborting group formation.

The basic idea is we store once, and ideally only once, but we have everyone submit their key and show that they got the same one. Once a key hits the threshold, any disagreeing submissions are punished.

Downside: if we get more than one key, we'll need to track all the variations until we've seen all the members' views.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow the signing bit. Say we require each participant to sign the group's public key… And maybe we require t signatures for a successful submission? So basically I generate the pubkey after I have enough shares, then I sign it, then I broadcast the key + signature on the broadcast channel. I wait to receive enough signatures from everyone else before publishing to the chain an aggregate signature of t signatures of the group public key.

If we don't do something like this, there's a key spoofing attack on generation where you come up with a group public key, say it represents another 199 people, submit it, and control the group. They get slashed if you don't sign.

When I mentioned that here, I meant something like each participant signs off on the output- either via eg BLS and checked with signature aggregation on-chain, or some other method using their staking public key. I've looked into this with James though, and verifying an aggregating signature on-chain will cost 1 pairing per signature- likely too expensive.

On the other hand, if we do it with Ethereum txs- where each provider agrees on-chain- that means 200+ txs. Also very expensive.

This sounds like a problem for state channels. Chatting with some folks about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting dispute resolution idea though... Frankly, many of these work- it's just about gas optimization right now 😞

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, definitely expensive, though it's worth noting the expense is spread around all the group members at least. That brings up an interesting question on cost optimizations… Is our goal to minimize cost for each user first, and for the system second, or is the goal to minimize for the system first, and each user second?

Say we want to use state channels, which presumably will require further learning/work/etc. Would it be viable/reasonable to implement a more expensive resolution mechanism for a first pass, then optimize? This also circles back to the upgradeability question, of course.

Copy link
Member

@mhluongo mhluongo Feb 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the cost of the system should be optimal vs individual providers, as it represents a tax on usage. Less clear when it comes to consumers. Basically, it's our job to convince all parties to do what's best for the system heh

Copy link
Contributor

@rargulati rargulati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reviewing this again. This is a prelim look.

/// - Alternatively, everyone needs to keep an ordered list of members,
// the output of our sortition.

threshold := 4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a const up top with some documentation as I imagine this is something that we'll want stapled and ideally not configured by the user unless they're doing malicious things TM or research?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need this to be something that the chain and the Go code agree on. And we'll also, of course, have to make sure that it doesn't explode in our face if they don't agree for some reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea- discover it on-chain

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's consensus-critical, after all, and we might need to change it with a contraact update in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dig… We'd basically just do a local contract call on startup to fetch critical constants? e.g., group size, threshold, .... possibly other things like BLS initialization settings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. That will be another piece that needs some major thought around upgradeability... Eg a new client might need to know what the last setting was to fully validate a relay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, on startup gets tricky as does a background loop.

On startup: Group A on startup fetches the threshold from the chain (ie. 4). Say some (t/2) - 1 restart and fetch the new threshold (ie. 10), but this exceeds the size of the group (or available threshold) - does the group dissolve without penalty because of the divergence? What if they are fulfilling a relay (I'd guess they fulfill the relay and then exit).

BG loop: probably better because we can buffer the new change until after events that take precedence (fulfill a relay entry > update threshold). Unless there's a race between a relay entry and upgrading a contract.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, since the relay entry is tied to the state of the contract at that block, it's fine. Members can reference information from the state of the contract at that point in time. Event precendece will still be important (always finish your relay entry).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think there'll need to be a switchover window defined when some core config change occurs. e.g. say we change the group size… All existing groups will have the old size, and we'll need to migrate them over. We could define that in such cases group creation accelerates (i.e., more than one group is allowed to be created for a given relay entry) for a certain number of entries.

var key *bls.PublicKey
for _, member := range membersByID {
if key == nil {
key = &member.GroupPublicKey
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a fan of new(member.GroupPublicKey), but if you think stylistically this is better, let's do that. I like your consistent use of &type and then type{} though I'm used to new(type) and make(type).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m down either way. If there’s a preferred approach in the core golang codebase I think it’s worth doing that. Ironically to my eye new is better than & but {} is better than make :P

if key == nil {
key = &member.GroupPublicKey
} else if !key.IsEqual(&member.GroupPublicKey) {
panic(fmt.Sprintf("Public key for %v is bad.", key))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we tell the other GroupMember's that we ran across a bad key - should we broadcast an accusation of the member before panic'ing? Or is the idea that everyone will blow up here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See line 383 above and my remarks about public key submission in a prior comment. Definitely some questions remain around that in my mind.

qualifiedPlayers []bls.ID // a list of ids of all qualified players
}

func idsFromGroupMembers(members []GroupMember) []*bls.ID {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above. Spoiler alert: no :)

Copy link
Member

@mhluongo mhluongo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress. Personally I vote we all leave thoughts, tweak it a bit, and get it merged- until we have a running prototype keeping all these PRs open is a drag.

An update of the glossary with DKG stuff would also be a great addition to this PR!

return memberIds
}

// Note: Each player generates as a set of shares. Each player's secret key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shadowfiend you're my favorite, never change ❤️

Seriously, prose for crypto is really helpful

return share
}

broadcast := func(senderID bls.ID, msg interface{}) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your thoughts on type switches have evolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean I just wrote an inline local anonymous function to pretend to broadcast… So pretty much if I had any misgivings about type switches, they're eclipsed by what I'm doing here anyway 😆

// -> Aggregate signature from the group of the public key message?
// -> Each member generates the public key, and broadcasts their
// -> sig of the key.
// -> Key publishing publishes key + group aggregated signature of key.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chain needs to validate that a group's public key came from the nodes that a tx author says they did on group creation- otherwise I could submit a fake public key for any new group, and seize "control" of the group on chain, disrupting the relay.

@Shadowfiend easy fix- each participating client has to sign the resulting group public key with their staking pub key. It's bigger but there you go. Actually, if we go all-in on BLS the signatures can be aggregated as well, so it won't grow the size beyond 1 signature.

// -> Aggregate signature from the group of the public key message?
// -> Each member generates the public key, and broadcasts their
// -> sig of the key.
// -> Key publishing publishes key + group aggregated signature of key.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aggregating keys will also save a ton of gas on new group creation / key attestation.

@Shadowfiend
Copy link
Contributor Author

This branch isn't really meant to be merged. I'll start extracting some useful stuff (e.g., the structures for commitment, share, accusation, and justification messages) into a PR that is.

@Shadowfiend
Copy link
Contributor Author

Ok, my current next steps here for a new PR:

  • Split out structs into one or more separate files.
  • Update the glossary a bit.
  • Start breaking this down into something that looks more like something that would take place over the network.

Of note, while discussing distributed keygen, I hit on the idea of defining each phase of the DKG as a certain number of blocks. We would set a max number of blocks for group formation, and then divide that by the number of phases (possibly divvied up unevenly based on communication cost per phase), such that the sum of the phase timeouts ~= the group formation timeout. If a node reached a phase timeout without having seen all of the information it needs to see (e.g., it doesn't see commitments from all other nodes), it could broadcast an abort, either in the channel or on chain. This breaks down somewhat in the accusation phase, because you don't know if you've seen all possible accusations. We could fix that by having a positive ack, but that adds significant comms overhead since you'd need n^2 acks/nacks. Haven't thought through that bit all the way.

Anyway, thoughts?

@Shadowfiend
Copy link
Contributor Author

Important thing I haven't figured out yet: what bls.ID actually represents in this whole shebang. It depends on the curve initialization, so it's clearly more than just some id that happens to be encoded in a way that works with the other BLS stuff. Interestingly the underlying C data structure for bls.SecretKey and bls.ID is the same...

@mhluongo
Copy link
Member

mhluongo commented Feb 8, 2018

Thought- let's use the chain less, if possible, not more. I think it's fine to expect that everyone should agree on the latest block, and use that for timeouts and phase definition, but using it as a comm mechanism is going to get very expensive- and it doesn't have a paired reward.

@Shadowfiend
Copy link
Contributor Author

Shadowfiend commented Feb 8, 2018

Right, the suggestion re: timeouts doesn't use the chain, and the added overhead of positive acks isn't on chain, but rather over the broadcast channel; however, it does add potential lag, meaning we'd require higher timeouts.

EDIT: worth noting, it's unclear what “higher timeouts” actually corresponds to in practice right now.

@mhluongo
Copy link
Member

mhluongo commented Feb 8, 2018

either in the channel or on chain.

I was responding to that :)

@mhluongo
Copy link
Member

mhluongo commented Feb 8, 2018

I'm going to do a deep-dive on this BLS / BN implementation today in pursuit of #31. I have a sneaking suspicious that https://github.com/dfinity/bn doesn't support the curve we need for pre-compiled verification on the EVM (ethereum/EIPs#197)

Hoping that dive makes bls.Id clear

@Shadowfiend
Copy link
Contributor Author

Ah, the in channel or on chain piece is about aborting the group creation… We'll need to analyze if we can safely announce that over the channel, in the sense that we can prevent group formation even in cases of network issues. I think that'll just require writing down all the phases and how we go through rounds and then carefully looking at where failures can happen.

@Shadowfiend Shadowfiend closed this Feb 9, 2018
@Shadowfiend
Copy link
Contributor Author

Closing this for now, in favor of some cleaner PRs that are incoming.

dimpar pushed a commit that referenced this pull request Feb 10, 2023
New Merkle Distribution for July 15th
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants