Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mitigate tob-5 by making a v2 of client_propose_tx endpoint #2670

Closed
wants to merge 11 commits into from

Conversation

cbeck88
Copy link
Contributor

@cbeck88 cbeck88 commented Oct 6, 2022

TOB-MCCT-5 is about a "token id oracle" described by TOB during review of confidential tokens design. The idea is that the consensus enclave fee map can be changed by the untrusted side, by restarting the node. If the attacker controls a node and is able to get the user to submit their Tx once with one fee map, and once with another, then the priority value leaks information about what the token id of the Tx is.

In principle they could do this without the user knowing -- they would restart the node and it would not be able to attest to the peers after changing the fee map, but it could still accept Tx proposals and see the WellFormedTxContext which conatins the priority number. Then they turn the node back to normal. The user's app retries when their transaction doesn't land, and the same Tx comes back to their node, now with the regular fee map.

This requires having root on a consensus node, but the threat model is that attackers with root on the consensus node should not be able to see the user's token id.

We view this as a time-of-check-vs-time-of-use (TOCTOU) bug, and we try to fix it a way that those bugs are sometimes fixed. The user, who checked the fee map, also makes an assertion about what the fee-map is when they try to use the system (by submitting a Tx). Before acting on the data (which the attacker has potentially tampered with), the enclave checks if the user's assertion is correct, and rejects the Tx without creating a WellFormedTxContext if the assertion is not correct. (This is sort of like dealing with races when manipulating atomic variables by using compare-and-swap.)

It should be straightforward to add this check, but unfortunately, the status quo client_propose_tx has a design flaw. It does not have a Request protobuf as the RPC argument, the argument it takes is directly an (encrypted) Tx. So, we cannot add any additional data to that protobuf in a non-breaking way without adding things to Tx. And that is untenable in this case because it adds complexity in sensitive areas of the code. We really just want to add this check next to the Tx.

To fix this, we introduce a ClientProposeTxRequest object, which contains a Tx, and the client's minimum fee map digest. We introduce client_propose_tx_v2 which encrypts this Request object instead of a naked Tx. This will come in handy if we want to add anything else like this in the future.

Clients will not be able to use client_propose_tx_v2 until after the V3 service is live, because existing consensus nodes don't have this endpoint. However, in the next release of clients after the v3 release is live, they could all start using client_propose_tx_v2 and we could deprecate / remove client_propose_tx.

I made the fog-sample-paykit exercise the client_propose_tx_v2 route, since the test client uses that and so we can see that it is working in CD without making mobilecoind or anything else use it yet.

I am not sure if this change requires an MCIP. It is a protocol / api change, but not a big one, and users are unlikely to notice. If it does require an MCIP we can write one pretty easily. Let me know what you think.

Also, lmk if you think there's a better approach than the route I took here.

@@ -246,6 +247,22 @@ pub struct FormBlockInputs {
pub mint_txs_with_config: Vec<(MintTx, MintConfigTx, MintConfig)>,
}

/// The schema expected to be encrypted for client_tx_propose_v2
#[derive(Clone, Message)]
pub struct ClientProposeTxRequest {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the request object that clients use to propose Tx's in the new client_propose_tx_v2.

/// metadata, or
/// * A serialization / cryptography error, or another mismatch error
/// e.g. minimum_fee_map_hash or (some-day) chain-id error
fn client_tx_propose_v2(&self, msg: EnclaveMessage<ClientSession>) -> Result<TxContext>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't just fold these changes into client_tx_propose because we need to make a breaking change to the schema in order to insert another layer where we can add things. Making a new v2 version means that the old API continues to work and support clients when we drop in the new server.

.blockchain_config
.get()
.ok_or(Error::NotInitialized)?.canonical_fee_map_digest()[..] {
return Err(Error::FeeMapDigestMismatch);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where we actually check if the hash matches

@cbeck88 cbeck88 requested a review from wjuan-mob October 6, 2022 22:30
Copy link
Contributor

@eranrund eranrund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, that looks like it's going in the right direction.
However, it appears there are zero tests for this. Would it be possible to test at least the enclave part?
We likely need to incorporate this into mobilecoind, integration tests, etc as well...

connection/test-utils/src/blockchain.rs Outdated Show resolved Hide resolved
connection/src/thick.rs Outdated Show resolved Hide resolved
consensus/enclave/api/src/lib.rs Outdated Show resolved Hide resolved
fog/sample-paykit/src/client.rs Outdated Show resolved Hide resolved
@cbeck88
Copy link
Contributor Author

cbeck88 commented Oct 7, 2022

The other thing I'm not sure about is, maybe I should try to categorize this as a transaction validation error rather than an enclave error. It may be easier for the client to properly handle it if that is done.

@cbeck88
Copy link
Contributor Author

cbeck88 commented Oct 7, 2022

We likely need to incorporate this into mobilecoind, integration tests, etc as well...

I think we can do this in a separate PR, that can land after branching for v3 perhaps?

If we try to make mobilecoind use this in the v3 release then the mobilecoind for that release can't be in customer hands before the v3 servers are live, which is not usually how we do releases. I guess maybe mobilecoind could switch between calling client_tx_propose and client_tx_propose_v2 based on block version? I didn't think about this before now.

Depending on how much other stuff comes up before we are scheduled to branch for v3 I am willing to try to do that.

@cbeck88
Copy link
Contributor Author

cbeck88 commented Oct 7, 2022

Here's what I suggest -- we should do a follow-on PR for mobilecoind that uses this, and see that it passes tests, to build confidence in this change.

But we should only merge that for release/v3.1 and not to release/v3

@jcape
Copy link
Contributor

jcape commented Oct 7, 2022

It's an eventually breaking API change for users, this will require an MCIP.

@jcape
Copy link
Contributor

jcape commented Oct 7, 2022

More to the point, what's wrong with just renaming Tx to TxRequest and adding the TOB-5 mitigations required?

@cbeck88
Copy link
Contributor Author

cbeck88 commented Oct 7, 2022

More to the point, what's wrong with just renaming Tx to TxRequest and adding the TOB-5 mitigations required?

This would be the same as adding the fee map hash to the Tx.

This would put additional complexity into the Tx and transaction-core, make the Tx hash dependent on this data, and increase the complexity of transaction validation. That's a much larger change that requires touching more sensitive code. Currently the fee-map is not a transaction-core concept.

It will also mean that all clients will be forced to plumb the fee map into the transaction builder in order to ship v3. For example, copper will now need to know the current fee map in order to build a Tx. The approach I took seems more realistic with current timelines.

I think if you want to do it that way, we will have to delay the release of v3, or punt on doing this mitigation.


// Cache the transaction. This performs the well-formedness checks.
let tx_hash = self.tx_manager.insert(tx_context).map_err(|err| {
if let TxManagerError::TransactionValidation(cause) = &err {
counters::TX_VALIDATION_ERROR_COUNTER.inc(&format!("{:?}", cause));
let result = ProposeTxResult::from(cause.clone());
response.set_result(result);
response.set_err_msg(cause.to_string());
Copy link
Contributor Author

@cbeck88 cbeck88 Oct 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these lines are removed because this is actually dead code.

this was kind of a head scratcher, but I think this is dead code introduced 2 years ago here: 894dd54

when we finally return this error, the response object will be discarded, so these changes to it have no effect.

@@ -139,6 +139,7 @@ impl From<ConsensusGrpcError> for Result<ProposeTxResponse, RpcStatus> {
match src {
ConsensusGrpcError::TransactionValidation(err) => {
let mut resp = ProposeTxResponse::new();
resp.set_err_msg(err.to_string());
Copy link
Contributor Author

@cbeck88 cbeck88 Oct 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a line that I failed to add to an earlier PR: #2501

I think because I got confused by the aforementioned dead code

@eranrund
Copy link
Contributor

eranrund commented Oct 7, 2022

We likely need to incorporate this into mobilecoind, integration tests, etc as well...

I think we can do this in a separate PR, that can land after branching for v3 perhaps?

If we try to make mobilecoind use this in the v3 release then the mobilecoind for that release can't be in customer hands before the v3 servers are live, which is not usually how we do releases. I guess maybe mobilecoind could switch between calling client_tx_propose and client_tx_propose_v2 based on block version? I didn't think about this before now.

Depending on how much other stuff comes up before we are scheduled to branch for v3 I am willing to try to do that.

Good point about the sequencing. But we do need to make sure this API works and for that we need clients that support it, so the block version switching logic might be required. We just need to make sure it really works, so maybe it should assert if block version is 0 since that should no longer happen (my concern is places where we might not be properly including the block version - like the issue we found in the full service validator earlier this week).

@cbeck88
Copy link
Contributor Author

cbeck88 commented Oct 7, 2022

We likely need to incorporate this into mobilecoind, integration tests, etc as well...

I think we can do this in a separate PR, that can land after branching for v3 perhaps?
If we try to make mobilecoind use this in the v3 release then the mobilecoind for that release can't be in customer hands before the v3 servers are live, which is not usually how we do releases. I guess maybe mobilecoind could switch between calling client_tx_propose and client_tx_propose_v2 based on block version? I didn't think about this before now.
Depending on how much other stuff comes up before we are scheduled to branch for v3 I am willing to try to do that.

Good point about the sequencing. But we do need to make sure this API works and for that we need clients that support it, so the block version switching logic might be required. We just need to make sure it really works, so maybe it should assert if block version is 0 since that should no longer happen (my concern is places where we might not be properly including the block version - like the issue we found in the full service validator earlier this week).

The thing is that we cannot do this block version switching in thick client because thick client has no idea what the block version is. Tx's are structures that may conform to one or more sets of block version rules, and the block version they target is not recorded.

What do you think about this approach: #2670 (comment)

So, no clients would be using this new pathway in the wild until 3.1 release, and no block version switching is required, the 3.1 client release would assume this endpoint exists.

I think that's less work overall, since we would not need to write all the block-version branching code. We can develop the PR to make those clients do that separate from this one, and when that PR is passing CD we can be sure this PR is working and merge it confidently for v3. But that follow-on PR would not actually merge until v3.1

@cbeck88
Copy link
Contributor Author

cbeck88 commented Oct 7, 2022

My plan right now is to add a unit test that exercises this functionality in consensus-enclave-impl, and then do any subsequent work in a follow on PR for mobilecoind.

@eranrund
Copy link
Contributor

eranrund commented Oct 7, 2022

What do you think about this approach: #2670 (comment)

I think that's an acceptable compromise but maybe would be better if we merge that to master and revert it on release/v3?

@cbeck88 cbeck88 added the old-v4.0.0 Blocker for v4 label Oct 9, 2022
@cbeck88 cbeck88 self-assigned this Oct 9, 2022
@cbeck88 cbeck88 added the consensus Related only to the consensus protocol or service label Oct 10, 2022
connection/src/thick.rs Show resolved Hide resolved
connection/src/thick.rs Show resolved Hide resolved
connection/src/traits.rs Show resolved Hide resolved
fog/sample-paykit/src/client.rs Outdated Show resolved Hide resolved
TOB-5 is about a "token id oracle" described by TOB during review
of confidential tokens design. The idea is that the consensus enclave
fee map can be changed by the untrusted side, by restarting the node.
If the attacker controls a node and is able to get the user to submit
their Tx once with one fee map, and once with another, then the
priority value leaks information about what the token id of the Tx is.

In principle they could do this without the user knowing -- they would
restart the node and it would not be able to attest to the peers after
changing the fee map, but it could still accept Tx proposals and see
the WellFormedTxContext which conatins the priority number.
Then they turn the node back to normal. The user's app retries when
their transaction doesn't land, and the same Tx comes back to their node,
now with the regular fee map.

This requires having root on a consensus node, but the threat model is
that attackers with root on the consensus node should not be able to
see the user's token id.

We view this as a time-of-check-vs-time-of-use (TOCTOU) bug, and we
try to fix it a way that those bugs are sometimes fixed. The user,
who checked the fee map, also makes an assertion about what the fee-map
is when they try to use the system (by submitting a Tx). Before acting
on the data (which the attacker has potentially tampered with), the
enclave checks if the user's assertion is correct, and rejects the Tx
without creating a WellFormedTxContext if the assertion is not correct.

This should be straightforward to add this check, but unfortunately,
the status quo `client_propose_tx` has a design flaw. It does not
have a `Request` protobuf as the RPC argument, the argument it
takes is directly an (encrypted) Tx. So, we cannot add any additional
data to that protobuf in a non-breaking way without adding things to
`Tx`. And that is untenable in this case because it adds complexity
in sensitive areas of the code. We really just want to add this check
next to the `Tx`.

To fix this, we introduce a `ClientProposeTxRequest` object, which
contains a `Tx`, and the client's minimum fee map digest. We introduce
`client_propose_tx_v2` which encrypts this `Request` object instead
of a naked `Tx`. This will come in handy if we want to add anything
else like this.

Clients will not be able to use `client_propose_tx_v2` until after the
V3 service is live, because existing consensus nodes don't have this
endpoint. However, in the next release of clients after the v3 release
is live, they could all start using `client_propose_tx_v2` and we could
deprecate / remove `client_propose_tx`.

I made the fog-sample-paykit exercise the `client_propose_tx_v2` route,
since the test client uses that and so we can see that it is working in
CD without making `mobilecoind` or anything else use it yet.
This makes it easier for clients to handle this error appropriately.

This also removes some dead code and bugs in the consensus grpc
error handling.
the fee map cache should actually be a block info cache, and
this should be integrated with the existing block info caching
stuff.

then, factor out some duplicated fee determination code in the
test client.
still WIP, the test is failing due to the enclave not having a
valid report yet.
/// Test that we can exercise client_tx_propose_v2 and that it passes and fails
/// as expected.
#[test_with_logger]
fn consensus_enclave_client_tx_propose_v2(logger: Logger) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to create a proper test of the consensus enclave client_propose_tx_v2 this way

@eranrund
Copy link
Contributor

eranrund commented Oct 18, 2022

I'm still reviewing this, but I think adding the fee map digest into the Tx struct (as an Option<Vec<u8>>) is a simpler change. I actually think that if TxPrefix was named Tx and Tx was named ProposeTxRequest (and contained a Tx and a signature), we wouldn't be having a discussion around this, it would just sit inside ProposeTxRequest alongside the actual Tx and its signature.

Sticking it in Tx is not perfect, but here are the pros and cons as I see them:

Pros:

  • Less changes to clients: clients can now start including the fee_map_digest when they are ready to do so, and if it is there consensus will validate it. That means we can go and immediately add it to all the clients we control, without worrying about what consensus version we are running (since this new field will get ignored by current consensus). With the v2 API, we can't update the clients until we have deployed a version of consensus that has the v2 API, or we need to add forks in the client code (if block version is X then use V1, otherwise use V2). This is the main reason to consider this approach imo.
  • By adding it to the Tx, we are going to need to pass it to TxBuilder, which means that clients building against the newer code will be forced to learn about this change. Otherwise, we can only hope they start using the V2 API (or learn about it the hard way when V1 is deprecated)
  • Less code changes overall (at least I think that would be the outcome): No new GRPC API call, no new enclave API
  • Fee map hash validation can become part of the Tx validation, and doesn't need a separate path of "in addition to validating the Tx data, here is some other stuff to validate separately but only validate once when we receive the Tx". Where this plays into account is not relevant right now, but imagine if we ever made fees dynamic, or stored the pending transaction queue. If we don't validate the fee map at the time we are considering the transaction for a block, it means that we have possibly validated it against a stale fee map. I don't know how much that matters (but in general I think this whole attack is pretty insignificant)

Cons:

  • Including the fee map digest in the Tx means the Tx gets bigger, and this structure gets propagated to all consensus nodes, stored in precious enclaves memory, etc.
  • Putting stuff inside the Tx structure is currently considered controversial because it's nice that it only contains a signature and signed data (see initial argument as to why I think these names are just poorly chosen)
  • Clients wont explicitly know that they are calling a version of ProposeTx that validates the fee map digest (unless they rely on figuring that out based on MRENCLAVE or a block version change that happened at the same time this got deployed), and are also not forced to include this, so if they are doing their own transaction building, they might never start including it. This can be mitigated by making it required at some point (which is equivalent to removing the deprecated V1 api)

I don't feel strongly about either path, but figured I will put my thoughts out there for others to consider in case someone thinks one path is significantly superior to the other. It's also quite possible that it is not a good use of time to debate this, and we should just move forward with what was already built here since it works...

Copy link
Contributor

@eranrund eranrund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting aside the discussion on whether this is the right approach or whether the fee_map_digest should be added to Tx, I think the code here is fine if we choose to go this route and not the other.

@@ -1980,9 +1980,9 @@ checksum = "b294d6fa9ee409a054354afc4352b0b9ef7ca222c69b8812cbea9e7d2bf3783f"

[[package]]
name = "libc"
version = "0.2.135"
version = "0.2.134"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some weird downgrades here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, will fix

cbeck88 added a commit that referenced this pull request Oct 30, 2022
comments from first part:

> these lines are removed because this is actually dead code.
>
> this was kind of a head scratcher, but I think this is dead code introduced 2 years ago here: 894dd54
>
> when we finally return this error, the response object will be discarded, so these changes to it have no effect.

comments from second part:

> this is a line that I failed to add to an earlier PR: #2501
>
> I think because I got confused by the aforementioned dead code
cbeck88 added a commit that referenced this pull request Oct 31, 2022
* add bugfix from #2670

comments from first part:

> these lines are removed because this is actually dead code.
>
> this was kind of a head scratcher, but I think this is dead code introduced 2 years ago here: 894dd54
>
> when we finally return this error, the response object will be discarded, so these changes to it have no effect.

comments from second part:

> this is a line that I failed to add to an earlier PR: #2501
>
> I think because I got confused by the aforementioned dead code

* add tests of consensus enclave propose_tx API

* linting

* Update consensus/enclave/tests/enclave_api_tests.rs
dolanbernard pushed a commit to dolanbernard/mobilecoin that referenced this pull request Nov 2, 2022
* add bugfix from mobilecoinfoundation#2670

comments from first part:

> these lines are removed because this is actually dead code.
>
> this was kind of a head scratcher, but I think this is dead code introduced 2 years ago here: 894dd54
>
> when we finally return this error, the response object will be discarded, so these changes to it have no effect.

comments from second part:

> this is a line that I failed to add to an earlier PR: mobilecoinfoundation#2501
>
> I think because I got confused by the aforementioned dead code

* add tests of consensus enclave propose_tx API

* linting

* Update consensus/enclave/tests/enclave_api_tests.rs
@cbeck88
Copy link
Contributor Author

cbeck88 commented Nov 15, 2022

This is now moot, Eran fixed this a different way in a series of PRs linked from: #2611

@cbeck88 cbeck88 closed this Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consensus Related only to the consensus protocol or service old-v4.0.0 Blocker for v4
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants