Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure limit on tx proposals #3296

Open
wants to merge 50 commits into
base: main
Choose a base branch
from

Conversation

NotGyro
Copy link
Contributor

@NotGyro NotGyro commented Mar 28, 2023

  • Detects when a client has submitted excessive invalid tx proposals within a configurable rolling window
  • Adds tx_failure_window and tx_failure_limit fields to the consensus service config Rust structure (for use with clap).
  • Adds tx_failure_window_seconds and tx_failure_limit, as well as previously-missed client_tracking_capacity to the ConsensusNodeConfig protobuf, and amends get_node_config to add those values to that configuration export.

Motivation

This will be used to track how many failed tx proposals have been sent recently, so that bad connections can be dropped, to prevent attacks and harmful behavior from buggy clients. See: #2977

Future Work

  • Test case to ensure client_close() actually prevents additional wasted work (i.e. breaks auth for the client such that they need to re-authenticate before any other work can be passed to the enclave).
  • Slam-testing to determine reasonable defaults for tx_failure_window and tx_failure_limit

@NotGyro
Copy link
Contributor Author

NotGyro commented Mar 28, 2023

This stack of pull requests is managed by Graphite. Learn more about stacking.

@NotGyro NotGyro force-pushed the milliec/03-07-Track_failed_tx_proposals_in_consensus branch from cf0b095 to ecd5836 Compare April 5, 2023 03:37
NotGyro and others added 5 commits April 6, 2023 21:03
Co-authored-by: Sam Dealy <33067698+samdealy@users.noreply.github.com>
Bumps [syn](https://github.com/dtolnay/syn) from 1.0.109 to 2.0.11.
- [Release notes](https://github.com/dtolnay/syn/releases)
- [Commits](dtolnay/syn@1.0.109...2.0.11)

---
updated-dependencies:
- dependency-name: syn
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@NotGyro NotGyro force-pushed the milliec/03-07-Track_failed_tx_proposals_in_consensus branch from c9ecddf to 13376da Compare April 8, 2023 02:01
Base automatically changed from milliec/03-07-Track_failed_tx_proposals_in_consensus to master April 8, 2023 02:45
@NotGyro NotGyro added the consensus Related only to the consensus protocol or service label Apr 10, 2023
@NotGyro NotGyro marked this pull request as ready for review April 10, 2023 21:49
@NotGyro NotGyro requested a review from jcape April 10, 2023 21:49
@NotGyro NotGyro self-assigned this Apr 10, 2023
dependabot bot and others added 9 commits April 10, 2023 18:43
* chore(deps): bump bitflags from 1.3.2 to 2.0.1

Bumps [bitflags](https://github.com/bitflags/bitflags) from 1.3.2 to 2.0.1.
- [Release notes](https://github.com/bitflags/bitflags/releases)
- [Changelog](https://github.com/bitflags/bitflags/blob/main/CHANGELOG.md)
- [Commits](bitflags/bitflags@1.3.2...2.0.1)

---
updated-dependencies:
- dependency-name: bitflags
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

* Update bitflag usage for the 1.0->2.0 changes

Per <https://github.com/bitflags/bitflags/releases/tag/2.0.0> this
changes the serialized format of items containing bitflags.

* Update enclave lock files

* Rust fmt

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Nick Santana <nick@mobilecoin.com>
* refactor mobilecoind api to not depend on consensus-api

I am having a lot of build problems in another project due to
mc-mobilecoind-api pulling in mbedtls, which exports some symbols
that conflicts with another C library that some rust code depends on.

I determined that the only reason I have mbedtls in my build is
that `mobilecoind-api` depends on `consensus-api` and this pulls
in `mc-attest-core` etc.

This change allows me to break this dependency.

* remove stuff from build.rs that isn't needed anymore

* cargo lock

* fixup

* fixup

* restore deprecated field (avoid compatibility break)

* add changelog entry

* fixup
Copy link
Collaborator

@nick-mobilecoin nick-mobilecoin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ this looks like it might have a merge issue, it seems to be diffing against other changes that are present in the mainline branch master.

@NotGyro
Copy link
Contributor Author

NotGyro commented Apr 11, 2023

x this looks like it might have a merge issue, it seems to be diffing against other changes that are present in the mainline branch master.

Yes, I'm not sure how best to resolve that.

The only files modified are consensus_config.proto, consensus/service/config/src/lib.rs, consensus/service/src/api/client_api_service.rs, and that is correct.

However, I do see all of those unrelated commits. I may have messed up rebasing to keep up-to-date with the main branch.

@@ -69,4 +69,24 @@ message ConsensusNodeConfig {

// SCP message signing key.
external.Ed25519Public scp_message_signing_key = 8;

// Maximum number of client session tracking structures to retain in
// a least-recently-used cache.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙃 Don't think "least recently used" is a hyphenated word

Suggest looking at my comment on "denial-of-service" before deciding

Suggested change
// a least-recently-used cache.
// a least recently used cache.

// a least-recently-used cache.
//
// This corresponds to Config::client_tracking_capacity
uint64 client_tracking_capacity = 9;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ I don't think it matters as they're all dynamically sized, but why uint64 here? Considering tx_failure_limit is uint32

// failures, per-client. This is used to implement DOS-protection,
// protecting against clients who make too many failed
// transaction proposals within this span.
uint64 tx_failure_window_seconds = 10;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Like the call out of the units in the name since there is no type safety.

uint64 tx_failure_window_seconds = 10;

// How many tx proposal failures within the rolling window are required
// before it's treated as concerning, thereby tripping denial-of-service
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 interestingly "denial-of-service" is often hyphenated on the interwebs. It looks like the rules are pretty loose when to hyphenate, some say since "denial-of-service" is a compound adjective it can be hyphenated.

@@ -55,6 +58,17 @@ impl ClientSessionTracking {
}
}

pub fn get_proposetx_failures(&self) -> usize {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 Is this name correct? Also getter setter methods should avoid get_ prefix

Suggested change
pub fn get_proposetx_failures(&self) -> usize {
pub fn num_tx_proposal_failures(&self) -> usize {

While the usize uses communicates the type, I used "num" to indicate it wasn't the actual field itself.

let result = self.handle_proposed_tx(msg);
// The block present below rate-limits suspicious behavior.
if let Err(_err) = &result {
let mut tracker = self.tracked_sessions.lock().expect("Mutex poisoned");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 I usually lean on the 3 strike rule:

  1. you're writing code
  2. It might be coincidence that the logic is duplicate
  3. You're out, refactor.

This is the 3rd time this patter of if get() else put() && get() has been present. This should have method which does this for you.
Unfortunately due to the mutex this will be a little bit more work as it will need to return a session guard that implements drop, to free up the mutex.

};
record.fail_tx_proposal(Instant::now(), self.config.tx_failure_window);
}
result.or_else(ConsensusGrpcError::into)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ Thoughts on using a match statement here?
Something about seeing is_err logic and then in a separate block modifying that error.

.or_else(ConsensusGrpcError::into)
let result = self.handle_proposed_tx(msg);
// The block present below rate-limits suspicious behavior.
if let Err(_err) = &result {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 Thinking is_err() would work here since the contents don't appear to be looked at

Suggested change
if let Err(_err) = &result {
if result.is_err() {

Comment on lines +1987 to +1997
assert!(propose_tx_response.is_err());

match propose_tx_response {
Err(grpcio::Error::RpcFailure(rpc_status)) => {
assert_eq!(rpc_status.code(), RpcStatusCode::RESOURCE_EXHAUSTED);
}
_ => panic!(
"Unexpected response upon continuing to use\
a rate-limited session: {propose_tx_response:?}"
),
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 it might lose a little bit of info on failures but perhaps

Suggested change
assert!(propose_tx_response.is_err());
match propose_tx_response {
Err(grpcio::Error::RpcFailure(rpc_status)) => {
assert_eq!(rpc_status.code(), RpcStatusCode::RESOURCE_EXHAUSTED);
}
_ => panic!(
"Unexpected response upon continuing to use\
a rate-limited session: {propose_tx_response:?}"
),
}
assert!(
matches!(propose_tx_response, Err(grpcio::Error::RpcFailure(failure)) if failure.code() == RpcStatusCode::RESOURCE_EXHAUSTED)
);

Comment on lines +2004 to +2006

// Because of the behavior of Mockall, if this returns without calling
// client_close() exactly once, it will panic and the test will fail.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔧 it doesn't seem like this comment is needed since the tracker will close it.

Suggested change
// Because of the behavior of Mockall, if this returns without calling
// client_close() exactly once, it will panic and the test will fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consensus Related only to the consensus protocol or service
Projects
Status: Changes Requested
Development

Successfully merging this pull request may close these issues.

8 participants