Bad blocks RPC + reporting #9433

tomusdrw · 2018-08-29T11:52:10Z

Prints a verbose error message in case bad block is detected.
Stores bad blocks in a cache, the cache is limited to take at most 8MBs (should we rather have a count limit?)
Exposes bad blocks over RPC attempting to maintain geth's compatibility, but also returns all the same fields we return for regular block RPCs.
Adds reason to the RPC response to see internal error message that occurred during import.

sorpaas

I still worry a little about the design around badblocks. Using RPC methods doesn't really feel elegant, and in some cases if under attacks, it may not work well.

@holiman would it work if those bad blocks are printed out as tracing info instead?

We are not limited to 8mb bad blocks. For sophisticated consensus attackers, they may find vulnerabilities in networking/importing first, and flood the 8mb struct with non-consensus-critical bad blocks, forcing the consensus-critical bad blocks to be dropped out of the 8mb cache.
For cases where you actually want to debug bad blocks, you always have direct node access. So it makes this RPC indirection a little bit unnecessary.

sorpaas · 2018-08-29T12:35:05Z

ethcore/src/client/bad_blocks.rs

+
+impl BadBlocks {
+	/// Reports given RLP as invalid block.
+	pub  fn report(&self, raw: Bytes, message: String) {


Extra whitespace.

sorpaas · 2018-08-29T12:36:21Z

ethcore/src/client/client.rs

+						}
+					},
+					Err(err) => {
+						self.bad_blocks.report(raw, format!("{:?}", err));


Can we make this configurable? Such as only enable bad blocks reporting when the debug namespace RPC is enabled, or just use another cli flag.

I think the whole point is to have it running by default, so that anyone (that have seen such message) can be asked to get bad blocks from RPC and then report them for the analysis.

sorpaas · 2018-08-29T12:38:46Z

rpc/src/v1/impls/debug.rs

+use v1::types::{Block, Bytes, RichBlock, BlockTransactions, Transaction};
+
+/// Debug rpc implementation.
+pub struct DebugClient<C> {


I think this mod needs to be enabled in parity/rpc_apis.rs? Or did I missed that?

Oh, that's 100% true. Haven't tested it properly, will add to CLI and actually check if it's queryable through RPC.

holiman · 2018-08-29T13:19:41Z

The reason to have this functionality is so that we can run one geth node and one parity node in paralell, and detect immediately if a consensus fork has occurred (one rejecting a block that the other has accepted).

The later steps of analyzing the error is not very difficult, although it definitely helps if the client that rejecteded the block also prints out the rlp, for analysis.

We are not limited to 8mb bad blocks. For sophisticated consensus attackers, they may find vulnerabilities in networking/importing first, and flood the 8mb struct with non-consensus-critical bad blocks, forcing the consensus-critical bad blocks to be dropped out of the 8mb cache.

So, of the cases where real-world consensus issues have happened, none of them have been at that level of sophistication -- afiak all of them have been incidental. But even so -- how could they be flooded with non-consensus-critical bad blocks? Those would be rejected due to low difficulty even before going into actual block execution.

tomusdrw

I think the general idea is that logs are easily lost, and accessing over RPC makes it easy for anyone to fetch the details and report back for analysis.

For sophisticated consensus attackers, they may find vulnerabilities in networking/importing first, and flood the 8mb struct with non-consensus-critical bad blocks, forcing the consensus-critical bad blocks to be dropped out of the 8mb cache.

That would require a targeted attack to your particular machine, other bad blocks would most likely be propagated by consensus-affected peers anyway and re-imported in the future, so they would show up in the cache again.

tomusdrw · 2018-08-29T13:16:28Z

ethcore/src/client/client.rs

+						}
+					},
+					Err(err) => {
+						self.bad_blocks.report(raw, format!("{:?}", err));


I think the whole point is to have it running by default, so that anyone (that have seen such message) can be asked to get bad blocks from RPC and then report them for the analysis.

tomusdrw · 2018-08-29T13:16:56Z

rpc/src/v1/impls/debug.rs

+use v1::types::{Block, Bytes, RichBlock, BlockTransactions, Transaction};
+
+/// Debug rpc implementation.
+pub struct DebugClient<C> {


Oh, that's 100% true. Haven't tested it properly, will add to CLI and actually check if it's queryable through RPC.

sorpaas · 2018-08-29T13:22:01Z

Another idea is to move this outside of parity-ethereum binary -- create a utility bin crate that reads parity-ethereum's tracing log, collect bad blocks there, and expose debug_badblocks rpc method in that utility bin. In that way we won't need to worry about badblocks struct creating any extra attack vector on the main parity-ethereum client.

tomusdrw · 2018-08-29T13:24:10Z

Those would be rejected due to low difficulty even before going into actual block execution.

@holiman What's the definition of a "bad block" then? In this PR we are reporting any block that fails the validation (even before execution) - so even low difficulty, invalid seal, etc. blocks would be reported here.

holiman · 2018-08-29T13:24:29Z

I don't understand how a small cache of blocks can be a potential attack target? Am I missing something?

holiman · 2018-08-29T13:32:37Z

@holiman What's the definition of a "bad block" then? In this PR we are reporting any block that fails the validation (even before execution) - so even low difficulty, invalid seal, etc. blocks would be reported here.

Actually, I'm not quite certain -- it's possible we do too. However, as you've pointed out -- an attacker needs to specifically target the node, in order to get his bad blocks propagated.

holiman · 2018-08-29T13:35:08Z

We report bad blocks here:https://github.com/ethereum/go-ethereum/blob/master/core/blockchain.go#L1136 (pre-check) and here: https://github.com/ethereum/go-ethereum/blob/master/core/blockchain.go#L1154 (processing)

holiman · 2018-08-30T07:13:56Z

Some more context: we've had 'reporting' of bad blocks now for quite some time. By 'reporting' I mean printing out a warning about a bad block being encountered, and some details, in the log stream.

This has triggered only a couple of times over the last couple of years, and each time it has been very good info. IIRC, that's what made people notice when Parity mined bad blocks on Ropsten due to the consensus-bug with having EIP-86 enabled.

The added ability to actually get the same info over RPC will make it possible to get instant alerts immediately if a fork occurs. If the fork-monitor needs to parse log output for the same info, it will be quite messy.

If you see fit to limit the badblock-cache even more, then please do so, but I'd really love to have this feature in, even if it's somewhat limited.

Right now, our fork monitor (http://mon04.ethdevops.io/forkmon) would immediately detect if parity head contains something that geth rejects, but the opposite scenario would not be detected until later.

5chdn · 2018-09-04T09:50:29Z

How does this relate to #9207 ?

tomusdrw · 2018-09-04T10:56:34Z

@5chdn #9207 is around for more than a month and it's nowhere close to be neither correct nor finished - it stores bad_blocks hashes in the Blockchain, and the RPC attempts to retrieve them from the database, but those blocks are bad so they never even get to the db. While I appreciate the contribution, reviewing the PR is pretty much implementing that stuff in comments.

andresilva

LGTM

andresilva · 2018-09-05T18:13:40Z

parity/rpc_apis.rs

@@ -647,6 +658,7 @@ impl ApiSet {
 				public_list
 			},
 			ApiSet::SafeContext => {
+				public_list.insert(Api::Debug);


Why enable this only for safe context (given that bad blocks cache is enabled by default)? I understand that the debug namespace might have other methods in the future which are unsafe to expose.

Didn't enable for two reasons:

debug is potentially unsafe

It should never be used by regular dapps

We have this implemented to support detecting forks, I don't want to make it a first-class namespace where we implement all the methods, cause that most likely won't happen - the methods are not going through EIPs, and they are pretty much geth-specific debug RPCs.

andresilva · 2018-09-05T18:14:10Z

util/memory_cache/src/lib.rs

+	pub fn backstore(&self) -> &LruCache<K, V> {
+		&self.inner
+	}
+


Unnecessary newline?

lexfrl · 2018-09-06T12:48:38Z

ethcore/src/client/bad_blocks.rs

+
+/// Recently seen bad blocks.
+pub struct BadBlocks {
+	last_blocks: RwLock<MemoryLruCache<H256, (Unverified, String)>>,


How about to add the raw_block data (Vec) into the tuple also, to be able to save "undecodable" BadBlocks? Is that makes any sense?

Yeah, I though about it actually. It doesn't allow us to return detailed block in the RPC response though. I think currently it makes sense to have something with the same results as geth to start detecting issues asap.

That said, feel free to log it as a separate issue.

I also thought about this and checked the geth implementation and it seems they also don't store block rlp's that aren't decodable.

tomusdrw · 2018-09-06T13:17:56Z

To follow up on @sorpaas comment. We spoke through a different channel about this and agreed that it doesn't necessarily suits the core client.

The idea that came up was to expose structured logs via RPC (we already to that for recent logs, we could add a subscription), where one would be able to subscribe to some particular events happening in the core client.
Using events data we could build a separate binary that would be able to expose this (and similar) RPCs. Event sourcing would also be useful for all other stats and attaching to events at the runtime would also be quite useful for debugging purposes - imagine attaching to a running node to inspect incoming transactions or networking communication (currently it requires restarting with different logging flag).

lexfrl

LGTM, but CI is failing..

tomusdrw added 5 commits August 27, 2018 12:25

Bad blocks RPC.

c96ee74

Return bad blocks via RPC.

eaf9928

Fix test.

b22388d

Merge branch 'master' into td-badblocks

e67f5ce

More verbose bad block message.

6d2bd28

tomusdrw added A0-pleasereview 🤓 Pull request needs code review. M4-core ⛓ Core client code / Rust. M6-rpcapi 📣 RPC API. labels Aug 29, 2018

sorpaas reviewed Aug 29, 2018

View reviewed changes

tomusdrw added A3-inprogress ⏳ Pull request is in progress. No review needed at this stage. and removed A0-pleasereview 🤓 Pull request needs code review. labels Aug 29, 2018

tomusdrw commented Aug 29, 2018

View reviewed changes

tomusdrw added 2 commits August 30, 2018 10:37

Expose via CLI.

aa46cd9

Remove stray whitespace.

1686698

tomusdrw added A0-pleasereview 🤓 Pull request needs code review. and removed A3-inprogress ⏳ Pull request is in progress. No review needed at this stage. labels Aug 30, 2018

5chdn added this to the 2.1 milestone Aug 30, 2018

andresilva approved these changes Sep 5, 2018

View reviewed changes

5chdn added A8-looksgood 🦄 Pull request is reviewed well. and removed A0-pleasereview 🤓 Pull request needs code review. labels Sep 6, 2018

lexfrl reviewed Sep 6, 2018

View reviewed changes

Remove stray newline.

395ec14

lexfrl reviewed Sep 6, 2018

View reviewed changes

Fix tests.

023da5b

lexfrl approved these changes Sep 7, 2018

View reviewed changes

5chdn merged commit 61bd47c into master Sep 8, 2018

5chdn deleted the td-badblocks branch September 8, 2018 02:04

5chdn mentioned this pull request Sep 8, 2018

Add debug getbadblocks #9207

Closed

debris mentioned this pull request Sep 24, 2018

fix bad-block reporting no reason #9638

Merged

ordian mentioned this pull request Oct 2, 2018

Backports for stable 2.0.7 #9648

Merged

13 tasks

ordian mentioned this pull request Oct 23, 2018

Backports: parity stable 2.0.9 #9786

Merged

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad blocks RPC + reporting #9433

Bad blocks RPC + reporting #9433

tomusdrw commented Aug 29, 2018

sorpaas left a comment

sorpaas Aug 29, 2018

sorpaas Aug 29, 2018

tomusdrw Aug 29, 2018

sorpaas Aug 29, 2018

tomusdrw Aug 29, 2018

holiman commented Aug 29, 2018

tomusdrw left a comment

tomusdrw Aug 29, 2018

tomusdrw Aug 29, 2018

sorpaas commented Aug 29, 2018

tomusdrw commented Aug 29, 2018

holiman commented Aug 29, 2018

holiman commented Aug 29, 2018

holiman commented Aug 29, 2018

holiman commented Aug 30, 2018

5chdn commented Sep 4, 2018

tomusdrw commented Sep 4, 2018

andresilva left a comment

andresilva Sep 5, 2018

tomusdrw Sep 6, 2018

andresilva Sep 5, 2018

tomusdrw Sep 6, 2018

lexfrl Sep 6, 2018

tomusdrw Sep 6, 2018

andresilva Sep 6, 2018

tomusdrw commented Sep 6, 2018

lexfrl left a comment

Bad blocks RPC + reporting #9433

Bad blocks RPC + reporting #9433

Conversation

tomusdrw commented Aug 29, 2018

sorpaas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

holiman commented Aug 29, 2018

tomusdrw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sorpaas commented Aug 29, 2018

tomusdrw commented Aug 29, 2018

holiman commented Aug 29, 2018

holiman commented Aug 29, 2018

holiman commented Aug 29, 2018

holiman commented Aug 30, 2018

5chdn commented Sep 4, 2018

tomusdrw commented Sep 4, 2018

andresilva left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tomusdrw commented Sep 6, 2018

lexfrl left a comment

Choose a reason for hiding this comment