overview paper: deprecate sentry nodes - see also paritytech/substrat…

…e#6845
w3f · Jan 21, 2021 · 0c3c006 · 0c3c006
1 parent 56dde38
commit 0c3c006
Show file tree

Hide file tree

Showing 4 changed files with 45 additions and 59 deletions.
diff --git a/docs/polkadot/Polkadot-OverviewPaper/background.tex b/docs/polkadot/Polkadot-OverviewPaper/background.tex
@@ -50,7 +50,6 @@ \section{Glossary}
     %PoW & \emph{Proof-of-Work} - Mechanism where parties vote with processing power. && \\
     Relay\newline- Chain & Ensures global consensus among parachains. && \ref{sec:relaychain} \\
     Runtime & The Wasm blob which contains the state transition functions, including other core operations required by Polkadot. && \ref{par:state_transition} \\
-    Sentry\newline- nodes & Specialized proxy server which forward traffic to/from the validator. && \\
     Session & A session is a period of time that has a constant set of validators. Validators can only join or exit the validator set at a session change. && \\
     STVF & \emph{State-Transition-Validation-Function} - A function of the Runtime to verify the PoV. && \ref{sec:parachainblockproduction} \\
     Validator & The elected and highest in charge party who has a chance of being selected by BABE to produce a block. A set of candidate validators is defined as \Can . The number of validators to elect is defined as \nval . & \val (\Val)& \ref{par:validators} \\

diff --git a/docs/polkadot/Polkadot-OverviewPaper/crypto.tex b/docs/polkadot/Polkadot-OverviewPaper/crypto.tex
@@ -4,26 +4,26 @@ \subsection{Cryptography}\label{sec:crypto}
 
 \subsubsection{Account keys}
 
-Account keys have an associated balance of which portions can be {\em locked} to play roles in staking, resource rental, and governance, including waiting out a couple types of unlocking period.  We allow several locks of varying duration, both because these roles impose different restrictions, and for multiple unlocking periods running concurrently. 
+Account keys have an associated balance of which portions can be {\em locked} to play roles in staking, resource rental, and governance, including waiting out a couple types of unlocking period.  We allow several locks of varying duration, both because these roles impose different restrictions, and for multiple unlocking periods running concurrently.
 
 We encourage active participation in all these roles, but they all require occasional signatures from accounts.  At the same time, account keys have better physical security when kept in inconvenient locations, like safety deposit boxes, which makes signing arduous.  We avoid this friction for users as follows.
 
-Accounts that lock funds for staking are called {\em stash accounts}.  All stash accounts register a certificate on-chain that delegates all validator operation and nomination powers to some {\em controller account}, and also designates some {\em proxy key} for governance votes.  In this state, the controller and proxy accounts can sign for the stash account in staking and governance functions respectively, but not transfer funds.  
+Accounts that lock funds for staking are called {\em stash accounts}.  All stash accounts register a certificate on-chain that delegates all validator operation and nomination powers to some {\em controller account}, and also designates some {\em proxy key} for governance votes.  In this state, the controller and proxy accounts can sign for the stash account in staking and governance functions respectively, but not transfer funds.
 
 \smallskip
 
-At present, we support both ed25519 \cite{ed25519} and Schnorrkel/sr25519 \cite{schnorrkel} for account keys.  These are both Schnorr-like signatures implemented using the Ed25519 curve, so both offer extremely similar security.  We recommend ed25519 keys for users who require Hardware Security Module (HSM) support or other external key management solution, while Schnorrkel/sr25519 provides more blockchain-friendly functionality like Hierarchical Deterministic Key Derivation (HDKD) and multi-signatures.  
+At present, we support both ed25519 \cite{ed25519} and Schnorrkel/sr25519 \cite{schnorrkel} for account keys.  These are both Schnorr-like signatures implemented using the Ed25519 curve, so both offer extremely similar security.  We recommend ed25519 keys for users who require Hardware Security Module (HSM) support or other external key management solution, while Schnorrkel/sr25519 provides more blockchain-friendly functionality like Hierarchical Deterministic Key Derivation (HDKD) and multi-signatures.
 
 In particular, Schnorrkel/sr25519 uses the Ristretto implementation \cite{Ristretto} of Mike Hamburg's Decaf \cite[\S7]{Decaf}, which provide the 2-torsion free points of the Ed25519 curve as a prime order group.  Avoiding the cofactor like this means Ristretto makes implementing more complex protocols significantly safer.  We employ Blake2b for most conventional hashing in Polkadot, but Schnorrkel/sr25519 itself uses STROBE128 \cite{STROBE}, which is based on Keccak-f(1600) and provides a hashing interface well suited to signatures and non-interactive zero-knowledge proofs (NIZKs).
 % See https://github.com/w3f/schnorrkel/blob/master/annoucement.md for more detailed design notes.
 
 \subsubsection{Session keys}\label{sec:session_keys}
 
-Session keys each fill roughly one particular role in consensus or security.  As a rule, session keys gain authority only from a session certificate, signed by some controller key, that delegates appropriate stake.  
+Session keys each fill roughly one particular role in consensus or security.  As a rule, session keys gain authority only from a session certificate, signed by some controller key, that delegates appropriate stake.
 
-At any time, the controller key can pause or revoke this session certificate and/or issue replacement with new session keys.  All new session keys can be registered in advance, and most must be, so validators can cleanly transition to new hardware by issuing session certificates that only become valid after some future session.  We suggest using pause mechanism for emergency maintenance and using revocation if a session key might be compromised.  
+At any time, the controller key can pause or revoke this session certificate and/or issue replacement with new session keys.  All new session keys can be registered in advance, and most must be, so validators can cleanly transition to new hardware by issuing session certificates that only become valid after some future session.  We suggest using pause mechanism for emergency maintenance and using revocation if a session key might be compromised.
 
-We prefer if session keys remain tied to one physical machine because doing so minimises the risk of accidental equivocation.  We ask validator operators to issue session certificates using an RPC protocol, not to handle the session secret keys themselves.  
+We prefer if session keys remain tied to one physical machine because doing so minimises the risk of accidental equivocation.  We ask validator operators to issue session certificates using an RPC protocol, not to handle the session secret keys themselves.
 
 Almost all early proof-of-stake networks have a negligent public key infrastructure that encourages duplicating session secret keys across machines, and thus reduces security and leads to pointless slashing.
 % TODO: I'd meant to cite this somewhere, but not sure where now.  It's easy to cite the slashing to part to cosmos.  Thoughts?
@@ -32,21 +32,21 @@ \subsubsection{Session keys}\label{sec:session_keys}
 
 We impose no prior restrictions on the cryptography employed by specific components or their associated session keys types.\footnote{We always implement cryptography for Polkadot in native code, not just because the runtime suffers from WASM's performance penalties, but because all of Polkadot's consensus protocols are partially implemented outside the runtime in Substrate modules.}
 
-In BABE \ref{sec:babe}, validators use Schnorrkel/sr25519 keys both for regular Schnorr signatures, as well as for a verifiable random function (VRF) based on NSEC5 \cite{NSEC5}.  
+In BABE \ref{sec:babe}, validators use Schnorrkel/sr25519 keys both for regular Schnorr signatures, as well as for a verifiable random function (VRF) based on NSEC5 \cite{NSEC5}.
 
 A VRF is the public-key analog of a pseudo-random function (PRF), aka cryptographic hash function with a distinguished key, such as many MACs.  We award block production slots when the block producer scores a low enough VRF output $\mathtt{VRF}_{\sk}(r_e || \mathtt{slot\_number} )$, so anyone with the VRF public keys can verify that blocks were produced in the correct slot, but only the block producers know their slots in advance via their VRF secret key.
 
 As in \cite{Praos}, we provide a source of randomness $r_e$ for the VRF inputs by hashing together all VRF outputs form the previous session, which requires that BABE keys be registered at least two full epochs before being used.
 
-We reduce VRF output malleability by hashing the signer's public key alongside the input, which dramatically improves security when used with HDKD.  We also hash the VRF input and output together when providing output used elsewhere, which improves composability when used as a random oracle in security proofs.  See the 2Hash-DH construction from Theorem 2 on page 32 in appendix C of \cite{Praos}.  
+We reduce VRF output malleability by hashing the signer's public key alongside the input, which dramatically improves security when used with HDKD.  We also hash the VRF input and output together when providing output used elsewhere, which improves composability when used as a random oracle in security proofs.  See the 2Hash-DH construction from Theorem 2 on page 32 in appendix C of \cite{Praos}.
 
-In GRANDPA \ref{sec:grandpa}, validators shall vote using BLS signatures, which supports convenient signature aggregation and select ZCash's BLS12-381 curve for performance.  There is a risk that BLS12-381 might drop significantly below 128 bits of security, due to number field sieve advancements.  If and when this happens, we expect upgrading GRANDPA to another curve to be straightforward. 
+In GRANDPA \ref{sec:grandpa}, validators shall vote using BLS signatures, which supports convenient signature aggregation and select ZCash's BLS12-381 curve for performance.  There is a risk that BLS12-381 might drop significantly below 128 bits of security, due to number field sieve advancements.  If and when this happens, we expect upgrading GRANDPA to another curve to be straightforward.
 % https://mailarchive.ietf.org/arch/msg/cfrg/eAn3_8XpcG4R2VFhDtE_pomPo2Q
 
 % TODO: ImOnline
 % ref. https://github.com/paritytech/substrate/issues/3546
 
-We treat libp2p's transport keys roughly like session keys too, but they include the transport keys for sentry nodes, not just for the validator itself.  As such, the operator interacts slightly more with these.
+We treat libp2p's transport keys roughly like session keys too.  As such, the operator interacts slightly more with these.
 
 % As mentioned above, we permit controller keys to revoke session key validity of course, but controllers could pause operation for shorter periods.  We similarly permit controllers to register new session keys in advance, which enables a clean handover between validator machines.
 
diff --git a/docs/polkadot/Polkadot-OverviewPaper/networking.tex b/docs/polkadot/Polkadot-OverviewPaper/networking.tex
@@ -88,12 +88,10 @@ \subsubsection{Gossiping} \label{sec:gossiping}
 
 There are also more specific constraint rules applied to artefacts belonging to the various higher-level subprotocols using the gossip protocol, to avoid broadcasting obsolete or otherwise unneeded artefacts. For example, for GRANDPA we only allow two votes being received for each type of vote, round number, and voter; any further votes will be ignored. For block production only valid block producers are allowed to produce one block per round; any further blocks will be ignored.
 
-There is basic support for \emph{sentry nodes}, proxy servers that are essentially the only neighbour of a private server, running more security-critical operations such the validator role.
-
 The network topology is a weak point currently; nodes connect to each other on an ad-hoc basis by performing random lookups in the \hyperref[sec:net_lowlevel]{address book}. Further work will proceed along two fronts:
 
 \begin{enumerate}
-\item Trusted nodes will reserve a portion of their bandwidth and connection resources, to form a structured overlay with a deterministic but unpredictable topology that rotates every era. For nodes running behind sentries, this effectively means that their sentry nodes instead participate in this topology.
+\item Trusted nodes will reserve a portion of their bandwidth and connection resources, to form a structured overlay with a deterministic but unpredictable topology that rotates every era.
 
 \item For the remainder of trusted nodes' resource capacity, and for the whole of untrusted nodes' resource capacity, they will select neighbours via a scheme based on latency measurements, with the details to be decided. Notably, for good security properties we want a scheme that does not simply choose "closest first", but also some far links as well.
 \end{enumerate}
@@ -126,7 +124,6 @@ \subsubsection{Storage and availability} \label{sec:net_storage}
 
 \begin{itemize}
 \item With both distribution and retrieval, the set of recipients is known. Therefore, pieces can be pre-emptively pushed from validators that already have the piece, in addition to bittorrent's pull semantics.
-\item Validators behind sentry nodes will use these as proxies, rather than directly sending.
 \item Instead of a centralised tracker, tracker-like information such as who has what piece, is broadcast via the relay chain \hyperref[sec:gossiping]{gossip network}.
 \end{itemize}
 
@@ -196,47 +193,6 @@ \subsubsection{Cross-chain messaging} \label{sec:net_crosschain}
 must all be distributed reliably. Applying erasure coding to these as well, is
 a straightforward and obvious solution, but we will also explore alternatives.
 
-\subsubsection{Sentry nodes} \label{sec:net_sentry}
-
-Sometimes, network operators want to arrange aspects of their physical network
-for operational security reasons. Some of these arrangements are independent
-and compatible with the design of any decentralised protocol, which typically
-works in the layer above. However some other arrangements need special
-treatment by the decentralised protocol, in particular arrangements affecting
-the reachability of nodes.
-
-For such use-cases, Polkadot supports running full-nodes as the sentry nodes of
-another full-node that is only reachable by these sentry nodes. This works best
-when one runs several sentry nodes for a single private full-node. Protocol
-wise, briefly, sentry nodes are regular neighbours of their private node, with
-some additional metadata to tell others to communicate this private node via
-its sentry nodes. In direct-sending mode, they act similarly to TURN servers,
-without any resource bottleneck constraints since every sentry node is serving
-only one private node. These additions are fairly straightforward and more
-details are available elsewhere.
-
-It is not required to run sentry nodes, for example if you believe the
-aforementioned security benefits are not worth the added latency cost.
-
-A brief discussion about the security tradeoffs of this approach follows. One
-benefit of a restricted physical topology, is to support load-balancing and DoS
-protection across a number of gateway proxies. The indirection can also help to
-protect the private node if there are exploits in the software - although note
-this does not cover the most severe exploits that give arbitrary execution
-rights on the sentry node, which can then be used as a launching pad for
-attacks on the private node. So, while we don't believe that a public address
-is itself a security liability when the serving code is written well, sentry
-nodes can help to mitigate these other scenarios.
-
-(An alternative possibility is for the network operator to run lower-level
-proxies, such as IP or TCP proxies, for their private node. This certainly can
-be done without any protocol support from Polkadot. One advantage of sentry
-nodes compared to this scenario is that the traffic coming from a sentry node
-has been through some level of verification and sanitisation as part of the
-Polkadot protocol, which would be missing for a lower-level proxy. Of course
-there can be exploits against this, but these are dealt with with high priority
-since they are amplification vectors usable against the whole network.)
-
 \subsubsection{Authentication, transport, and discovery} \label{sec:net_lowlevel}
 
 In secure protocols in general, and likewise with Polkadot, entities refer to each other by their cryptographic public keys. There is no strong security association with weak references such as IP addresses since those are typically controlled not by the entities themselves but by their communications provider.
@@ -248,3 +204,37 @@ \subsubsection{Authentication, transport, and discovery} \label{sec:net_lowlevel
 Further work will decouple the discovery mechanism from the address book, as described in \nameref{sec:gossiping}, resulting in a more security network topology. Part of this will require some fraction of transport-level connections be authenticated against the currently-trusted validator set. However we also require to retain the ability to accept incoming connections from unauthenticated entities, and this needs to be restricted on a resource basis, without starving the authenticated entities.
 
 Further work will also decouple the implementation of the address book from its interface, so that e.g. we can put part of in on-chain. This has different security tradeoffs from a Kademlia-based address book, some of which is outside of the current scope of Polkadot, such as location privacy. By offering different possibilities, we hope to satisfy a diverse set of nodes having different security requirements.
+
+\subsubsection{Deprecated: Sentry nodes} \label{sec:net_sentry}
+
+An earlier version of Polkadot included the feature of \emph{sentry nodes}.
+These were essentially proxies for validators that did not want their nodes to
+be reachable from the public internet for security reasons.
+
+We have deprecated these in the latest version moving forward. The reason is a
+complexity-benefit tradeoff:
+
+\begin{itemize}
+\item As implemented, multiple sentry nodes could be shared between multiple
+  validators on a general n-to-m basis. This has very few consequences in a
+  gossip network where everything is broadcast to everyone, however it greatly
+  increases the complexity of designing a protocol where certain artefacts must
+  be sent directly to particular validators, such as in
+  \nameref{sec:net_storage} or \nameref{sec:net_crosschain}.
+\item The security benefit of a sentry node was to provide a defense-in-depth
+  (i.e. secondary) level of protection against DoS attacks and exploits less
+  severe than arbitrary execution, such as memory leak exploits. (Arbitrary
+  execution can be used to launch an attack via the sentry node, so sentries
+  are not effective anyway in this case.) These can be achieved via other means
+  - (1) For low-level DoS attacks node operators can add physical-network-level
+  protections in conjuction with their ISP; for high-level DoS attacks we will
+  add additional protection into Polkadot itself. (2) For exploits less severe
+  than arbitrary execution, these can be mitigated by configuring a validator
+  to use remote signing capabilities (under development), performed either on
+  another machine or on a separate process or VM on the same machine depending
+  on your threat model.
+\end{itemize}
+
+If node operators wish to recover the secondary security protections of sentry
+nodes, they may now instead implement the primary security protections as
+mentioned above.
diff --git a/docs/polkadot/Polkadot-OverviewPaper/summary.tex b/docs/polkadot/Polkadot-OverviewPaper/summary.tex
@@ -23,9 +23,6 @@ \subsection{Nodes and roles}
 \begin{enumerate}
 \item Light client - retrieves certain user-relevant data from the network. The availability of light clients is irrelevant - they don't perform a service for others.
 \item Full node - retrieves all types of data, stores it long-term, and propagates it to others. Must be highly available.
-  \begin{enumerate}
-  \item \hyperref[sec:net_sentry]{Sentry node} - publicly-reachable full nodes that perform trusted proxying services for a private full node, run by the same operator.
-  \end{enumerate}
 Sometimes we refer to a \emph{full node} of a parachain. In the abstract sense for non-blockchain parachains, this means that they participate in it to a sufficient degree that they can verify all data passing through it.
 \end{enumerate}