-
Notifications
You must be signed in to change notification settings - Fork 234
Hardware Requirements
Gatekeeper has been tested and deployed on bare metal, Ubuntu, Linux servers. While Gatekeeper may work on other platforms, entities interested in deploying Gatekeeper on other platforms must be willing to embrace the effort to make Gatekeeper work on their chosen platform since our team does not currently have the resources to support these efforts. This page centralizes information on the hardware that has been successfully used to deploy Gatekeeper, so people can minimize their bootstrap process.
Our team does understand that being able to test Gatekeeper on virtual machines would lower the learning curve of Gatekeeper, but we have run into several issues with virtual machines, so at this point, we consider them unsupported. We have the milestone Cloud deployment to address these issues, but this milestone is not upfront in our list of priorities.
In order to keep up with floods of packets, Gatekeeper banks on the availability of modern multi-/manycore CPUs. Moreover, in order to have faster access to the RAM, it helps to have multi-socket mainboards, that is, mainboards with at least two physical processors. The reason for this is that each socket has independent access to the memory bank attached to it. A physical processor and its respective memory bank form what is called a NUMA node. Gatekeeper leverages NUMA nodes to process packets faster. Thus, a greater number of NUMA nodes enables faster processing. One should evenly spread the memory modules on the memory banks of a mainboard to make the NUMA nodes equally capable.
While the exact number of needed cores varies with the capacity of each individual core and the speed of the NICs used to connect a Gatekeeper server to the network, one should expect to have a number of cores dedicated to run Gatekeeper. Four of these cores will run the functional blocks GGU, LLS, CPS, and Dynamic configuration. The GK and SOL blocks take a variable number of cores. On a server with n NUMA nodes, the typical configuration is for having n instances of the SOL block and 2*n instances of the GK block. Thus, on a server with 2 NUMA nodes, Gatekeeper will take 4 + 2 + 2*2 = 10 cores. One must have more cores to run the operating system and other applications such as Bird. Grantor servers are less demanding: 3 cores for the LLS, CPS, Dynamic configuration, and n cores for GT blocks. One can have more GT blocks, and it can even be advisable, but one must make sure that the server is not overloaded.
Another feature that has a high impact on Gatekeeper's performance is the ability of the processors to receive packets from NICs directly into their caches. This feature enables Gatekeeper to process packets entirely bypassing the RAM. With this feature, the normal workflow goes as follows, packets are delivered to processors' caches, Gatekeeper processes these packets and sends them back to the NICs. Thus, packets are typically not written to the RAM. This is essential because not only does Gatekeeper have other demands for the RAM, but the RAM would quickly become the bottleneck if the RAM received all packets going through a Gatekeeper server. On Intel processors, this feature is called Data Direct I/O (Intel DDIO). Intel DDIO is only available on Xeon processors E5 family and E7 v2 family and later.
Since most, if not all, I/O in a Gatekeeper server is going through the caches of the processors, having large caches can make a difference on the overall perforce as well. If possible, disabling hyperthreading is advisable because virtual cores add stress to the cache of real cores.
Although Gatekeeper's code is mostly lock-free, it employs Hardware transactional memory (HTM) if it's available. Thus, HTM is a nice-to-have feature.
Due to all the considerations above, production Gatekeeper servers have been deployed on Xeon processors with support to Intel DDIO. It is likely that some of AMD's processors are also a good fit for Gatekeeper, but there is no information on them yet.
Since Gatekeeper relies on DPDK, one can only use NICs that DPDK supports. Moreover, NICs must support multiple queues for receiving and transmitting packets and support Receive Side Scaling (RSS). Without these essential features, Gatekeeper would spend most of its time on lock contention instead of pushing packets forward.
For the best performance, NICs should support the following features in hardware:
- L2 Ethertype Filters: these filters identify packets by their L2 Ethertype and assign them to receive queues;
- N-tuple filter: these filters identify packets by specific sets of L3/L4 fields and assign them to receive queues;
- Hardware offloading for IP and UDP headers checksum.
There is no requirement for the front and back NICs of a Gatekeeper server to be of the same speed. If the expected clean traffic is less than 1Gbps, it would perfectly fine to have a 10Gbps NIC as the front NIC to stand attacks and a 1Gbps NIC as the back NIC. NICs of 1Gbps (Intel X550), 10Gbps (Intel X520), 40Gbps (Intel XL710), and 100Gbps (Intel 810) have been deployed in production.
More than half of the RAM of a Gatekeeper server is dedicated to the flow table. In large deployments, this percentage may go much higher, like 90%. The reason for that is that the flow table is what enables Gatekeeper to fine-grained filter multi-vector attacks in seconds without human intervention. So it is advisable to err on the side of excess flow table. Gatekeeper defines a flow as the pair of source and destination IP addresses. A Gatekeeper server with 256GB of RAM supports more than 1 billion flows. Gatekeeper servers scale horizontally, thus, two Gatekeeper servers each with 256GB of RAM support more than 2 billion flows.
While there is nothing holding one from going lower than 256GB of RAM, this is a recommended amount on deployments in which the deployer cannot measure or estimate its flow demand. If multiple Gatekeeper servers are being deployed, this amount is likely safe, but if a single Gatekeeper is being deployed, the deployer may want to go higher, like 512GB.
Grantor servers are much less demanding. Their memory essentially depends on how much information the policy holds in memory. Once a ballpark of this memory is obtained, one should multiply this number by the number of instances of the GT block plus one. For example, a policy takes 1GB of memory, and 2 GT instances are in use: 1GB * (2 + 1) = 3GB.
The only demand that Gatekeeper servers put on disks is logging, and, in general, it is a low demand. Grantor servers add to that demand the need to maintain threat intelligence to feed the policy, but this is typically small as well. Thus, small but fast disks suffice.