-
Notifications
You must be signed in to change notification settings - Fork 7
What is a GPU?
Previous: What is Parallel Computing?
The GPU, or Graphics Processing Unit, was initially intended to be a secondary device that the CPU could offload the increasingly complex task of rendering (representing 3D meshes in one or more 2D images) onto. Because of this, GPUs had to be designed to perform many calculations very quickly in order to keep up with the stream of images needing to be displayed. This was accomplished by making the GPU highly parallel, allowing for multiple parts of an image to be rendered at once and thus reducing the total amount of time it took to render an image compared to the CPU.
To get an idea of the vast difference in computing that GPUs brought, check out this awesome visual demonstration by Mythbusters duo Adam Savage and Jamie Hyneman.
Early on, GPUs were highly specialized for rendering, meaning it was difficult, if not impossible, to harness their capabilities for other compute tasks. Eventually GPU architectures were modified slightly and GPU programming languages were developed that enabled GPUs to be used for a more generic workload, allowing them to be used for complex, time-consuming tasks such as machine learning and physics simulations, among many more.
Before getting to how GPUs work, it is handy to know how a CPU works for comparison, because both use the same types of internal components and largely follow the same processing pattern. At a high level, a typical CPU consists of 4 main components, shown on the right:
- The control unit, a microprocessor within the CPU that determines what the other components in the CPU should do based on the instructions it receives
- The Arithmetic Logic Unit (ALU) which is responsible for carrying out the operations done by the CPU
- Registers which store the operands and results of each ALU computation
- Cache memory which stores data on the chip for high-speed access after it has initially been loaded from slower, but much larger Random Access Memory (RAM), also called main memory.
The typical CPU processing cycle looks something like this:
- The CPU receives an instruction
- The instruction is decoded by the control unit
- The control unit retrieves the necessary data, first checking the cache and then loading it from slower memory if it is not in the cache.
- The data is loaded into the appropriate registers and the ALU is signaled to run.
- Output is retrieved from the appropriate ALU output register and either stored in cache or routed to the appropriate destination outside the CPU.
This process, known as the fetch-decode-execute cycle, is repeated hundreds of thousands of times per second and is what makes up modern day computing.
GPUs also have 4 main component types, each of which has a CPU counterpart:
- Execution cores are the GPU counterpart of the ALU. The main difference is that the GPU holds a large number of cores so that it can execute many operations in parallel rather than doing them sequentially as the CPU does.
- Streaming Multiprocessors(SMs) are the GPU solution for controlling groups of execution cores. Just as the CPU control unit directs the ALU, each SM directs a group of execution cores and handles routing of data and operation signaling for all of the cores within its group. This group is uncreatively referred to as an SM group. In a standard GPU there will be multiple SM groups which can all run independently of one another.
- Register banks which store operands and output each execution core and are not shared between cores.
- Cache memory which stores frequently used values and can be accessed by every core in the SM group. Additionally, the GPU has a larger (but slower) device-wide memory bank that is shared by all of the SM groups.
As we will learn later, GPU instructions include the number of cores intended to carry out that instruction, often split up between multiple SM groups. Each SM group receives the instruction and number of cores to run the instruction on, and from there the same fetch-decode-execute cycle used by the CPU occurs, just at a higher degree of parallelism: data is routed to each core's registers from memory, the cores are all signaled to perform the operation, and the results are stored either in the SM group's cache or in shared memory.
In the next article, we'll look at what GPU execution looks like on the software side by digging into the GPU programming model and some basic CUDA syntax.