Skip to content

BPU model

Pavel I. Kryukov edited this page Oct 1, 2017 · 17 revisions

This page is maintained by George Korepanov

Introduction

The Branch Prediction Unit is a digital circuit that tries to guess which way a branch (e.g. an if-then-else structure) will go before this is known for sure, thus enabling the fetch unit to continue fetching instructions without stalling the pipeline.

For more detailed description please refer to the lectures.

Implementation

The operation of this unit is simulated inside BP class.

Interface

The BaseBP class provides very common interface of BPU as follows:

class BaseBP
{
    virtual bool is_taken( Addr PC) = 0;
    virtual Addr get_target( Addr PC) = 0;
    virtual void update( bool is_taken,
                         Addr branch_ip,
                         Addr target) = 0;
}
  • The first two methods are called by fetch stage, whenever it checks whether the next instruction is a branch to be taken. It uses get_target() to determine the IP to be fetched next cycle, and is_taken to store BPU prediction so that later memory stage could check if misprediction took place.
  • As mentioned, on memory stage the BPU prediction is compared with condition calculated by ALU and in case of misprediction, updates BPU information by calling update method.

bool is_taken( Addr PC)

Lookups the BP and if an entry holds info about requested IP, retrieves that info from it. Otherwise we have nothing else but return false (if we return true, we won't be able to return the target as well).

Addr get_target( Addr PC)

Lookups the BP entry and if it is TAKEN, returns predicted target. If there is no entry or it is predicted as NOT TAKEN, it returns PC + 4.

void update(bool is_taken, Addr branch_ip, Addr target)

Touches the existing BP entry in cache or creates a new one if it is not present (in the latter case we also have to reset the new entry with target to make it valid).

Internals

The implementation of described interface is very straight-forward. We just create and serve the cache holding BP entries, passing all the requests to corresponding entries (thus the real logic of operation is comprised within BP entries, not the BP class itself). In hardware BPU has it's own independent cache (BTB) for storing data, but in simulator we are free to reuse existing generic CacheTagArray implementation:

template<typename T>
class BP final: public BaseBP
{
    std::vector<std::vector<T>> data;
    CacheTagArray tags;

Constructor

BP class is templated with a type of different entries. To create different modes of BP, you may use BPFactory class:

class BPFactory {
     BaseBP* create( std::string mode,
                     uint32 size,
                     uint32 ways,
                     uint32 branch_ip_size_in_bits = 32);

Tuning in command line

Please, describe configuration variables here

Predictor types

All of implemented predictors store IP's (instruction pointers) of branches (both conditional & unconditional) to be able to retrieve address of next instruction without decoding the current one. Thus the basic class BP implements common BTB (branch target buffer, practically the cache keeping IP's), encapsulating all the differences of implementation of specific predictor inside BPentry.

Static predictor

Please, describe the behavior here

"Smart" static predictor

Please, describe the behavior here

Dynamic predictor

Please, describe the behavior here

Saturating predictor

Please, describe the behavior here. You may add a picture from the slides.

Clone this wiki locally