Skip to content

BPU model

Pavel I. Kryukov edited this page Oct 2, 2017 · 17 revisions

This page is maintained by George Korepanov

Introduction

The Branch Prediction Unit is a digital circuit that tries to guess which way a branch (e.g. an if-then-else structure) will go before this is known for sure, thus enabling the fetch unit to continue fetching instructions without stalling the pipeline.

For more detailed description please refer to the latest BPU lecture.

Implementation

The operation of this unit is simulated inside BP class.

Interface

The BaseBP class provides very common interface of BPU as follows:

class BaseBP
{
    virtual bool is_taken( Addr PC) = 0;
    virtual Addr get_target( Addr PC) = 0;
    virtual void update( bool is_taken,
                         Addr branch_ip,
                         Addr target) = 0;
}
  • The first two methods are called by fetch stage, whenever it checks whether the next instruction is a branch to be taken. It uses get_target() to determine the IP to be fetched next cycle, and is_taken to store BPU prediction so that later memory stage could check if misprediction took place.
  • As mentioned, on memory stage the BPU prediction is compared with condition calculated by ALU and in case of misprediction, updates BPU information by calling update method.

bool is_taken( Addr PC)

Lookups the BP and if an entry holds info about requested IP, retrieves that info from it. Otherwise we have nothing else but return false (if we return true, we won't be able to return the target as well).

Addr get_target( Addr PC)

Lookups the BP entry and if it is TAKEN, returns predicted target. If there is no entry or it is predicted as NOT TAKEN, it returns PC + 4.

void update(bool is_taken, Addr branch_ip, Addr target)

Touches the existing BP entry in cache or creates a new one if it is not present (in the latter case we also have to reset the new entry with target to make it valid).

Internals

The implementation of described interface is very straight-forward. We just create and serve the cache holding BP entries, passing all the requests to corresponding entries (thus the real logic of operation is comprised within BP entries, not the BP class itself). In hardware BPU has it's own independent cache (BTB) for storing data, but in simulator we are free to reuse existing generic CacheTagArray implementation:

template<typename T>
class BP : public BaseBP
{
    std::vector<std::vector<T>> data;
    CacheTagArray tags;

Constructor

BP class is templated with a type of different entries. To create different modes of BP, you may use BPFactory class:

class BPFactory {
     BaseBP* create( std::string mode,
                     uint32 size,
                     uint32 ways,
                     uint32 branch_ip_size_in_bits = 32);

Tuning in command line

Please, describe configuration variables here

Predictor types

All of implemented predictors store IP's (instruction pointers) of branches (both conditional & unconditional) to be able to retrieve address of next instruction without decoding the current one. Thus the basic class BP implements common BTB (branch target buffer, practically the cache keeping IP's), encapsulating all the differences of implementation of specific predictor inside BPentry.

Static predictor

Static predictor simply predicts that jump will always be taken:

Please, do not put C++ code here. It may be clear for you to read the code, but not for somebody else. In this case — what value does this code add? What is BPEntryAlwaysTaken and what is BPEntryStatic? They are internals which shouldn't be visible to a person who wants to try different branch predictors

class BPEntryAlwaysTaken final : public BPEntryStatic
{
public:
    bool is_taken( Addr /* unused */) const { return true; }
};

You may wonder, why static prediction actually works? The answer is simple. For example, loops are usually created via backward jumps, like

li   $t0, 10         # t0 is a constant 10
li   $t1, 0          # t1 is our counter (i)
loop:
beq  $t1, $t0, end   # if t1 == 10 wear done
addi $t1, $t1, 1     # add 1 to t1
   ... # loop body
j loop               # jump back to the top
end:

In given example fetch stage after fetching j loop instruction does not know which instrcution to fetch next. However, when we enable static predictor, it predicts that jump will be taken and will also provide next IP (IP of loop label). And next cycle the right beq $t1, $t0, end instruction will be fetched.

"Smart" static predictor

Above example also illustrates the flaw of plain static predictor: each time the beq $t1, $t0, end instruction is fetched, the predictor will wrongly tell that the branch will be taken and thus the pipeline will be flushed each time the loop is executed.

Please, do not put C++ code here. It may be clear for you to read the code, but not for somebody else The simple solution is to presume that backward branches will be taken and that forward branches will not:

class BPEntryBackwardJumps final : public BPEntryStatic
{
public:
    /* prediction */
    bool is_taken( Addr PC) const { return _target < PC ; }
};

You may easily see that in the given loop example the "smart" static preictor would usually predict right direction.

Static prediction is the simplest branch prediction technique because it does not rely on information about the dynamic history of code executing. Instead it predicts the outcome of a branch based solely on the branch instruction.

Dynamic predictor

When code structure is more complicated (e.g. nested if-then-else blocks within loops), the static predictor makes too many mistakes. The more intelligent solution is to gather and keep statistics at run-time. The simplest variant is to remember the last outcome of the branch, and presume that next time the direction would be the same:

Please, do not put C++ code here. It may be clear for you to read the code, but not for somebody else

private:
    enum class State
    {
        NT = 0,
        T = 1
    };

public:
    bool is_taken( Addr /* unused */) const { return state == State::T; }
    void update( bool is_taken, Addr target)
    {
        if ( is_taken && _target != target) {
            /* if the address has somehow changed, we should update it appropriately */
            reset();
            update_target(target);
        }

        state = static_cast<State>( is_taken);
    }

Saturating predictor

Please, describe the behavior here. You may add a picture from the slides.

Clone this wiki locally