Skip to content

BPU model

Pavel I. Kryukov edited this page Oct 1, 2017 · 17 revisions

The Branch Prediction Unit is a digital circuit that tries to guess which way a branch (e.g. an if-then-else structure) will go before this is known for sure, thus enabling the fetch unit to continue fetching instructions without stalling the pipeline.

For more detailed description please refer to the lectures.

Implementation

The operation of this unit is simulated inside BP class.

Interface

The BaseBP class provides very common interface of BPU as follows:

class BaseBP
{
    virtual bool is_taken( Addr PC) = 0;
    virtual Addr get_target( Addr PC) = 0;
    virtual void update( bool is_taken,
                         Addr branch_ip,
                         Addr target) = 0;
}

bool is_taken( Addr PC)

Lookups the BP and if an entry holds info about requested IP, retrieves that info from it. Otherwise we have nothing else but return false (if we return true, we won't be able to return the target as well).

Addr get_target( Addr PC)

Lookups the BP entry and if it is TAKEN, returns predicted target. If there is no entry or it is predicted as NOT TAKEN, it returns PC + 4.

void update(bool is_taken, Addr branch_ip, Addr target)

Touches the existing BP entry in cache or creates a new one if it is not present (in the latter case we also have to reset the new entry with target to make it valid).

Internals

The implementation of described interface is very straight-forward. We just create and serve the cache holding BP entries, passing all the requests to corresponding entries (thus the real logic of operation is comprised within BP entries, not the BP class itself). In hardware BPU has it's own independent cache (BTB) for storing data, but in simulator we are free to reuse existing generic CacheTagArray implementation:

template<typename T>
class BP final: public BaseBP
{
    std::vector<std::vector<T>> data;
    CacheTagArray tags;

Constructor

BP class is a templated with a type of different entries. To create different

Examples

The idea is to start with a very simple model, developing it into more mature one. The simplest examples are:

  • Static predictor (tells that branch is always taken)
  • "Smart" static predictor (backward branches are always taken, while forward ones are never taken)
  • Dynamic predictor (keep statistics and remember last variant)
  • Saturating predictor (the same as previous, but also keeps level of confidence)
  • A lof of more complicated ones
Any of above predictors have to remember the IP's (instruction pointers) of branches (both conditional & unconditional) to be able to retrieve address of next instruction without decoding the current one.

Thus the basic class BP implements common BTB (branch target buffer, practically the cache keeping IP's), encapsulating all the differences of implementation of specific predictor inside BPentry.

  • The first two methods are called by fetch stage, whenever it checks whether the next instruction is a branch to be taken. It uses get_target() to determine the IP to be fetched next cycle, and is_taken to store BPU prediction so that later memory stage could check if misprediction took place.
  • As mentioned, on memory stage the BPU prediction is compared with condition calculated by ALU and in case of misprediction, updates BPU information by calling update method.

BP entry implementation

The BP entries do actually contain the main predictor's logic. Though as far as all preliminary actions are alresy taken by BP class, the code is very simple:

class BPEntry
{
protected:
    Addr _target = NO_VAL32;

public:
    Addr getTarget() const { return _target; }
    void reset( Addr PC = NO_VAL32) { _target = PC; }
};

The entry stores the target and should provide following methods, as described above:

  • bool is_taken( PC)
  • void update( bool is_taken, Addr target)

For example, for static backward-jumps predictor, we should return TAKEN only on backward jumps and update target to keep it valid:

class BPEntryStatic : public BPEntry
{
public:
    void update( bool is_taken, Addr target) {if (is_taken) _target = target; }
};


class BPEntryBackwardJumps final : public BPEntryStatic
{
public:
    /* prediction */
    bool is_taken( Addr PC) const { return _target < PC ; }
};

Please, look at bpu/bpentry.h to see how different types of basic predictors are implemented. Pay attention to how different types of predictors form hierarchy, and try to reuse the existing functionality in the same way when implementing the new types of predictors.

Next steps

What might be done to improve current state of affairs:

  • Split code into .h/.cpp
  • Implement new types of predictors
  • Study the perfomance gained from using different types of predictors (see next point as well)
  • Current set of traces does not reflect the BPU influence on IPC. Try to find/generate/compile (the latter is a hard one since some instructions might not be implemented in simulator yet) representative traces to evaluate perfomance gains
  • Switch to ports instead of fucntion calls (it's likely to be within more global changes than just BPU)
  • Think over how to make current code more consize and clear (the possible problems are: 1) superfluous verboseness of BP/BPFactory pair; 2) tangled implemention of connection between target and state inside predictor)
Clone this wiki locally