-
Notifications
You must be signed in to change notification settings - Fork 141
BPU model
This page is maintained by George Korepanov
The Branch Prediction Unit is a digital circuit that tries to guess which way a branch (e.g. an if-then-else structure) will go before this is known for sure, thus enabling the fetch unit to continue fetching instructions without stalling the pipeline.
For more detailed description please refer to the latest BPU lecture.
The operation of this unit is simulated inside BP
class.
The BaseBP
class provides very common interface of BPU as follows:
class BaseBP
{
virtual bool is_taken( Addr PC) = 0;
virtual Addr get_target( Addr PC) = 0;
virtual void update( bool is_taken,
Addr branch_ip,
Addr target) = 0;
}
- The first two methods are called by fetch stage, whenever it checks whether the next instruction is a branch to be taken. It uses
get_target()
to determine the IP to be fetched next cycle, andis_taken
to store BPU prediction so that later memory stage could check if misprediction took place. - As mentioned, on memory stage the BPU prediction is compared with condition calculated by ALU and in case of misprediction, updates BPU information by calling
update
method.
Lookups the BP and if an entry holds info about requested IP, retrieves that info from it. Otherwise we have nothing else but return false
(if we return true
, we won't be able to return the target as well).
Lookups the BP entry and if it is TAKEN
, returns predicted target. If there is no entry or it is predicted as NOT TAKEN
, it returns PC + 4
.
Touches the existing BP entry in cache or creates a new one if it is not present (in the latter case we also have to reset the new entry with target to make it valid).
The implementation of described interface is very straight-forward. We just create and serve the cache holding BP entries, passing all the requests to corresponding entries (thus the real logic of operation is comprised within BP entries, not the BP class itself). In hardware BPU has it's own independent cache (BTB) for storing data, but in simulator we are free to reuse existing generic CacheTagArray
implementation:
template<typename T>
class BP : public BaseBP
{
std::vector<std::vector<T>> data;
CacheTagArray tags;
BP class is templated with a type of different entries. To create different modes of BP, you may use BPFactory class:
class BPFactory {
BaseBP* create( std::string mode,
uint32 size,
uint32 ways,
uint32 branch_ip_size_in_bits = 32);
Please, describe configuration variables here
All of implemented predictors store IP's (instruction pointers) of branches (both conditional & unconditional) to be able to retrieve address of next instruction without decoding the current one. Thus the basic class BP
implements common BTB (branch target buffer, practically the cache keeping IP's), encapsulating all the differences of implementation of specific predictor inside BPentry
.
Static predictor simply predicts that jump will always be taken:
Please, do not put C++ code here. It may be clear for you to read the code, but not for somebody else
In this case — what value does this code add? What is BPEntryAlwaysTaken
and what is BPEntryStatic
? They are internals which shouldn't be visible to a person who wants to try different branch predictors
class BPEntryAlwaysTaken final : public BPEntryStatic
{
public:
bool is_taken( Addr /* unused */) const { return true; }
};
You may wonder, why static prediction actually works? The answer is simple. For example, loops are usually created via backward jumps, like
li $t0, 10 # t0 is a constant 10
li $t1, 0 # t1 is our counter (i)
loop:
beq $t1, $t0, end # if t1 == 10 wear done
addi $t1, $t1, 1 # add 1 to t1
... # loop body
j loop # jump back to the top
end:
In given example fetch stage after fetching j loop
instruction does not know which instrcution to fetch next. However, when we enable static predictor, it predicts that jump will be taken and will also provide next IP (IP of loop
label). And next cycle the right beq $t1, $t0, end
instruction will be fetched.
Above example also illustrates the flaw of plain static predictor: each time the beq $t1, $t0, end
instruction is fetched, the predictor will wrongly tell that the branch will be taken and thus the pipeline will be flushed each time the loop is executed.
Please, do not put C++ code here. It may be clear for you to read the code, but not for somebody else The simple solution is to presume that backward branches will be taken and that forward branches will not:
class BPEntryBackwardJumps final : public BPEntryStatic
{
public:
/* prediction */
bool is_taken( Addr PC) const { return _target < PC ; }
};
You may easily see that in the given loop example the "smart" static preictor would usually predict right direction.
Static prediction is the simplest branch prediction technique because it does not rely on information about the dynamic history of code executing. Instead it predicts the outcome of a branch based solely on the branch instruction.
When code structure is more complicated (e.g. nested if-then-else
blocks within loops), the static predictor makes too many mistakes. The more intelligent solution is to gather and keep statistics at run-time. The simplest variant is to remember the last outcome of the branch, and presume that next time the direction would be the same:
Please, do not put C++ code here. It may be clear for you to read the code, but not for somebody else
private:
enum class State
{
NT = 0,
T = 1
};
public:
bool is_taken( Addr /* unused */) const { return state == State::T; }
void update( bool is_taken, Addr target)
{
if ( is_taken && _target != target) {
/* if the address has somehow changed, we should update it appropriately */
reset();
update_target(target);
}
state = static_cast<State>( is_taken);
}
Please, describe the behavior here. You may add a picture from the slides.
MIPT-V / MIPT-MIPS — Cycle-accurate pre-silicon simulation.