Decouple BTB and BP #613

pavelkryukov · 2018-10-07T20:46:04Z

Current implementation mixes two entities: BP (which predicts taken/not taken) and BTB (branch target buffer, predicts address of the branch).

There are 3 types of branches:

Direct unconditional branches (j, jal) — always taken by definition, target is known at ID stage,
Direct conditional branches (branches) — need to be predicted, target is known at ID stage.
Indirect unconditional branches (jr, jalr) — always taken by definition, target is known at MEM stage.

Only 2nd need T/NT branch prediction, as others are taken by definition.

All of three need BTB to predict target, and penalty for BTB mispredict is small for first two types, and higher for the last type.

The idea is to leave current BP modes only as BP modes (taken/not taken) and create a separate buffer for branch targets. It should be made as a separate cache structure with interfaces similar to the BP.

The road map goes as following:

Separate BTB and put it inside BP class, separate BTB entry out of BP entry as well. Unit tests should not be affected, as BP shall have the same interfaces.
Separate BTB and BP tests.
Now, extract BTB to a new class and intergrate it to the pipeline
Tune configurations (some branches have to update BP, some does not).

pavelkryukov · 2019-03-19T21:41:28Z

@yanlo In my opinion, it would be natural for you to continue by doing this. In the end, you'll have understanding of all BPU pipeline and hopefully describe it on Wiki.

YanLogovskiy · 2019-03-25T10:55:32Z

Hi, I am back! There is what I've found out:

Separate BTB and put it inside BP class, separate BTB entry out of BP entry as well. Unit tests should not be affected, as BP shall have the same interfaces.

We need to create btbentry.h file where methods that maintain target prediction will be described.
In BP class leave methods corresponding to target in BP class but move them to a separate module.

What concerns virtual void update( const BPInterface& bp_upd) method. Now it is used for updating both BP and BTB prediction information. So, it will be logical to create separate BPInterface and BTBInterface and corresponding update methods.

pavelkryukov · 2019-03-25T11:04:31Z

My strategy would be to separate things by doing a lot of small steps.

For instance, to separate BTB and BP entry I would start by morphing this:

template<typename T> // T is a joined BTB and BP entry
class BP final: public BaseBP
{
    std::vector<std::vector<T>> data;
    CacheTagArray tags;

to this:

template<typename T> // T is BP entry
class BP final: public BaseBP
{
    struct BTBEntry {
        bool valid;
        Addr target;
    };
    struct Entry {
        T direction;
        BTBEntry target;
    };
    std::vector<std::vector<Entry>> data;
    CacheTagArray tags;

Then you'll naturaly fix all the BP methods to operate with new arrays, and it would simplify the next steps.

YanLogovskiy · 2019-03-26T10:09:35Z

Could you please explain me the general purpose of Entries?

As I understand:

In bpu.cpp, bpu.h we have cache structure implementation.
bp_interface.h corresponds to communication with other modules using ports.
bpentry.h describes different types of predictors, but why should it be separated from cache structure implementation?

I see in bpu.cpp mapping:

    static Map generate_map() {
        Map my_map;
        my_map.emplace("always_taken", std::make_unique<BPCreator<BPEntryAlwaysTaken>>());
        my_map.emplace("always_not_taken", std::make_unique<BPCreator<BPEntryAlwaysNotTaken>>());
        my_map.emplace("backward_jumps", std::make_unique<BPCreator<BPEntryBackwardJumps>>());
        my_map.emplace("saturating_one_bit", std::make_unique<BPCreator<BPEntryOneBit>>());
        my_map.emplace("saturating_two_bits", std::make_unique<BPCreator<BPEntryTwoBit>>());
        my_map.emplace("adaptive_two_levels", std::make_unique<BPCreator<BPEntryAdaptive<2>>>());
        return my_map;
    }

Why do we need to map here?

YanLogovskiy · 2019-03-26T10:11:36Z

I've read wiki where it is written:

We just create and serve the cache holding BP entries, passing all the requests to corresponding entries (thus the real logic of operation is comprised within BP entries, not the BP class itself).

But still it doesn't help me

pavelkryukov · 2019-03-26T10:16:28Z

why should it be separated from cache structure implementation?

To deploy things separately. When you update the cache implementation, you do not care about what that cache holds (data, tags, simple BP entry, complicated BP entry etc.). In opposite, if you wish to add a new branch prediction algorithm — and you did that once — you do not have change cache structure.

If it was done otherwise, code would be extremely tangled, hardly testable and maintainable.

Why do we need to map here?

That's an example of Abstract Factory pattern. We need it to hide the internals of BP implementations from its users — they just have to pass a string, and factory creates an instance of BP cache holding the entries for the particular branch prediction mode.

pavelkryukov · 2019-03-27T12:52:48Z

Great. Now you can remove target variable from BPEntry, so all targets should be fetched from BTEntry.

YanLogovskiy · 2019-03-28T18:57:08Z

As I see direction field in Entry structure has type T, which allows functions fetch direction and target both regardless to the field BTBEntry target.

What type T is used while creating BP class object?
Should we change this type in the future?

pavelkryukov · 2019-03-29T08:03:32Z

template<typename T> // T is BP entry
class BP final: public BaseBP
{

T is a name for template parameter. Please check what we use as the actual template parameters for BP.

YanLogovskiy · 2019-03-29T08:14:38Z

Yes, I see. As I understand I must see something like BP<actual_type> when we actually use BP class object. But I can't find this in repository.

pavelkryukov · 2019-03-29T08:37:57Z

Here it is

mipt-mips/simulator/modules/fetch/bpu/bpu.cpp

Lines 112 to 137 in 7912021

    
           template<typename T> 
        
           struct BPCreator : BaseBPCreator { 
        
               std::unique_ptr<BaseBP> create(uint32 size_in_entries, uint32 ways, 
        
                                              uint32 branch_ip_size_in_bits) const final 
        
               { 
        
                   return std::make_unique<BP<T>>( size_in_entries, 
        
                                                   ways, 
        
                                                   branch_ip_size_in_bits); 
        
               } 
        
               BPCreator() = default; 
        
           }; 
        
           using Map = std::map<std::string, std::unique_ptr<BaseBPCreator>>; 
        
           const Map map; 
        
           // Use old-fashioned generation since initializer-lists don't work with unique_ptrs 
        
           static Map generate_map() { 
        
               Map my_map; 
        
               my_map.emplace("always_taken", std::make_unique<BPCreator<BPEntryAlwaysTaken>>()); 
        
               my_map.emplace("always_not_taken", std::make_unique<BPCreator<BPEntryAlwaysNotTaken>>()); 
        
               my_map.emplace("backward_jumps", std::make_unique<BPCreator<BPEntryBackwardJumps>>()); 
        
               my_map.emplace("saturating_one_bit", std::make_unique<BPCreator<BPEntryOneBit>>()); 
        
               my_map.emplace("saturating_two_bits", std::make_unique<BPCreator<BPEntryTwoBit>>()); 
        
               my_map.emplace("adaptive_two_levels", std::make_unique<BPCreator<BPEntryAdaptive<2>>>()); 
        
               return my_map; 
        
           }

pavelkryukov mentioned this issue Oct 8, 2018

Add unit tests for all BP modes #506

Closed

pavelkryukov added 4 Features of medium complexity which usually require infrastructure enhancements. and removed 5 Same as 4, but requires good understanding of CPU microarchitecture. labels Mar 4, 2019

This was referenced Mar 4, 2019

Introduce "short" pipeline flush #91

Closed

Implement delayed branches in PerfSim #639

Open

YanLogovskiy assigned YanLogovskiy and unassigned YanLogovskiy Mar 20, 2019

YanLogovskiy removed their assignment Mar 31, 2019

pavelkryukov self-assigned this Apr 1, 2019

pavelkryukov added 0 This task has the owner who does not participate in scoring system. and removed 4 Features of medium complexity which usually require infrastructure enhancements. labels Apr 1, 2019

pavelkryukov mentioned this issue Apr 1, 2019

Decouple BTB and BP, part 1 #944

Merged

pavelkryukov closed this as completed in #944 Apr 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decouple BTB and BP #613

Decouple BTB and BP #613

pavelkryukov commented Oct 7, 2018 •

edited

Loading

pavelkryukov commented Mar 19, 2019

YanLogovskiy commented Mar 25, 2019

pavelkryukov commented Mar 25, 2019

YanLogovskiy commented Mar 26, 2019

YanLogovskiy commented Mar 26, 2019

pavelkryukov commented Mar 26, 2019

pavelkryukov commented Mar 27, 2019

YanLogovskiy commented Mar 28, 2019

pavelkryukov commented Mar 29, 2019

YanLogovskiy commented Mar 29, 2019

pavelkryukov commented Mar 29, 2019

Decouple BTB and BP #613

Decouple BTB and BP #613

Comments

pavelkryukov commented Oct 7, 2018 • edited Loading

pavelkryukov commented Mar 19, 2019

YanLogovskiy commented Mar 25, 2019

pavelkryukov commented Mar 25, 2019

YanLogovskiy commented Mar 26, 2019

YanLogovskiy commented Mar 26, 2019

pavelkryukov commented Mar 26, 2019

pavelkryukov commented Mar 27, 2019

YanLogovskiy commented Mar 28, 2019

pavelkryukov commented Mar 29, 2019

YanLogovskiy commented Mar 29, 2019

pavelkryukov commented Mar 29, 2019

pavelkryukov commented Oct 7, 2018 •

edited

Loading