Skip to content

Decouple BTB and BP #613

Closed
pavelkryukov opened this issue Oct 7, 2018 · 11 comments · Fixed by #944
Closed

Decouple BTB and BP #613

pavelkryukov opened this issue Oct 7, 2018 · 11 comments · Fixed by #944
Assignees
Labels
0 This task has the owner who does not participate in scoring system. enhancement Adds a new feature to simulation. S1 — Branch prediction To solve the issue, you need knowledge about branch prediction

Comments

@pavelkryukov
Copy link
Member

pavelkryukov commented Oct 7, 2018

Current implementation mixes two entities: BP (which predicts taken/not taken) and BTB (branch target buffer, predicts address of the branch).

There are 3 types of branches:

  • Direct unconditional branches (j, jal) — always taken by definition, target is known at ID stage,
  • Direct conditional branches (branches) — need to be predicted, target is known at ID stage.
  • Indirect unconditional branches (jr, jalr) — always taken by definition, target is known at MEM stage.

Only 2nd need T/NT branch prediction, as others are taken by definition.

All of three need BTB to predict target, and penalty for BTB mispredict is small for first two types, and higher for the last type.

The idea is to leave current BP modes only as BP modes (taken/not taken) and create a separate buffer for branch targets. It should be made as a separate cache structure with interfaces similar to the BP.

The road map goes as following:

  1. Separate BTB and put it inside BP class, separate BTB entry out of BP entry as well. Unit tests should not be affected, as BP shall have the same interfaces.
  2. Separate BTB and BP tests.
  3. Now, extract BTB to a new class and intergrate it to the pipeline
  4. Tune configurations (some branches have to update BP, some does not).
@pavelkryukov pavelkryukov added enhancement Adds a new feature to simulation. 5 Same as 4, but requires good understanding of CPU microarchitecture. S1 — ISA To solve the issue, you need knowledge about MIPS or RISC-V ISA S1 — Branch prediction To solve the issue, you need knowledge about branch prediction 4 Features of medium complexity which usually require infrastructure enhancements. and removed S1 — ISA To solve the issue, you need knowledge about MIPS or RISC-V ISA 5 Same as 4, but requires good understanding of CPU microarchitecture. 4 Features of medium complexity which usually require infrastructure enhancements. labels Oct 7, 2018
@pavelkryukov pavelkryukov added 4 Features of medium complexity which usually require infrastructure enhancements. and removed 5 Same as 4, but requires good understanding of CPU microarchitecture. labels Mar 4, 2019
@pavelkryukov
Copy link
Member Author

@yanlo In my opinion, it would be natural for you to continue by doing this. In the end, you'll have understanding of all BPU pipeline and hopefully describe it on Wiki.

@YanLogovskiy
Copy link
Contributor

Hi, I am back! There is what I've found out:

  1. Separate BTB and put it inside BP class, separate BTB entry out of BP entry as well. Unit tests should not be affected, as BP shall have the same interfaces.
  1. We need to create btbentry.h file where methods that maintain target prediction will be described.

  2. In BP class leave methods corresponding to target in BP class but move them to a separate module.

What concerns virtual void update( const BPInterface& bp_upd) method. Now it is used for updating both BP and BTB prediction information. So, it will be logical to create separate BPInterface and BTBInterface and corresponding update methods.

@pavelkryukov
Copy link
Member Author

My strategy would be to separate things by doing a lot of small steps.

For instance, to separate BTB and BP entry I would start by morphing this:

template<typename T> // T is a joined BTB and BP entry
class BP final: public BaseBP
{
    std::vector<std::vector<T>> data;
    CacheTagArray tags;

to this:

template<typename T> // T is BP entry
class BP final: public BaseBP
{
    struct BTBEntry {
        bool valid;
        Addr target;
    };
    struct Entry {
        T direction;
        BTBEntry target;
    };
    std::vector<std::vector<Entry>> data;
    CacheTagArray tags;

Then you'll naturaly fix all the BP methods to operate with new arrays, and it would simplify the next steps.

@YanLogovskiy
Copy link
Contributor

Could you please explain me the general purpose of Entries?

As I understand:

  1. In bpu.cpp, bpu.h we have cache structure implementation.
  2. bp_interface.h corresponds to communication with other modules using ports.
  3. bpentry.h describes different types of predictors, but why should it be separated from cache structure implementation?

I see in bpu.cpp mapping:

    static Map generate_map() {
        Map my_map;
        my_map.emplace("always_taken", std::make_unique<BPCreator<BPEntryAlwaysTaken>>());
        my_map.emplace("always_not_taken", std::make_unique<BPCreator<BPEntryAlwaysNotTaken>>());
        my_map.emplace("backward_jumps", std::make_unique<BPCreator<BPEntryBackwardJumps>>());
        my_map.emplace("saturating_one_bit", std::make_unique<BPCreator<BPEntryOneBit>>());
        my_map.emplace("saturating_two_bits", std::make_unique<BPCreator<BPEntryTwoBit>>());
        my_map.emplace("adaptive_two_levels", std::make_unique<BPCreator<BPEntryAdaptive<2>>>());
        return my_map;
    }

Why do we need to map here?

@YanLogovskiy
Copy link
Contributor

I've read wiki where it is written:

We just create and serve the cache holding BP entries, passing all the requests to corresponding entries (thus the real logic of operation is comprised within BP entries, not the BP class itself).

But still it doesn't help me

@pavelkryukov
Copy link
Member Author

why should it be separated from cache structure implementation?

To deploy things separately. When you update the cache implementation, you do not care about what that cache holds (data, tags, simple BP entry, complicated BP entry etc.). In opposite, if you wish to add a new branch prediction algorithm — and you did that once — you do not have change cache structure.

If it was done otherwise, code would be extremely tangled, hardly testable and maintainable.

Why do we need to map here?

That's an example of Abstract Factory pattern. We need it to hide the internals of BP implementations from its users — they just have to pass a string, and factory creates an instance of BP cache holding the entries for the particular branch prediction mode.

@pavelkryukov
Copy link
Member Author

Great. Now you can remove target variable from BPEntry, so all targets should be fetched from BTEntry.

@YanLogovskiy
Copy link
Contributor

As I see direction field in Entry structure has type T, which allows functions fetch direction and target both regardless to the field BTBEntry target.

What type T is used while creating BP class object?
Should we change this type in the future?

@pavelkryukov
Copy link
Member Author

template<typename T> // T is BP entry
class BP final: public BaseBP
{

T is a name for template parameter. Please check what we use as the actual template parameters for BP.

@YanLogovskiy
Copy link
Contributor

Yes, I see. As I understand I must see something like BP<actual_type> when we actually use BP class object. But I can't find this in repository.

@pavelkryukov
Copy link
Member Author

Here it is

template<typename T>
struct BPCreator : BaseBPCreator {
std::unique_ptr<BaseBP> create(uint32 size_in_entries, uint32 ways,
uint32 branch_ip_size_in_bits) const final
{
return std::make_unique<BP<T>>( size_in_entries,
ways,
branch_ip_size_in_bits);
}
BPCreator() = default;
};
using Map = std::map<std::string, std::unique_ptr<BaseBPCreator>>;
const Map map;
// Use old-fashioned generation since initializer-lists don't work with unique_ptrs
static Map generate_map() {
Map my_map;
my_map.emplace("always_taken", std::make_unique<BPCreator<BPEntryAlwaysTaken>>());
my_map.emplace("always_not_taken", std::make_unique<BPCreator<BPEntryAlwaysNotTaken>>());
my_map.emplace("backward_jumps", std::make_unique<BPCreator<BPEntryBackwardJumps>>());
my_map.emplace("saturating_one_bit", std::make_unique<BPCreator<BPEntryOneBit>>());
my_map.emplace("saturating_two_bits", std::make_unique<BPCreator<BPEntryTwoBit>>());
my_map.emplace("adaptive_two_levels", std::make_unique<BPCreator<BPEntryAdaptive<2>>>());
return my_map;
}

@YanLogovskiy YanLogovskiy removed their assignment Mar 31, 2019
@pavelkryukov pavelkryukov self-assigned this Apr 1, 2019
@pavelkryukov pavelkryukov added 0 This task has the owner who does not participate in scoring system. and removed 4 Features of medium complexity which usually require infrastructure enhancements. labels Apr 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
0 This task has the owner who does not participate in scoring system. enhancement Adds a new feature to simulation. S1 — Branch prediction To solve the issue, you need knowledge about branch prediction
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants