Implement delayed branches in Functional simulation #626

pavelkryukov · 2018-10-09T13:53:27Z

Delayed branch is one of the (obsolete) ways to fix problem of control hazards, but, however, it exists in MIPS and cannot be removed — there is a lot of binary files compatible with it.

The idea is to put an instruction independent from branch to be executed right after the branch. Then, in a case of branch misprediction, that instruction is not flushed from the pipeline:

Let's assume there is a following control flow:

if (branch)
   then instructions
else
   else instruction
independent instructions

Pipeline without delayed branches:

       0 1 2 3 4 5 6 7 8 9 10
branch F D E M W
then-1   F D E x
then-2     F D x
then-3       F x
else-1         F D E M W
independent-1    F D E M W
independent-2      F D E M W
       0 1 2 3 4 5 6 7 8 9 10

We save one cycle with a single branch delay slot:

       0 1 2 3 4 5 6 7 8 9 10
branch F D E M W
ind-1    F D E M W
then-1     F D x
then-2       F x
else-1         F D E M W
independent-2    F D E M W
       0 1 2 3 4 5 6 7 8 9 10

We save two cycles with two delay slots:

       0 1 2 3 4 5 6 7 8 9 10
branch F D E M W
ind-1    F D E M W
ind-2      F D E M W
then-1       F x
else-1         F D E M W
       0 1 2 3 4 5 6 7 8 9 10

How can we support branch delay slot in micro-architecture? There are straightforward steps:

Add configuration option for delay slots (0, 1, 2 or more). Currently we implement 0.
Build MIPS tests with delayed slots. Currently they are disable by .set noreorder directive in MIPS assembly files.
Implement delayed slots in FuncSim.

The text was updated successfully, but these errors were encountered:

agrachiv · 2018-11-28T15:47:41Z

It is not clear, where exactly i should add this option for delay slots.

pavelkryukov · 2018-11-28T15:51:16Z

I don't know either. I suggest to start with implementing it in FuncSim.

agrachiv · 2018-11-28T16:29:08Z

So i should add bool enabe_delayed_branches argument here?
Trap run(uint64 instrs_to_run, int enabled_delayed_slots) final;

Or in the constructor?
explicit FuncSim( bool log = false, int enabled_delayed_slots);

pavelkryukov · 2018-11-28T17:20:02Z

Usually constructor is preferrable, as you are able to look configuration in all class methods.

agrachiv · 2018-11-28T19:30:11Z

So I should add function, that replaces instructions in instruction list, when delayed slots option is turned on?

pavelkryukov · 2018-11-28T19:51:07Z

First of all, I advise to get sure you have the test trace which is failing without delayed branches.

that replaces instructions in instruction list, when delayed slots option is turned on?

What is the "instruction list"?

agrachiv · 2018-11-29T00:15:14Z

What is the "instruction list"?

List of instructions that run method need to proceed. I mean, changing instructions with each other shouldn't be the work of simulator, should it? In my vision order of the instructions should be changed in constructor, as it would be done before by compiler. Is my vision proper?

P.s: Or I have to work in model where simulator knows only instructions it already decoded? But then it is impossible to use delayed brunch method. On lecture we've been told that finding independent instructions is done with the compiler not with the pipeline. So is this issue's point is figuring out some algorithm to find independent instructions?

agrachiv · 2018-11-29T00:25:22Z

And is there any branch prediction functionality in functional simulator now?

pavelkryukov · 2018-11-29T07:27:54Z

And is there any branch prediction functionality in functional simulator now?

Yes, it is clearly stated in README file

List of instructions that run method need to proceed.

No, we do not have such a list. We fetch instructions one-by-one from memory, as the next instruction program counter is dependent on previous instruction result.

On lecture we've been told that finding independent instructions is done with the compiler

Right. But our simulator does not assume compiler may do this at the moment, so instead of this:

       0 1 2 3 4 5 6 7 8 9 10
branch F D E M W
ind-1    F D E M W
then-1     F D x
then-2       F x
else-1         F D E M W
independent-2    F D E M W
       0 1 2 3 4 5 6 7 8 9 10

we get this:

       0 1 2 3 4 5 6 7 8 9 10
branch F D E M W
ind-1    F D E x
then-1     F D x
then-2       F x
else-1         F D E M W
independent-2    F D E M W
       0 1 2 3 4 5 6 7 8 9 10

which is not what compiler wanted us to do.

So is this issue's point is figuring out some algorithm to find independent instructions?

The point is to execute 1 or 2 instruction after the branch independently from branch condition, i.e. the program counter change should be effective with a delay.

agrachiv · 2018-11-29T09:15:48Z

Build MIPS tests with delayed slots. Currently they are disable by .set noreorder directive in MIPS assembly files.

Sorry for stupid questions, but i can't find anything to understand what should i do here. There are no tests for delayed slots in traces directory and i don't understand what set noreorder directive is.

pavelkryukov · 2018-11-29T09:38:39Z

GitHub has a nice feature called "search":
https://github.com/MIPT-ILab/mips-traces/search?q=set+noreorder&unscoped_q=set+noreorder

By default, MIPS compiler/assembler assumes the CPU (or simulator) supports branch delays and therefore it finds independent instructions to fill the delay slot. Since MIPT-MIPS does not support it, we had to disable them by setting the set noreorder directive to assembler files. If this directive is removed, MIPS assembler will reorder instructions to fill the branch delay slot, and simulation will fail.

pavelkryukov · 2018-11-29T09:45:59Z

More detailed explanation how compiler/assembler works:

https://stackoverflow.com/questions/3807480/weird-mips-assembler-behavior-with-jump-and-link-instruction

agrachiv · 2018-11-29T10:24:49Z

GitHub has a nice feature called "search":

I was searching in this repository, so that i couldn't find anything, sorry.

So after I remove all .set noreorder detectives all the reorders I will be able to cover all the reorders by implementing delayed branches?

pavelkryukov · 2018-11-29T10:37:28Z

Actually, I never investigated deeply how assembler works and what brake in torture tests — I just disabled reordering and TT started to work,

Let's start with simpler test provided on SO;

.set noreorder
.section .text
.globl __start

__start:
   addi $a0, $0, 100
   addi $a1, $0, 200 
   jal test
   jr $zero # Required to halt simulation
test:
   add $v0, $a0, $a1
   jr $ra

If we build it with current flow:
https://github.com/MIPT-ILab/mips-traces/blob/0c8d51184a9918cdc80e68c6355deec07092fc5f/Makefile#L21-L27

we should get something like the following code (you can check it with mips-linux-gnu-objdump -EL -D -b test.out, not sure about exact options), which is perfectly executed on MIPT-MIPS at the moment.

Disassembly of section .text:

00000000 <__start>:
   0:   20040064    addi    a0,zero,100
   4:   200500c8    addi    a1,zero,200
   8:   0c000004    jal 10 <test>
   c:   00000000    nop
   10:   00000008    jr  zero
   14:   00000000    nop

00000010 <test>:
  18:   00851020    add v0,a0,a1
  1c:   03e00008    jr  ra
  20:   00000000    nop
  24:   00000000    nop

We need to get something like SO author had:

Disassembly of section .text:

00000000 <__start>:
   0:   20040064    addi    a0,zero,100
   4:   0c000003    jal c <test>    <--- Why is jal coming before addi?
   8:   200500c8    addi    a1,zero,200
   c:   00000008    jr  zero
   10: 00000000    nop

0000000c <test>:
   14:   03e00008    jr  ra  <--- Why is jr coming before add?
   18:   00851020    add v0,a0,a1
    ...

agrachiv · 2018-11-29T11:30:19Z

mips-linux-gnu-objdump -EL -D -b test.out

So in order to test it I need to use MIPS binutils or something else?

pavelkryukov · 2018-11-29T11:43:19Z

So in order to test it I need to use MIPS binutils or something else?

Yes, that's the only way to build ELF files from MIPS assembler.
You should have installed it to run simulation unit tests, right?

agrachiv · 2018-11-30T09:45:38Z

I guess I should copy of all tests with noreorder because both functional simulator and performance simulator use same tests, and paste new path here:

TEST_CASE( "Torture_Test: Func_Sim")
{
    CHECK_NOTHROW( get_simulator_with_test<MIPS32>( TEST_PATH "/tt.core.universal.out")->run_no_limit() );
    CHECK_NOTHROW( get_simulator_with_test<MIPS32>( TEST_PATH "/tt.core32.le.out")->run_no_limit() );
    CHECK_NOTHROW( get_simulator_with_test<MIPS64>( TEST_PATH "/tt.core64.le.out")->run_no_limit() );
}

pavelkryukov · 2018-11-30T09:50:18Z

If you want to run these tests with ctest, yes, but at the beginning you may run them just from shell with mipt-mips command.

agrachiv · 2018-11-30T11:08:48Z

That is how disassembly looks like:

004000d0 <__start>:
  4000d0:	20040064 	addi	a0,zero,100
  4000d4:	0c100039 	jal	4000e4 <test>
  4000d8:	200500c8 	addi	a1,zero,200
  4000dc:	00000008 	jr	zero
  4000e0:	00000000 	nop

004000e4 <test>:
  4000e4:	03e00008 	jr	ra
  4000e8:	00851020 	add	v0,a0,a1
  4000ec:	00000000 	nop

There are 2 instructions after the jump, and these 2 are always filled with nops or independent instructions, so simulation should work without any pipeline flushing or whatever, am I right?

pavelkryukov · 2018-11-30T11:16:06Z

We do not touch pipeline at the moment, we are talking only about functional simulation (single-cycle implementation, func_sim.cpp).

simulation should work without any pipeline flushing or whatever, am I right?

Please run functional simulation at your own and see what it does.

agrachiv · 2018-11-30T11:35:00Z

I've just understood that functional simulator is single-cycled, but then it can't face any branches hazards. So what does 'implementing delayed branches' stands for if we talk about single-cycled simulator?

pavelkryukov · 2018-11-30T11:42:08Z

Please run functional simulation at your own and see what it does.

Please, do this and share what instructions are executed by default (./mipt-mips -b test.out -d -f).

agrachiv · 2018-11-30T17:30:29Z

I see the problem. So as soon as simulator meets branch condition it should execute 2 more instructions before jumping?

pavelkryukov · 2018-11-30T17:33:10Z

Right, but we should optionally select 0, 1 or 2.

That's the problem with delayed branches method Oleg explained on the lecture — whereas it works well in pipelined microarchitecture, it still should be supported in other implementations, like single-cycle or more complicated.

agrachiv · 2018-11-30T17:49:41Z

That's the problem with delayed branches method Oleg explained on the lecture — whereas it works well in pipelined microarchitecture, it still should be supported in other implementations, like single-cycle or more complicated.

Why do we need to optionally select 0, 1 or 2 if compiler always put 2 instructions after the branch condition

pavelkryukov · 2018-11-30T17:53:34Z

compiler always put 2 instructions after the branch condition

So far I don't see 2 instructions case, I see always 1 — please explain if I'm wrong

Why do we need to optionally select 0, 1 or 2?

RISC-V has no branch delay option, thus we'll always choose 0 option
MARS and SPIM has an option to execute without branch delay — we need to have it as well
If compiler is instructed not to reorder instructions, we must have 0 (as we have at the moment).

pavelkryukov · 2018-11-30T17:57:56Z

And, finally, we won't have a branch delay slot in PerfSim for some time, so we have to have no slots to ensure both simulators behave similarly.

agrachiv · 2018-11-30T21:48:52Z

This delayed option slots should be one of the command line parameters ideally, or I should just add it into constructor parameter as you said?

pavelkryukov · 2018-11-30T22:30:18Z

Do as best as you can, we'll refactor that later.

agrachiv · 2018-12-02T13:47:36Z

Is alu.h related only to the functional simulator?

pavelkryukov · 2018-12-02T16:29:24Z

No.

pavelkryukov · 2018-12-20T14:57:27Z

Committing 1 point as something like a solution was proposed

pavelkryukov · 2019-02-27T14:26:28Z

Closing as duplicate of #737

pavelkryukov added enhancement Adds a new feature to simulation. 4 Features of medium complexity which usually require infrastructure enhancements. S1 — Branch prediction To solve the issue, you need knowledge about branch prediction labels Oct 9, 2018

pavelkryukov changed the title ~~Implement Delayed Branches~~ Implement delayed branches Oct 9, 2018

pavelkryukov added this to the Run DOOM2 on Nintendo64 model milestone Oct 11, 2018

pavelkryukov added 5 Same as 4, but requires good understanding of CPU microarchitecture. bug Fixes a bug or potential bug in simulation. and removed 4 Features of medium complexity which usually require infrastructure enhancements. enhancement Adds a new feature to simulation. labels Oct 12, 2018

pavelkryukov changed the title ~~Implement delayed branches~~ Implement delayed branches in Functional simulation Oct 12, 2018

pavelkryukov added 4 Features of medium complexity which usually require infrastructure enhancements. and removed 5 Same as 4, but requires good understanding of CPU microarchitecture. labels Oct 12, 2018

pavelkryukov mentioned this issue Oct 12, 2018

Implement delayed branches in PerfSim #639

Open

3 tasks

agrachiv self-assigned this Nov 28, 2018

pavelkryukov added 3 Features of medium complexity or infrastructure enhancements and removed 4 Features of medium complexity which usually require infrastructure enhancements. labels Dec 20, 2018

pavelkryukov pinned this issue Feb 6, 2019

pavelkryukov closed this as completed Feb 27, 2019

pavelkryukov unpinned this issue Feb 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement delayed branches in Functional simulation #626

Implement delayed branches in Functional simulation #626

pavelkryukov commented Oct 9, 2018 •

edited by agrachiv

Loading

agrachiv commented Nov 28, 2018

pavelkryukov commented Nov 28, 2018 •

edited

Loading

agrachiv commented Nov 28, 2018 •

edited

Loading

pavelkryukov commented Nov 28, 2018

agrachiv commented Nov 28, 2018

pavelkryukov commented Nov 28, 2018

agrachiv commented Nov 29, 2018 •

edited

Loading

agrachiv commented Nov 29, 2018

pavelkryukov commented Nov 29, 2018

agrachiv commented Nov 29, 2018

pavelkryukov commented Nov 29, 2018 •

edited

Loading

pavelkryukov commented Nov 29, 2018

agrachiv commented Nov 29, 2018 •

edited

Loading

pavelkryukov commented Nov 29, 2018 •

edited

Loading

agrachiv commented Nov 29, 2018 •

edited

Loading

pavelkryukov commented Nov 29, 2018

agrachiv commented Nov 30, 2018 •

edited

Loading

pavelkryukov commented Nov 30, 2018

agrachiv commented Nov 30, 2018 •

edited

Loading

pavelkryukov commented Nov 30, 2018

agrachiv commented Nov 30, 2018 •

edited

Loading

pavelkryukov commented Nov 30, 2018

agrachiv commented Nov 30, 2018

pavelkryukov commented Nov 30, 2018 •

edited

Loading

agrachiv commented Nov 30, 2018

pavelkryukov commented Nov 30, 2018

pavelkryukov commented Nov 30, 2018

agrachiv commented Nov 30, 2018 •

edited

Loading

pavelkryukov commented Nov 30, 2018

agrachiv commented Dec 2, 2018 •

edited

Loading

pavelkryukov commented Dec 2, 2018

pavelkryukov commented Dec 20, 2018

pavelkryukov commented Feb 27, 2019

Implement delayed branches in Functional simulation #626

Implement delayed branches in Functional simulation #626

Comments

pavelkryukov commented Oct 9, 2018 • edited by agrachiv Loading

agrachiv commented Nov 28, 2018

pavelkryukov commented Nov 28, 2018 • edited Loading

agrachiv commented Nov 28, 2018 • edited Loading

pavelkryukov commented Nov 28, 2018

agrachiv commented Nov 28, 2018

pavelkryukov commented Nov 28, 2018

agrachiv commented Nov 29, 2018 • edited Loading

agrachiv commented Nov 29, 2018

pavelkryukov commented Nov 29, 2018

agrachiv commented Nov 29, 2018

pavelkryukov commented Nov 29, 2018 • edited Loading

pavelkryukov commented Nov 29, 2018

agrachiv commented Nov 29, 2018 • edited Loading

pavelkryukov commented Nov 29, 2018 • edited Loading

agrachiv commented Nov 29, 2018 • edited Loading

pavelkryukov commented Nov 29, 2018

agrachiv commented Nov 30, 2018 • edited Loading

pavelkryukov commented Nov 30, 2018

agrachiv commented Nov 30, 2018 • edited Loading

pavelkryukov commented Nov 30, 2018

agrachiv commented Nov 30, 2018 • edited Loading

pavelkryukov commented Nov 30, 2018

agrachiv commented Nov 30, 2018

pavelkryukov commented Nov 30, 2018 • edited Loading

agrachiv commented Nov 30, 2018

pavelkryukov commented Nov 30, 2018

pavelkryukov commented Nov 30, 2018

agrachiv commented Nov 30, 2018 • edited Loading

pavelkryukov commented Nov 30, 2018

agrachiv commented Dec 2, 2018 • edited Loading

pavelkryukov commented Dec 2, 2018

pavelkryukov commented Dec 20, 2018

pavelkryukov commented Feb 27, 2019

pavelkryukov commented Oct 9, 2018 •

edited by agrachiv

Loading

pavelkryukov commented Nov 28, 2018 •

edited

Loading

agrachiv commented Nov 28, 2018 •

edited

Loading

agrachiv commented Nov 29, 2018 •

edited

Loading

pavelkryukov commented Nov 29, 2018 •

edited

Loading

agrachiv commented Nov 29, 2018 •

edited

Loading

pavelkryukov commented Nov 29, 2018 •

edited

Loading

agrachiv commented Nov 29, 2018 •

edited

Loading

agrachiv commented Nov 30, 2018 •

edited

Loading

agrachiv commented Nov 30, 2018 •

edited

Loading

agrachiv commented Nov 30, 2018 •

edited

Loading

pavelkryukov commented Nov 30, 2018 •

edited

Loading

agrachiv commented Nov 30, 2018 •

edited

Loading

agrachiv commented Dec 2, 2018 •

edited

Loading