Skip to content

Fix factorial.out #45

Closed
pavelkryukov opened this issue Mar 10, 2017 · 21 comments · Fixed by #68
Closed

Fix factorial.out #45

pavelkryukov opened this issue Mar 10, 2017 · 21 comments · Fixed by #68
Assignees
Labels
1 Usually one-liner tasks, but may require some deep into infrastructure. bug Fixes a bug or potential bug in simulation.

Comments

@pavelkryukov
Copy link
Member

Problem with Factorial program.

To reproduce:

mipt-mips/tests/samples$ make build_all
mipt-mips/perf_sim$./perf_sim ../tests/samples/factorial.out 100 -d

Error message:

Executed instructions: 5

    fetch	cycle 15:  0x23bdfff4
    decode	cycle 15:  bubble
    execute	cycle 15:  or $t0, $v0, $zero	 [ $t0 = 0x4]
    memory	cycle 15:  syscall
ERROR: Writing to valid register!
@pavelkryukov pavelkryukov added 1 Usually one-liner tasks, but may require some deep into infrastructure. bug Fixes a bug or potential bug in simulation. labels Mar 10, 2017
@gkorepanov gkorepanov self-assigned this Apr 4, 2017
@gkorepanov
Copy link
Contributor

gkorepanov commented Apr 4, 2017

There are a few problems at once.

  1. Ideologically the syscall instruction depends somehow on the values of registers (first of all, of course, on v0. As we don't take it in account anyhow, we practically have two subsequent loads in asm code:

     execute        cycle 13:  addiu $v0, $zero, 0x5	 [ $v0 = 0x5]
     memory	        cycle 13:  syscall
     writeback	cycle 13:  addiu $v0, $zero, 0x4	 [ $v0 = 0x4]
    
  2. It leads us to the core: the code doesn't allow two subsequent writes to register to exist. yeah, in real code it would be useless, but nothing prohibits one two write two subsequent instructions.

  3. The last problem is that when the first adds instruction is executed, it makes the register to be "valid", which deludes the following dependant instruction (or $t0, $v0, $zero in that particular place), and prevents it from stalling despite the fact that there is another addition instruction pending.

@gkorepanov
Copy link
Contributor

gkorepanov commented Apr 4, 2017

Thus the solution might be as one of the following:

  1. Assuming that there can't be a situation of multiple instructions writing to the same register in a row, we can just make the syscall depend on all the registers it might use (or probably somehow emulate it's work...). Actually I don't understand at all why do we need `syscall§ in perfomance simulator. It's work is defined by OS kernel, the way it's is implemented, but not the ISA indeed.

@gkorepanov
Copy link
Contributor

  1. We might want not just to store the register validity (btw, the terminology is somewhat strange... Even the error message Writing to valid register! sounds completely odd), but to do some more complicated stuff. I have some ideas, but I don't know whether they are adequate from hardware point of view.

@pavelkryukov
Copy link
Member Author

Let's assume that syscall instruction is equivalent for nop for now. We will implement it according to MARS conventions later.

@pavelkryukov
Copy link
Member Author

pavelkryukov commented Apr 5, 2017

The problem is that instruction or $t0, $v0, $zero starts execution before instruction addiu $v0, $zero, 0x5 performed writeback with correct value of $v0.

@gkorepanov
Copy link
Contributor

That is pretty clear. I'm wondering which way should I choose to resolve the problem.

To clarify I get it right:

Now the only way of determining the RAW and WAR hazards is keeping the "state" of a register. When the write to register happen, it's considered to be valid. Then it becomes invalid after the subsequent read operation.

The scheme fails, when the reads and writes are not in that strict order, like in our case:

read    ^
nop     |
write   |
nop     |
write   |

Possible workarounds:

I haven't found any hardware implementation of plain stalling algorithm, everywhere only the bypassing unit is described (and more complex pipelining, e.g. with score-boarding).
So I came up with these:

  1. Evident, but senseless. The easiest way is just to stall the pipeline in case write goes after write, just as it happens with RAW hazard. Thus, the latter write instruction will be decoded only after the writeback of the former is finished, and the problem will go away:
read       ^
stalls...  |
write      |
stalls...  |
nop        |
write      |
  1. More adequate, more complicated. We might keep something like a table keeping number of cycles till writeback for each register:
Instruction Number of cycles till WB ^
read 0 |
stall 1 |
stall 2 |
stall 3 |
nop 4 |
write 5 |
nop 4 |
write 5 |

That solves a problem but introduces a kind of scoreboard just for almost nothing.
Also we can just create a special unit which will control somehow the process. That seems to be even greater overhead.

  1. Bypassing. Probably It's better for me not to stick with that minor problem, but just proceed to creating full-fledged forwarding and hazard detection units. Once implemented, they would make this problem go away, as in plain pipelines there are no problems with such hazards (they only might arise later after addition of non-unified pipeline, or probably out-of-order execution or something).

@pavelkryukov
Copy link
Member Author

Please try to merge this version: viktor-prutyanov@ec693c5, it looks better, maybe it solves all the issues.

@gkorepanov
Copy link
Contributor

Well, tried to run it, and unfortunately it failed. I will try to deal with it and find an error, maybe it is worth to find some small mistake.

(Now it falls into endless loop right from the beginning, so something very mere error is likely to occur)

@gkorepanov
Copy link
Contributor

Found an error. The variable is_stall was not initialised with false value and thus got arbitrary value leading to infinite loop. Unfortunately, running factorial.out fails with exactly the same error:

$ ./perf_sim  ../tests/samples/factorial.out 10
lui $t0, 0x41	 [ $t0 = 0x410000]
addiu $t0, $t0, 0x1d0	 [ $t0 = 0x4101d0]
lw $a0, 0x0($t0)	 [ $a0 = 0x4101dc]
addiu $v0, $zero, 0x4	 [ $v0 = 0x4]
syscall
Assertion failed: (array[(size_t)num].is_valid == false), function write, file ./rf.h, line 66.
Abort trap: 6

That was predictable though, as double write without reads among them should be specially handled in some of the ways.

@gkorepanov
Copy link
Contributor

But in general the code is somewhat clearer and more readable. Probably I should enhance it, taking best from different variants? Or I may try to look for other assignment 4 accomplishments.

@pavelkryukov
Copy link
Member Author

Unfortunately, running factorial.out fails with exactly the same error:

If valid was not a boolean variable, but integer, then each decoded register write would increase it, while each writeback would decrease — I think it solves the problem.

@pavelkryukov
Copy link
Member Author

I'll proceed with merge, please wait until I finish it

@pavelkryukov
Copy link
Member Author

Done, you may proceed with your changes

@gkorepanov
Copy link
Contributor

Thank you, I will proceed.

@pavelkryukov
Copy link
Member Author

The variable is_stall was not initialised with false value and thus got arbitrary value leading to infinite loop.

It is the reason I like to set the most strict compiling rules :-)

@gkorepanov
Copy link
Contributor

Yeah, it could have been incomparably easier to find the error source if those options had been enabled!

@pavelkryukov
Copy link
Member Author

pavelkryukov commented Apr 10, 2017

I see some random bug, it occurs sometimes with different instruction opcodes:

make perf_sim && ./perf_sim -b ../tests/samples/fib.out -n 100 -d
make: 'perf_sim' is up to date.
wb      cycle 0:bubble
decode  cycle 0:ERROR.Incorrect instruction: 0x7331f860	Unknown

@gkorepanov
Copy link
Contributor

I will review it.

@pavelkryukov
Copy link
Member Author

pavelkryukov commented Apr 10, 2017

I found the reason, but I want you to find it too.

@gkorepanov
Copy link
Contributor

Well, this is really intriguing. I have spent a couple of hours trying to reproduce this bug, but didn't manage to get such an error. Thus just started to examine the code... That's something tricky :-)

I'm wondering how such an instruction, 0x7331f860, could be loaded... There definitely should be some unclosed file or uninitialised variable or something which may produce undefined result in some very rare conditions.

I will postpone the investigation a little bit, if you don't mind, as it is very time-consuming and won't probably give a wished result.

@pavelkryukov
Copy link
Member Author

pavelkryukov commented Apr 11, 2017

There definitely should be some unclosed file or uninitialised variable or something which may produce undefined result in some very rare conditions.

Valgrind finds the problem instantly: decode_data is not initialized.

==7932== Conditional jump or move depends on uninitialised value(s)
==7932==    at 0x40C12F: FuncInstr::initFormat() (func_instr.cpp:154)
==7932==    by 0x40BEFC: FuncInstr::FuncInstr(unsigned int, unsigned int) (func_instr.cpp:119)
==7932==    by 0x40FCB4: PerfMIPS::clock_decode(int) (perf_sim.cpp:105)
==7932==    by 0x40F8B8: PerfMIPS::run(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int) (perf_sim.cpp:53)
==7932==    by 0x431E62: main (main.cpp:29)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
1 Usually one-liner tasks, but may require some deep into infrastructure. bug Fixes a bug or potential bug in simulation.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants