- Operating Systems
- Lecture 2: Histroy
- Lecture3 : Concurrency: Processes and Threads
- Concurrency
- The Basic Problem of Concurrency
- How can we give the illusion of multiple processors?
- Properties of this simple multiprogramming technique
- Modern Technique: SMT/Hyperthreading
- How to protect threads from one another?
- Recall: Program’s Address Space
- Traditional UNIX Process
- How do we multiplex processes?
- CPU Switch From Process to Process
- Diagram of Process State
- Process Scheduling
- What does it take to create a process?
- Process =? Program
- Multiple Processes Collaborate on a Task
- Modern “Lightweight” Process with Threads
- Thread State
- Summary
- Lecture 4: Thread Dispatching
- MIPS: Software conventions for Registers
- Single-Threaded Example
- Use of Threads
- Memory Footprint of Two-Thread Example
- Per Thread State
- Lifecycle of a Thread (or Process)
- Ready Queue And Various I/O Device Queues
- Dispatch Loop
- Running a thread
- Stack for Yielding Thread
- What do the stacks look like?
- Saving/Restoring state (often called “Context Switch)
- Switch Details
- What happens when thread blocks on I/O?
- External Events
- Use of Timer Interrupt to Return Control
- Choosing a Thread to Run
- Summary
- Lecture 5 : Cooperating Threads
- Interrupt Controller
- Review: Timer Interrupt
- ThreadFork(): Create a New Thread
- How does Thread get started?
- What does ThreadFinish() do?
- Additional Detail
- Parent-Child relationship
- ThreadJoin() system call
- Use of Join for Traditional Procedure Call
- Kernel versus User-Mode threads
- Threading models mentioned by book
- Multiprocessing vs Multiprogramming
- Correctness for systems with concurrent threads
- Interactions Complicate Debugging
- Summary
https://cs162.eecs.berkeley.edu/
https://people.eecs.berkeley.edu/~kubitron/cs162/
- Software Engineering Problem:
- Turn hardware/software quirks => what programmers want/need
- Optimize for convenience, utilization, security, reliability, etc…
- For Any OS area (e.g. file systems, virtual memory, networking, scheduling):
- What’s the hardware interface? (physical reality)
- What’s the application interface? (nicer abstraction)
- Problem: Run multiple applications in such a way that they are protected from one another
- Goal:
- Keep User Programs from Crashing OS
- Keep User Programs from Crashing each other
- [Keep Parts of OS from crashing other parts?]
- (Some of the required) Mechanisms:
- Address Translation
- Dual Mode Operation
- Simple Policy:
- Programs are not allowed to read/write memory of other Programs or of Operating System
- Address Space
- A group of memory addresses usable by something
- Each program (process) and kernel has potentially different address spaces.
- Address Translation:
here program 1 has its address space. In the code segment here those addresses which might be from 0 to some point 3F433F whatever, get translated to this part of the DRAM.
The set of addresses that are translated to in the physical space don't overlap.
There is a simple problem here: that mapping need s to somehow be out of the realm of changable by the process.
- Hardware provides at least two modes:
- “Kernel” mode (or “supervisor” or “protected”)
- “User” mode: Normal programs executed
- Some instructions/ops prohibited in user mode:
- Example: cannot modify page tables in user mod
- Attempt to modify => Exception generated
- Example: cannot modify page tables in user mod
- Transitions from user mode to kernel mode:
- System Calls, Interrupts, Other exceptions
- Process Management
- Main-Memory Management
- I/O System management
- File Management
- Networking
- User Interfaces
- Services that (more-or-less) map onto components
- Program execution
- How do you execute concurrent sequences of instructions?
- I/O operations
- Standardized interfaces to extremely diverse devices
- File system manipulation
- How do you read/write/preserve files?
- Looming concern: How do you even find files???
- Communications
- Networking protocols/Interface with CyberSpace?
- Program execution
- Cross-cutting capabilities
- Error detection & recovery
- Resource allocation
- Accounting
- Protection
- Simple
- Only one or two levels of code
- Layered
- Lower levels independent of upper levels
- Microkernel
- OS built from many user-level processes
- Modular
- Core kernel with Dynamically loadable modules
- “Thread” of execution
- Independent Fetch/Decode/Execute loop
- Operating in some Address space
- Uniprogramming: one thread at a time
- MS/DOS, early Macintosh, Batch processing
- Easier for operating system builder
- Get rid concurrency by defining it away
- Does this make sense for personal computers?
- Multiprogramming: more than one thread at a time
- Multics, UNIX/Linux, OS/2, Windows NT/2000/XP, Mac OS X
- Often called “multitasking”, but multitasking has other meanings (talk about this later)
- The basic problem of concurrency involves resources:
- Hardware: single CPU, single DRAM, single I/O devices
- Multiprogramming API: users think they have exclusive access to shared resources
- OS Has to coordinate all activity
- Multiple users, I/O interrupts, …
- How can it keep all these things straight?
- Basic Idea: Use Virtual Machine abstraction
- Decompose hard problem into simpler ones
- Abstract the notion of an executing program
- Then, worry about multiplexing these abstract machines
- Dijkstra did this for the “THE system”
- Few thousand lines vs 1 million lines in OS 360 (1K bugs)
- Assume a single processor. How do we provide the illusion of multiple processors?
- Multiplex in time!
- Each virtual “CPU” needs a structure to hold:
- Program Counter (PC), Stack Pointer (SP)
- Registers (Integer, Floating point, others…?)
- How switch from one CPU to the next?
- Save PC, SP, and registers in current state block
- Load PC, SP, and registers from new state block
- What triggers switch?
- Timer, voluntary yield, I/O, other things
- All virtual CPUs share same non-CPU resources
- I/O devices the same
- Memory the same
- Consequence of sharing:
- Each thread can access the data of every other thread (good for sharing, bad for protection)
- Threads can share instructions (good for sharing, bad for protection)
- Can threads overwrite OS functions?
- This (unprotected) model common in:
- Embedded applications
- Windows 3.1/Machintosh (switch only with yield)
- Windows 95—ME? (switch with both yield and timer)
- Hardware technique
- Exploit natural properties of superscalar processors to provide illusion of multiple processors
- Higher utilization of processor resources
- Can schedule each thread as if were separate CPU
- However, not linear speedup!
- If have multiprocessor, should schedule each processor first
- Original technique called “Simultaneous Multithreading”
- Alpha, SPARC, Pentium 4 (“Hyperthreading”), Power 5
- modern processors all execute more than one thing at a time
- more than 1 instruction on a cycle
- what the picture is showing you here is
- time is going down
- each slot(row?) represents a cycle ,each column represents a functional unit
- so there are 3 functional units
- the 1st cycle
- a) functional unit 1 and 3 are busy (colored), function unit 2 is not busy
- b) you might put two of these together , and each of them has idle slots.
- c) you put two threads togethter into the same pipline and they share the slots as a result you end up filling up more the empty slots.
- this means that the hardware itself has got 2 sets of registers , 2 Program Counters, 2 branch predictores , and 2 of a few other things ...
Need three important things:
- Protection of memory
- Every task does not have access to all memory
- Protection of I/O devices
- Every task does not have access to every device
- Protection of Access to Processor: Preemptive switching from task to task
- Use of timer
- Must not be possible to disable timer from usercode
- here's a zero up to whatever the max address
- we have the text area which is where the instructions are
- we have data which is static data that's allocated at the beginning of the program
- heap : dynamically allocated memory
- stack: local variables
- Address space => the set of accessible addresses + state associated with them:
- For a 32-bit processor there are 2³² = 4 billion addresses
- What happens when you read or write to an address?
- Perhaps Nothing
- you write data in the middle of an empty space , usually it will cause an segmentation fault because it is not real memory.
- Perhaps acts like regular memory
- Perhaps ignores writes
- you write data in the section called read-only , eg. the text segment , or they cause another segmentation fault.
- Perhaps causes I/O operation
- (Memory-mapped I/O)
- Perhaps causes exception (fault)
- Perhaps Nothing
- Process: Operating system abstraction to represent what is needed to run a single program
- Often called a “HeavyWeight Process”
- Formally: a single, sequential stream of execution in its own address space
- Two parts:
- Sequential Program Execution Stream
- Code executed as a single, sequential stream of execution
- Includes State of CPU registers
- Protected Resources:
- Main Memory State (contents of Address Space)
- I/O state (i.e. file descriptors)
- Sequential Program Execution Stream
- Important: There is no concurrency in a heavyweight process
- which means there's only one thread
- The current state of process held in a process control block (PCB):
- This is a “snapshot” of the execution and protection environment
- Only one PCB active at a time
- not quite multi-core world yet, we only have one process can be running at a time in this mode because we only have 1 cpu.
- Give out CPU time to different processes (Scheduling):
- Only one process “running” at a time
- Give more time to important processes
- Give pieces of resources to different processes (Protection):
- Controlled access to non-CPU resources
- Sample mechanisms:
- Memory Mapping: Give each process their own address space
- Kernel/User duality: Arbitrary multiplexing of I/O through system calls
- This is also called a “context switch”
- executing instractuions of p₀,p₁ do not overlap because only 1 cpu
- Code executed in kernel above is overhead
- no process is making useful progress while OS is saving / reloading PCB, etc...
- Overhead sets minimum practical switching time
- Less overhead with SMT/hyperthreading, but… contention for resources instead
- you could actually have 2 processes loaded at the same time ,and that overhead switching in the hardware , and there is no overhead they're pretty much because the hardware is doing it for you.
- As a process executes, it changes state
- new: The process is being created
- The process is waiting to run
- Instructions are being executed
- Process waiting for some event to occur
- The process has finished execution
- PCBs move from queue to queue as they change state
- Decisions about which order to remove from queues are Scheduling decisions
- Many algorithms possible (few weeks from now)
- Must construct new PCB
- Inexpensive
- Must set up new page tables for address space
- More expensive
- Copy data from parent process? (Unix fork() )
- Semantics of Unix fork() are that the child process gets a complete copy of the parent memory and I/O state
- Originally very expensive
- Much less expensive with “copy on write”
- Copy I/O state (file handles, etc)
- Medium expense
- More to a process than just a program:
- Program is just part of the process state
- I run emacs on lectures.txt, you run it on homework.java – Same program, different processes
- Less to a process than a program:
- A program can invoke more than one process
- cc starts up cpp, cc1, cc2, as, and ld
- this is kind of the beginnings of parallelism
- there are 3 processes they want to collaborate together , they got to talk to each other.
- High Creation/memory Overhead
- (Relatively) High Context-Switch Overhead
- Need Communication mechanism:
- Separate Address Spaces Isolates Processes
- it's not like you can write in the memory of one process and read it in the memory of the other
- Shared-Memory Mapping ( one way to talk to each other )
- Accomplished by mapping addresses to common DRAM
- Read and Write through memory
- Message Passing ( another one )
- send() and receive() messages
- Works across network
- Separate Address Spaces Isolates Processes
- Communication occurs by “simply” reading/writing to shared address page
- Really low overhead communication
- Introduces complex synchronization problems
- Mechanism for processes to communicate and to synchronize their actions
- Message system – processes communicate with each other without resorting to shared variables
- IPC facility provides two operations:
- send(message) – message size fixed or variable
- receive(message)
- If P and Q wish to communicate, they need to:
- establish a communication link between them
- exchange messages via send/receive
- Implementation of communication link
- physical (e.g., shared memory, hardware bus, systcall/trap)
- logical (e.g., logical properties)
A modern process has more than one thread. The idea is the process still has one address space , but it has multiple threads in it.
- Thread: a sequential execution stream within process (Sometimes called a “Lightweight process”)
- Process still contains a single Address Space
- No protection between threads
- threads in the same process all share the same memory
- Multithreading: a single program made up of a number of different concurrent activities
- Sometimes called multitasking, as in Ada…
- Why separate the concept of a thread from that of a process?
- Discuss the “thread” part of a process (concurrency)
- Separate from the “address space” (Protection)
- Heavyweight Process == Process with one thread
- each thread has its own registers and stack.
- Threads encapsulate concurrency: “Active” component
- Address spaces encapsulate protection: “Passive” part
- Keeps buggy program from trashing the system
- Why have multiple threads per address space?
- Embedded systems
- Elevators, Planes, Medical systems, Wristwatches
- Single Program, concurrent operations
- Most modern OS kernels
- Internally concurrent because have to deal with concurrent requests by multiple users
- But no protection needed within kernel
- Database Servers
- Access to shared data by many concurrent users
- Also background utility processing must be done
- Network Servers
- Concurrent requests from network
- Again, single program, multiple concurrent operations
- File server, Web server, and airline reservation systems
- Parallel Programming (More than one physical CPU)
- Split program into multiple threads for parallelism
- This is called Multiprocessing
- Some multiprocessors are actually uniprogrammed:
- Multiple threads in one address space but one program at a time
- State shared by all threads in process/addr space
- Contents of memory (global variables, heap)
- I/O state (file system, network connections, etc)
- State “private” to each thread
- Kept in TCB -- Thread Control Block
- CPU registers (including, program counter)
- Execution stack – what is this?
- Execution Stack
- Parameters, Temporary variables
- return PCs are kept while called procedures are executing
- Stack holds temporary results
- Permits recursive execution
- Crucial to modern languages
- Processes have two parts
- Threads (Concurrency)
- Address Spaces (Protection)
- Concurrency accomplished by multiplexing CPU Time:
- Unloading current thread (PC, registers)
- Loading new thread (PC, registers)
- Such context switching may be voluntary (yield(), I/O operations) or involuntary (timer, other interrupts)
- Protection accomplished restricting access:
- Memory mapping isolates processes from each other
- Dual-mode for isolating I/O, other resources
- Book talks about processes
- When this concerns concurrency, really talking about thread portion of a process
- When this concerns protection, talking about address space portion of a process
- 0 register is always 0
- if you write to it nothing happen , if you read to it you always get 0
- 4-7 used for argument passing
- 8-15 caller saves
- 16-23 callee saves
- 24,25 another couple of temporaries which are actually caller saves
- 26,27 reserved for kernel
- 28-30
- Before calling procedure:
- Save caller-saves regs
- Save v0, v1
- Save ra
- After return, assume
- Callee-saves reg OK
- gp,sp,fp OK (restored!)
- Other things trashed
- Imagine the following C program:
main() {
ComputePI(“pi.txt”);
PrintClassList(“clist.text”);
}
- What is the behavior here?
- Program would never print out class list
- because ComputePI would never finish
- Version of program with Threads:
main() {
CreateThread(ComputePI(“pi.txt”));
CreateThread(PrintClassList(“clist.text”));
}
- What does “CreateThread” do?
- Start independent thread running given procedure
- What is the behavior here?
- If we stopped this program and examined it with a debugger, we would see
- Two sets of CPU registers
- Two sets of Stacks
- threads basically can trash each other pretty much anyway they want. but because of the way the abstraction works it's very hard for them to really screw each other's registers up that are stored in memory. but they can pretty much screw each other up by messing with their heaps anyway.
- Questions:
- How do we position stacks relative to each other?
- What maximum size should we choose for the stacks?
- What happens if threads violate this?
- How might you catch violations?
- Each Thread has a Thread Control Block (TCB)
- Execution State: CPU registers, program counter, pointer to stack
- Scheduling info: State (more later), priority, CPU time
- Accounting Info
- Various Pointers (for implementing scheduling queues)
- Pointer to enclosing process? (PCB)?
- – Etc (add stuff as you find a need)
- In Nachos: “Thread” is a class that includes the TCB
- OS Keeps track of TCBs in protected memory
- In Array, or Linked List, or …
- As a thread executes, it changes state:
- new: The thread is being created
- ready: The thread is waiting to run
- running: Instructions are being executed
- waiting: Thread waiting for some event to occur
- terminated: The thread has finished execution
- “Active” threads are represented by their TCBs
- TCBs organized into queues based on their state
- Thread not running => TCB is in some scheduler queue
- Separate queue for each device/signal/condition
- Each queue can have a different scheduler policy
- Conceptually, the dispatching loop of the operating system looks as follows:
Loop {
RunThread();
ChooseNextThread();
SaveStateOfCPU(curTCB);
LoadStateOfCPU(newTCB);
}
- This is an infinite loop
- One could argue that this is all that the OS does
- Should we ever exit this loop???
- When would that be? Shutting down, blue screen , ...
Consider first portion: RunThread()
- How do I run a thread?
- Load its state (registers, PC, stack pointer) into CPU
- Load environment (virtual memory space, etc)
- Jump to the PC
- How does the dispatcher get control back?
- Internal events: thread returns control voluntarily
- External events: thread gets preempted
- Blocking on I/O
- The act of requesting I/O implicitly yields the CPU
- Waiting on a “signal” from other thread
- Thread asks to wait and thus yields the CPU
- Thread executes a yield()
- Thread volunteers to give up CPU
computePI() {
while(TRUE) {
ComputeNextDigit();
yield();
}
}
- the suggestion here is that you're never doing a lot of work if you're mostly yielding all the time.
- why must trap to OS ?
- It's the kernel that is actually multiplexing the threads. The TCBs are stored in kernel space.
- But you can do threading entirely at user level if you want.
- kernel mode vs user mode
- So nothing that's involved here in the switching of threads has to be kernel mode because it just involves register.
- Once we start switching processes then that is got to be in the kernel because we're changing address protection.
- How do we run a new thread?
run_new_thread() {
newThread = PickNewThread();
switch(curThread, newThread);
ThreadHouseKeeping(); /* next Lecture */
}
- How does dispatcher switch to a new thread?
- Save anything next thread may trash: PC, regs, stack
- Maintain isolation for each thread
- Consider the following code blocks:
proc A() {
B();
}
proc B() {
while(TRUE) {
yield();
}
}
- Suppose we have 2 threads:
- Threads S and T
- we have 2 threads, both of the threads are running this code
Switch(tCur,tNew) {
/* Unload old thread */
TCB[tCur].regs.r7 = CPU.r7;
…
TCB[tCur].regs.r0 = CPU.r0;
TCB[tCur].regs.sp = CPU.sp;
TCB[tCur].regs.retpc = CPU.retpc; /*return addr*/
/* Load and execute new thread */
CPU.r7 = TCB[tNew].regs.r7;
…
CPU.r0 = TCB[tNew].regs.r0;
CPU.sp = TCB[tNew].regs.sp;
CPU.retpc = TCB[tNew].regs.retpc;
return; /* Return to CPU.retpc */
}
-
How many registers need to be saved/restored?
- – MIPS 4k: 32 Int(32b), 32 Float(32b)
- Pentium: 14 Int(32b), 8 Float(80b), 8 SSE(128b),…
- Sparc(v7): 8 Regs(32b), 16 Int regs (32b) * 8 windows = 136 (32b)+32 Float (32b)
- Itanium: 128 Int (64b), 128 Float (82b), 19 Other(64b)
-
retpc is where the return should jump to.
- In reality, this is implemented as a jump
-
There is a real implementation of switch in Nachos
- See switch.s
- Normally, switch is implemented as assembly
- Of course, it’s magical!
- But you should be able to follow it!
- See switch.s
-
What if you make a mistake in implementing switch?
- Suppose you forget to save/restore register 4
- Get intermittent failures depending on when context switch occurred and whether new thread uses register 4
- System will give wrong result without warning
-
Can you devise an exhaustive test to test switch code?
- No! Too many combinations and inter-leavings
-
Cautionary tail:
- For speed, Topaz kernel saved one instruction in switch()
- Carefully documented!
- Only works As long as kernel size < 1MB
- What happened?
- Time passed, People forgot
- Later, they added features to kernel (no one removes features!)
- Very weird behavior started happening
- Moral of story: Design for simplicity
- What happens when a thread requests a block of data from the file system?
- User code invokes a system call
- Read operation is initiated
- Run new thread/switch
- Thread communication similar
- Wait for Signal/Join
- Networking
- What happens if thread never does any I/O, never waits, and never yields control?
- Could the ComputePI program grab all resources and never release the processor?
- What if it didn’t print to console?
- Must find way that dispatcher can regain control!
- Could the ComputePI program grab all resources and never release the processor?
- Answer: Utilize External Events
- Interrupts: signals from hardware or software that stop the running code and jump to kernel
- Timer: like an alarm clock that goes off every some many milliseconds
- If we make sure that external events occur frequently enough, can ensure dispatcher runs
Network interrupt is an external interrupt.
-
An interrupt is a hardware-invoked context switch
- No separate step to choose what to run next
- Always run the interrupt handler immediately
-
network intertupts run every time a packets received
- Solution to our dispatcher problem
- Timer Interrupt routine:
TimerInterrupt() {
DoPeriodicHouseKeeping();
run_new_thread();
}
- I/O interrupt: same as timer interrupt except that DoHousekeeping() replaced by ServiceIO().
- How does Dispatcher decide what to run?
- Zero ready threads – dispatcher loops
- Alternative is to create an “idle thread”
- Can put machine into low-power mode
- Exactly one ready thread – easy
- More than one ready thread: use scheduling priorities
- Zero ready threads – dispatcher loops
- Possible priorities:
- LIFO (last in, first out):
- put ready threads on front of list, remove from front
- Pick one at random
- FIFO (first in, first out):
- Put ready threads on back of list, pull them from front
- This is fair and is what Nachos does
- Priority queue:
- keep ready list sorted by TCB priority field
- LIFO (last in, first out):
- The state of a thread is contained in the TCB
- Registers, PC, stack pointer
- States: New, Ready, Running, Waiting, or Terminated
- Multithreading provides simple illusion of multiple CPUs
- Switch registers and stack to dispatch new thread
- Provide mechanism to ensure dispatcher regains control
- Switch routine
- Can be very expensive if many registers
- Must be very carefully constructed!
- Many scheduling options
- Decision of which thread to run complex enough for complete lecture
- the yellow thing represents something that's inside the chip, a chip talking to CPU
- there is a hardware software interrupt , if I set some bits in my interrupt controller they essentially assert and interrupt just like anybody else, but these bits get turned on by software.
- Interrupts invoked with interrupt lines from devices
- Interrupt controller chooses interrupt request to honor
- Mask enables/disables interrupts
- Priority encoder picks highest enabled interrupt
- and turns that into an ID ,eg 02 maybe the printer
- Software Interrupt Set/Cleared by Software
- Interrupt identity specified with ID line
- CPU can disable all interrupts with internal flag
- Non-maskable interrupt line (NMI) can’t be disabled
- Disable/Enable All Ints => Internal CPU disable bit
- RTI (interrupt return instruction) reenables interrupts, returns to user mode
- the only way to prevent a recursive infinite interrupting of an interrupt is to turn them all off to correct something.
- Raise/lower priority: change interrupt mask
- in software what we do is we raise the priority
- which means we change the mask to reflect we've running at some priority and all devices of that priority or lower will no longer interrutp. and only the highers ones will interrupt.
- so it's a purely a software concept.
- Software interrupts can be provided entirely in software at priority switching boundaries
- software interrupts don't have to be done in hardware because at the point where we change the priority by changing the mask back we could see whether a software interrupts supposed to occur
- This is often called preemptive multithreading , since threads are preempted for better scheduling
- Solves problem of user who doesn’t insert yield();
- ThreadFork() s a user-level procedure that creates a new thread and places it on ready queue.
- We called this CreateThread() earlier
- can be entirely user-level because the notion of "placing on the ready queue" is a kernel level concept. because ready queue is inside kernel.
- Arguments to ThreadFork()
- Pointer to application routine (fcnPtr)
- tell it what routine to run
- Pointer to array of arguments (fcnArgPtr)
- quite possible that the thing you're going to run needs arguments
- Size of stack to allocate
- Pointer to application routine (fcnPtr)
- Implementation
- Sanity Check arguments
- optional , helpful with debugging
- Enter Kernel-mode and Sanity Check arguments again
- why need check again ? making sure by the kernel that the user thread creating this thread really has permissions for all the arguments.
- Allocate new Stack and TCB
- Initialize TCB and place on ready list (Runnable).
- Sanity Check arguments
- Initialize Register fields of TCB
- Stack pointer made to point at stack
- PC return address => OS (asm) routine called ThreadRoot()
- Two arg registers (a0 and a1) initialized to fcnPtr and fcnArgPtr, respectively (assume MIPS)
- Initialize stack data?
- Here is a thread that's been running for a while.
- A which was the first procedure called by the thread is actually has been called by this thread root procedure
- thread root is really the root of the thread
- Eventually, run_new_thread() will select this TCB and return into beginning of ThreadRoot()
- This really starts the new thread
- ThreadRoot() is the root for the thread routine:
ThreadRoot() {
DoStartupHousekeeping();
UserModeSwitch(); /* enter user mode */
Call fcnPtr(fcnArgPtr);
ThreadFinish();
}
- Startup Housekeeping
- Includes things like recording start time of thread
- Other Statistics
- Stack will grow and shrink with execution of thread
- Final return from thread returns into ThreadRoot() which calls ThreadFinish()
- ThreadFinish() will start at user-level
- it's very important that we get int user-mode before we start running
- once we are calling that
fcnPtr(fcnArgPtr)
we are running arbitrary code that was given to us when the thread was created and we really don't want to do that in kernel mode.
- once we are calling that
- Needs to re-enter kernel mode (system call)
- “Wake up” (place on ready queue) threads waiting for this thread
- Threads (like the parent) may be on a wait queue waiting for this thread to finish
- Can’t deallocate thread yet
- We are still running on its stack!
- Instead, record thread as “waitingToBeDestroyed”
- Call
run_new_thread()
to run another thread:- ThreadHouseKeeping() notices waitingToBeDestroyed and deallocates the finished thread’s TCB and stack
- Thread Fork is not the same thing as UNIX fork
- UNIX fork creates a new process so it has to create a new address space
- For now, don’t worry about how to create and switch between address spaces
- Thread fork is very much like an asynchronous procedure call
- Runs procedure in separate thread
- Calling thread doesn’t wait for finish
- What if thread wants to exit early?
- ThreadFinish() and exit() are essentially the same procedure entered at user level
- Every thread (and/or Process) has a parentage
- A “parent” is a thread that creates another thread
- A child of a parent was created by that parent
- One thread can wait for another to finish with the ThreadJoin(tid) call
- Calling thread will be taken off run queue and placed on waiting queue for thread tid
- Where is a logical place to store this wait queue?
- Similar to wait() system call in UNIX
- Lets parents wait for child processes
- ThreadJoin is not kind of synchronization
- A traditional procedure call is logically equivalent to doing a ThreadFork followed by ThreadJoin
A() { B(); }
B() { Do interesting, complex stuff }
A’() {
tid = ThreadFork(B,null);
ThreadJoin(tid);
}
- The procedure A() is equivalent to A’()
- Why not do this for every procedure?
- Context Switch Overhead
- Memory Overhead for Stacks
- We have been talking about Kernel threads
- Native threads supported directly by the kernel
- Every thread can run or block independently
- One process may have several threads waiting on different things
- Downside of kernel threads: a bit expensive
- Need to make a crossing into kernel mode to schedule
- Even lighter weight option: User Threads
- User program provides scheduler and thread package
- May have several user threads per kernel thread
- User threads may be scheduled non-premptively relative to each other (only switch on yield())
- Cheap
- Downside of user threads:
- When one thread blocks on I/O, all threads block
- Kernel cannot adjust scheduling among all threads
- Option: Scheduler Activations
- Have kernel inform user level when thread blocks…
- the idea is when you're doing user-level scheduling if one of them goes to block and the first thing the kernel does is gives back the THREAD via another up call so that you can continue to do the lightweight stuff. so the kernel has to participate.
- Remember Definitions:
- Multiprocessing == Multiple CPUs
- Multiprogramming == Multiple Jobs or Processes
- Multithreading == Multiple threads per Process
- What does it mean to run two threads “concurrently”?
- Scheduler is free to run threads in any order and interleaving: FIFO, Random, …
- Dispatcher can choose to run each thread to completion or time-slice in big chunks or small chunks
- If dispatcher can schedule threads in any way, programs must work under all circumstances
- Can you test for this?
- How can you know if your program works?
- Independent Threads:
- No state shared with other threads
- Deterministic => Input state determines results
- Reproducible => Can recreate Starting Conditions, I/O
- Scheduling order doesn’t matter (if switch() works!!!)
- Cooperating Threads:
- Shared State between multiple threads
- Non-deterministic
- Non-reproducible
- Non-deterministic and Non-reproducible means that bugs can be intermittent
- Sometimes called “Heisenbugs”
- Is any program truly independent?
- Every process shares the file system, OS resources, network, etc
- Extreme example: buggy device driver causes thread A to crash “independent thread” B
- You probably don’t realize how much you depend on reproducibility:
- Example: Evil C compiler
- Modifies files behind your back by inserting errors into C program unless you insert debugging code
- Example: Debugging statements can overrun stack
- Example: Evil C compiler
- Non-deterministic errors are really difficult to find
- Example: Memory layout of kernel+user programs
- depends on scheduling, which depends on timer/other things
- Original UNIX had a bunch of non-deterministic errors
- Example: Something which does interesting I/O
- User typing of letters used to help generate secure keys
- Example: Memory layout of kernel+user programs
- Interrupts: hardware mechanism for returning control to operating system
- Used for important/high-priority events
- Can force dispatcher to schedule a different thread (premptive multithreading)
- New Threads Created with ThreadFork()
- Create initial TCB and stack to point at ThreadRoot()
- ThreadRoot() calls thread code, then ThreadFinish()
- ThreadFinish() wakes up waiting threads then prepares TCB/stack for distruction
- Threads can wait for other threads using ThreadJoin()
- Threads may be at user-level or kernel level
- Cooperating threads have many potential advantages
- But: introduces non-reproducibility and non-determinism
- Need to have Atomic operations