Introduction
To begin with a short definition we’ll introduce the technique as a programming paradigm to allow a program to perform multiple tasks concurrently. With the ever increasing speed of the CPUs, longer instruction pipelines and better compiler optimizations and given limitations of the hardware peripherals we needed a technique similar to preemptive multitasking (It is a the feature that allows your computer to run two or more programs concurrently) employed for processes (A program in execution is called a process) to be applicable at thread (A thread is a dispatch able unit of executable code which shares the address space, resources and kernel objects of a process) level to allow it to perform multiple tasks simultaneously. This lead to the advent of multithreading. In this technique we can have more than one thread of execution of a single process scheduled by the CPU to perform an independent task.
How is multithreading achieved?
Multithreading on a single processor system generally occurs by time slicing, wherein the processor switches between different threads and this happens fast enough to give the end user an illusion of simultaneity. On a multiprocessor or multi-core system, now coming into general use, threading can be achieved via multiprocessing wherein different threads and processes can run literally simultaneously on different processors or cores. Many known multithreading standards exist as of now. Win32 Multithreading is the standard on Windows platform, other multithreading standards do exist like OS/2, POSIX Threads known as pthreads and Java threads. There is no in built support for POSIX threads or pthreads on Windows but there are Open Source libraries that give you pthreads like interface on Windows.
Threading Basics
· A thread is an independent path of execution through a program.
· It has two components: a kernel object that the OS uses to manage the thread and its stack which it uses to maintain the method parameters and the local variables as it executes the code.
· This independent flow of control is accomplished because a thread maintains its own stack pointer, registers, scheduling properties (such as policy or thread priority), set of pending and blocked signals (A thread is said to be signaled when it is terminated, when active its non signaled) and other thread specific data.
· Any thread can be in one of the following states during its lifetime: 1) Active(Running) 2) Ready(Schedulable) 3) Waiting(Blocked) 4) Terminated
· There are various APIs to manage threads varying according to standards, just to give an example to create a thread using Win32 threading model we need to call Win32 API CreateThread or CRT API _beginthreadex. The same can be accomplished using pthread API pthread_create. Java thread can be created instantiating class object which either extends Thread class or implements Runnable interface (We need to call its start or run method in respective case to start the thread execution).
Threading Issues
All’s well if our threads didn’t have to communicate with each other, but this an ideal assumption which rarely happens practically. Our threads might need to communicate in certain scenarios like
a) When we have multiple threads accessing a shared resource in such a way that the resource does not become corrupt avoiding any race condition. (This requires atomic access to the resource)
b) They the threads are not in a cyclic deadlock condition accessing the resource already acquired by the other thread.
c) When one thread needs to notify one or more other threads that a specific task has been completed.
Thread Synchronization
Thread synchronization is the means to organize the processes’ threads so that multiple threads do no leave resources in an inconsistent state, thread dependency order is respected and thread communication is systematic. All thread synchronization does is to provide atomic access to a piece of code. There are various means to achieve the same depending upon our requirements and the threading model being used.
1. Need an atomic access for a single shared global.
Win32 provides us with InterLocked family of functions to achieve the same:Interlocked function Description InterlockedIncrement Increments variable by 1 atomically InterlockedDecrement Decrements variable by 1 atomically InterlockedExchangeAdd Increments the variable by specified amount InterlockedCompareExchange Compares and exchanges variables atomically InterlockedDecrement64 Decrements 64 bit variable by 1 atomically InterlockedExchangePointer Atomically exchanges a pair of pointer values
They are real fast as they do not put the thread to the waiting state, but are only good enough for elementary value variables. We need an entirely different solution for protecting a data structure.
2. Waiting on a thread to finish its execution.
Win32 has Wait functions for the sameWait function Description WaitForSingleObject Waits for a single object to become signaled (terminate/relinquish control) WaitForMultipleObjects Waits simultaneously on 64 objects, and can track each one independently
The wait functions have a simple interface. A single wait function can be used to wait on objects of varied types viz. a thread, mutex, semaphore, event etc. Using these APIs, it may not be readily apparent if the code is expecting a thread termination, an event to be signaled, or a mutex to be released. PThreads on the other hand have different functions to manage each kind of object
pthread_mutex_lock, pthread_mutex_unlock, pthread_cond_signal, pthread_cond_wait etc.
3. Need to guard atomic update to more sophisticated data structures
We have multiple solutions to this problem. On Win32 we have a choice of usingUse Spin locks, Critical Sections, Mutex and Events, Semaphores, Wait functions. We’ll discuss Critical Section as a solution here
Spin Locks
Spin locks can be implemented using Interlocked family function InterlockedExchange. All it does is poll for a particular value unless it is set.
It is expensive as this does not put the thread to wait and loops idly as in poling hence wasting CPU cycles.BOOL gResourceInUse = FALSE;
void SomeFunc(void)
{
// Waiting for the resource.
while (InterlockedExchange(&gResourceInUse, TRUE) == TRUE)
Sleep(0);
// Access the resource
…
// Release the Access to the resource
InterlockedExchange(&gResourceInUse, FALSE);
}
A critical section is a section of code that must be allowed to complete atomically with no interruption that affects its completion. Different from Spin locks, other threads which want to execute same piece of code are put in wait state. But here we can specify how many times we need to loop as in InterlockedExchange before putting the thread to wait state (as putting the thread to wait state involves context switching and might be heavier still), and this can be done by setting the spint count for the Critical Section. Various APIs to manipulate Critical Section are as follows.Critical Section function Description InitializeCriticalSection Initializes the CRITICAL_SECTION object EnterCriticalSection Puts a lock around critical code section LeaveCriticalSection Removes the lock DeleteCriticalSection Deinitializes the CRITICAL_SECTION object TryEnterCriticalSection If not granted access, immediately returns InitializeCriticalSectionAndSpinCount Initializes the critical section and specifies a spin count to be executed before moving to a wait state SetCriticalSectionSpinCount Changes Spin count at run time
A con of Critical Sections is that we may be caught up in deadlocks as it is hard to specify timeout for wait state.
4. We need protection from threads across processes (Kernel Mode Synchronization)
Spin locks, Critical Section are means to achieve User Mode Synchronization (where only user mode threads are involved). When we need to involve creation of kernel object (A kernel object is simply a memory block allocated by the kernel and is accessible only by the kernel. This memory block is a data structure whose members maintain information about the object. The are several types of kernel objects, such as event objects, file objects, file-mapping objects, mutex objects, process objects, semaphore objects, thread objects, waitable timer objects etc) to manage communication across processes the same is said to be Kernel Mode Synchronization.
A kernel object may be in two states a) Signaled b) Non Signaled. A thread and for that matter a process is created in nonsignaled state and it is marked as signaled when it finishes. So all we need is to create the kind of kernel object we want, around the code we want to protect and let the other thread wait on it until the kernel object is turned to signaled state.
4.1. When you want to wait on a thread/process to complete
Use Wait Functions (discussed above). Wait functions cause a thread to voluntarily place itself into a wait state until specific kernel object becomes signaled.
4.2. When you want to ensure that a thread has mutually exclusive excess to a resource.
Use Mutex Kernel Objects. They work pretty much like critical section with the difference that Critical Section are user mode objects while mutex are kernel mode objects. They are slower than their counterparts. Win32 APIs for manipulating mutex are
CreateMutex, OpenMutex, ReleaseMutex while PThreads have pthread_mutex_init, pthread_mutex_destroy, pthread_mutex_lock, pthread_mutex_trylock, thread_mutex_unlock for the same.
4.3. When you may be interested to form a listener kind of mechanism between two threads
Use Event Kernel Objects on Windows. An event can be in signaled or nonsignaled state. When signaled, threads waiting on event become schedulable, when nonsignaled, threads waiting keep waiting. Further more an event can be a manual reset or auto reset event. When a manual reset event kernel object becomes signaled, all the threads waiting on the event become schedulable. When an auto reset event becomes signaled, only one of the threads becomes schedulable. The Win32 APIs for the same are CreateEvent, OpenEvent, ResetEvent,
SetEvent, SetEventData. There is no direct equivalent to the Win32 event in pthreads. Rather, there are condition variables (data type pthread_cond_t), and there is no distinction between auto and manual reset. Condition variables most closely resemble auto-reset events, but there are some differences. The APIs are pthread_cond_signal, pthread_cond_wait, pthread_cond_timedwait.
4.4. When you want to synchronize the access in a way where you can provide simultaneous access certain number of times.
Use Semaphore Kernel Objects. For windows a semaphore consists of a maximum resource count (The maximum number allowed to access the resource) and current resource count (The available resources that can be allocated). If the current resource count is greater than 0, the semaphore is signaled, else it’s nonsignaled. The Win32 APIs for the same are CreateSemaphore,
OpenSemaphore, ReleaseSemaphore. Posix semaphore APIs are sem_init, sem_wait, sem_post, sem_getvalue, sem_trywait, destroy.
There are still other synchronization mechanisms like Critical Regions, Monitors, Waitable Timers, Named Semaphores which I haven’t taken up here. If need be they can be discussed. Hope this is useful to get you started on multithreading.
Tuesday, March 18, 2008
Multithreading - An Introduction
Labels:
Multithreading
Subscribe to:
Post Comments (Atom)

What people said... (0)
Post a Comment