Title: COMP309/509 - Lecture 7
class: middle, center, inverse
COMP309/509 - Parallel and Distributed Computing
Lecture 7 - Multi-Threaded Programming
By Mitchell Welch
University of New England
Reading
- Chapters 11 and 10 (Just section 10.6) from Advanced Programming in the UNIX environment
- Assignment 1 Description
Summary
- Overview of Threads
- Processes vs Threads
- Thread Creation
- Reentrant Code
- Multi-threaded Examples
Overview of Threads
- This lecture & most of the next will be aimed at giving you some idea of what threads are.
- The topics we will cover are:
- What are threads?
- Why are they different from processes?
- What is the motivating example for threads?
- How do you create a single thread?
- How do you create a bevy of threads?
Overview of Threads
Overview of Threads
- Threads be What? A CPU executes a program by:
- executing a sequence of instructions within the executable (text) of the corresponding process
- The CPU uses value of the process program counter to determine which instruction to execute next.
- This sequence of instructions is called the program’s or processes' thread of execution.
- Essentially, the threads of execution allow for concurrent tasks within the environment of a single process.
Overview of Threads
for(expression1; expression2; expression3)
expression4;
- Executing this expression will create the following thread of execution (assuming expression2 remains true):
expression1;
expression2;
expression4;
expression3;
expression2;
expression4;
expression3;
:
- until either
- a expression2 becomes false, or
- expression4 executes a break or ……..
Overview of Threads
- So from the view of the process executing the for loop:
- the thread of execution is a continuous sequence of instructions:
1,2,4,3,2,4,3,……..
- However from the point of view of the CPU:
this thread of execution will be intertwined with the threads of execution of all the other processes running at the time
- Each sequence being interrupted at each context switch.
- Indeed this thread may even be interrupted within this process if the process receives a signal for which it has a handler installed.
- The obvious extension of this model of computation is for a single process to have multiple threads of execution.
Overview of Threads
- Threads are a programming tool for creating parallelism in programs
- Processes are also a programming tool for creating parallelism in programs.
- One spawns a process to complete a task in parallel and uses shared memory (i.e. pipes, fifos or mapped memory) to synchronise & communicate.
- One spawns a thread to complete a task in parallel and uses shared memory to synchronise & communicate.
- The main difference is that threads share much much more!
- Threading on Linux has a bit of a history
Overview of Threads
Overview of Threads
- The UNIX standard for threading is called pthreads.
- Its not the greatest standard in the world, but it is the only one.
It is only quite recently that Linux has had an implementation that comes close to conforming:
* The new [Native POSIX Thread Library for Linux (NPTL)](http://people.redhat.com/drepper/nptl-design.pdf)
This library was introduced around the transition from 2.4 to 2.5 kernel.
- Depending on the Linux distribution.
Overview of Threads
#include <sys/types.h> /* for getppid */
#include <unistd.h> /* for getppid */
#include <pthread.h> /* for threads */
#include <stdio.h> /* for perror etc */
#include <stdlib.h> /* for exit */
static pid_t mainThread;
static pid_t childThread;
void *child(void* arg){
childThread = getppid();
return arg;
}
int main(){
pthread_t tid;
mainThread = getppid();
if(pthread_create(&tid, NULL, child, (void *)NULL){
perror("Could not create thread");
exit(EXIT_FAILURE);
} else {
pthread_join(tid, NULL);
}
if(childThread == mainThread){
printf("Your system is using NPTL\n");
} else {
printf("Your system is using LinuxThreads\n");
}
exit(EXIT_SUCCESS);
}
Processes vs Threads
- Once upon a time, Unix processes were heavy-weight processes:
- Their creation was relatively expensive (requires copying memory and executables, allocating pids, scheduling etc).
- The parallelism was reasonably course grained, a process executes for a while and is then suspended by a context switch, its next turn depends on how many other processes are running.
- However, on Linux, fork is implemented using copy-on-write pages.
- The only penalty incurred by fork is the time and memory required to duplicate the parent’s page tables, and to create a unique task structure for the child.
- In fact both fork and thread creation in LinuxThreads and NPTL are implemented by the same Linux system call clone, differing only in the flags used!
Processes vs Threads
- A thread can be thought of as a type of light-weight process:
- their creation is cheap (perhaps cheaper than
fork)
;
- the parallelism could be extremely fine (compared to a
fork
);
- A thread is simply a particular sequence of execution steps through a single process’s executable.
- So two threads within a process may execute in parallel without being interrupted by a context switch.
Separate threads within a process share the same memory (& the same executable),
- Hence two threads that use
strtok
, for example, can cause havoc.
- This hints at the idea of thread-safe functions.
Processes vs Threads
- A thread is small & cheap, consisting merely of:
- a program counter
- register set
- stack
- state
- Consequently creating them is cheap.
- On the other hand since they share memory & executable,
- programming with them can be a subtle business.
Similar to signals.
Processes vs Threads
- A typical problem well solved by threads is monitoring several file descriptors.
- There are two classic approaches in UNIX:
- Method A: Use non-blocking I/O
- By setting a flag using fcntl we can force read to return immediately if there is nothing to read (it returns -1 and sets errno to be EAGAIN)
- Thus it is a simple matter to continuously monitor several file descriptors by repeatedly checking them. i.e. busy waiting.
- Method B:
- Use the
select
operation.
Processes vs Threads
- Other possibilities
- Method C:
- Use Asynchronous I/O with SIGPOLL
- This is not part of POSIX though.
- Method D:
- Use POSIX1.b Asynchronous I/O
- Method E:
- Use poll to block
- Again this is not POSIX
- Another classic use would be a multi-threaded server
- We’ll find other cool uses of threads.
Thread creation
- Include the appropriate library:
#include <pthread.h>
- Write the entry point to the thread:
void * thread_entry(void *arg);
- Create space for the thread’s id:
pthread_t my_thread;
Thread creation
- Create the data required by the thread:
void *thread_arg = (void *)<some arbitrary expression>;
pthread_create(&my_thread, NULL, thread_entry, thread_arg);
Thread creation
- Compile with gcc using the flag:
-pthread
- when both compiling and linking!
- The
-pthread
tells the compiler to use reentrant versions of the C library routines, so is needed when producing library object files.
- Even when no pthread routines appear in the source of the *.o file!
- The
-pthread
tells the linker to link with the pthread library.
Reentrant Code
- In the previous slide I mentioned the use of reentrant versions of functions from the c libraries.
- A function is reentrant if it can be interrupted mid-execution and safely be called again before the previous invocation has completed.
- The following is an example of non-reentrant code:
int g_var = 1;
int f(){
g_var = g_var + 2;
return g_var;
}
int g(){
return f() + 2;
}
(Source: http://en.wikipedia.org/wiki/Reentrancy_%28computing%29)
Reentrant Code
- Slight modifications to make everything safe:
int f(int i){
return i + 2;
}
int g(int i){
return f(i) + 2;
}
Reentrant Code
- Basic principles:
- Reentrant code may not hold any static or global data
- Reentrant code may not modify its own code
- Reentrant code may not call non-reentrant code
Reentrant Code
- Final points - it is possible for function to be thread-safe abut not reentrant:
int myFunction(){
mutex_lock();
// ...
function body code
// ...
mutex_unlock();
}
- If this function is used as part of a reentrant interrupt handler and a second call is made while the first is within the mutual exclusion block, a deadlock may occur in the second call.
- This situation will arise if a second interrupt occurs while the first is executing the mutual exclusion block.
Multi-threaded Examples
Multi-threaded Examples
- To do this we first present a simple library of procedures.
- process.c is the implementation, and
- process.h is the corresponding header file
- process.h declares three routines:
void *process_fd(void *);
void process_command(char *, int);
void *pr_msg_fn(void *ptr);
- the first monitors a file descriptor, and
- the second echoes stuff out to standard output.
- the third just prints a string to stderr
- note how (in
process.c
) the (void *) arg
must be recast prior to processing.
Multi-threaded Examples
void *process_fd(void *);
void process_command(char *, int);
void *pr_msg_fn(void *ptr);
Multi-threaded Examples
headers.h
- Packages up all header includes for convenience
#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <pthread.h>
#include <stdlib.h>
Multi-threaded Examples
#include "process.h"
#include "headers.h"
#define BUFSIZE 1024
void *process_fd(void *arg){
int nbytes, fd = *((int *)(arg));
char buf[BUFSIZE];
while(1){
if(((nbytes = read(fd, buf, BUFSIZE)) == -1) &&
(errno != EINTR)) break;
if(!nbytes) break;
process_command(buf, nbytes);
}
return NULL;
}
void process_command(char *buffy, int bytes){
if(write(STDOUT_FILENO, buffy, bytes) < bytes)
fprintf(stderr, "write of buffy failed");
}
void *pr_msg_fn(void *arg){
fprintf(stderr, "%s ", (char *)arg);
return NULL;
}
Multi-threaded Examples
- exampl01.c - A simple single-threaded function call
#include "process.h"
#include "headers.h"
int main(){
int fd_1 = open("makefile", O_RDONLY);
if(fd_1 == -1)
perror("Could not open makefile");
else
process_fd(&fd_1);
return 0;
}
Multi-threaded Examples
- exampl02.c - A simple single-threaded function call
#include "process.h"
#include "headers.h"
int main(){
pthread_t tid;
int fd = open("makefile", O_RDONLY);
if(fd == -1)
perror("Could not open makefile");
else if(pthread_create(&tid, NULL, process_fd, (void *)&fd) != 0)
perror("Could not create thread");
else if(pthread_join(tid, NULL) != 0)
perror("Could not join with thread");
else exit(EXIT_SUCCESS);
exit(EXIT_FAILURE);
}
Multi-threaded Examples
- Compile all multithreaded code with the
-pthread
flag.
- It affects include files in three ways:
- The include files define prototypes for the reentrant variants of some of the standard library functions, e.g.
strtok_r()
as a reentrant equivalent to strtok()
.
- If -pthread is defined, some
<stdio.h>
functions are no longer defined as macros, e.g. getc()
and putc()
. In a multithreaded program, such functions require additional locking, which the macros don’t perform, so we must call functions instead.
- More importantly,
<errno.h>
redefines errno so that errno
refers to the thread-specific errno
location, rather than the global errno variable.
- Why would a shared
errno
be a bad idea? - Race conditions: Which thread is producing the error?
Multi-threaded Examples
- The following operations are the basic POSIX thread management operations.
- They require
#include <pthread.h>
and to be compiled using gcc (and linked) with the flag: -pthread
- They are:
pthread_create
pthread_join
pthread_exit
- Most pthread routines return zero on success, and a non-zero error code on failure.
- Some,
pthread_exit
for example, do not return (a value).
Summary
- Overview of Threads
- Processes vs Threads
- Thread Creation
- Reentrant Code
- Multi-threaded Examples
class: middle, center, inverse
Questions?
Reading
- Chapters 11 and 10 (Just section 10.6) from Advanced Programming in the UNIX environment
- Assignment 1 Description