Firstly, all work in this tutorial will need to be done on a node of the cluster. Start off by logging onto bourbaki:
turing.comp309> ssh bourbaki
Now log onto to a node of the cluster:
turing.comp309> ssh b1
Now copy the entire contents of the Lecture_13 Examples directory to the directory that you are doing this tutorial in:
b1.comp309> cp ~comp309/markdown_lectures/lecture_13/Examples/* .
Take a look at the makefile. Make sure you pay particular attention to the host target, and try to follow what it does.
OK, lets try compiling all the programs you just copied:
b1.comp309> make
gcc -Wall hello.c -lpvm3 -o hello
gcc -Wall hello_other.c -lpvm3 -o hello_other
Now, lets install those binaries on the host we are running on (say b1):
b1.comp309> make host
cp -f hello hello_other ~/pvm3/bin/LINUXI386/
This will copy the binaries to ~/pvm3/bin/LINUXI386/ directory. PVM programs should be placed in this directory to enable the PVM daemon to find them when it needs to spawn copies of a particular program.
Now change into the directory you copied the binaries to:
b1.comp309> cd ~/pvm3/bin/LINUXI386/
b1.LINUXI386>
Now try running the hello program:
b1.LINUXI386> ./hello
You should get a heap of errors. Time to start up the PVM daemon:
b1.LINUXI386> pvm
pvm>
This will give you a console where you can control your own PVM daemon. The console allows you to list running processes, kill processes, etc. For now we will just quit the console, leaving the PVM daemon running:
pvm> quit
quit
Console: exit handler called
pvmd still running.
Now try running the hello program again:
b1.LINUXI386> ./hello
i'm t40002
from t40003: hello, world from b1
Success! Now run the console again and shut down PVM by typing halt.
Login to one of the other nodes, use rlogin or ssh.
b1.Examples> rlogin b2
Last login: Wed Sep 4 12:32:41 from b1
[comp309@b2 ~]$
Change to the ~/pvm3/bin/LINUX/ directory
, and make sure the binaries are there. Run the PVM console the same way you did earlier on b1, and quit. Run the hello program again, it will work as it did when you ran it on the first node, but it will use the new node’s host name instead:
[comp309@b2 LINUX]$ ./hello
i'm t40002
from t40003: hello, world from b2
Now lets do something a little more interesting. Go back into the PVM console:
[comp309@b1 LINUX]$ pvm
pvmd already running.
pvm>
We will now add some extra hosts for the daemon to spawn jobs on, using the add command:
pvm> add b3
add b3
1 successful
HOST DTID
b3 80000
pvm> add b4
add b4
1 successful
HOST DTID
b4 c0000
We can now see what hosts are available to the PVM daemon:
pvm> conf
conf
3 hosts, 1 data format
HOST DTID ARCH SPEED DSIG
b2 40000 LINUXI386 1000 0x00408841
b3 80000 LINUXI386 1000 0x00408841
b4 c0000 LINUXI386 1000 0x00408841
Quit the console again, and run the hello program a few times:
[comp309@b2 LINUXI386]$ ./hello
i'm t40005
from t40006: hello, world from b2
[comp309@b2 LINUXI386]$ ./hello
i'm t40007
from t80001: hello, world from b2
[comp309@b2 LINUXI386]$ ./hello
i'm t40008
from tc0001: hello, world from b3
[comp309@b2 LINUXI386]$ ./hello
i'm t40009
from t4000a: hello, world from b4
As you can see, the hello message will come from different hosts. How does this happen? The pvm_spawn call in hello.c
does not specify a particular host to spawn jobs on. The PVM daemon will then simply choose one of the hosts available to it to run jobs on.
The PVM daemon will produce log files in /tmp. You will find the log for your PVM daemon will be named pvml.
b1.tmp> id
uid=3191(comp309) gid=6310(admin) groups=6310(admin)
For comp309, the user id is 3191, so the log file should be pvml.3191:
bourbaki.tmp> ls -l pvml.3191
-rw------- 1 comp309 admin 125 Sep 4 13:26 pvml.3191
Sure enough, the owner of the file is comp309. Take a look at your log file, you probably wont see alot information in it after what we’ve been running today. It will become useful later on when you write larger programs spanning the nodes, as PVM captures program output (for example, printing to stderr) and logs it. In most cases, this will be the only way of seeing what your programs are actually doing.
You will most likely notice the pvmd.
b1.tmp> pvm
libpvm [pid6089] mksocs() connect: Connection refused
libpvm [pid6089] socket address tried: /tmp/pvmtmp005572.0
pvmd already running.
You can rectify this situation by removing the pvmd file that belongs to you:
b1.tmp> rm pvmd.3191
rm: remove `pvmd.3191'? y