The Duke Compute Cluster has a new login node, dscr-slogin-03, dedicated to running jobs on the Open Science Grid (OSG). This capability was previewed during the OSG Software Carpentry Workshop last October:
SWC-OSG Workshop at Duke University, October 27-29th 2015
Running jobs on the Open Science Grid requires an OSG account: please sign up Complete instructions for running OSG jobs can be found here: Job Scheduling with HTCondor and here: Connecting the Campus to Grid Resources. Below is a terminal session of running the first OSG tutorial job on the DCC. Give this a try and send any questions to rescomputing@duke.edu
First, ssh to dscr-slogin-03:
tm103@dscr-slogin-02 ~ $ ssh dscr-slogin-03 ################################################################################ ## You are about to access a Duke University computer network that is intended # ## for authorized users only. You should have no expectation of privacy in # ## your use of this network. Use of this network constitutes consent to # ## monitoring, retrieval, and disclosure of any information stored within the # ## network for any purpose including criminal prosecution. # ################################################################################ tm103@dscr-slogin-03's password: Last login by user tm103: Mon Jan 25 10:24 - 10:42 (00:17) from: dscr-slogin-02.oit.duke.edu -bash-4.1$
Setup the OSG connect client (only needs to be done once):
-bash-4.1$ connect setup Please enter the user name that you created during Connect registration. Note that it consists only of letters and numbers, with no @ symbol. You will be connecting via the login.duke.ci-connect.net server. Enter your Connect username: tm103 Password for tm103@login.duke.ci-connect.net: notice: Ongoing client access has been authorized at login.duke.ci-connect.net. notice: Use "connect test" to verify access.
Test the connect client:
-bash-4.1$ connect test Success! Your client access to login.duke.ci-connect.net is working.
Create the tutorial files:
-bash-4.1$ tutorial quickstart Installing quickstart (master)... Tutorial files installed in ./tutorial-quickstart. Running setup in ./tutorial-quickstart...
Change to the tutorial-quickstart directory:
-bash-4.1$ cd tutorial-quickstart -bash-4.1$ pwd /dscrhome/tm103/tutorial-quickstart
Look at the tutorial files:
-bash-4.1$ ls -l total 454 -rw-r--r--. 1 tm103 scsc 0 Oct 29 13:16 job.error -rw-r--r--. 1 tm103 scsc 28240 Oct 29 13:22 job.log -rw-r--r--. 1 tm103 scsc 273 Oct 29 13:22 job.output drwxrwxr-x. 3 tm103 scsc 6797 Dec 8 14:59 log -rw-rw-r--. 1 tm103 scsc 1204 Oct 28 21:14 osg-template-job.submit -rw-rw-r--. 1 tm103 scsc 12938 Oct 28 21:14 README.md -rwxrwxr-x. 1 tm103 scsc 296 Oct 28 21:14 short.sh -rw-r--r--. 1 tm103 scsc 0 Dec 8 14:29 testjob.error -rw-r--r--. 1 tm103 scsc 6127 Jan 25 10:28 testjob.log -rw-r--r--. 1 tm103 scsc 250 Jan 25 2016 testjob.output -rw-rw-r--. 1 tm103 scsc 800 Dec 8 14:28 tutorial01.submit -rw-rw-r--. 1 tm103 scsc 204 Oct 28 21:14 tutorial02.submit -rw-rw-r--. 1 tm103 scsc 237 Oct 28 21:14 tutorial03.submit drwxrwxr-x. 3 tm103 scsc 220 Jan 7 10:28 tutorial-quickstart
Edit the first sample job script and change the project name from +ProjectName=”ConnectTrain” to +ProjectName=”duke-campus”.
-bash-4.1$ vim tutorial01.submit
-bash-4.1$ cat tutorial01.submit # The UNIVERSE defines an execution environment. You will almost always use VANILLA. Universe = vanilla # EXECUTABLE is the program your job will run It's often useful # to create a shell script to "wrap" your actual work. Executable = short.sh Arguments = 10 # ERROR and OUTPUT are the error and output channels from your job # that HTCondor returns from the remote host. Error = testjob.error Output = testjob.output # The LOG file is where HTCondor places information about your # job's status, success, and resource consumption. Log = testjob.log # +ProjectName is the name of the project reported to the OSG accounting system # +ProjectName="ConnectTrain" +ProjectName="duke-campus" # QUEUE is the "start button" - it launches any jobs that have been # specified thus far. Queue 1
Look at the short.sh script:
-bash-4.1$ cat short.sh #!/bin/bash # short.sh: a short discovery job printf "Start time: "; /bin/date printf "Job is running on node: "; /bin/hostname printf "Job running as user: "; /usr/bin/id printf "Job is running in directory: "; /bin/pwd echo echo "Working hard..." sleep ${1-15} echo "Science complete!"
Submit the sample job:
-bash-4.1$ connect submit tutorial01.submit ..............+..+++++++++........................................................................................................................................................................................................... 10 objects sent; 219 objects up to date; 0 errors Submitting job(s). 1 job(s) submitted to cluster 123956.
Check the progress of the job:
-bash-4.1$ connect q -- Submitter: duke-login.osgconnect.net : <192.170.227.203:60920> : duke-login.osgconnect.net ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 123956.0 tm103 1/25 09:38 0+00:00:00 I 0 0.0 short.sh 10 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended -bash-4.1$ connect q -- Submitter: duke-login.osgconnect.net : <192.170.227.203:60920> : duke-login.osgconnect.net ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 123956.0 tm103 1/25 09:38 0+00:00:05 R 0 0.0 short.sh 10 1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended -bash-4.1$ connect q -- Submitter: duke-login.osgconnect.net : <192.170.227.203:60920> : duke-login.osgconnect.net ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
Retrieve the output files (important!):
-bash-4.1$ connect pull +.......++...+.................................................................................................................................................................................................................. 4 objects retrieved; 220 objects up to date; 0 errors
Look at the job output:
-bash-4.1$ cat testjob.output Start time: Mon Jan 25 09:38:38 CST 2016 Job is running on node: iut2-c085.iu.edu Job running as user: uid=21039(osg) gid=21000(osgvo) groups=21000(osgvo) Job is running in directory: /var/lib/condor/execute/dir_1093243/glide_6VG8SA/execute/dir_1099394 Working hard... Science complete! -bash-4.1$