Login to TRITON:
ssh -l your_user_name tscc-login.sdsc.eduHow to generate SSH keys:
Windows: http://kb.site5.com/shell-access-ssh/how-to-generate-ssh-keys-and-connect-to-your-account-with-putty/ http://wiki.joyent.com/wiki/display/jpc2/Manually+Generating+Your+SSH+Key+in+Windows MAC OS X: http://wiki.joyent.com/wiki/display/jpc2/Manually+Generating+your+SSH+Key+in+Mac+OS+X After generating the SSH keys, please send the PUBLIC KEY to stefan@ucsb.eduX Window System Server for Windows:
http://sourceforge.net/projects/xming/ Xming is the leading X Window System Server for Microsoft Windows 8/7/Vista/XP. It is fully featured, small and fast, simple to install and because it is standalone native Microsoft Windows, easily made portable (not needing a machine-specific installation).WinSCP:
http://winscp.net/eng/index.php WinSCP is an open source free SFTP client, FTP client, WebDAV client and SCP client for Windows. Its main function is file transfer between a local and a remote computer. Beyond this, WinSCP offers scripting and basic file manager functionality.Accounting on TRITON:
gbalance -u usernameFile transfer to/from TRITON:
To copy the file "pi.c" from a ENGR machine to TRITON: scp pi.c your_SDSC_username@tscc-login.sdsc.edu:pi.c To copy the file pi.c from TRITON to the ENGR domain machines: scp pi.c your_ENGR_username@linux.engr.ucsb.edu:pi.cModules
Here are some common module commands and their descriptions: module list - List the modules that are currently loaded module avail - List the modules that are available module display "module_name" - Show the environment variables used by "module name" and how they are affected module unload "module name" - Remove "module name" from the environment module load "module name" - Load "module name" into the environment module switch "module 1 name" "module 2 name" - Replace "module 1 name" with "module 2 name" in the enviornmentCompiling
In general, the login node should be used only to edit, compile software and to submit jobs to the scheduler.
NEVER RUN A JOB ON THE LOGIN NODE. Jobs should be run only on Triton's compute nodes.
Serial programCompile your programs with pgcc, pgf77, and pgf90 (Portland Group Compilers), or icc, ifort (Intel Compilers) or gcc, g77, gfortran (GNU Compilers). icc [options] file.c C ifort [options] file.f Fortran Example: % % icc -o serial serial.c %MPI programMPI source codes should be recompiled for the Triton system with the following compiler commands: mpicc [options] file.c C & C++ [myrinet/mx switch & Portland Compiler] mpif77 [options] file.f Fortran 77 [myrinet/mx switch & Portland Compiler] mpif90 [options] file.f90 Fortran 90 [myrinet/mx switch & Portland Compiler}OPENMP programOPENMP source codes should be recompiled for the Triton system with the following compiler commands: module purge module load intel module load openmpi_mx icpc -openmp -o execfile -o file.c To run: export OMP_NUM_THREADS=8 ./execfileOPENMP-HYBRID programOPENMP source codes should be recompiled for the Triton system with the following compiler commands: module purge module load intel module load openmpi_mx mpicc -openmp -o execfile -o file.c To run: export OMP_NUM_THREADS=8 mpirun -machinefile $PBS_NODEFILE -np 2 ./execfileCILK
by Veronica Strnadova - Computer Science DepartmentThe Intel Cilk Plus SDK, which provides the Cilk screen race detector and Cilk view scalability analyzer, can be doawnloaded from Intel Cilk Plus SDK Here is an example Cilk program: simplecilkprogram.cpp To compile, use icc, like this: icc simplecilkexample.cpp -o simplecilkexample Then, to run, type: ./simplecilkexample You should see output that looks like this: result=832040 Now, to run the Cilk screen race detector, you just need to know where the cilkscreen executable is, and run it with "simplecilkprogram" as an argument. If cilkscreen is under: /opt/cilkutil/bin/, then we can run cilkscreen like this: /opt/cilkutil/bin/cilkscreen simplecilkprogram We get the following output: Cilkscreen Race Detector V2.0.0, Build 3229 result=832040 No errors found by Cilkscreen Similarly, we can run the Cilk view scalability analyzer like this: /opt/cilkutil/bin/cilkview simplecilkprogram And we get output that starts with: Cilkview: Generating scalability data Cilkview Scalability Analyzer V2.0.0, Build 3229 result=832040 The output goes on to report a "Parallelism Profile" and a "Speedup Estimate" for both the program as a whole and the "parallel region" of the program. Here is a link to a short summary of the cilkscreen and cilkview tools:Cilk Tools Tutorial And here is a link to the Cilk++ SDK Programmer's guide, although it is for Cilk++ and not Cilk Plus. I haven't been able to find an equivalent version of a programmer's guide for Cilk Plus, but I'll keep looking: Intel Cilk++ Programmers Guide> Finally, I think this e-book is very helpful as an introduction to Cilk for anyone that wants to read it (off of prof. John Gilbert's web page): CilkBookRunning
When you have a job running, you are allocated the nodes requested. At that time, a PBS prologue script runs that allows you direct ssh access to your nodes. At the conclusion of your job, that privilege is removed. Interactive You can use "qsub -I" to get exclusive access to a set of nodes, where you can perform interactive analyses. If you need one processor: qsub -I -l walltime=00:10:00 Examples: To run an interactive job with a wall clock limit of 30 minutes, using two nodes and two processors per node: $ qsub -I -l walltime=00:30:00 -l nodes=2:ppn=2 qsub: waiting for job 75.tscc-mgr.local to start qsub: job 75.tscc-mgr.local ready $ echo $PBS_NODEFILE /var/spool/torque/aux//1083840.tscc-mgr.local Then you can use "more" or other editors, such as "vi" to see the information contained in the file. For example, in this particular case, four processors were allocated as requested. Two of those are located on node 39, and the other two on node 36. $ more /var/spool/torque/aux//1083840.tscc-mgr.local tscc-0-39 tscc-0-39 tscc-0-36 tscc-0-36 To run a job: $ mpirun -machinefile $PBS_NODEFILE -np 4 execfile Batch See: Running Batch Jobs http://idi.ucsd.edu/computing/jobs/index.html Example: Script file for the HOTEL queue: #!/bin/csh #PBS -q hotel #PBS -N hello #PBS -l nodes=1:ppn=4 #PBS -l walltime=0:05:00 #PBS -o hello-out #PBS -e hello-err #PBS -V cd /home/u4078/cs140/compile-run mpirun -v -machinefile $PBS_NODEFILE -np 4 mpi_hello > h-outNumerical Libraries & Peformance Tools
NUMERICAL LIBRARIES
The Portland Group compilers come with the Optimized ACML library (LAPACK/BLAS/FFT). ACML user guide is in the following location: /opt/pgi/linux86-64/8.0-6/doc/acml.pdf Example BLAS, LAPACK, FFT codes in: /home/diag/examples/ACML Compile and link as follows: pgf90 dzfft_example.f -L/opt/pgi/linux86-64/8.0-6/lib -lacml pgcc -L/opt/pgi/linux86-64/8.0-6/lib lapack_dgesdd.c -lacml -lm -lpgftnrtl -lrt pgcc -L/opt/pgi/linux86-64/8.0-6/lib blas_cdotu.c -lacml -lm -lpgftnrtl -lrt
Intel Intel has developed Math Kernel Library (MKL) which contains many linear algebra, FFT and other useful numerical routines. * Basic linear algebra subprograms (BLAS) with additional sparse routines * Fast Fourier Transforms (FFT) in 1 and 2 dimensions, complex and real * The linear algebra package, LAPACK * A C interface to BLAS * Vector Math Library (VML) * Vector Statistical Library (VSL) * Multi-dimensional Discrete Fourier Transforms (DFTs) To link the MKL libraries To link the MKL libraries, please refer to the Intel MKL Link Line Advisor Web page. This tool accepts inputs for several variables based on your environment and automatically generates a link line for you. When using the output generated by this site, substitute the Triton path of the Intel MKL for the value $MKLPATH in the generated script. That value is${MKL_ROOT}/lib/em64t. Examples in the following directory: /home/diag/examples/MKL LAPACK example using MKL Compile as follows: export MKLPATH=/opt/intel/Compiler/11.1/072/mkl ifort dgebrdx.f -I$MKLPATH/include $MKLPATH/lib/em64t/libmkl_solver_lp64_sequential.a -Wl,--start-group $MKLPATH/lib/em64t/libmkl_intel_lp64.a $MKLPATH/lib/em64t/libmkl_sequential.a $MKLPATH/lib/em64t/libmkl_core.a -Wl,--end-group libaux_em64t_intel.a -lpthread Output: ./a.out < dgebrdx.d ScaLAPACK example using MKL Sample test case (from MKL examples) is in: /home/diag/examples/scalapack The make file is set up to compile all the tests. Procedure: module purge module load intel module load openmpi_mx make libem64t compiler=intel mpi=openmpi LIBdir=/opt/intel/Compiler/11.1/072/mkl/lib/em64t Sample link line (to illustrate how to link for scalapack): /opt/openmpi/bin/mpicc -o mm_pblas mm_pblas.c -L/opt/intel/Compiler/11.1/072/mkl/lib/em64t /opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_scalapack_lp64.a /opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_blacs_openmpi_lp64.a -L/opt/intel/Compiler/11.1/072/mkl/lib/em64t /opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_intel_lp64.a -Wl,--start-group /opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_sequential.a /opt/intel/Compiler/11.1/072/mkl/lib/em64t/libmkl_core.a -Wl,--end-group -lpthread mm_pblas.c
GPROF
GPROF is the GNU Project PROFiler. Requires recompilation of the code. Compiler options and libraries provide wrappers for each routine call and periodic sampling of the program. A default gmon.out file is produced with the function call information. GPROF links the symbol list in the executable with the data in gmon.out. Types of Profiles Flat Profile CPU time spend in each function (self and cumulative) Number of times a function is called Useful to identify most expensive routines Call Graph Number of times a function was called by other functions Number of times a function called other functions Useful to identify function relations Suggests places where function calls could be eliminated Use the -pg flag during compilation: % gcc -g -pg ./srcFile.c % icc -g -p ./srcFile.c % pgcc -g -pg ./srcFile.c Run the executable. An output file gmon.out will be generated with the profiling information. Execute gprof and redirect the output to a file: % gprof ./exeFile gmon.out > profile.txt % gprof -l ./exeFile gmon.out > profile_line.txt
IPM
IPM is a portable profiling infrastructure for parallel codes. It provides a low-overhead performance profile of the performance aspects and resource utilization in a parallel program. On TRITON the library is located in: /opt/ipm To run: module unload openmpi_ib module load mvapich2_ib module load ipm module load papi qsub -I -l walltime=00:30:00 -l nodes=1:ppn=2 mpicc mpi_hello.c -L$IPMHOME/lib -L$PAPIHOME/lib -lipm -lpapi mpirun_rsh -np 4 -hostfile $PBS_NODEFFILE ./a.outFPMPI
FPMPI Is a simple MPI profiling library. It is intended as a first step towards understanding the nature of the communication patterns and potential bottlenecks in existing applications. Applications run which are linked to FPMPI will generate an output file, fpmpi_profile.txt. This file contains: * description: A brief description of fpmpi_profile.txt format. * synchronization data: A listing of the synchronizing routines used and some related profile data. * asynchronous communication data: A listing of the asynchronous communication routines used and some related profile data. * topology data: A brief output of the communication topology. On TRITON the library is located in: /opt/openmpi/intel/ib/lib To run, just relink with the library. For example: /opt/openmpi/intel/ib/bin/mpicc -o trap-fpmpi trap.c -L/opt/openmpi/intel/ib/lib -lfpmpi qsub -I -l walltime=00:20:00 -l nodes=1:ppn=4 mpirun -machinefile $PBS_NODEFILE -np 4 trap-fpmpi fpmpi_profile.txt
TAU/PDT
TAU Performance System is a portable profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, Java, Python. TAU's profile visualization tool, paraprof, provides graphical displays of all the performance analysis results, in aggregate and single node/context/thread forms. New users may find this TAU-workshop tutorial helpful, which includes also the following lab exercises. TAU location: /opt/tau/ PAPI location: /opt/papi/ PAPI - Performance Application Programming Interface provides the tool designer and application engineer with a consistent interface and methodology for use of the performance counter hardware found in most major microprocessors. PAPI enables software engineers to see, in near real time, the relation between software performance and processor events. Load the TAU environment: module load tau module load papi export PATH=/opt/tau/intel/openmpi/x86_64/bin:$PATH export LD_LIBRARY_PATH=/opt/tau/intel/openmpi/x86_64/lib:$LD_LIBRARY_PATH Select the appropiate TAU MAKEFILE based on your choices. For example: /opt/tau/intel/openmpi_ib/x86_64/lib/Makefile.tau-icpc-mpi-pdt So, we set it up: % export TAU_MAKEFILE=/opt/tau/intel/openmpi_ib/x86_64/lib/Makefile.tau-icpc-mpi-pdt And we compile using the wrapper provided by tau: % tau_cc.sh trap.c or, for Makefiles, edit Makefile and change mpif90/mpicc = tau_f90.sh/tau_cc.sh. Run the job through the queue normally. We obtain the following profile files [on 4 processors]: profile.0.0.0, profile.1.0.0, profile.2.0.0 & profile.3.0.0 Analyze performance data: pprof - for text based display - output of PPROF paraprof - for GUI jumpshot - for GUI - Using Jumpshot-4 GUI environment: a. On PC systems [PUTYY]: select X11 forwarding. On Linux & MAC OS: ssh -X ... b. On TRITON, - Connect to the compute nodes, with X forwarding: qsub -I -X -l walltime=00:20:00 -l nodes=1:ppn=4 - Go to the directory where the "profile.0.0.0, etc." are stored. - Set the TAU path: module load tau module load papi export PATH=/opt/tau/intel/openmpi_ib/x86_64/bin:$PATH export LD_LIBRARY_PATH=/opt/tau/intel/openmpi_ib/x86_64/lib:$LD_LIBRARY_PATH Use 'paraprof', to analyze performance data: paraprof To use the trace option: - after compiling, set the environmental variable: export TAU_TRACE=1 - run the code: mpirun -machinefile $PBS_NODEFILE -np 4 ./execfile - execute the following commands: tau_treemerge.pl tau2slog2 tau.trc tau.edf -o app.slog2 - run jumpshot: jumpshot app.slog2
HADOOP on Triton
Here is some more info on what the scripts do and the setup involved before the run: #1 Setup In the persistent mode myHadoop's configure script needs the location to use and subdirectories named by numbers. For example in my test above I chose the following location: /oasis/triton/scratch/diag/hadoop/data and made the following 4 directories in this location: [diag@tcc-3-43 data]$ mkdir 1 2 3 4 [diag@tcc-3-43 data]$ ls 1 2 3 4 #2 First run of the persistent set up (myhadoop_persistent_setup.cmd). This looks just like the normal example, except the configure.sh script is given the persistent option: $MY_HADOOP_HOME/bin/configure.sh -n 4 -c $HADOOP_CONF_DIR -p -d /oasis/triton/scratch/diag/hadoop/data (its point to the base location we created above) In the example we copy in the .bashrc file which we will look for in the second run to make sure we still have the data from the first hadoop run. In my example the job ran on the following nodes: tcc-3-45 tcc-3-51 tcc-3-52 tcc-3-53 #3 The above job completed and now we will check if we can spin up the same hadoop cluster using a second job and potentially different set of compute nodes (myhadoop_persistent_restart.cmd). The changes we make in the script: (a) *Do not* format the HDFS, we have this line commented out. This will enable us to keep the data from the previous run. (b) After cluster start up, move dfs out of safemode (to allow for writes on the second run); $HADOOP_HOME/bin/hadoop dfsadmin -safemode leave Note that we are still running the configure script because the new compute nodes need to be in the configuration files for hadoop. In this example I list the contents of the test HDFS directory to verify that the old data is still there and then copy in another file. Sample output of dfs ls: Found 1 items -rw-r--r-- 3 diag supergroup 878 2013-02-27 03:16 /user/diag/Test/.bashrc Found 2 items -rw-r--r-- 3 diag supergroup 16450 2013-02-27 03:26 /user/diag/Test/.bash_history -rw-r--r-- 3 diag supergroup 878 2013-02-27 03:16 /user/diag/Test/.bashrc (First ls shows we still have the .bashrc from the first run and then the second shows the newly copied in .bash_history file) #4 If you check the directories in /oasis/triton/scratch/diag/hadoop/data you can see the data from HDFS in the numbered directories (1,2,3,4). For example: $ ls /oasis/triton/scratch/diag/hadoop/data/2 dfs mapred Only note of caution is to restrict this to small job sizes (4 nodes is fine) as lustre has meta data limitations that might show up if you try a large hadoop job.
Notes & Examples - Stampede
Stampede User's Guide
Sample Programs
Serial MPI OPENMP
OPENMP-MPI Hybrid
CILK
CILKPLUS
CUDA
PAPI and TAU