Refactor the GPU CUDA documentation. Organized for both

RHEL and Ubuntu sharing common pages. Made sure the documentation reflects the chanages made for diskless support in Issue #314 and Issue #316
2025-07-15 17:16:10 +00:00 · 2015-10-28 16:38:49 -04:00
parent 0d023445d0
commit 6b54a9dd76
14 changed files with 639 additions and 646 deletions
--- a/docs/source/advanced/gpu/cuda_rhel.rst
+++ b/docs/source/advanced/gpu/cuda_rhel.rst
@ -1,307 +0,0 @@
-
-CUDA Installation
-=================
-
-Overview
--------
-
-CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model created by NVIDIA.  It can be used by the graphics processing units (GPUs) for general purpose processing.
-
-xCAT supports CUDA installation for Ubuntu and rhel7.2 on PowerNV (p8le node with NVIDIA GPU Support) for both diskless and diskful nodes.  The CUDA packages provided by NVIDIA include both the runtime libraries for computing and development tools for programming and monitoring. The full package set is very large, so in xCAT, it's suggested that the packages be split into two package sets: 
-
-#. **cudaruntime** package set 
-#. **cudafull** package set  
-
-It's suggested to only installing the **cudaruntime** package set on the Compute Nodes (CNs), and the **cudafull** package set on the Management Node or the monitor/development nodes.
-
-In this documentation, xCAT will provide CUDA installation based on rhel7.2 Power 8 firestone nodes.  User can find Ubuntu CUDA installation document here: http://sourceforge.net/p/xcat/wiki/xCAT_P8LE_cuda_installing/
-
-
-CUDA Repository
---------------
-
-The NIVIDIA CUDA Toolkit is available at http://developer.nvidia.com/cuda-downloads.  User can download the CUDA Toolkit based on the target platform.   
-
-Prepare a local repository directory which contains all the CUDA packages and repository meta-data. User can either 
-
-* Create a repository on the Management node installing the CUDA toolkit:  ::
-
-   mkdir -p /install/cuda-repo/cuda-7-5   
-   rpm -ivh cuda-repo-rhel7-7-5-local-7.5-18.ppc64le.rpm
-   cp /var/cuda-repo-7-5-local/* /install/cuda-repo/cuda-7-5
-   cd /install/cuda-repo/cuda-7-5
-   createrepo .
-
-* Create a repository on the Management node without installing the CUDA toolkit: ::
-   
-   mkdir -p /tmp/cuda
-   cd /tmp/cuda
-   rpm2cpio /root/cuda-repo-rhel7-7-5-local-7.5-18.ppc64le.rpm | cpio -i -d
-   cp /tmp/cuda/var/cuda-repo-7-5-local/* /install/cuda-repo/cuda-7-5
-   cd /install/cuda-repo/cuda-7-5
-   createrepo .
-
-
-The NVIDIA driver RPM packages depend on other external packages, such as DKMS and maybe EPEL (firestone node doesn't need this package).  Users need to download those package to the directory. ::
-   
-  mkdir -p /install/cuda-repo/cuda-deps  
-  ls -ltr /install/cuda-repo/cuda-deps
-    -rw-r--r-- 1 root root 79048 Oct  5 10:58 dkms-2.2.0.3-30.git.7c3e7c5.el7.noarch.rpm  
-  cd /install/cuda-repo/cuda-deps
-  createrepo .
-
-
-CUDA osimage
------------
-User can generate a new CUDA osimage object based on another osimage definition or just modify an existing osimage.  xCAT provides some sample CUDA pkglist files:
-
-
-* diskful provisioning in ``/opt/xcat/share/install/rh/`` for ``cudafull`` and ``cudaruntime``:  :: 
-
-
-    #cat /opt/xcat/share/xcat/install/rh/cudafull.rhels7.pkglist
-      #INCLUDE:compute.rhels7.pkglist#
-      #For Cuda 7.5
-      kernel-devel
-      gcc
-      pciutils
-      dkms
-      cuda
-    #cat /opt/xcat/share/xcat/install/rh/cudaruntime.rhels7.pkglist
-      #INCLUDE:compute.rhels7.pkglist#
-      #For Cuda 7.5
-      kernel-devel
-      gcc
-      pciutils
-      dkms
-      cuda-runtime-7-5
-
-
-* diskless provisioning in ``/opt/xcat/share/xcat/netboot/rh`` for ``cudafull`` and ``cudaruntime``: ::
-
-    #cat /opt/xcat/share/xcat/netboot/rh/cudafull.rhels7.ppc64le.pkglist
-      #INCLUDE:compute.rhels7.ppc64.pkglist#
-      #For Cuda 7.5
-      kernel-devel
-      gcc
-      pciutils
-      dkms
-      cuda
-    #cat /opt/xcat/share/xcat/netboot/rh/cudaruntime.rhels7.ppc64le.pkglist
-      #INCLUDE:compute.rhels7.ppc64.pkglist#
-      #For Cuda 7.5
-      kernel-devel
-      gcc
-      pciutils
-      dkms
-      cuda-runtime-7-5
-
-
-**NOTE: After CUDA are installed, the nodes require a reboot**
-
-* For the diskful installation,  the CUDA packages should be included in the ``pkglist`` field so a reboot happens automatically after the OS is installed.  
-* For the diskless installation, the CUDA package can be included either in ``otherpkglist`` or ``pktlist`` fields.  
-
-The following are some sample osimage definitions:   
- 
-* The diskful cudafull installation osimage object. ::
-
-    #lsdef -t osimage rhels7.2-ppc64le-install-cudafull
-      Object name: rhels7.2-ppc64le-install-cudafull
-      imagetype=linux
-      osarch=ppc64le
-      osdistroname=rhels7.2-ppc64le
-      osname=Linux
-      osvers=rhels7.2
-      otherpkgdir=/install/post/otherpkgs/rhels7.2/ppc64le
-      pkgdir=/install/rhels7.2/ppc64le,/install/cuda-repo
-      pkglist=/opt/xcat/share/xcat/install/rh/cudafull.rhels7.pkglist
-      profile=compute
-      provmethod=install
-      template=/opt/xcat/share/xcat/install/rh/compute.rhels7.tmpl
-
-
-* The diskful cudaruntime installation osimage object. ::
-
-    #lsdef -t osimage rhels7.2-ppc64le-install-cudaruntime
-      Object name: rhels7.2-ppc64le-install-cudaruntime
-      imagetype=linux
-      osarch=ppc64le
-      osdistroname=rhels7.2-ppc64le
-      osname=Linux
-      osvers=rhels7.2
-      otherpkgdir=/install/post/otherpkgs/rhels7.2/ppc64le
-      pkgdir=/install/rhels7.2/ppc64le,/install/cuda-repo
-      pkglist=/opt/xcat/share/xcat/install/rh/cudairuntime.rhels7.pkglist
-      profile=compute
-      provmethod=install
-      template=/opt/xcat/share/xcat/install/rh/compute.rhels7.tmpl
-
-
-* The diskless cudafull installation osimage object. ::
-
-    #lsdef -t osimage rhels7.2-ppc64le-netboot-cudafull
-      Object name: rhels7.2-ppc64le-netboot-cudafull
-      imagetype=linux
-      osarch=ppc64le
-      osdistroname=rhels7.2-ppc64le
-      osname=Linux
-      osvers=rhels7.2
-      otherpkgdir=/install/post/otherpkgs/rhels7.2/ppc64le
-      permission=755
-      pkgdir=/install/rhels7.2/ppc64le,/install/cuda-repo
-      pkglist=/opt/xcat/share/xcat/netboot/rh/cudafull.rhels7.ppc64le.pkglist
-      postinstall=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.ppc64le.postinstall
-      profile=compute
-      provmethod=netboot
-      rootimgdir=/install/netboot/rhels7.2/ppc64le/compute
-
-
-* The diskless cudaruntime installation osimage object. ::
-
-    #lsdef -t osimage rhels7.2-ppc64le-netboot-cudaruntime
-      Object name: rhels7.2-ppc64le-netboot-cudaruntime
-      imagetype=linux
-      osarch=ppc64le
-      osdistroname=rhels7.2-ppc64le
-      osname=Linux
-      osvers=rhels7.2
-      otherpkgdir=/install/post/otherpkgs/rhels7.2/ppc64le
-      permission=755
-      pkgdir=/install/rhels7.2/ppc64le,/install/cuda-repo
-      pkglist=/opt/xcat/share/xcat/netboot/rh/cudaruntime.rhels7.ppc64le.pkglist
-      postinstall=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.ppc64le.postinstall
-      profile=compute
-      provmethod=netboot
-      rootimgdir=/install/netboot/rhels7.2/ppc64le/compute
-
-
-
-Deployment of CUDA node
-----------------------
-
-* To provision diskful nodes: ::
-
-
-    nodeset <node> osimage=rhels7.2-ppc64le-install-cudafull
-    rsetboot <node> net
-    rpower <node> boot 
-
-
-* To provision diskless nodes: ::
-
-    genimage rhels7.2-ppc64le-netboot-cudafull
-    packimage rhels7.2-ppc64le-netboot-cudafull
-    nodeset <node> osimage=rhels7.2-ppc64le-netboot-cudafull
-    rsetboot <node> net
-    rpower <node> boot 
-
-
-
-Verification of CUDA Installation
---------------------------------
-
-**NOTE** For ``cudaruntime`` installation, it only provide the basic libraries that can bee used by other applications which works with GPU.  The following verification will not apply to ``cudaruntime`` installations.
-  
-After compute node booted, The Environment variable has to be set in order to use the CUDA toolkits.  The PATH variable needs to include ``/usr/local/cuda-7.5/bin`` and LD_LIBRARY_PATH variable needs to contain ``/usr/local/cuda-7.5/lib64`` on a 64-bit system, and ``/usr/local/cuda-7.5`` on a 32-bit system.
-
-* To change the environment variables for 64-bit operating systems ::
-
-    export PATH=/usr/local/cuda-7.5/bin:$PATH
-    export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH
-
-
-* To change the environment variable for 32-bit operating systems ::
-
-    export PATH=/usr/local/cuda-7.5/bin:$PATH
-    export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib:$LD_LIBRARY_PATH
-
-After Environment variables are set correctly, user can verify the CUDA installation by
-  
-* Verify the Driver Version ::
-    
-    #cat /proc/driver/nvidia/version
-      NVRM version: NVIDIA UNIX ppc64le Kernel Module  352.39  Fri Aug 14 17:10:41 PDT 2015
-      GCC version:  gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) 
-
-* The version of the CUDA Toolkits ::
-
-    #nvcc -V
-     nvcc: NVIDIA (R) Cuda compiler driver
-     Copyright (c) 2005-2015 NVIDIA Corporation
-     Built on Tue_Aug_11_14:31:50_CDT_2015
-     Cuda compilation tools, release 7.5, V7.5.17
-
-* Compiling the Examples, then can run a `deviceQuery` or `bandwidthTest` or other commands under the bin directory to ensure the system and the CUDA-capable device are able to communicate correctly  ::
-  
-    # mkdir -p /tmp/cuda
-    # cuda-install-samples-7.5.sh /tmp/cuda
-    # cd /tmp/cuda/NVIDIA_CUDA-7.5_Samples
-    # make
-    # cd bin/ppc64le/linux/release
-    # ./deviceQuery   
-      ./deviceQuery Starting...
-      CUDA Device Query (Runtime API) version (CUDART static linking)
-      Detected 4 CUDA Capable device(s)
-      Device 0: "Tesla K80"
-        CUDA Driver Version / Runtime Version          7.5 / 7.5
-        CUDA Capability Major/Minor version number:    3.7
-        Total amount of global memory:                 11520 MBytes (12079136768 bytes)
-        (13) Multiprocessors, (192) CUDA Cores/MP:     2496 CUDA Cores
-        GPU Max Clock rate:                            824 MHz (0.82 GHz)
-        Memory Clock rate:                             2505 Mhz
-        Memory Bus Width:                              384-bit
-        L2 Cache Size:                                 1572864 bytes
-        ............
-        deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.5, NumDevs = 4, Device0 = Tesla K80, Device1 = Tesla K80, Device2 = Tesla K80, Device3 = Tesla K80
-        Result = PASS
-
-    # ./bandwidthTest
-      [CUDA Bandwidth Test] - Starting...
-      Running on...
-      Device 0: Tesla K80
-      Quick Mode
-      Host to Device Bandwidth, 1 Device(s)
-      PINNED Memory Transfers
-        Transfer Size (Bytes)        Bandwidth(MB/s)
-        33554432                     7765.1
-      Device to Host Bandwidth, 1 Device(s)
-      PINNED Memory Transfers
-        Transfer Size (Bytes)        Bandwidth(MB/s)
-        33554432                     7759.6
-
-      Device to Device Bandwidth, 1 Device(s)
-      PINNED Memory Transfers
-        Transfer Size (Bytes)        Bandwidth(MB/s)
-        33554432                     141485.3
-
-      Result = PASS
-
-      NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
-
-
-
-* The tool `nvidia-smi` providied by NVIDIA driver can be used to do GPU management and monitoring. ::
-
-   #nvidia-smi -q
-     ==============NVSMI LOG==============
-
-     Timestamp                           : Mon Oct  5 13:43:39 2015
-     Driver Version                      : 352.39
-
-     Attached GPUs                       : 4
-     GPU 0000:03:00.0
-     Product Name                    : Tesla K80
-     Product Brand                   : Tesla
-     ...........................
-
-
-    
-
-
-
-
-
-
-  
-
--- a/docs/source/advanced/gpu/cuda_ubuntu.rst
+++ b/docs/source/advanced/gpu/cuda_ubuntu.rst
@ -1,337 +0,0 @@
-
-CUDA Installation Based on Ubuntu
-=================================
-
-Overview
--------
-
-CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model created by NVIDIA.  It can be used by the graphics processing units (GPUs) for general purpose processing.
-
-xCAT supports CUDA installation for Ubuntu 14.04.3 and RHEL 7.2LE on PowerNV (Non-Virtualized) for both diskless and diskful nodes.  The CUDA packages provided by NVIDIA include both the runtime libraries for computing and development tools for programming and monitoring. The full package set is very large, so in xCAT, it's suggested that the packages be split into two package sets: 
-
-#. **cudaruntime** package set 
-#. **cudafull** package set  
-
-It's suggested to only installing the **cudaruntime** package set on the Compute Nodes (CNs), and the **cudafull** package set on the Management Node or the monitor/development nodes.
-
-In this documentation, xCAT will provide CUDA installation based on Ubuntu 14.04.3 running on IBM Power Systems S822LC nodes.
-
-
-CUDA Repository
---------------
-
-Currently, there are 2 types of Ubuntu repos for installing cuda-7-0 on p8LE hardware: `The online repo <http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/ppc64el/cuda-repo-ubuntu1404_7.0-28_ppc64el.deb>`_ and `The local package repo <http://developer.download.nvidia.com/compute/cuda/7_0/Prod/local_installers/rpmdeb/cuda-repo-ubuntu1404-7-0-local_7.0-28_ppc64el.deb>`_. 
-
-**The online repo**
-
-The online repo will provide a sourcelist entry which includes the URL with the location of the cuda packages. The online repo can be used directly by Compute Nodes. The source.list entry will be similar to: ::
-
-   deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1410/ppc64el /
-
-**The local package repo**
-
-A local package repo will contain all of the cuda packages.The admin can either simply install the local repo (need to copy the whole /var/cuda-repo-7-0-local/ to /install/cuda-repo/)or extract the cuda packages into the local repo with the following command: ::
-
-   dpkg -x cuda-repo-ubuntu14xx-7-0-local_7.0-28_ppc64el.deb /install/cuda-repo/
-
-The following repos will be used in the test environment:
-
-* "/install/ubuntu14.04.2/ppc64el": The OS image package directory
-* "http://ports.ubuntu.com/ubuntu-ports": The internet mirror, if there is local mirror available, it can be replaced
-* "http://10.3.5.10/install/cuda-repo/var/cuda-repo-7-0-local /": The repo for cuda, you can replaced it with online cuda repo.
-
-
-CUDA osimage
------------
-User can generate a new CUDA osimage object based on another osimage definition or just modify an existing osimage.  xCAT provides some sample CUDA pkglist files:
-
-
-* diskful provisioning in ``/opt/xcat/share/install/ubuntu/`` for ``cudafull`` and ``cudaruntime``:  :: 
-
-
-    #cat /opt/xcat/share/xcat/install/ubuntu/cudafull.ubuntu14.04.3.ppc64el.pkglist
-      #INCLUDE:compute.ubuntu14.04.3.ppc64el.pkglist#
-      linux-headers-generic-lts-utopic
-      build-essential
-      dkms
-      
-      zlib1g-dev
-      
-      cuda
-    #cat /opt/xcat/share/xcat/install/ubuntu/cudaruntime.ubuntu14.04.3.ppc64el.pkglist
-      #INCLUDE:compute.ubuntu14.04.3.ppc64el.pkglist#
-      linux-headers-generic-lts-utopic
-      build-essential
-      dkms
-      
-      zlib1g-dev
-      
-      cuda-runtime-7-0
-
-
-* diskless provisioning in ``/opt/xcat/share/xcat/netboot/ubuntu`` for ``cudafull`` and ``cudaruntime``: ::
-
-
-    #cat /opt/xcat/share/xcat/netboot/ubuntu/cudafull.ubuntu14.04.3.ppc64el.pkglist
-      #INCLUDE:compute.ubuntu14.04.3.ppc64el.pkglist#
-      linux-headers-generic-lts-utopic
-      
-      build-essential
-      zlib1g-dev
-      dkms
-    #cat /opt/xcat/share/xcat/netboot/ubuntu/cudaruntime.ubuntu14.04.3.ppc64el.pkglist
-      #INCLUDE:compute.ubuntu14.04.3.ppc64el.pkglist#
-      linux-headers-generic-lts-utopic
-
-      build-essential
-      zlib1g-dev
-      dkms
-
-
-
-The following are some sample osimage definitions:   
-
-* The diskful cudafull installation osimage object. ::
-
-    #lsdef -t osimage ubuntu14.04.3-ppc64el-install-cudafull
-      Object name: ubuntu14.04.3-ppc64el-install-cudafull
-      imagetype=linux
-      osarch=ppc64el
-      osname=Linux
-      osvers=ubuntu14.04.3
-      otherpkgdir=/install/post/otherpkgs/ubuntu14.04.3/ppc64el
-      pkgdir=http://ports.ubuntu.com/ubuntu-ports trusty main,http://ports.ubuntu.com/ubuntu-ports trusty-updates main,http://10.3.5.10/install/cuda-repo/var/cuda-repo-7-0-local /,/install/ubuntu14.04.3/ppc64el
-      pkglist=/opt/xcat/share/xcat/install/ubuntu/cudafull.ubuntu14.04.3.ppc64el.pkglist
-      profile=cudafull
-      provmethod=install
-      template=/opt/xcat/share/xcat/install/ubuntu/cudafull.tmpl
-
-
-	  
-* The diskful cudaruntime installation osimage object. ::
-
-    #lsdef -t osimage ubuntu14.04.3-ppc64el-install-cudaruntime                          
-      Object name: ubuntu14.04.3-ppc64el-install-cudaruntime
-      imagetype=linux
-      osarch=ppc64el
-      osname=Linux
-      osvers=ubuntu14.04.3
-      otherpkgdir=/install/post/otherpkgs/ubuntu14.04.3/ppc64el
-      pkgdir=http://ports.ubuntu.com/ubuntu-ports trusty main,http://ports.ubuntu.com/ubuntu-ports trusty-updates main,http://10.3.5.10/install/cuda-repo/var/cuda-repo-7-0-local /,/install/ubuntu14.04.3/ppc64el
-      pkglist=/opt/xcat/share/xcat/install/ubuntu/cudaruntime.ubuntu14.04.3.ppc64el.pkglist
-      profile=cudaruntime
-      provmethod=install
-      template=/opt/xcat/share/xcat/install/ubuntu/cudaruntime.tmpl
-
-
-
-* The diskless cudafull installation osimage object. ::
-
-    #Object name: ubuntu14.04.3-ppc64el-netboot-cudafull
-      imagetype=linux
-      osarch=ppc64el
-      osname=Linux
-      osvers=ubuntu14.04.3
-      otherpkgdir=http://10.3.5.10/install/cuda-repo/var/cuda-repo-7-0-local /
-      otherpkglist=/opt/xcat/share/xcat/netboot/ubuntu/cudafull.otherpkgs.pkglist
-      permission=755
-      pkgdir=http://ports.ubuntu.com/ubuntu-ports trusty main,http://ports.ubuntu.com/ubuntu-ports trusty-updates main,/install/ubuntu14.04.3/ppc64el
-      pkglist=/opt/xcat/share/xcat/netboot/ubuntu/cudafull.ubuntu14.04.3.ppc64el.pkglist
-      profile=cudafull
-      provmethod=netboot
-      rootimgdir=/install/netboot/ubuntu14.04.3/ppc64el/cudafull
-
-
-
-* The diskless cudaruntime installation osimage object. ::
-
-    #Object name: ubuntu14.04.3-ppc64el-netboot-cudaruntime
-      imagetype=linux
-      osarch=ppc64el
-      osname=Linux
-      osvers=ubuntu14.04.3
-      otherpkgdir=http://10.3.5.10/install/cuda-repo/var/cuda-repo-7-0-local /
-      otherpkglist=/opt/xcat/share/xcat/netboot/ubuntu/cudaruntime.otherpkgs.pkglist
-      permission=755
-      pkgdir=http://ports.ubuntu.com/ubuntu-ports trusty main,http://ports.ubuntu.com/ubuntu-ports trusty-updates main,/install/ubuntu14.04.3/ppc64el
-      pkglist=/opt/xcat/share/xcat/netboot/ubuntu/cudaruntime.ubuntu14.04.3.ppc64el.pkglist
-      profile=cudaruntime
-      provmethod=netboot
-      rootimgdir=/install/netboot/ubuntu14.04.3/ppc64el/cudaruntime
-
-
-
-**Use addcudakey postscript to install GPGKEY for cuda packages**
-
-In order to access the cuda repo and authorize it, you will need to import the cuda GPGKEY into the apt key trust list.The following command can be used to add a postscript for a node that will install cuda. ::
-
-   chdef <node> -p postscripts=addcudakey
-
-**Install NVML (optional, for nodes which need to compile cuda related applications)**
-
-The NVIDIA Management Library (NVML) is a C-based programmatic interface for monitoring and managing various states within NVIDIA Tesla GPUs. It is intended to be a platform for building 3rd party applications.
-
-The NVML can be download from http://developer.download.nvidia.com/compute/cuda/7_0/Prod/local_installers/cuda_346.46_gdk_linux.run.
-
-After download NVML and put it under /install/postscripts on MN, the following steps can be used to have NVML installed after the node is installed and rebooted if needed. ::
-
-   chmod +x  /install/postscripts/cuda_346.46_gdk_linux.run
-   chdef <node> -p postbootscripts="cuda_346.46_gdk_linux.run --silent --installdir=<you_desired_dir>"
-
-Deployment of CUDA node
-----------------------
-
-* To provision diskful nodes: ::
-
-    nodeset <node> osimage=<diskfull_osimage_object_name>
-    rsetboot <node> net
-    rpower <node> boot 
-	
-* To provision diskless nodes:
-
-To generate stateless image for a diskless installation, the acpid is needed to be installed on MN or the host on which you generate stateless image. ::
-
-    apt-get  install -y acpid
-
-Then, use the following commands to generate stateless image and pack it. ::
-
-    genimage <diskless_osimage_object_name>
-    packimage <diskless_osimage_object_name>
-    nodeset <node> osimage=<diskless_osimage_object_name>
-    rsetboot <node> net
-    rpower <node> boot
-
-
-Verification of CUDA Installation
---------------------------------
-
-The command below can be used to display GPU or Unit info on the node. ::
-
-    nvidia-smi -q
-
-Verify the Driver Version. ::
-    
-    # cat /proc/driver/nvidia/version
-      NVRM version: NVIDIA UNIX ppc64le Kernel Module  346.46  Tue Feb 17 17:18:33 PST 2015
-      GCC version:  gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04)
-
-**GPU management and monitoring**
-
-The tool "nvidia-smi" provided by NVIDIA driver can be used to do GPU management and monitoring, but it can only be run on the host where GPU hardware, CUDA and NVIDIA driver is installed. The xdsh can be used to run "nvidia-smi" on GPU host remotely from xCAT management node.
-
-The using of xdsh will be like this: ::
-
-    # xdsh p8le-42l "nvidia-smi -i 0 --query-gpu=name,serial,uuid --format=csv,noheader"
-      p8le-42l: Tesla K40m, 0324114102927, GPU-8750df00-40e1-8a39-9fd8-9c29905fa127
-
-Some of the useful nvidia-smi command for monitoring and managing of GPU are as belows, for more information, pls read nvidia-smi manpage.
-
-* For monitoring: ::
-	
-    *The number of NVIDIA GPUs in the system.
-      nvidia-smi --query-gpu=count --format=csv,noheader
-    *The version of the installed NVIDIA display driver
-      nvidia-smi -i 0 --query-gpu=driver_version --format=csv,noheader
-    *The BIOS of the GPU board
-      nvidia-smi -i 0 --query-gpu=vbios_version --format=csv,noheader
-    *Product name, serial number and UUID of the GPU
-      nvidia-smi -i 0 --query-gpu=name,serial,uuid --format=csv,noheader
-    *Fan speed
-      nvidia-smi -i 0 --query-gpu=fan.speed --format=csv,noheader
-    *The compute mode flag indicates whether individual or multiple compute applications may run on the GPU. Also known as exclusivity modes
-      nvidia-smi -i 0 --query-gpu=compute_mode --format=csv,noheader
-    *Percent of time over the past sample period during which one or more kernels was executing on the GPU
-      nvidia-smi -i 0 --query-gpu=utilization.gpu --format=csv,noheader
-    *Total errors detected across entire chip. Sum of device_memory, register_file, l1_cache, l2_cache and texture_memory
-      nvidia-smi -i 0 --query-gpu=ecc.errors.corrected.aggregate.total --format=csv,noheader
-    *Core GPU temperature, in degrees C
-      nvidia-smi -i 0 --query-gpu=temperature.gpu --format=csv,noheader
-    *The ECC mode that the GPU is currently operating under
-      nvidia-smi -i 0 --query-gpu=ecc.mode.current --format=csv,noheader
-    *The power management status
-      nvidia-smi -i 0 --query-gpu=power.management --format=csv,noheader
-    *The last measured power draw for the entire board, in watts
-      nvidia-smi -i 0 --query-gpu=power.draw --format=csv,noheader
-    *The minimum and maximum value in watts that power limit can be set to.
-      nvidia-smi -i 0 --query-gpu=power.min_limit,power.max_limit --format=csv
-	
-* For management: ::
-	
-    *Set persistence mode, When persistence mode is enabled the NVIDIA driver remains loaded even when no active clients, DISABLED by default
-      nvidia-smi -i 0 -pm 1
-    *Disabled ECC support for GPU. Toggle ECC support, A flag that indicates whether ECC support is enabled, need to use --query-gpu=ecc.mode.pending to check. Reboot required.
-      nvidia-smi -i 0 -e 0
-    *Reset the ECC volatile/aggregate error counters for the target GPUs
-      nvidia-smi -i 0 -p 0/1
-    *Set MODE for compute applications, query with --query-gpu=compute_mode
-      nvidia-smi -i 0 -c 0/1/2/3
-    *Trigger reset of the GPU.
-      nvidia-smi -i 0 -r
-    *Enable or disable Accounting Mode, statistics can be calculated for each compute process running on the GPU, query with -query-gpu=accounting.mode
-      nvidia-smi -i 0 -am 0/1
-    *Specifies maximum power management limit in watts, query with --query-gpu=power.limit.
-      nvidia-smi -i 0 -pl 200
-
-**Installing CUDA example applications**
-
-The cuda-samples-7-0 pkgs include some CUDA examples which can help uses to know how to use cuda.For a node which only cuda runtime libraries installed, the following command can be used to install cuda-samples package. ::
-
-    apt-get install cuda-samples-7-0 -y
-	
-After cuda-sample-7-0 has been installed, go to /usr/local/cuda-7.0/samples to build the examples. See this link https://developer.nvidia.com/ for more information. Or, you can simply run the make command under dir /usr/local/cuda-7.0/samples to build all the tools.
-
-The following command can be used to build the deviceQuery tool in the cuda samples directory: ::
-
-    # pwd
-      /usr/local/cuda-7.0/samples
-    # make -C 1_Utilities/deviceQuery 
-      make: Entering directory `/usr/local/cuda-7.0/samples/1_Utilities/deviceQuery'
-      /usr/local/cuda-7.0/bin/nvcc -ccbin g++ -I../../common/inc  -m64    -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 -o deviceQuery.o -c deviceQuery.cpp
-      /usr/local/cuda-7.0/bin/nvcc -ccbin g++   -m64      -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_52,code=compute_52 -o deviceQuery deviceQuery.o 
-      mkdir -p ../../bin/ppc64le/linux/release
-      cp deviceQuery ../../bin/ppc64le/linux/release
-      make: Leaving directory `/usr/local/cuda-7.0/samples/1_Utilities/deviceQuery'
-
-The verification results from this example on a test node were: ::
-
-    # pwd
-      /usr/local/cuda-7.0/samples
-    # bin/ppc64le/linux/release/deviceQuery 
-      bin/ppc64le/linux/release/deviceQuery Starting...
-      CUDA Device Query (Runtime API) version (CUDART static linking)
-      Detected 4 CUDA Capable device(s)
-	  Device 0: "Tesla K80"
-        CUDA Driver Version / Runtime Version          7.0 / 7.0
-        CUDA Capability Major/Minor version number:    3.7
-        Total amount of global memory:                 11520 MBytes (12079136768 bytes)
-        (13) Multiprocessors, (192) CUDA Cores/MP:     2496 CUDA Cores
-        GPU Max Clock rate:                            824 MHz (0.82 GHz)
-        Memory Clock rate:                             2505 Mhz
-        Memory Bus Width:                              384-bit
-        L2 Cache Size:                                 1572864 bytes
-        Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
-        Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
-        Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
-        Total amount of constant memory:               65536 bytes
-        Total amount of shared memory per block:       49152 bytes
-        Total number of registers available per block: 65536
-        Warp size:                                     32
-        Maximum number of threads per multiprocessor:  2048
-        Maximum number of threads per block:           1024
-        Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
-        Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
-        Maximum memory pitch:                          2147483647 bytes
-        Texture alignment:                             512 bytes
-        Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
-        Run time limit on kernels:                     No
-        Integrated GPU sharing Host Memory:            No
-        Support host page-locked memory mapping:       Yes
-        Alignment requirement for Surfaces:            Yes
-        Device has ECC support:                        Enabled
-        Device supports Unified Addressing (UVA):      Yes
-        Device PCI Domain ID / Bus ID / location ID:   0 / 3 / 0
-        Compute Mode:
-           < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
-      Device 1: "Tesla K80"
-        CUDA Driver Version / Runtime Version          7.0 / 7.0
-        ......
--- a/docs/source/advanced/gpu/index.rst
+++ b/docs/source/advanced/gpu/index.rst
@ -4,5 +4,4 @@ GPUs
 .. toctree::
   :maxdepth: 2

-   cuda_rhel.rst
-   cuda_ubuntu.rst
+   nvidia/index.rst
--- a/docs/source/advanced/gpu/nvidia/deploy_cuda_node.rst
+++ b/docs/source/advanced/gpu/nvidia/deploy_cuda_node.rst
@ -0,0 +1,22 @@
+Deploy CUDA nodes
+=================
+
+Diskful 
+-------
+
+* To provision diskful nodes using osimage ``rhels7.2-ppc64le-install-cudafull``: ::
+
+    nodeset <noderange> osimage=rhels7.2-ppc64le-install-cudafull
+    rsetboot <noderange> net
+    rpower <noderange> boot 
+
+
+Diskless
+--------
+
+* To provision diskless nodes using osimage ``rhels7.2-ppc64le-netboot-cudafull``: ::
+
+    nodeset <noderange> osimage=rhels7.2-ppc64le-netboot-cudafull
+    rsetboot <noderange> net
+    rpower <noderange> boot 
+
--- a/docs/source/advanced/gpu/nvidia/index.rst
+++ b/docs/source/advanced/gpu/nvidia/index.rst
@ -0,0 +1,19 @@
+NVIDIA CUDA
+===========
+
+CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model created by NVIDIA.  It can be used to increase computing performance by leveraging the Graphics Processing Units (GPUs).
+
+For more information, see NVIDIAs website: https://developer.nvidia.com/cuda-zone
+
+xCAT supports CUDA installation for Ubuntu 14.04.3 and RHEL 7.2LE on PowerNV (Non-Virtualized) for both diskful and diskless nodes. 
+
+Within the NVIDIA CUDA Toolkit, installing the ``cuda`` package will install both the ``cuda-runtime`` and the ``cuda-toolkit``.  The ``cuda-toolkit`` is intended for developing CUDA programs and monitoring CUDA jobs.  If your particular installation requires only running GPU jobs, it's recommended to install only the ``cuda-runtime`` package. 
+
+.. toctree::
+   :maxdepth: 2
+
+   repo/index.rst
+   osimage/index.rst
+   deploy_cuda_node.rst
+   verify_cuda_install.rst
+   management.rst
--- a/docs/source/advanced/gpu/nvidia/management.rst
+++ b/docs/source/advanced/gpu/nvidia/management.rst
@ -0,0 +1,107 @@
+GPU Management and Monitoring
+=============================
+
+The ``nvidia-smi`` command provided by NVIDIA can be used to manage and monitor GPUs enabled Compute Nodes. In conjunction with the xCAT``xdsh`` command, you can easily manage and monitor the entire set of GPU enabled Compute Nodes remotely from the Management Node. 
+
+Example: ::
+
+    # xdsh <noderange> "nvidia-smi -i 0 --query-gpu=name,serial,uuid --format=csv,noheader"
+    node01: Tesla K80, 0322415075970, GPU-b4f79b83-c282-4409-a0e8-0da3e06a13c3
+    ...
+
+
+
+**Note: The following commands are provided as convenience.**  *Always consult the nvidia-smi manpage for the latest supported functions.*
+
+Management
+----------
+
+Some useful ``nvidia-smi`` example commands for management.  
+
+	
+    * Set persistence mode, When persistence mode is enabled the NVIDIA driver remains loaded even when no active clients, DISABLED by default::
+
+        nvidia-smi -i 0 -pm 1
+
+    * Disabled ECC support for GPU. Toggle ECC support, A flag that indicates whether ECC support is enabled, need to use --query-gpu=ecc.mode.pending to check [Reboot required]::
+
+        nvidia-smi -i 0 -e 0
+
+    * Reset the ECC volatile/aggregate error counters for the target GPUs::
+
+        nvidia-smi -i 0 -p 0/1
+
+    * Set MODE for compute applications, query with --query-gpu=compute_mode:: 
+
+        nvidia-smi -i 0 -c 0/1/2/3
+
+    * Trigger reset of the GPU :: 
+
+        nvidia-smi -i 0 -r
+
+    * Enable or disable Accounting Mode, statistics can be calculated for each compute process running on the GPU, query with -query-gpu=accounting.mode::
+
+        nvidia-smi -i 0 -am 0/1
+
+    * Specifies maximum power management limit in watts, query with --query-gpu=power.limit ::
+
+        nvidia-smi -i 0 -pl 200
+	
+Monitoring
+----------
+
+Some useful ``nvidia-smi`` example commands for monitoring.  
+
+    * The number of NVIDIA GPUs in the system ::
+
+        nvidia-smi --query-gpu=count --format=csv,noheader
+
+    * The version of the installed NVIDIA display driver ::
+
+        nvidia-smi -i 0 --query-gpu=driver_version --format=csv,noheader
+
+    * The BIOS of the GPU board ::
+
+        nvidia-smi -i 0 --query-gpu=vbios_version --format=csv,noheader
+
+    * Product name, serial number and UUID of the GPU::
+
+        nvidia-smi -i 0 --query-gpu=name,serial,uuid --format=csv,noheader
+
+    * Fan speed::
+
+        nvidia-smi -i 0 --query-gpu=fan.speed --format=csv,noheader
+
+    * The compute mode flag indicates whether individual or multiple compute applications may run on the GPU. (known as exclusivity modes) ::
+
+        nvidia-smi -i 0 --query-gpu=compute_mode --format=csv,noheader
+
+    * Percent of time over the past sample period during which one or more kernels was executing on the GPU::
+ 
+        nvidia-smi -i 0 --query-gpu=utilization.gpu --format=csv,noheader
+
+    * Total errors detected across entire chip. Sum of device_memory, register_file, l1_cache, l2_cache and texture_memory ::
+
+        nvidia-smi -i 0 --query-gpu=ecc.errors.corrected.aggregate.total --format=csv,noheader
+
+    * Core GPU temperature, in degrees C::
+
+        nvidia-smi -i 0 --query-gpu=temperature.gpu --format=csv,noheader
+
+    * The ECC mode that the GPU is currently operating under:: 
+
+        nvidia-smi -i 0 --query-gpu=ecc.mode.current --format=csv,noheader
+
+    * The power management status::
+
+        nvidia-smi -i 0 --query-gpu=power.management --format=csv,noheader
+ 
+    * The last measured power draw for the entire board, in watts::
+
+        nvidia-smi -i 0 --query-gpu=power.draw --format=csv,noheader
+
+    * The minimum and maximum value in watts that power limit can be set to ::
+
+        nvidia-smi -i 0 --query-gpu=power.min_limit,power.max_limit --format=csv
+
+
--- a/docs/source/advanced/gpu/nvidia/osimage/index.rst
+++ b/docs/source/advanced/gpu/nvidia/osimage/index.rst
@ -0,0 +1,11 @@
+Create osimage definitions
+==========================
+
+Generate ``osimage`` definitions to provison the compute nodes with the NVIDIA CUDA toolkit installed.
+
+.. toctree::
+   :maxdepth: 2
+
+   rhels.rst
+   ubuntu.rst
+   postscripts.rst
--- a/docs/source/advanced/gpu/nvidia/osimage/postscripts.rst
+++ b/docs/source/advanced/gpu/nvidia/osimage/postscripts.rst
@ -0,0 +1,47 @@
+Postscripts
+===========
+
+The following sections demonstrates how to use xCAT to configure post-installation steps
+
+Setting PATH and LD_LIBRARY_PATH
+--------------------------------
+
+NVIDIA recommends various post-installation actions that should be performed to properly configure the nodes.  A sample script is provided by xCAT for this purpose ``config_cuda`` and can be modified to fit your specific installation.
+
+The ``config_cuda`` script sets the PATH and LD_LIBRARY_PATH: ::
+
+    #!/bin/sh
+
+    # set the paths required for cuda7.5
+    CUDA_VER="cuda-7.5"
+    FILENAME="/etc/profile.d/xcat-${CUDA_VER}.sh"
+    
+    echo "export PATH=/usr/local/${CUDA_VER}/bin:\$PATH" > ${FILENAME}
+    echo "export LD_LIBRARY_PATH=/usr/local/${CUDA_VER}/lib64:\$LD_LIBRARY_PATH" >> ${FILENAME}
+
+
+Add this script to your node object using the chdef command: ::
+
+    chdef -t node -o <noderange> -p postscripts=/install/postscripts/config_cuda
+
+
+Setting GPU Configurations
+--------------------------
+
+NVIDIA allows for changing GPU attributes using the ``nvidia-smi`` commands.  These settings do not persist when a compute node is rebooted.  One way set these attributes is to use an xCAT postscript to set the values every time the node is rebooted.  
+
+
+* Set the power limit to 175W: ::
+
+    # set the power limit to 175W
+    nvidia-smi -pl 175
+
+
+*  Set the GPUs to persistence mode to increase performance: ::
+
+    # nvidia-smi -pm 1
+    Enabled persistence mode for GPU 0000:03:00.0.
+    Enabled persistence mode for GPU 0000:04:00.0.
+    Enabled persistence mode for GPU 0002:03:00.0.
+    Enabled persistence mode for GPU 0002:04:00.0.
+    All done.
--- a/docs/source/advanced/gpu/nvidia/osimage/rhels.rst
+++ b/docs/source/advanced/gpu/nvidia/osimage/rhels.rst
@ -0,0 +1,115 @@
+RHEL 7.2 LE
+===========
+
+Diskful images
+---------------
+
+The following examples will create diskful images for ``cudafull`` and ``cudaruntime``.  The osimage definitions will be created from the base ``rhels7.2-ppc64le-install-compute`` osimage. 
+
+xCAT provides a sample package list files for CUDA. You can find them at:
+
+    * ``/opt/xcat/share/xcat/instal/rh/cudafull.rhels7.ppc64le.pkglist``
+    * ``/opt/xcat/share/xcat/instal/rh/cudaruntime.rhels7.ppc64le.pkglist``
+
+**[diskful note]**: There is a requirement to reboot the machine after the CUDA drivers are installed.  To satisfy this requirement, the CUDA software is installed in the ``pkglist`` attribute of the osimage definition where the reboot happens after the Operating System is installed. 
+
+cudafull
+^^^^^^^^
+
+#. Create a copy of the ``install-compute`` image and label it ``cudafull``: ::
+
+    lsdef -t osimage -z rhels7.2-ppc64le-install-compute \
+      | sed 's/install-compute:/install-cudafull:/' \
+      | mkdef -z 
+
+#. Add the CUDA repo created in the previous step to the ``pkgdir`` attribute: ::
+
+    chdef -t osimage -o rhels7.2-ppc64le-install-cudafull -p pkgdir=/install/cuda-repo
+
+#. Use the provided ``cudafull`` pkglist to install the CUDA packages: ::
+
+    chdef -t osimage -o rhels7.2-ppc64le-install-cudafull \
+    pkglist=/opt/xcat/share/xcat/instal/rh/cudafull.rhels7.ppc64le.pkglist
+
+cudaruntime
+^^^^^^^^^^^
+
+#. Create a copy of the ``install-compute`` image and label it ``cudaruntime``: ::
+
+    lsdef -t osimage -z rhels7.2-ppc64le-install-compute \
+      | sed 's/install-compute:/install-cudaruntime:/' \
+      | mkdef -z 
+
+#. Add the CUDA repo created in the previous step to the ``pkgdir`` attribute: ::
+
+    chdef -t osimage -o rhels7.2-ppc64le-install-cudaruntime -p pkgdir=/install/cuda-repo
+
+#. Use the provided ``cudaruntime`` pkglist to install the CUDA packages: ::
+
+    chdef -t osimage -o rhels7.2-ppc64le-install-cudaruntime \
+    pkglist=/opt/xcat/share/xcat/instal/rh/cudaruntime.rhels7.ppc64le.pkglist
+
+Diskless images
+---------------
+
+The following examples will create diskless images for ``cudafull`` and ``cudaruntime``.  The osimage definitions will be created from the base ``rhels7.2-ppc64le-netboot-compute`` osimage. 
+
+xCAT provides a sample package list files for CUDA. You can find them at:
+
+    * ``/opt/xcat/share/xcat/netboot/rh/cudafull.rhels7.ppc64le.otherpkgs.pkglist``
+    * ``/opt/xcat/share/xcat/netboot/rh/cudaruntime.rhels7.ppc64le.otherpkgs.pkglist``
+
+**[diskless note]**: For diskless images, the requirement for rebooting the machine is not applicable because the images is loaded on each reboot.  The install of the CUDA packages is required to be done in the ``otherpkglist`` **NOT** the ``pkglist``. 
+
+cudafull
+^^^^^^^^
+
+#. Create a copy of the ``netboot-compute`` image and label it ``cudafull``: ::
+
+    lsdef -t osimage -z rhels7.2-ppc64le-netboot-compute \
+      | sed 's/netboot-compute:/netboot-cudafull:/' \
+      | mkdef -z 
+
+#. Add the CUDA repo created in the previous step to the ``otherpkgdir`` attribute: ::
+
+    chdef -t osimage -o rhels7.2-ppc64le-netboot-cudafull otherpkgdir=/install/cuda-repo
+
+#. Add the provided ``cudafull`` otherpkglist.pkglist file to install the CUDA packages: ::
+
+    chdef -t osimage -o rhels7.2-ppc64le-netboot-cudafull \
+    otherpkglist=/opt/xcat/share/xcat/netboot/rh/cudafull.rhels7.ppc64le.otherpkgs.pkglist
+
+#. Generate the image: ::
+
+    genimage rhels7.2-ppc64le-netboot-cudafull
+
+#. Package the image: ::
+
+    packimage rhels7.2-ppc64le-netboot-cudafull
+
+cudaruntime
+^^^^^^^^^^^
+
+#. Create a copy of the ``netboot-compute`` image and label it ``cudaruntime``: ::
+
+    lsdef -t osimage -z rhels7.2-ppc64le-netboot-compute \
+      | sed 's/netboot-compute:/netboot-cudaruntime:/' \
+      | mkdef -z 
+
+#. Add the CUDA repo created in the previous step to the ``otherpkgdir`` attribute: ::
+
+    chdef -t osimage -o rhels7.2-ppc64le-netboot-cudaruntime otherpkgdir=/install/cuda-repo
+
+#. Add the provided ``cudaruntime`` otherpkglist.pkglist file to install the CUDA packages: ::
+
+    chdef -t osimage -o rhels7.2-ppc64le-netboot-cudaruntime \
+    otherpkglist=/opt/xcat/share/xcat/netboot/rh/cudaruntime.rhels7.ppc64le.otherpkgs.pkglist
+
+#. Generate the image: ::
+
+    genimage rhels7.2-ppc64le-netboot-cudaruntime
+
+#. Package the image: ::
+
+    packimage rhels7.2-ppc64le-netboot-cudaruntime
+
--- a/docs/source/advanced/gpu/nvidia/osimage/ubuntu.rst
+++ b/docs/source/advanced/gpu/nvidia/osimage/ubuntu.rst
@ -0,0 +1,158 @@
+Ubuntu 14.04.3
+==============
+
+
+Diskful images
+---------------
+
+The following examples will create diskful images for ``cudafull`` and ``cudaruntime``.  The osimage definitions will be created from the base ``ubuntu14.04.3-ppc64el-install-compute`` osimage.
+
+xCAT provides a sample package list files for CUDA. You can find them at:
+
+    * ``/opt/xcat/share/xcat/install/ubuntu/cudafull.ubuntu14.04.3.ppc64el.pkglist``
+    * ``/opt/xcat/share/xcat/install/ubuntu/cudaruntime.ubuntu14.04.3.ppc64el.pkglist``
+
+**[diskful note]**: There is a requirement to reboot the machine after the CUDA drivers are installed.  To satisfy this requirement, the CUDA software is installed in the ``pkglist`` attribute of the osimage definition where the reboot happens after the Operating System is installed. 
+
+cudafull
+^^^^^^^^
+
+#. Create a copy of the ``install-compute`` image and label it ``cudafull``: ::
+
+    lsdef -t osimage -z ubuntu14.04.3-ppc64el-install-compute \
+      | sed 's/install-compute:/install-cudafull:/' \
+      | mkdef -z 
+
+#. Add the CUDA repo created in the previous step to the ``pkgdir`` attribute.
+
+   If your Management Node IP is 10.0.0.1, the URL for the repo would be ``http://10.0.0.1/install/cuda-repo/ppc64/var/cuda-repo-7-5-local``, add it to the pkgdir::
+
+    chdef -t osimage -o ubuntu14.04.3-ppc64el-install-cudafull \ 
+     -p pkgdir=http://10.0.0.1/install/cuda-repo/ppc64/var/cuda-repo-7-5-local
+
+
+   **TODO:** Need to add Ubuntu Port?  "http://ports.ubuntu.com/ubuntu-ports trusty main,http://ports.ubuntu.com/ubuntu-ports trusty-updates main"
+
+#. Use the provided ``cudafull`` pkglist to install the CUDA packages: ::
+
+    chdef -t osimage -o ubuntu14.04.3-ppc64el-install-cudafull \
+    pkglist=/opt/xcat/share/xcat/install/ubuntu/cudafull.ubuntu14.04.3.ppc64el.pkglist
+
+cudaruntime
+^^^^^^^^^^^
+
+#. Create a copy of the ``install-compute`` image and label it ``cudaruntime``: ::
+
+    lsdef -t osimage -z ubuntu14.04.3-ppc64el-install-compute \
+      | sed 's/install-compute:/install-cudaruntime:/' \
+      | mkdef -z 
+
+#. Add the CUDA repo created in the previous step to the ``pkgdir`` attribute:
+
+   If your Management Node IP is 10.0.0.1, the URL for the repo would be ``http://10.0.0.1/install/cuda-repo/ppc64/var/cuda-repo-7-5-local``, add it to the pkgdir::
+
+    chdef -t osimage -o ubuntu14.04.3-ppc64el-install-cudaruntime \
+     -p pkgdir=http://10.0.0.1/install/cuda-repo/ppc64/var/cuda-repo-7-5-local
+
+   **TODO:** Need to add Ubuntu Port?  "http://ports.ubuntu.com/ubuntu-ports trusty main,http://ports.ubuntu.com/ubuntu-ports trusty-updates main"
+
+#. Use the provided ``cudaruntime`` pkglist to install the CUDA packages: ::
+
+    chdef -t osimage -o ubuntu14.04.3-ppc64el-install-cudaruntime \
+    pkglist=/opt/xcat/share/xcat/install/ubuntu/cudaruntime.ubuntu14.04.3.ppc64el.pkglist
+
+Diskless images
+---------------
+
+The following examples will create diskless images for ``cudafull`` and ``cudaruntime``.  The osimage definitions will be created from the base ``ubuntu14.04.3-ppc64el-netboot-compute`` osimage. 
+
+xCAT provides a sample package list files for CUDA. You can find them at:
+
+    * ``/opt/xcat/share/xcat/netboot/rh/cudafull.rhels7.ppc64le.otherpkgs.pkglist``
+    * ``/opt/xcat/share/xcat/netboot/rh/cudaruntime.rhels7.ppc64le.otherpkgs.pkglist``
+
+**[diskless note]**: For diskless images, the requirement for rebooting the machine is not applicable because the images is loaded on each reboot.  The install of the CUDA packages is required to be done in the ``otherpkglist`` **NOT** the ``pkglist``. 
+
+cudafull
+^^^^^^^^
+
+#. Create a copy of the ``netboot-compute`` image and label it ``cudafull``: ::
+
+    lsdef -t osimage -z ubuntu14.04.3-ppc64el-netboot-compute \
+      | sed 's/netboot-compute:/netboot-cudafull:/' \
+      | mkdef -z 
+
+#. Add the CUDA repo created in the previous step to the ``otherpkgdir`` attribute. 
+
+   If your Management Node IP is 10.0.0.1, the URL for the repo would be ``http://10.0.0.1/install/cuda-repo/ppc64/var/cuda-repo-7-5-local``, add it to the ``otherpkgdir``::
+
+    chdef -t osimage -o ubuntu14.04.3-ppc64el-netboot-cudafull \
+    otherpkgdir=http://10.0.0.1/install/cuda-repo/ppc64/var/cuda-repo-7-5-local
+
+#. Add the provided ``cudafull`` otherpkg.pkglist file to install the CUDA packages: ::
+
+    chdef -t osimage -o ubuntu14.04.3-ppc64el-netboot-cudafull \
+    otherpkglist=/opt/xcat/share/xcat/netboot/ubuntu/cudafull.otherpkgs.pkglist
+
+   **TODO:** Need to add Ubuntu Port?  "http://ports.ubuntu.com/ubuntu-ports trusty main,http://ports.ubuntu.com/ubuntu-ports trusty-updates main"
+
+#. Verify that ``acpid`` is installed on the Management Node or on the Ubuntu host where you are generating the diskless image: ::
+
+    apt-get install -y acpid 
+
+#. Generate the image: ::
+
+    genimage ubuntu14.04.3-ppc64el-netboot-cudafull
+
+#. Package the image: ::
+
+    packimage ubuntu14.04.3-ppc64el-netboot-cudafull
+
+cudaruntime
+^^^^^^^^^^^
+
+#. Create a copy of the ``netboot-compute`` image and label it ``cudaruntime``: ::
+
+    lsdef -t osimage -z ubuntu14.04.3-ppc64el-netboot-compute \
+      | sed 's/netboot-compute:/netboot-cudaruntime:/' \
+      | mkdef -z 
+
+#. Add the CUDA repo created in the previous step to the ``otherpkgdir`` attribute. 
+
+   If your Management Node IP is 10.0.0.1, the URL for the repo would be ``http://10.0.0.1/install/cuda-repo/ppc64/var/cuda-repo-7-5-local``, add it to the ``otherpkgdir``::
+
+    chdef -t osimage -o ubuntu14.04.3-ppc64el-netboot-cudaruntime  \
+    otherpkgdir=http://10.0.0.1/install/cuda-repo/ppc64/var/cuda-repo-7-5-local
+
+#. Add the provided ``cudaruntime`` otherpkg.pkglist file to install the CUDA packages: ::
+
+    chdef -t osimage -o ubuntu14.04.3-ppc64el-netboot-cudaruntime \
+    otherpkglist=/opt/xcat/share/xcat/netboot/ubuntu/cudaruntime.otherpkgs.pkglist
+
+   **TODO:** Need to add Ubuntu Port?  "http://ports.ubuntu.com/ubuntu-ports trusty main,http://ports.ubuntu.com/ubuntu-ports trusty-updates main"
+
+#. Verify that ``acpid`` is installed on the Management Node or on the Ubuntu host where you are generating the diskless image: ::
+
+    apt-get install -y acpid 
+
+#. Generate the image: ::
+
+    genimage ubuntu14.04.3-ppc64el-netboot-cudaruntime
+
+#. Package the image: ::
+
+    packimage ubuntu14.04.3-ppc64el-netboot-cudaruntime
+
+
+Install NVIDIA Management Library (optional)
+--------------------------------------------
+
+See https://developer.nvidia.com/nvidia-management-library-nvml for more information.
+
+The .run file can be downloaded from NVIDIAs website and placed into the ``/install/postscripts`` directory on the Management Node. 
+
+To enable installation of the management library after the node is install, add the runfile to the ``postbootscripts`` attribute for the nodes: :: 
+
+   chmod +x /install/postscripts/<gpu_deployment_kit>.run
+   chdef -t node -o <noderange> -p postbootscripts=<gpu_deployment_kit>.run \
+   --silent --installdir=<your_desired_install_dir>
--- a/docs/source/advanced/gpu/nvidia/repo/index.rst
+++ b/docs/source/advanced/gpu/nvidia/repo/index.rst
@ -0,0 +1,12 @@
+Create CUDA software repository
+===============================
+
+The NVIDIA CUDA Toolkit is available to download at http://developer.nvidia.com/cuda-downloads.  
+
+Download the toolkit and prepare the software repository on the xCAT Management Node to server the NVIDIA CUDA files.
+
+.. toctree::
+   :maxdepth: 2
+
+   rhels.rst
+   ubuntu.rst
--- a/docs/source/advanced/gpu/nvidia/repo/rhels.rst
+++ b/docs/source/advanced/gpu/nvidia/repo/rhels.rst
@ -0,0 +1,31 @@
+RHEL 7.2 LE
+===========
+
+
+#. Create a repository on the MN node installing the CUDA Toolkit: ::
+
+    # For cuda toolkit name: /root/cuda-repo-rhel7-7-5-local-7.5-18.ppc64le.rpm
+    # extract the contents from the rpm 
+    mkdir -p /tmp/cuda
+    cd /tmp/cuda
+    rpm2cpio /root/cuda-repo-rhel7-7-5-local-7.5-18.ppc64le.rpm | cpio -i -d
+
+    # Create the repo directory under xCAT /install dir
+    mkdir -p /install/cuda-repo/ppc64le
+    cp -r /tmp/cuda/var/cuda-repo-7-5-local /install/cuda-repo/ppc64le/
+
+    # Create the yum repo files 
+    createrepo /install/cuda-repo/ppc64le
+    
+#. The NVIDIA CUDA Toolkit contains rpms that have dependencies on other external packages (such as ``DKMS``).  These are provided by EPEL.  It's up to the system administrator to obtain the dependency packages and add those to the ``cuda-deps`` directory: ::
+
+    mkdir -p /install/cuda-repo/cuda-deps  
+    cd /install/cuda-repo/cuda-deps
+
+    # Copy the DKMS rpm to this directory 
+    ls -ltr /install/cuda-repo/cuda-deps
+    -rw-r--r-- 1 root root 79048 Oct  5 10:58 dkms-2.2.0.3-30.git.7c3e7c5.el7.noarch.rpm  
+
+    # Execute createrepo in this directory 
+    createrepo .
+
--- a/docs/source/advanced/gpu/nvidia/repo/ubuntu.rst
+++ b/docs/source/advanced/gpu/nvidia/repo/ubuntu.rst
@ -0,0 +1,36 @@
+Ubuntu 14.04.3
+==============
+
+NVIDIA supports two types of debian repositories that can be used to install Cuda Toolkit: **local** and **network**.  You can download the installers from https://developer.nvidia.com/cuda-downloads.
+
+local
+-----
+
+A local package repo will contain all of the cuda packages.  
+
+Extract the cuda packages into ``/install/cuda-repo/ppc64le``: ::
+
+    # For cuda toolkit name: /root/cuda-repo-ubuntu1404-7-5-local_7.5-18_ppc64el.deb
+
+    # Create the repo directory under xCAT /install dir
+    mkdir -p /install/cuda-repo/ppc64le
+    dpkg -x /root/cuda-repo-ubuntu1404-7-5-local_7.5-18_ppc64el.deb /install/cuda-repo/ppc64le
+    
+
+network
+-------
+
+The online package repo provides a source list entry pointing to a URL containing the CUDA packages.  This can be used directly on the Compute Nodes.
+
+The ``sources.list`` entry may look similar to: ::
+
+   deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1410/ppc64el /
+
+
+Authorize the CUDA repo
+-----------------------
+
+In order to access the CUDA repository you must import the CUDA GPGKEY into the ``apt_key`` trust list.  xCAT provides a sample postscript ``/install/postscripts/addcudakey`` to help with this task: :: 
+
+   chdef -t node -o <noderange> -p postscripts=addcudakey
+
--- a/docs/source/advanced/gpu/nvidia/verify_cuda_install.rst
+++ b/docs/source/advanced/gpu/nvidia/verify_cuda_install.rst
@ -0,0 +1,80 @@
+Verify CUDA Installation
+========================
+
+**The following verification steps only apply to the ``cudafull`` installations.**
+
+#. Verify driver version by looking at: ``/proc/driver/nvidia/version``: ::
+  
+    # cat /proc/driver/nvidia/version
+     NVRM version: NVIDIA UNIX ppc64le Kernel Module  352.39  Fri Aug 14 17:10:41 PDT 2015
+     GCC version:  gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) 
+
+#. Verify the CUDA Toolkit version ::
+
+    # nvcc -V
+     nvcc: NVIDIA (R) Cuda compiler driver
+     Copyright (c) 2005-2015 NVIDIA Corporation
+     Built on Tue_Aug_11_14:31:50_CDT_2015
+     Cuda compilation tools, release 7.5, V7.5.17
+
+#. Verify running CUDA GPU jobs by compiling the samples and executing the ``deviceQuery`` or ``bandwidthTest`` programs.
+
+   * Compile the samples: 
+
+     **[RHEL]:** ::
+
+        cd ~/
+        cuda-install-samples-7.5.sh .
+        cd NVIDIA_CUDA-7.5_Samples
+        make
+
+     **[Ubuntu]:** ::
+
+        cd ~/
+        apt-get install cuda-samples-7-0 -y
+        cd /usr/local/cuda-7.0/samples 
+        make 
+
+
+   * Run the ``deviceQuery`` sample: ::
+
+        # ./bin/ppc64le/linux/release/deviceQuery   
+          ./deviceQuery Starting...
+          CUDA Device Query (Runtime API) version (CUDART static linking)
+          Detected 4 CUDA Capable device(s)
+          Device 0: "Tesla K80"
+            CUDA Driver Version / Runtime Version          7.5 / 7.5
+            CUDA Capability Major/Minor version number:    3.7
+            Total amount of global memory:                 11520 MBytes (12079136768 bytes)
+            (13) Multiprocessors, (192) CUDA Cores/MP:     2496 CUDA Cores
+            GPU Max Clock rate:                            824 MHz (0.82 GHz)
+            Memory Clock rate:                             2505 Mhz
+            Memory Bus Width:                              384-bit
+            L2 Cache Size:                                 1572864 bytes
+            ............
+            deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.5, NumDevs = 4, Device0 = Tesla K80, Device1 = Tesla K80, Device2 = Tesla K80, Device3 = Tesla K80
+            Result = PASS
+   
+   * Run the ``bandwidthTest`` sample: ::
+ 
+        # ./bin/ppc64le/linux/release/bandwidthTest
+          [CUDA Bandwidth Test] - Starting...
+          Running on...
+          Device 0: Tesla K80
+          Quick Mode
+          Host to Device Bandwidth, 1 Device(s)
+          PINNED Memory Transfers
+            Transfer Size (Bytes)        Bandwidth(MB/s)
+            33554432                     7765.1
+          Device to Host Bandwidth, 1 Device(s)
+          PINNED Memory Transfers
+            Transfer Size (Bytes)        Bandwidth(MB/s)
+            33554432                     7759.6
+          Device to Device Bandwidth, 1 Device(s)
+          PINNED Memory Transfers
+            Transfer Size (Bytes)        Bandwidth(MB/s)
+            33554432                     141485.3
+          Result = PASS
+    
+    NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
+