mirror of
https://github.com/xcat2/xcat-core.git
synced 2025-07-08 13:55:37 +00:00
modified doc from victor's comments
This commit is contained in:
@ -7,96 +7,104 @@ Overview
|
||||
|
||||
CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model created by NVIDIA. It can be used by the graphics processing units (GPUs) for general purpose processing.
|
||||
|
||||
xCAT supports CUDA installation for Ubuntu and rhel7.2 on PowerNV (p8le node with Nvidia GPU Support) for both diskless and diskfull nodes. The cuda packages provided by nvidia include both the runtime libraries for computing and dev tools for programming and monitoring. In xCAT, we split the packages into 2 groups: the cudaruntime package set and the cudafull package set. Since the full package set is so large. xCAT suggests only installing the runtime libraries on the Compute Nodes (CNs), and the full cuda package set on the Management Node or the monitor/development nodes.
|
||||
xCAT supports CUDA installation for Ubuntu and rhel7.2 on PowerNV (p8le node with NVIDIA GPU Support) for both diskless and diskful nodes. The CUDA packages provided by NVIDIA include both the runtime libraries for computing and development tools for programming and monitoring. The full package set is very large, so in xCAT, it's suggested that the packages be split into two package sets:
|
||||
|
||||
In this documentation, xCAT will provide only installation for CUDA on rhel7.2 power 8 firestone nodes. User can find Ubuntu cuda installation document here: http://sourceforge.net/p/xcat/wiki/xCAT_P8LE_cuda_installing/
|
||||
#. **cudaruntime** package set
|
||||
#. **cudafull** package set
|
||||
|
||||
It's suggested to only installing the **cudaruntime** package set on the Compute Nodes (CNs), and the **cudafull** package set on the Management Node or the monitor/development nodes.
|
||||
|
||||
Install xCAT MN
|
||||
---------------
|
||||
In this documentation, xCAT will provide CUDA installation based on rhel7.2 Power 8 firestone nodes. User can find Ubuntu CUDA installation document here: http://sourceforge.net/p/xcat/wiki/xCAT_P8LE_cuda_installing/
|
||||
|
||||
Follow the instructions in ``XCAT_P8LE_Hardware_Management`` to install xCAT Management Node and do hardware discovery for p8le nodes.
|
||||
|
||||
CUDA Repository
|
||||
---------------
|
||||
|
||||
The Nividia CUDA Toolkit is available at http://developer.nvidia.com/cuda-downloads. User can download the CUDA Toolkit based on the target platform. Currently, this download site didn't provide the CUDA Toolkit for rhel7.2 ppc64le. The Users need to pull the package from Nvidia partner site with user id and passwords.
|
||||
The NIVIDIA CUDA Toolkit is available at http://developer.nvidia.com/cuda-downloads. User can download the CUDA Toolkit based on the target platform.
|
||||
|
||||
Prepare a local repo directory which contains all the CUDA packages and repository meta-data. ::
|
||||
Prepare a local repository directory which contains all the CUDA packages and repository meta-data. User can either
|
||||
|
||||
* Create a repository on the Management node installing the CUDA toolkit: ::
|
||||
|
||||
#mkdir -p /install/cuda-repo/cuda-7-5
|
||||
#rpm -ivh cuda-repo-rhel7-7-5-local-7.5-18.ppc64le.rpm
|
||||
#cp /var/cuda-repo-7-5-local/* /install/cuda-repo/cuda-7-5
|
||||
#cd /install/cuda-repo/cuda-7-5
|
||||
#createrepo .
|
||||
|
||||
|
||||
or if don't want to install cuda on the MN node, do ::
|
||||
mkdir -p /install/cuda-repo/cuda-7-5
|
||||
rpm -ivh cuda-repo-rhel7-7-5-local-7.5-18.ppc64le.rpm
|
||||
cp /var/cuda-repo-7-5-local/* /install/cuda-repo/cuda-7-5
|
||||
cd /install/cuda-repo/cuda-7-5
|
||||
createrepo .
|
||||
|
||||
* Create a repository on the Management node without installing the CUDA toolkit: ::
|
||||
|
||||
#mkdir -p /tmp/cuda
|
||||
#cd /tmp/cuda
|
||||
#rpm2cpio /root/cuda-repo-rhel7-7-5-local-7.5-18.ppc64le.rpm | cpio -i -d
|
||||
#cp /tmp/cuda/var/cuda-repo-7-5-local/* /install/cuda-repo/cuda-7-5
|
||||
#cd /install/cuda-repo/cuda-7-5
|
||||
#createrepo .
|
||||
mkdir -p /tmp/cuda
|
||||
cd /tmp/cuda
|
||||
rpm2cpio /root/cuda-repo-rhel7-7-5-local-7.5-18.ppc64le.rpm | cpio -i -d
|
||||
cp /tmp/cuda/var/cuda-repo-7-5-local/* /install/cuda-repo/cuda-7-5
|
||||
cd /install/cuda-repo/cuda-7-5
|
||||
createrepo .
|
||||
|
||||
|
||||
The NVIDIA driver RPM packages depend on other external packages, such as DKMS and maybe EPEL (firestone node doesn't need this package). Users need to download those package to the directory. ::
|
||||
|
||||
#mkdir -p /install/cuda-repo/cuda-deps
|
||||
#ls -ltr /install/cuda-repo/cuda-deps
|
||||
mkdir -p /install/cuda-repo/cuda-deps
|
||||
ls -ltr /install/cuda-repo/cuda-deps
|
||||
-rw-r--r-- 1 root root 79048 Oct 5 10:58 dkms-2.2.0.3-30.git.7c3e7c5.el7.noarch.rpm
|
||||
#cd /install/cuda-repo/cuda-deps
|
||||
#createrepo .
|
||||
|
||||
cd /install/cuda-repo/cuda-deps
|
||||
createrepo .
|
||||
|
||||
|
||||
CUDA osimage
|
||||
------------
|
||||
User can generate own CUDA osimage object based on the definition or just modify existing osimage. xCAT provides sample cuda pkglist file which defined in the pkgdir for CUDA osimage. ::
|
||||
|
||||
# cat /opt/xcat/share/xcat/install/rh/cudafull.rhels7.pkglist
|
||||
#INCLUDE:compute.rhels7.pkglist#
|
||||
#For Cuda 7.5
|
||||
kernel-devel
|
||||
gcc
|
||||
pciutils
|
||||
dkms
|
||||
cuda
|
||||
# cat /opt/xcat/share/xcat/install/rh/cudaruntime.rhels7.pkglist
|
||||
#INCLUDE:compute.rhels7.pkglist#
|
||||
#For Cuda 7.5
|
||||
kernel-devel
|
||||
gcc
|
||||
pciutils
|
||||
dkms
|
||||
cuda-runtime-7-5
|
||||
|
||||
Similar pkglist files are created for diskless osimage also in the ``/opt/xcat/share/xcat/netboot/rh`` directory ::
|
||||
|
||||
# cat /opt/xcat/share/xcat/netboot/rh/cudafull.rhels7.ppc64le.pkglist
|
||||
#INCLUDE:compute.rhels7.ppc64.pkglist#
|
||||
#For Cuda 7.5
|
||||
kernel-devel
|
||||
gcc
|
||||
pciutils
|
||||
dkms
|
||||
cuda
|
||||
# cat /opt/xcat/share/xcat/netboot/rh/cudaruntime.rhels7.ppc64le.pkglist
|
||||
#INCLUDE:compute.rhels7.ppc64.pkglist#
|
||||
#For Cuda 7.5
|
||||
kernel-devel
|
||||
gcc
|
||||
pciutils
|
||||
dkms
|
||||
cuda-runtime-7-5
|
||||
User can generate a new CUDA osimage object based on another osimage definition or just modify an existing osimage. xCAT provides some sample CUDA pkglist files:
|
||||
|
||||
|
||||
After installed cuda, the nodes have to be reboot, so for the diskfull installation, the cuda packages should be included in the pkglist. for the diskless installation, the cuda package can be put either in otherpkglist or pktlist. Following shows the definition for each cuda osimage object.
|
||||
* diskful provisioning in ``/opt/xcat/share/install/rh/`` for ``cudafull`` and ``cudaruntime``: ::
|
||||
|
||||
|
||||
#cat /opt/xcat/share/xcat/install/rh/cudafull.rhels7.pkglist
|
||||
#INCLUDE:compute.rhels7.pkglist#
|
||||
#For Cuda 7.5
|
||||
kernel-devel
|
||||
gcc
|
||||
pciutils
|
||||
dkms
|
||||
cuda
|
||||
#cat /opt/xcat/share/xcat/install/rh/cudaruntime.rhels7.pkglist
|
||||
#INCLUDE:compute.rhels7.pkglist#
|
||||
#For Cuda 7.5
|
||||
kernel-devel
|
||||
gcc
|
||||
pciutils
|
||||
dkms
|
||||
cuda-runtime-7-5
|
||||
|
||||
|
||||
* diskless provisioning in ``/opt/xcat/share/xcat/netboot/rh`` for ``cudafull`` and ``cudaruntime``: ::
|
||||
|
||||
#cat /opt/xcat/share/xcat/netboot/rh/cudafull.rhels7.ppc64le.pkglist
|
||||
#INCLUDE:compute.rhels7.ppc64.pkglist#
|
||||
#For Cuda 7.5
|
||||
kernel-devel
|
||||
gcc
|
||||
pciutils
|
||||
dkms
|
||||
cuda
|
||||
#cat /opt/xcat/share/xcat/netboot/rh/cudaruntime.rhels7.ppc64le.pkglist
|
||||
#INCLUDE:compute.rhels7.ppc64.pkglist#
|
||||
#For Cuda 7.5
|
||||
kernel-devel
|
||||
gcc
|
||||
pciutils
|
||||
dkms
|
||||
cuda-runtime-7-5
|
||||
|
||||
|
||||
**NOTE: After CUDA are installed, the nodes require a reboot**
|
||||
|
||||
* For the diskful installation, the CUDA packages should be included in the ``pkglist`` field so a reboot happens automatically after the OS is installed.
|
||||
* For the diskless installation, the CUDA package can be included either in ``otherpkglist`` or ``pktlist`` fields.
|
||||
|
||||
The following are some sample osimage definitions:
|
||||
|
||||
* The diskfull cudafull installation osimage object. ::
|
||||
* The diskful cudafull installation osimage object. ::
|
||||
|
||||
#lsdef -t osimage rhels7.2-ppc64le-install-cudafull
|
||||
Object name: rhels7.2-ppc64le-install-cudafull
|
||||
@ -113,7 +121,7 @@ After installed cuda, the nodes have to be reboot, so for the diskfull installat
|
||||
template=/opt/xcat/share/xcat/install/rh/compute.rhels7.tmpl
|
||||
|
||||
|
||||
* The diskfull cudaruntime installation osimage object. ::
|
||||
* The diskful cudaruntime installation osimage object. ::
|
||||
|
||||
#lsdef -t osimage rhels7.2-ppc64le-install-cudaruntime
|
||||
Object name: rhels7.2-ppc64le-install-cudaruntime
|
||||
@ -172,44 +180,43 @@ After installed cuda, the nodes have to be reboot, so for the diskfull installat
|
||||
Deployment of CUDA node
|
||||
-----------------------
|
||||
|
||||
Follow the instructions in ``xCAT_p8LE_Hardware_Management`` to perform OS provisioning for the p8le compute nodes. Then use the osimage object generated above to do the OS provisioning
|
||||
|
||||
* For diskfull nodes: ::
|
||||
* To provision diskful nodes: ::
|
||||
|
||||
|
||||
#nodeset <node> osimage=rhels7.2-ppc64le-install-cudafull
|
||||
#rsetboot <node> net
|
||||
#rpower <node> boot
|
||||
nodeset <node> osimage=rhels7.2-ppc64le-install-cudafull
|
||||
rsetboot <node> net
|
||||
rpower <node> boot
|
||||
|
||||
|
||||
* For diskless nodes: ::
|
||||
* To provision diskless nodes: ::
|
||||
|
||||
|
||||
#genimage rhels7.2-ppc64le-netboot-cudafull
|
||||
#packimage rhels7.2-ppc64le-netboot-cudafull
|
||||
#nodeset <node> osimage=rhels7.2-ppc64le-netboot-cudafull
|
||||
#rsetboot <node> net
|
||||
#rpower <node> boot
|
||||
genimage rhels7.2-ppc64le-netboot-cudafull
|
||||
packimage rhels7.2-ppc64le-netboot-cudafull
|
||||
nodeset <node> osimage=rhels7.2-ppc64le-netboot-cudafull
|
||||
rsetboot <node> net
|
||||
rpower <node> boot
|
||||
|
||||
|
||||
|
||||
Verification of CUDA Installation
|
||||
---------------------------------
|
||||
|
||||
**NOTE** For ``cudaruntime`` installation, it only provide the basic libraries that can bee used by other applications which works with GPU. The following verification will not apply to ``cudaruntime`` installations.
|
||||
|
||||
After compute node booted, The Environment variable has to be set in order to use the CUDA toolkits. The PATH variable needs to include ``/usr/local/cuda-7.5/bin`` and LD_LIBRARY_PATH variable needs to contain ``/usr/local/cuda-7.5/lib64`` on a 64-bit system, and ``/usr/local/cuda-7.5`` on a 32-bit system.
|
||||
|
||||
* To change the environment variables for 64-bit operating systems ::
|
||||
|
||||
#export PATH=/usr/local/cuda-7.5/bin:$PATH
|
||||
#export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH
|
||||
export PATH=/usr/local/cuda-7.5/bin:$PATH
|
||||
export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH
|
||||
|
||||
|
||||
* To change the environment variable for 32-bit operating systems ::
|
||||
|
||||
#export PATH=/usr/local/cuda-7.5/bin:$PATH
|
||||
#export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib:$LD_LIBRARY_PATH
|
||||
export PATH=/usr/local/cuda-7.5/bin:$PATH
|
||||
export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib:$LD_LIBRARY_PATH
|
||||
|
||||
After Environment variables are set correctly, user can verify the cuda installation by
|
||||
After Environment variables are set correctly, user can verify the CUDA installation by
|
||||
|
||||
* Verify the Driver Version ::
|
||||
|
||||
|
Reference in New Issue
Block a user