mirror of
https://github.com/xcat2/xcat-core.git
synced 2025-05-30 01:26:38 +00:00
Remove tailing spaces
This commit is contained in:
parent
dcb3ea4270
commit
7e6290fb7d
@ -1,15 +1,14 @@
|
||||
Deploy CUDA nodes
|
||||
=================
|
||||
|
||||
Diskful
|
||||
Diskful
|
||||
-------
|
||||
|
||||
* To provision diskful nodes using osimage ``rhels7.5-ppc64le-install-cudafull``: ::
|
||||
|
||||
nodeset <noderange> osimage=rhels7.5-ppc64le-install-cudafull
|
||||
rsetboot <noderange> net
|
||||
rpower <noderange> boot
|
||||
|
||||
rpower <noderange> boot
|
||||
|
||||
Diskless
|
||||
--------
|
||||
@ -18,5 +17,4 @@ Diskless
|
||||
|
||||
nodeset <noderange> osimage=rhels7.5-ppc64le-netboot-cudafull
|
||||
rsetboot <noderange> net
|
||||
rpower <noderange> boot
|
||||
|
||||
rpower <noderange> boot
|
||||
|
@ -7,7 +7,7 @@ For more information, see NVIDIAs website: https://developer.nvidia.com/cuda-zon
|
||||
|
||||
xCAT supports CUDA installation for Ubuntu 14.04.3 and RHEL 7.5 on PowerNV (Non-Virtualized) for both diskful and diskless nodes.
|
||||
|
||||
Within the NVIDIA CUDA Toolkit, installing the ``cuda`` package will install both the ``cuda-runtime`` and the ``cuda-toolkit``. The ``cuda-toolkit`` is intended for developing CUDA programs and monitoring CUDA jobs. If your particular installation requires only running GPU jobs, it's recommended to install only the ``cuda-runtime`` package.
|
||||
Within the NVIDIA CUDA Toolkit, installing the ``cuda`` package will install both the ``cuda-runtime`` and the ``cuda-toolkit``. The ``cuda-toolkit`` is intended for developing CUDA programs and monitoring CUDA jobs. If your particular installation requires only running GPU jobs, it's recommended to install only the ``cuda-runtime`` package.
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
@ -1,7 +1,7 @@
|
||||
RHEL 7.5
|
||||
========
|
||||
|
||||
xCAT provides a sample package list (pkglist) files for CUDA. You can find them:
|
||||
xCAT provides a sample package list (pkglist) files for CUDA. You can find them:
|
||||
|
||||
* Diskful: ``/opt/xcat/share/xcat/install/rh/cuda*``
|
||||
* Diskless: ``/opt/xcat/share/xcat/netboot/rh/cuda*``
|
||||
@ -9,7 +9,7 @@ xCAT provides a sample package list (pkglist) files for CUDA. You can find them:
|
||||
Diskful images
|
||||
--------------
|
||||
|
||||
The following examples will create diskful images for ``cudafull`` and ``cudaruntime``. The osimage definitions will be created from the base ``rhels7.5-ppc64le-install-compute`` osimage.
|
||||
The following examples will create diskful images for ``cudafull`` and ``cudaruntime``. The osimage definitions will be created from the base ``rhels7.5-ppc64le-install-compute`` osimage.
|
||||
|
||||
**[Note]**: There is a requirement to reboot the machine after the CUDA drivers are installed. To satisfy this requirement, the CUDA software is installed in the ``pkglist`` attribute of the osimage definition where a reboot will happen after the Operating System is installed.
|
||||
|
||||
@ -20,7 +20,7 @@ cudafull
|
||||
|
||||
lsdef -t osimage -z rhels7.5-ppc64le-install-compute \
|
||||
| sed 's/install-compute:/install-cudafull:/' \
|
||||
| mkdef -z
|
||||
| mkdef -z
|
||||
|
||||
#. Add the CUDA repo created in the previous step to the ``pkgdir`` attribute: ::
|
||||
|
||||
@ -39,7 +39,7 @@ cudaruntime
|
||||
|
||||
lsdef -t osimage -z rhels7.5-ppc64le-install-compute \
|
||||
| sed 's/install-compute:/install-cudaruntime:/' \
|
||||
| mkdef -z
|
||||
| mkdef -z
|
||||
|
||||
#. Add the CUDA repo created in the previous step to the ``pkgdir`` attribute: ::
|
||||
|
||||
@ -54,9 +54,9 @@ cudaruntime
|
||||
Diskless images
|
||||
---------------
|
||||
|
||||
The following examples will create diskless images for ``cudafull`` and ``cudaruntime``. The osimage definitions will be created from the base ``rhels7.5-ppc64le-netboot-compute`` osimage.
|
||||
The following examples will create diskless images for ``cudafull`` and ``cudaruntime``. The osimage definitions will be created from the base ``rhels7.5-ppc64le-netboot-compute`` osimage.
|
||||
|
||||
**[Note]**: For diskless, the install of the CUDA packages MUST be done in the ``otherpkglist`` and **NOT** the ``pkglist`` as with diskful. The requirement for rebooting the machine is not applicable in diskless nodes because the image is loaded on each reboot.
|
||||
**[Note]**: For diskless, the install of the CUDA packages MUST be done in the ``otherpkglist`` and **NOT** the ``pkglist`` as with diskful. The requirement for rebooting the machine is not applicable in diskless nodes because the image is loaded on each reboot.
|
||||
|
||||
cudafull
|
||||
^^^^^^^^
|
||||
@ -65,16 +65,16 @@ cudafull
|
||||
|
||||
lsdef -t osimage -z rhels7.5-ppc64le-netboot-compute \
|
||||
| sed 's/netboot-compute:/netboot-cudafull:/' \
|
||||
| mkdef -z
|
||||
| mkdef -z
|
||||
|
||||
#. Verify that the CUDA repo created in the previous step is available in the directory specified by the ``otherpkgdir`` attribute.
|
||||
#. Verify that the CUDA repo created in the previous step is available in the directory specified by the ``otherpkgdir`` attribute.
|
||||
|
||||
The ``otherpkgdir`` directory can be obtained by running lsdef on the osimage: ::
|
||||
|
||||
# lsdef -t osimage rhels7.5-ppc64le-netboot-cudafull -i otherpkgdir
|
||||
Object name: rhels7.5-ppc64le-netboot-cudafull
|
||||
otherpkgdir=/install/post/otherpkgs/rhels7.5/ppc64le
|
||||
|
||||
|
||||
Create a symbolic link of the CUDA repository in the directory specified by ``otherpkgdir`` ::
|
||||
|
||||
ln -s /install/cuda-9.2 /install/post/otherpkgs/rhels7.5/ppc64le/cuda-9.2
|
||||
@ -84,7 +84,7 @@ cudafull
|
||||
chdef -t osimage -o rhels7.5-ppc64le-netboot-cudafull \
|
||||
rootimgdir=/install/netboot/rhels7.5/ppc64le/cudafull
|
||||
|
||||
#. Create a custom pkglist file to install additional operating system packages for your CUDA node.
|
||||
#. Create a custom pkglist file to install additional operating system packages for your CUDA node.
|
||||
|
||||
#. Copy the default compute pkglist file as a starting point: ::
|
||||
|
||||
@ -111,7 +111,7 @@ cudafull
|
||||
#. Create the otherpkg.pkglist file for cudafull: ::
|
||||
|
||||
vi /install/custom/netboot/rh/cudafull.rhels7.ppc64le.otherpkgs.pkglist
|
||||
# add the following packages
|
||||
# add the following packages
|
||||
cuda-9.2/ppc64le/cuda-deps/dkms
|
||||
cuda-9.2/ppc64le/cuda-core/cuda
|
||||
|
||||
@ -137,7 +137,7 @@ cudaruntime
|
||||
| sed 's/netboot-compute:/netboot-cudaruntime:/' \
|
||||
| mkdef -z
|
||||
|
||||
#. Verify that the CUDA repo created previously is available in the directory specified by the ``otherpkgdir`` attribute.
|
||||
#. Verify that the CUDA repo created previously is available in the directory specified by the ``otherpkgdir`` attribute.
|
||||
|
||||
#. Obtain the ``otherpkgdir`` directory using the ``lsdef`` command: ::
|
||||
|
||||
@ -177,7 +177,6 @@ cudaruntime
|
||||
|
||||
packimage rhels7.5-ppc64le-netboot-cudaruntime
|
||||
|
||||
|
||||
POWER9 Setup
|
||||
------------
|
||||
|
||||
|
@ -1,11 +1,10 @@
|
||||
RHEL 7.5
|
||||
========
|
||||
|
||||
|
||||
#. Create a repository on the MN node installing the CUDA Toolkit: ::
|
||||
|
||||
# For cuda toolkit name: /path/to/cuda-repo-rhel7-9-2-local-9.2.64-1.ppc64le.rpm
|
||||
# extract the contents from the rpm
|
||||
# extract the contents from the rpm
|
||||
mkdir -p /tmp/cuda
|
||||
cd /tmp/cuda
|
||||
rpm2cpio /path/to/cuda-repo-rhel7-9-2-local-9.2.64-1.ppc64le.rpm | cpio -i -d
|
||||
@ -14,15 +13,15 @@ RHEL 7.5
|
||||
mkdir -p /install/cuda-9.2/ppc64le/cuda-core
|
||||
cp /tmp/cuda/var/cuda-repo-9-2-local/*.rpm /install/cuda-9.2/ppc64le/cuda-core
|
||||
|
||||
# Create the yum repo files
|
||||
# Create the yum repo files
|
||||
createrepo /install/cuda-9.2/ppc64le/cuda-core
|
||||
|
||||
|
||||
#. The NVIDIA CUDA Toolkit contains rpms that have dependencies on other external packages (such as ``DKMS``). These are provided by EPEL. It's up to the system administrator to obtain the dependency packages and add those to the ``cuda-deps`` directory: ::
|
||||
|
||||
mkdir -p /install/cuda-9.2/ppc64le/cuda-deps
|
||||
|
||||
# Copy the DKMS rpm to this directory
|
||||
cp /path/to/dkms-2.4.0-1.20170926git959bd74.el7.noarch.rpm /install/cuda-9.2/ppc64le/cuda-deps
|
||||
# Copy the DKMS rpm to this directory
|
||||
cp /path/to/dkms-2.4.0-1.20170926git959bd74.el7.noarch.rpm /install/cuda-9.2/ppc64le/cuda-deps
|
||||
|
||||
# Execute createrepo in this directory
|
||||
# Execute createrepo in this directory
|
||||
createrepo /install/cuda-9.2/ppc64le/cuda-deps
|
||||
|
@ -3,7 +3,7 @@ Update NVIDIA Driver
|
||||
|
||||
If the user wants to update the newer NVIDIA driver on the system, follow the :doc:`Create CUDA software repository </advanced/gpu/nvidia/repo/index>` document to create another repository for the new driver.
|
||||
|
||||
The following example assumes the new driver is in ``/install/cuda-9.2/ppc64le/nvidia_new``.
|
||||
The following example assumes the new driver is in ``/install/cuda-9.2/ppc64le/nvidia_new``.
|
||||
|
||||
Diskful
|
||||
-------
|
||||
@ -13,33 +13,26 @@ Diskful
|
||||
chdef -t osimage -o rhels7.5-ppc64le-install-cudafull \
|
||||
pkgdir=/install/cuda-9.2/ppc64le/nvidia_new,/install/cuda-9.2/ppc64le/cuda-deps
|
||||
|
||||
|
||||
#. Use xdsh command to remove all the NVIDIA rpms: ::
|
||||
|
||||
xdsh <noderange> "yum remove *nvidia* -y"
|
||||
|
||||
xdsh <noderange> "yum remove *nvidia* -y"
|
||||
|
||||
#. Run updatenode command to update NVIDIA driver on the compute node: ::
|
||||
|
||||
updatenode <noderange> -S
|
||||
|
||||
|
||||
#. Reboot compute node: ::
|
||||
|
||||
rpower <noderange> off
|
||||
rpower <noderange> on
|
||||
|
||||
|
||||
#. Verify the newer driver level: ::
|
||||
|
||||
nvidia-smi | grep Driver
|
||||
|
||||
|
||||
|
||||
|
||||
Diskless
|
||||
--------
|
||||
|
||||
To update a new NVIDIA driver on diskless compute nodes, re-generate the osimage pointing to the new NVIDIA driver repository and reboot the node to load the diskless image.
|
||||
To update a new NVIDIA driver on diskless compute nodes, re-generate the osimage pointing to the new NVIDIA driver repository and reboot the node to load the diskless image.
|
||||
|
||||
Refer to :doc:`Create osimage definitions </advanced/gpu/nvidia/osimage/index>` for specific instructions.
|
||||
Refer to :doc:`Create osimage definitions </advanced/gpu/nvidia/osimage/index>` for specific instructions.
|
||||
|
Loading…
x
Reference in New Issue
Block a user