mirror of
				https://github.com/xcat2/xcat-core.git
				synced 2025-10-26 17:05:33 +00:00 
			
		
		
		
	Remove tailing spaces
This commit is contained in:
		| @@ -1,15 +1,14 @@ | ||||
| Deploy CUDA nodes | ||||
| ================= | ||||
|  | ||||
| Diskful  | ||||
| Diskful | ||||
| ------- | ||||
|  | ||||
| * To provision diskful nodes using osimage ``rhels7.5-ppc64le-install-cudafull``: :: | ||||
|  | ||||
|     nodeset <noderange> osimage=rhels7.5-ppc64le-install-cudafull | ||||
|     rsetboot <noderange> net | ||||
|     rpower <noderange> boot  | ||||
|  | ||||
|     rpower <noderange> boot | ||||
|  | ||||
| Diskless | ||||
| -------- | ||||
| @@ -18,5 +17,4 @@ Diskless | ||||
|  | ||||
|     nodeset <noderange> osimage=rhels7.5-ppc64le-netboot-cudafull | ||||
|     rsetboot <noderange> net | ||||
|     rpower <noderange> boot  | ||||
|  | ||||
|     rpower <noderange> boot | ||||
|   | ||||
| @@ -7,7 +7,7 @@ For more information, see NVIDIAs website: https://developer.nvidia.com/cuda-zon | ||||
|  | ||||
| xCAT supports CUDA installation for Ubuntu 14.04.3 and RHEL 7.5 on PowerNV (Non-Virtualized) for both diskful and diskless nodes. | ||||
|  | ||||
| Within the NVIDIA CUDA Toolkit, installing the ``cuda`` package will install both the ``cuda-runtime`` and the ``cuda-toolkit``.  The ``cuda-toolkit`` is intended for developing CUDA programs and monitoring CUDA jobs.  If your particular installation requires only running GPU jobs, it's recommended to install only the ``cuda-runtime`` package.  | ||||
| Within the NVIDIA CUDA Toolkit, installing the ``cuda`` package will install both the ``cuda-runtime`` and the ``cuda-toolkit``.  The ``cuda-toolkit`` is intended for developing CUDA programs and monitoring CUDA jobs.  If your particular installation requires only running GPU jobs, it's recommended to install only the ``cuda-runtime`` package. | ||||
|  | ||||
| .. toctree:: | ||||
|    :maxdepth: 2 | ||||
|   | ||||
| @@ -1,7 +1,7 @@ | ||||
| RHEL 7.5 | ||||
| ======== | ||||
|  | ||||
| xCAT provides a sample package list (pkglist) files for CUDA. You can find them:  | ||||
| xCAT provides a sample package list (pkglist) files for CUDA. You can find them: | ||||
|  | ||||
|     * Diskful: ``/opt/xcat/share/xcat/install/rh/cuda*`` | ||||
|     * Diskless: ``/opt/xcat/share/xcat/netboot/rh/cuda*`` | ||||
| @@ -9,7 +9,7 @@ xCAT provides a sample package list (pkglist) files for CUDA. You can find them: | ||||
| Diskful images | ||||
| -------------- | ||||
|  | ||||
| The following examples will create diskful images for ``cudafull`` and ``cudaruntime``.  The osimage definitions will be created from the base ``rhels7.5-ppc64le-install-compute`` osimage.  | ||||
| The following examples will create diskful images for ``cudafull`` and ``cudaruntime``.  The osimage definitions will be created from the base ``rhels7.5-ppc64le-install-compute`` osimage. | ||||
|  | ||||
| **[Note]**: There is a requirement to reboot the machine after the CUDA drivers are installed.  To satisfy this requirement, the CUDA software is installed in the ``pkglist`` attribute of the osimage definition where a reboot will happen after the Operating System is installed. | ||||
|  | ||||
| @@ -20,7 +20,7 @@ cudafull | ||||
|  | ||||
|     lsdef -t osimage -z rhels7.5-ppc64le-install-compute \ | ||||
|       | sed 's/install-compute:/install-cudafull:/' \ | ||||
|       | mkdef -z  | ||||
|       | mkdef -z | ||||
|  | ||||
| #. Add the CUDA repo created in the previous step to the ``pkgdir`` attribute: :: | ||||
|  | ||||
| @@ -39,7 +39,7 @@ cudaruntime | ||||
|  | ||||
|     lsdef -t osimage -z rhels7.5-ppc64le-install-compute \ | ||||
|       | sed 's/install-compute:/install-cudaruntime:/' \ | ||||
|       | mkdef -z  | ||||
|       | mkdef -z | ||||
|  | ||||
| #. Add the CUDA repo created in the previous step to the ``pkgdir`` attribute: :: | ||||
|  | ||||
| @@ -54,9 +54,9 @@ cudaruntime | ||||
| Diskless images | ||||
| --------------- | ||||
|  | ||||
| The following examples will create diskless images for ``cudafull`` and ``cudaruntime``.  The osimage definitions will be created from the base ``rhels7.5-ppc64le-netboot-compute`` osimage.  | ||||
| The following examples will create diskless images for ``cudafull`` and ``cudaruntime``.  The osimage definitions will be created from the base ``rhels7.5-ppc64le-netboot-compute`` osimage. | ||||
|  | ||||
| **[Note]**: For diskless, the install of the CUDA packages MUST be done in the ``otherpkglist`` and **NOT** the ``pkglist`` as with diskful.  The requirement for rebooting the machine is not applicable in diskless nodes because the image is loaded on each reboot.  | ||||
| **[Note]**: For diskless, the install of the CUDA packages MUST be done in the ``otherpkglist`` and **NOT** the ``pkglist`` as with diskful.  The requirement for rebooting the machine is not applicable in diskless nodes because the image is loaded on each reboot. | ||||
|  | ||||
| cudafull | ||||
| ^^^^^^^^ | ||||
| @@ -65,16 +65,16 @@ cudafull | ||||
|  | ||||
|     lsdef -t osimage -z rhels7.5-ppc64le-netboot-compute \ | ||||
|       | sed 's/netboot-compute:/netboot-cudafull:/' \ | ||||
|       | mkdef -z  | ||||
|       | mkdef -z | ||||
|  | ||||
| #. Verify that the CUDA repo created in the previous step is available in the directory specified by the ``otherpkgdir`` attribute.   | ||||
| #. Verify that the CUDA repo created in the previous step is available in the directory specified by the ``otherpkgdir`` attribute. | ||||
|  | ||||
|    The ``otherpkgdir`` directory can be obtained by running lsdef on the osimage: :: | ||||
|  | ||||
|        # lsdef -t osimage rhels7.5-ppc64le-netboot-cudafull -i otherpkgdir | ||||
|        Object name: rhels7.5-ppc64le-netboot-cudafull | ||||
|            otherpkgdir=/install/post/otherpkgs/rhels7.5/ppc64le | ||||
|          | ||||
|  | ||||
|    Create a symbolic link of the CUDA repository in the directory specified by ``otherpkgdir`` :: | ||||
|  | ||||
|        ln -s /install/cuda-9.2 /install/post/otherpkgs/rhels7.5/ppc64le/cuda-9.2 | ||||
| @@ -84,7 +84,7 @@ cudafull | ||||
|     chdef -t osimage -o rhels7.5-ppc64le-netboot-cudafull \ | ||||
|        rootimgdir=/install/netboot/rhels7.5/ppc64le/cudafull | ||||
|  | ||||
| #. Create a custom pkglist file to install additional operating system packages for your CUDA node.  | ||||
| #. Create a custom pkglist file to install additional operating system packages for your CUDA node. | ||||
|  | ||||
|     #. Copy the default compute pkglist file as a starting point: :: | ||||
|  | ||||
| @@ -111,7 +111,7 @@ cudafull | ||||
|     #. Create the otherpkg.pkglist file for cudafull: :: | ||||
|  | ||||
|         vi /install/custom/netboot/rh/cudafull.rhels7.ppc64le.otherpkgs.pkglist | ||||
|         # add the following packages  | ||||
|         # add the following packages | ||||
|         cuda-9.2/ppc64le/cuda-deps/dkms | ||||
|         cuda-9.2/ppc64le/cuda-core/cuda | ||||
|  | ||||
| @@ -137,7 +137,7 @@ cudaruntime | ||||
|       | sed 's/netboot-compute:/netboot-cudaruntime:/' \ | ||||
|       | mkdef -z | ||||
|  | ||||
| #. Verify that the CUDA repo created previously is available in the directory specified by the ``otherpkgdir`` attribute.   | ||||
| #. Verify that the CUDA repo created previously is available in the directory specified by the ``otherpkgdir`` attribute. | ||||
|  | ||||
|     #. Obtain the ``otherpkgdir`` directory using the ``lsdef`` command: :: | ||||
|  | ||||
| @@ -177,7 +177,6 @@ cudaruntime | ||||
|  | ||||
|     packimage rhels7.5-ppc64le-netboot-cudaruntime | ||||
|  | ||||
|  | ||||
| POWER9 Setup | ||||
| ------------ | ||||
|  | ||||
|   | ||||
| @@ -1,11 +1,10 @@ | ||||
| RHEL 7.5 | ||||
| ======== | ||||
|  | ||||
|  | ||||
| #. Create a repository on the MN node installing the CUDA Toolkit: :: | ||||
|  | ||||
|     # For cuda toolkit name: /path/to/cuda-repo-rhel7-9-2-local-9.2.64-1.ppc64le.rpm | ||||
|     # extract the contents from the rpm  | ||||
|     # extract the contents from the rpm | ||||
|     mkdir -p /tmp/cuda | ||||
|     cd /tmp/cuda | ||||
|     rpm2cpio /path/to/cuda-repo-rhel7-9-2-local-9.2.64-1.ppc64le.rpm | cpio -i -d | ||||
| @@ -14,15 +13,15 @@ RHEL 7.5 | ||||
|     mkdir -p /install/cuda-9.2/ppc64le/cuda-core | ||||
|     cp /tmp/cuda/var/cuda-repo-9-2-local/*.rpm /install/cuda-9.2/ppc64le/cuda-core | ||||
|  | ||||
|     # Create the yum repo files  | ||||
|     # Create the yum repo files | ||||
|     createrepo /install/cuda-9.2/ppc64le/cuda-core | ||||
|      | ||||
|  | ||||
| #. The NVIDIA CUDA Toolkit contains rpms that have dependencies on other external packages (such as ``DKMS``).  These are provided by EPEL.  It's up to the system administrator to obtain the dependency packages and add those to the ``cuda-deps`` directory: :: | ||||
|  | ||||
|     mkdir -p /install/cuda-9.2/ppc64le/cuda-deps | ||||
|  | ||||
|     # Copy the DKMS rpm to this directory  | ||||
|     cp /path/to/dkms-2.4.0-1.20170926git959bd74.el7.noarch.rpm /install/cuda-9.2/ppc64le/cuda-deps   | ||||
|     # Copy the DKMS rpm to this directory | ||||
|     cp /path/to/dkms-2.4.0-1.20170926git959bd74.el7.noarch.rpm /install/cuda-9.2/ppc64le/cuda-deps | ||||
|  | ||||
|     # Execute createrepo in this directory  | ||||
|     # Execute createrepo in this directory | ||||
|     createrepo /install/cuda-9.2/ppc64le/cuda-deps | ||||
|   | ||||
| @@ -3,7 +3,7 @@ Update NVIDIA Driver | ||||
|  | ||||
| If the user wants to update the newer NVIDIA driver on the system,  follow the :doc:`Create CUDA software repository </advanced/gpu/nvidia/repo/index>` document to create another repository for the new driver. | ||||
|  | ||||
| The following example assumes the new driver is in ``/install/cuda-9.2/ppc64le/nvidia_new``.   | ||||
| The following example assumes the new driver is in ``/install/cuda-9.2/ppc64le/nvidia_new``. | ||||
|  | ||||
| Diskful | ||||
| ------- | ||||
| @@ -13,33 +13,26 @@ Diskful | ||||
|       chdef -t osimage -o rhels7.5-ppc64le-install-cudafull \ | ||||
|         pkgdir=/install/cuda-9.2/ppc64le/nvidia_new,/install/cuda-9.2/ppc64le/cuda-deps | ||||
|  | ||||
|  | ||||
| #.  Use xdsh command to remove all the NVIDIA rpms: :: | ||||
|      | ||||
|       xdsh <noderange> "yum remove *nvidia* -y" | ||||
|  | ||||
|       xdsh <noderange> "yum remove *nvidia* -y" | ||||
|  | ||||
| #.  Run updatenode command to update NVIDIA driver on the compute node: :: | ||||
|  | ||||
|       updatenode <noderange> -S | ||||
|  | ||||
|  | ||||
| #.  Reboot compute node: :: | ||||
|  | ||||
|       rpower <noderange> off | ||||
|       rpower <noderange> on | ||||
|  | ||||
|  | ||||
| #.  Verify the newer driver level: :: | ||||
|  | ||||
|       nvidia-smi | grep Driver | ||||
|  | ||||
|  | ||||
|  | ||||
|  | ||||
| Diskless | ||||
| -------- | ||||
|  | ||||
| To update a new NVIDIA driver on diskless compute nodes, re-generate the osimage pointing to the new NVIDIA driver repository and reboot the node to load the diskless image.   | ||||
| To update a new NVIDIA driver on diskless compute nodes, re-generate the osimage pointing to the new NVIDIA driver repository and reboot the node to load the diskless image. | ||||
|  | ||||
| Refer to :doc:`Create osimage definitions </advanced/gpu/nvidia/osimage/index>` for specific instructions.  | ||||
| Refer to :doc:`Create osimage definitions </advanced/gpu/nvidia/osimage/index>` for specific instructions. | ||||
|   | ||||
		Reference in New Issue
	
	Block a user