From dcb3ea4270ef8e0cbc6126131aff142de0e0560e Mon Sep 17 00:00:00 2001 From: GONG Jie Date: Thu, 22 Mar 2018 13:54:37 +0800 Subject: [PATCH 1/3] Update the CUDA installation document to CUDA 9.2 and RHEL 7.5 --- .../advanced/gpu/nvidia/deploy_cuda_node.rst | 8 +- docs/source/advanced/gpu/nvidia/index.rst | 2 +- .../advanced/gpu/nvidia/osimage/rhels.rst | 110 ++++++++++++------ .../source/advanced/gpu/nvidia/repo/rhels.rst | 25 ++-- .../gpu/nvidia/update_nvidia_driver.rst | 6 +- 5 files changed, 91 insertions(+), 60 deletions(-) diff --git a/docs/source/advanced/gpu/nvidia/deploy_cuda_node.rst b/docs/source/advanced/gpu/nvidia/deploy_cuda_node.rst index f20a6505c..b705e85d5 100644 --- a/docs/source/advanced/gpu/nvidia/deploy_cuda_node.rst +++ b/docs/source/advanced/gpu/nvidia/deploy_cuda_node.rst @@ -4,9 +4,9 @@ Deploy CUDA nodes Diskful ------- -* To provision diskful nodes using osimage ``rhels7.2-ppc64le-install-cudafull``: :: +* To provision diskful nodes using osimage ``rhels7.5-ppc64le-install-cudafull``: :: - nodeset osimage=rhels7.2-ppc64le-install-cudafull + nodeset osimage=rhels7.5-ppc64le-install-cudafull rsetboot net rpower boot @@ -14,9 +14,9 @@ Diskful Diskless -------- -* To provision diskless nodes using osimage ``rhels7.2-ppc64le-netboot-cudafull``: :: +* To provision diskless nodes using osimage ``rhels7.5-ppc64le-netboot-cudafull``: :: - nodeset osimage=rhels7.2-ppc64le-netboot-cudafull + nodeset osimage=rhels7.5-ppc64le-netboot-cudafull rsetboot net rpower boot diff --git a/docs/source/advanced/gpu/nvidia/index.rst b/docs/source/advanced/gpu/nvidia/index.rst index ea9459018..cf673106c 100644 --- a/docs/source/advanced/gpu/nvidia/index.rst +++ b/docs/source/advanced/gpu/nvidia/index.rst @@ -5,7 +5,7 @@ CUDA (Compute Unified Device Architecture) is a parallel computing platform and For more information, see NVIDIAs website: https://developer.nvidia.com/cuda-zone -xCAT supports CUDA installation for Ubuntu 14.04.3 and RHEL 7.2LE on PowerNV (Non-Virtualized) for both diskful and diskless nodes. +xCAT supports CUDA installation for Ubuntu 14.04.3 and RHEL 7.5 on PowerNV (Non-Virtualized) for both diskful and diskless nodes. Within the NVIDIA CUDA Toolkit, installing the ``cuda`` package will install both the ``cuda-runtime`` and the ``cuda-toolkit``. The ``cuda-toolkit`` is intended for developing CUDA programs and monitoring CUDA jobs. If your particular installation requires only running GPU jobs, it's recommended to install only the ``cuda-runtime`` package. diff --git a/docs/source/advanced/gpu/nvidia/osimage/rhels.rst b/docs/source/advanced/gpu/nvidia/osimage/rhels.rst index a39030f7a..163581ba6 100644 --- a/docs/source/advanced/gpu/nvidia/osimage/rhels.rst +++ b/docs/source/advanced/gpu/nvidia/osimage/rhels.rst @@ -1,5 +1,5 @@ -RHEL 7.2 LE -=========== +RHEL 7.5 +======== xCAT provides a sample package list (pkglist) files for CUDA. You can find them: @@ -7,9 +7,9 @@ xCAT provides a sample package list (pkglist) files for CUDA. You can find them: * Diskless: ``/opt/xcat/share/xcat/netboot/rh/cuda*`` Diskful images ---------------- +-------------- -The following examples will create diskful images for ``cudafull`` and ``cudaruntime``. The osimage definitions will be created from the base ``rhels7.2-ppc64le-install-compute`` osimage. +The following examples will create diskful images for ``cudafull`` and ``cudaruntime``. The osimage definitions will be created from the base ``rhels7.5-ppc64le-install-compute`` osimage. **[Note]**: There is a requirement to reboot the machine after the CUDA drivers are installed. To satisfy this requirement, the CUDA software is installed in the ``pkglist`` attribute of the osimage definition where a reboot will happen after the Operating System is installed. @@ -18,18 +18,18 @@ cudafull #. Create a copy of the ``install-compute`` image and label it ``cudafull``: :: - lsdef -t osimage -z rhels7.2-ppc64le-install-compute \ + lsdef -t osimage -z rhels7.5-ppc64le-install-compute \ | sed 's/install-compute:/install-cudafull:/' \ | mkdef -z #. Add the CUDA repo created in the previous step to the ``pkgdir`` attribute: :: - chdef -t osimage -o rhels7.2-ppc64le-install-cudafull -p \ - pkgdir=/install/cuda-7.5/ppc64le/cuda-core,/install/cuda-7.5/ppc64le/cuda-deps + chdef -t osimage -o rhels7.5-ppc64le-install-cudafull -p \ + pkgdir=/install/cuda-9.2/ppc64le/cuda-core,/install/cuda-9.2/ppc64le/cuda-deps #. Use the provided ``cudafull`` pkglist to install the CUDA packages: :: - chdef -t osimage -o rhels7.2-ppc64le-install-cudafull \ + chdef -t osimage -o rhels7.5-ppc64le-install-cudafull \ pkglist=/opt/xcat/share/xcat/install/rh/cudafull.rhels7.ppc64le.pkglist cudaruntime @@ -37,24 +37,24 @@ cudaruntime #. Create a copy of the ``install-compute`` image and label it ``cudaruntime``: :: - lsdef -t osimage -z rhels7.2-ppc64le-install-compute \ + lsdef -t osimage -z rhels7.5-ppc64le-install-compute \ | sed 's/install-compute:/install-cudaruntime:/' \ | mkdef -z #. Add the CUDA repo created in the previous step to the ``pkgdir`` attribute: :: - chdef -t osimage -o rhels7.2-ppc64le-install-cudaruntime -p \ - pkgdir=/install/cuda-7.5/ppc64le/cuda-core,/install/cuda-7.5/ppc64le/cuda-deps + chdef -t osimage -o rhels7.5-ppc64le-install-cudaruntime -p \ + pkgdir=/install/cuda-9.2/ppc64le/cuda-core,/install/cuda-9.2/ppc64le/cuda-deps #. Use the provided ``cudaruntime`` pkglist to install the CUDA packages: :: - chdef -t osimage -o rhels7.2-ppc64le-install-cudaruntime \ + chdef -t osimage -o rhels7.5-ppc64le-install-cudaruntime \ pkglist=/opt/xcat/share/xcat/instal/rh/cudaruntime.rhels7.ppc64le.pkglist Diskless images --------------- -The following examples will create diskless images for ``cudafull`` and ``cudaruntime``. The osimage definitions will be created from the base ``rhels7.2-ppc64le-netboot-compute`` osimage. +The following examples will create diskless images for ``cudafull`` and ``cudaruntime``. The osimage definitions will be created from the base ``rhels7.5-ppc64le-netboot-compute`` osimage. **[Note]**: For diskless, the install of the CUDA packages MUST be done in the ``otherpkglist`` and **NOT** the ``pkglist`` as with diskful. The requirement for rebooting the machine is not applicable in diskless nodes because the image is loaded on each reboot. @@ -63,7 +63,7 @@ cudafull #. Create a copy of the ``netboot-compute`` image and label it ``cudafull``: :: - lsdef -t osimage -z rhels7.2-ppc64le-netboot-compute \ + lsdef -t osimage -z rhels7.5-ppc64le-netboot-compute \ | sed 's/netboot-compute:/netboot-cudafull:/' \ | mkdef -z @@ -71,18 +71,18 @@ cudafull The ``otherpkgdir`` directory can be obtained by running lsdef on the osimage: :: - # lsdef -t osimage rhels7.2-ppc64le-netboot-cudafull -i otherpkgdir - Object name: rhels7.2-ppc64le-netboot-cudafull - otherpkgdir=/install/post/otherpkgs/rhels7.2/ppc64le + # lsdef -t osimage rhels7.5-ppc64le-netboot-cudafull -i otherpkgdir + Object name: rhels7.5-ppc64le-netboot-cudafull + otherpkgdir=/install/post/otherpkgs/rhels7.5/ppc64le Create a symbolic link of the CUDA repository in the directory specified by ``otherpkgdir`` :: - ln -s /install/cuda-7.5 /install/post/otherpkgs/rhels7.2/ppc64le/cuda-7.5 + ln -s /install/cuda-9.2 /install/post/otherpkgs/rhels7.5/ppc64le/cuda-9.2 #. Change the ``rootimgdir`` for the cudafull osimage: :: - chdef -t osimage -o rhels7.2-ppc64le-netboot-cudafull \ - rootimgdir=/install/netboot/rhels7.2/ppc64le/cudafull + chdef -t osimage -o rhels7.5-ppc64le-netboot-cudafull \ + rootimgdir=/install/netboot/rhels7.5/ppc64le/cudafull #. Create a custom pkglist file to install additional operating system packages for your CUDA node. @@ -102,7 +102,7 @@ cudafull #. Set the new file as the ``pkglist`` attribute for the cudafull osimage: :: - chdef -t osimage -o rhels7.2-ppc64le-netboot-cudafull \ + chdef -t osimage -o rhels7.5-ppc64le-netboot-cudafull \ pkglist=/install/custom/netboot/rh/cudafull.rhels7.ppc64le.pkglist @@ -112,28 +112,28 @@ cudafull vi /install/custom/netboot/rh/cudafull.rhels7.ppc64le.otherpkgs.pkglist # add the following packages - cuda-7.5/ppc64le/cuda-deps/dkms - cuda-7.5/ppc64le/cuda-core/cuda + cuda-9.2/ppc64le/cuda-deps/dkms + cuda-9.2/ppc64le/cuda-core/cuda #. Set the ``otherpkg.pkglist`` attribute for the cudafull osimage: :: - chdef -t osimage -o rhels7.2-ppc64le-netboot-cudafull \ + chdef -t osimage -o rhels7.5-ppc64le-netboot-cudafull \ otherpkglist=/install/custom/netboot/rh/cudafull.rhels7.ppc64le.otherpkgs.pkglist #. Generate the image: :: - genimage rhels7.2-ppc64le-netboot-cudafull + genimage rhels7.5-ppc64le-netboot-cudafull #. Package the image: :: - packimage rhels7.2-ppc64le-netboot-cudafull + packimage rhels7.5-ppc64le-netboot-cudafull cudaruntime ^^^^^^^^^^^ #. Create a copy of the ``netboot-compute`` image and label it ``cudaruntime``: :: - lsdef -t osimage -z rhels7.2-ppc64le-netboot-compute \ + lsdef -t osimage -z rhels7.5-ppc64le-netboot-compute \ | sed 's/netboot-compute:/netboot-cudaruntime:/' \ | mkdef -z @@ -141,18 +141,18 @@ cudaruntime #. Obtain the ``otherpkgdir`` directory using the ``lsdef`` command: :: - # lsdef -t osimage rhels7.2-ppc64le-netboot-cudaruntime -i otherpkgdir - Object name: rhels7.2-ppc64le-netboot-cudaruntime - otherpkgdir=/install/post/otherpkgs/rhels7.2/ppc64le + # lsdef -t osimage rhels7.5-ppc64le-netboot-cudaruntime -i otherpkgdir + Object name: rhels7.5-ppc64le-netboot-cudaruntime + otherpkgdir=/install/post/otherpkgs/rhels7.5/ppc64le #. Create a symbolic link to the CUDA repository in the directory specified by ``otherpkgdir`` :: - ln -s /install/cuda-7.5 /install/post/otherpkgs/rhels7.2/ppc64le/cuda-7.5 + ln -s /install/cuda-9.2 /install/post/otherpkgs/rhels7.5/ppc64le/cuda-9.2 #. Change the ``rootimgdir`` for the cudaruntime osimage: :: - chdef -t osimage -o rhels7.2-ppc64le-netboot-cudaruntime \ - rootimgdir=/install/netboot/rhels7.2/ppc64le/cudaruntime + chdef -t osimage -o rhels7.5-ppc64le-netboot-cudaruntime \ + rootimgdir=/install/netboot/rhels7.5/ppc64le/cudaruntime #. Create the ``otherpkg.pkglist`` file to do the install of the CUDA runtime packages: @@ -161,19 +161,53 @@ cudaruntime vi /install/custom/netboot/rh/cudaruntime.rhels7.ppc64le.otherpkgs.pkglist # Add the following packages: - cuda-7.5/ppc64le/cuda-deps/dkms - cuda-7.5/ppc64le/cuda-core/cuda-runtime-7-5 + cuda-9.2/ppc64le/cuda-deps/dkms + cuda-9.2/ppc64le/cuda-core/cuda-runtime-9-2 #. Set the ``otherpkg.pkglist`` attribute for the cudaruntime osimage: :: - chdef -t osimage -o rhels7.2-ppc64le-netboot-cudaruntime \ + chdef -t osimage -o rhels7.5-ppc64le-netboot-cudaruntime \ otherpkglist=/install/custom/netboot/rh/cudaruntime.rhels7.ppc64le.otherpkgs.pkglist #. Generate the image: :: - genimage rhels7.2-ppc64le-netboot-cudaruntime + genimage rhels7.5-ppc64le-netboot-cudaruntime #. Package the image: :: - packimage rhels7.2-ppc64le-netboot-cudaruntime + packimage rhels7.5-ppc64le-netboot-cudaruntime + +POWER9 Setup +------------ + +NVIDIA POWER9 CUDA driver need some additional setup. Refer the URL below for details. + +http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#power9-setup + +xCAT includes a script, `cuda_power9_setup` as example, to help user handle this situation. + +Diskful osimage +^^^^^^^^^^^^^^^ + +For diskful deployment, there is no need to change the osimage definition. Instead, add this postscript to your compute node postbootscrtips list. + + chdef p9compute -p postbootscripts=cuda_power9_setup + +Disless osimage +^^^^^^^^^^^^^^^ + +For diskless deployment, the script need to add to the postinstall script of the osimage. And it should be run in the chroot environment. Please refer the following commands as an example. + + mkdir -p /install/custom/netboot + cp /opt/xcat/share/xcat/netboot/rh/compute.rhels7.ppc64le.postinstall /opt/xcat/share/xcat/netboot/rh/compute.rhels7.ppc64le.postinstall + + cat >>/install/custom/netboot/rh/cudafull.rhels7.ppc64le.postinstall <-EOF + + cp /install/postscripts/cuda_power9_setup /install/netboot/rhels7.5/ppc64le/compute/rootimg/tmp/cuda_power9_setup" + chroot /install/netboot/rhels7.5/ppc64le/compute/rootimg" /tmp/cuda_power9_setup + + rm -f /install/netboot/rhels7.5/ppc64le/compute/rootimg/tmp/cuda_power9_setup + EOF + + chdef -t osimage rhels7.5-ppc64le-netboot-cudafull postinstall=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.ppc64le.postinstall diff --git a/docs/source/advanced/gpu/nvidia/repo/rhels.rst b/docs/source/advanced/gpu/nvidia/repo/rhels.rst index 7cedc2701..056ac682a 100644 --- a/docs/source/advanced/gpu/nvidia/repo/rhels.rst +++ b/docs/source/advanced/gpu/nvidia/repo/rhels.rst @@ -1,31 +1,28 @@ -RHEL 7.2 LE -=========== +RHEL 7.5 +======== #. Create a repository on the MN node installing the CUDA Toolkit: :: - # For cuda toolkit name: /root/cuda-repo-rhel7-7-5-local-7.5-18.ppc64le.rpm + # For cuda toolkit name: /path/to/cuda-repo-rhel7-9-2-local-9.2.64-1.ppc64le.rpm # extract the contents from the rpm mkdir -p /tmp/cuda cd /tmp/cuda - rpm2cpio /root/cuda-repo-rhel7-7-5-local-7.5-18.ppc64le.rpm | cpio -i -d + rpm2cpio /path/to/cuda-repo-rhel7-9-2-local-9.2.64-1.ppc64le.rpm | cpio -i -d - # Create the repo directory under xCAT /install dir for cuda 7.5 - mkdir -p /install/cuda-7.5/ppc64le/cuda-core - cp -r /tmp/cuda/var/cuda-repo-7-5-local/* /install/cuda-7.5/ppc64le/cuda-core + # Create the repo directory under xCAT /install dir for cuda 9.2 + mkdir -p /install/cuda-9.2/ppc64le/cuda-core + cp /tmp/cuda/var/cuda-repo-9-2-local/*.rpm /install/cuda-9.2/ppc64le/cuda-core # Create the yum repo files - createrepo /install/cuda-7.5/ppc64le/cuda-core + createrepo /install/cuda-9.2/ppc64le/cuda-core #. The NVIDIA CUDA Toolkit contains rpms that have dependencies on other external packages (such as ``DKMS``). These are provided by EPEL. It's up to the system administrator to obtain the dependency packages and add those to the ``cuda-deps`` directory: :: - mkdir -p /install/cuda-7.5/ppc64le/cuda-deps - cd /install/cuda-7.5/ppc64le/cuda-deps + mkdir -p /install/cuda-9.2/ppc64le/cuda-deps # Copy the DKMS rpm to this directory - ls - dkms-2.2.0.3-30.git.7c3e7c5.el7.noarch.rpm + cp /path/to/dkms-2.4.0-1.20170926git959bd74.el7.noarch.rpm /install/cuda-9.2/ppc64le/cuda-deps # Execute createrepo in this directory - createrepo /install/cuda-7.5/ppc64le/cuda-deps - + createrepo /install/cuda-9.2/ppc64le/cuda-deps diff --git a/docs/source/advanced/gpu/nvidia/update_nvidia_driver.rst b/docs/source/advanced/gpu/nvidia/update_nvidia_driver.rst index d7b726c05..5d2753e2d 100644 --- a/docs/source/advanced/gpu/nvidia/update_nvidia_driver.rst +++ b/docs/source/advanced/gpu/nvidia/update_nvidia_driver.rst @@ -3,15 +3,15 @@ Update NVIDIA Driver If the user wants to update the newer NVIDIA driver on the system, follow the :doc:`Create CUDA software repository ` document to create another repository for the new driver. -The following example assumes the new driver is in ``/install/cuda-7.5/ppc64le/nvidia_new``. +The following example assumes the new driver is in ``/install/cuda-9.2/ppc64le/nvidia_new``. Diskful ------- #. Change pkgdir for the cuda image: :: - chdef -t osimage -o rhels7.2-ppc64le-install-cudafull \ - pkgdir=/install/cuda-7.5/ppc64le/nvidia_new,/install/cuda-7.5/ppc64le/cuda-deps + chdef -t osimage -o rhels7.5-ppc64le-install-cudafull \ + pkgdir=/install/cuda-9.2/ppc64le/nvidia_new,/install/cuda-9.2/ppc64le/cuda-deps #. Use xdsh command to remove all the NVIDIA rpms: :: From 7e6290fb7d658e8b70da1491cbac18118963f684 Mon Sep 17 00:00:00 2001 From: GONG Jie Date: Thu, 22 Mar 2018 14:00:30 +0800 Subject: [PATCH 2/3] Remove tailing spaces --- .../advanced/gpu/nvidia/deploy_cuda_node.rst | 8 +++--- docs/source/advanced/gpu/nvidia/index.rst | 2 +- .../advanced/gpu/nvidia/osimage/rhels.rst | 25 +++++++++---------- .../source/advanced/gpu/nvidia/repo/rhels.rst | 13 +++++----- .../gpu/nvidia/update_nvidia_driver.rst | 15 +++-------- 5 files changed, 26 insertions(+), 37 deletions(-) diff --git a/docs/source/advanced/gpu/nvidia/deploy_cuda_node.rst b/docs/source/advanced/gpu/nvidia/deploy_cuda_node.rst index b705e85d5..2ea89f7ae 100644 --- a/docs/source/advanced/gpu/nvidia/deploy_cuda_node.rst +++ b/docs/source/advanced/gpu/nvidia/deploy_cuda_node.rst @@ -1,15 +1,14 @@ Deploy CUDA nodes ================= -Diskful +Diskful ------- * To provision diskful nodes using osimage ``rhels7.5-ppc64le-install-cudafull``: :: nodeset osimage=rhels7.5-ppc64le-install-cudafull rsetboot net - rpower boot - + rpower boot Diskless -------- @@ -18,5 +17,4 @@ Diskless nodeset osimage=rhels7.5-ppc64le-netboot-cudafull rsetboot net - rpower boot - + rpower boot diff --git a/docs/source/advanced/gpu/nvidia/index.rst b/docs/source/advanced/gpu/nvidia/index.rst index cf673106c..d4306710e 100644 --- a/docs/source/advanced/gpu/nvidia/index.rst +++ b/docs/source/advanced/gpu/nvidia/index.rst @@ -7,7 +7,7 @@ For more information, see NVIDIAs website: https://developer.nvidia.com/cuda-zon xCAT supports CUDA installation for Ubuntu 14.04.3 and RHEL 7.5 on PowerNV (Non-Virtualized) for both diskful and diskless nodes. -Within the NVIDIA CUDA Toolkit, installing the ``cuda`` package will install both the ``cuda-runtime`` and the ``cuda-toolkit``. The ``cuda-toolkit`` is intended for developing CUDA programs and monitoring CUDA jobs. If your particular installation requires only running GPU jobs, it's recommended to install only the ``cuda-runtime`` package. +Within the NVIDIA CUDA Toolkit, installing the ``cuda`` package will install both the ``cuda-runtime`` and the ``cuda-toolkit``. The ``cuda-toolkit`` is intended for developing CUDA programs and monitoring CUDA jobs. If your particular installation requires only running GPU jobs, it's recommended to install only the ``cuda-runtime`` package. .. toctree:: :maxdepth: 2 diff --git a/docs/source/advanced/gpu/nvidia/osimage/rhels.rst b/docs/source/advanced/gpu/nvidia/osimage/rhels.rst index 163581ba6..5a3e59ffe 100644 --- a/docs/source/advanced/gpu/nvidia/osimage/rhels.rst +++ b/docs/source/advanced/gpu/nvidia/osimage/rhels.rst @@ -1,7 +1,7 @@ RHEL 7.5 ======== -xCAT provides a sample package list (pkglist) files for CUDA. You can find them: +xCAT provides a sample package list (pkglist) files for CUDA. You can find them: * Diskful: ``/opt/xcat/share/xcat/install/rh/cuda*`` * Diskless: ``/opt/xcat/share/xcat/netboot/rh/cuda*`` @@ -9,7 +9,7 @@ xCAT provides a sample package list (pkglist) files for CUDA. You can find them: Diskful images -------------- -The following examples will create diskful images for ``cudafull`` and ``cudaruntime``. The osimage definitions will be created from the base ``rhels7.5-ppc64le-install-compute`` osimage. +The following examples will create diskful images for ``cudafull`` and ``cudaruntime``. The osimage definitions will be created from the base ``rhels7.5-ppc64le-install-compute`` osimage. **[Note]**: There is a requirement to reboot the machine after the CUDA drivers are installed. To satisfy this requirement, the CUDA software is installed in the ``pkglist`` attribute of the osimage definition where a reboot will happen after the Operating System is installed. @@ -20,7 +20,7 @@ cudafull lsdef -t osimage -z rhels7.5-ppc64le-install-compute \ | sed 's/install-compute:/install-cudafull:/' \ - | mkdef -z + | mkdef -z #. Add the CUDA repo created in the previous step to the ``pkgdir`` attribute: :: @@ -39,7 +39,7 @@ cudaruntime lsdef -t osimage -z rhels7.5-ppc64le-install-compute \ | sed 's/install-compute:/install-cudaruntime:/' \ - | mkdef -z + | mkdef -z #. Add the CUDA repo created in the previous step to the ``pkgdir`` attribute: :: @@ -54,9 +54,9 @@ cudaruntime Diskless images --------------- -The following examples will create diskless images for ``cudafull`` and ``cudaruntime``. The osimage definitions will be created from the base ``rhels7.5-ppc64le-netboot-compute`` osimage. +The following examples will create diskless images for ``cudafull`` and ``cudaruntime``. The osimage definitions will be created from the base ``rhels7.5-ppc64le-netboot-compute`` osimage. -**[Note]**: For diskless, the install of the CUDA packages MUST be done in the ``otherpkglist`` and **NOT** the ``pkglist`` as with diskful. The requirement for rebooting the machine is not applicable in diskless nodes because the image is loaded on each reboot. +**[Note]**: For diskless, the install of the CUDA packages MUST be done in the ``otherpkglist`` and **NOT** the ``pkglist`` as with diskful. The requirement for rebooting the machine is not applicable in diskless nodes because the image is loaded on each reboot. cudafull ^^^^^^^^ @@ -65,16 +65,16 @@ cudafull lsdef -t osimage -z rhels7.5-ppc64le-netboot-compute \ | sed 's/netboot-compute:/netboot-cudafull:/' \ - | mkdef -z + | mkdef -z -#. Verify that the CUDA repo created in the previous step is available in the directory specified by the ``otherpkgdir`` attribute. +#. Verify that the CUDA repo created in the previous step is available in the directory specified by the ``otherpkgdir`` attribute. The ``otherpkgdir`` directory can be obtained by running lsdef on the osimage: :: # lsdef -t osimage rhels7.5-ppc64le-netboot-cudafull -i otherpkgdir Object name: rhels7.5-ppc64le-netboot-cudafull otherpkgdir=/install/post/otherpkgs/rhels7.5/ppc64le - + Create a symbolic link of the CUDA repository in the directory specified by ``otherpkgdir`` :: ln -s /install/cuda-9.2 /install/post/otherpkgs/rhels7.5/ppc64le/cuda-9.2 @@ -84,7 +84,7 @@ cudafull chdef -t osimage -o rhels7.5-ppc64le-netboot-cudafull \ rootimgdir=/install/netboot/rhels7.5/ppc64le/cudafull -#. Create a custom pkglist file to install additional operating system packages for your CUDA node. +#. Create a custom pkglist file to install additional operating system packages for your CUDA node. #. Copy the default compute pkglist file as a starting point: :: @@ -111,7 +111,7 @@ cudafull #. Create the otherpkg.pkglist file for cudafull: :: vi /install/custom/netboot/rh/cudafull.rhels7.ppc64le.otherpkgs.pkglist - # add the following packages + # add the following packages cuda-9.2/ppc64le/cuda-deps/dkms cuda-9.2/ppc64le/cuda-core/cuda @@ -137,7 +137,7 @@ cudaruntime | sed 's/netboot-compute:/netboot-cudaruntime:/' \ | mkdef -z -#. Verify that the CUDA repo created previously is available in the directory specified by the ``otherpkgdir`` attribute. +#. Verify that the CUDA repo created previously is available in the directory specified by the ``otherpkgdir`` attribute. #. Obtain the ``otherpkgdir`` directory using the ``lsdef`` command: :: @@ -177,7 +177,6 @@ cudaruntime packimage rhels7.5-ppc64le-netboot-cudaruntime - POWER9 Setup ------------ diff --git a/docs/source/advanced/gpu/nvidia/repo/rhels.rst b/docs/source/advanced/gpu/nvidia/repo/rhels.rst index 056ac682a..ec98e9490 100644 --- a/docs/source/advanced/gpu/nvidia/repo/rhels.rst +++ b/docs/source/advanced/gpu/nvidia/repo/rhels.rst @@ -1,11 +1,10 @@ RHEL 7.5 ======== - #. Create a repository on the MN node installing the CUDA Toolkit: :: # For cuda toolkit name: /path/to/cuda-repo-rhel7-9-2-local-9.2.64-1.ppc64le.rpm - # extract the contents from the rpm + # extract the contents from the rpm mkdir -p /tmp/cuda cd /tmp/cuda rpm2cpio /path/to/cuda-repo-rhel7-9-2-local-9.2.64-1.ppc64le.rpm | cpio -i -d @@ -14,15 +13,15 @@ RHEL 7.5 mkdir -p /install/cuda-9.2/ppc64le/cuda-core cp /tmp/cuda/var/cuda-repo-9-2-local/*.rpm /install/cuda-9.2/ppc64le/cuda-core - # Create the yum repo files + # Create the yum repo files createrepo /install/cuda-9.2/ppc64le/cuda-core - + #. The NVIDIA CUDA Toolkit contains rpms that have dependencies on other external packages (such as ``DKMS``). These are provided by EPEL. It's up to the system administrator to obtain the dependency packages and add those to the ``cuda-deps`` directory: :: mkdir -p /install/cuda-9.2/ppc64le/cuda-deps - # Copy the DKMS rpm to this directory - cp /path/to/dkms-2.4.0-1.20170926git959bd74.el7.noarch.rpm /install/cuda-9.2/ppc64le/cuda-deps + # Copy the DKMS rpm to this directory + cp /path/to/dkms-2.4.0-1.20170926git959bd74.el7.noarch.rpm /install/cuda-9.2/ppc64le/cuda-deps - # Execute createrepo in this directory + # Execute createrepo in this directory createrepo /install/cuda-9.2/ppc64le/cuda-deps diff --git a/docs/source/advanced/gpu/nvidia/update_nvidia_driver.rst b/docs/source/advanced/gpu/nvidia/update_nvidia_driver.rst index 5d2753e2d..ad6a31a63 100644 --- a/docs/source/advanced/gpu/nvidia/update_nvidia_driver.rst +++ b/docs/source/advanced/gpu/nvidia/update_nvidia_driver.rst @@ -3,7 +3,7 @@ Update NVIDIA Driver If the user wants to update the newer NVIDIA driver on the system, follow the :doc:`Create CUDA software repository ` document to create another repository for the new driver. -The following example assumes the new driver is in ``/install/cuda-9.2/ppc64le/nvidia_new``. +The following example assumes the new driver is in ``/install/cuda-9.2/ppc64le/nvidia_new``. Diskful ------- @@ -13,33 +13,26 @@ Diskful chdef -t osimage -o rhels7.5-ppc64le-install-cudafull \ pkgdir=/install/cuda-9.2/ppc64le/nvidia_new,/install/cuda-9.2/ppc64le/cuda-deps - #. Use xdsh command to remove all the NVIDIA rpms: :: - - xdsh "yum remove *nvidia* -y" + xdsh "yum remove *nvidia* -y" #. Run updatenode command to update NVIDIA driver on the compute node: :: updatenode -S - #. Reboot compute node: :: rpower off rpower on - #. Verify the newer driver level: :: nvidia-smi | grep Driver - - - Diskless -------- -To update a new NVIDIA driver on diskless compute nodes, re-generate the osimage pointing to the new NVIDIA driver repository and reboot the node to load the diskless image. +To update a new NVIDIA driver on diskless compute nodes, re-generate the osimage pointing to the new NVIDIA driver repository and reboot the node to load the diskless image. -Refer to :doc:`Create osimage definitions ` for specific instructions. +Refer to :doc:`Create osimage definitions ` for specific instructions. From f7cf702cfbfe4d61fa356b79f88184d81df9a1d4 Mon Sep 17 00:00:00 2001 From: GONG Jie Date: Thu, 22 Mar 2018 14:01:26 +0800 Subject: [PATCH 3/3] Mionr tweak text format --- docs/source/advanced/gpu/nvidia/osimage/rhels.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/advanced/gpu/nvidia/osimage/rhels.rst b/docs/source/advanced/gpu/nvidia/osimage/rhels.rst index 5a3e59ffe..77dee5f3f 100644 --- a/docs/source/advanced/gpu/nvidia/osimage/rhels.rst +++ b/docs/source/advanced/gpu/nvidia/osimage/rhels.rst @@ -184,7 +184,7 @@ NVIDIA POWER9 CUDA driver need some additional setup. Refer the URL below for de http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#power9-setup -xCAT includes a script, `cuda_power9_setup` as example, to help user handle this situation. +xCAT includes a script, ``cuda_power9_setup`` as example, to help user handle this situation. Diskful osimage ^^^^^^^^^^^^^^^