mirror of
https://github.com/xcat2/xcat-core.git
synced 2025-07-30 08:11:20 +00:00
Cleanup service node setup docs
This commit is contained in:
@@ -1,66 +1,18 @@
|
||||
Appendix B: Diagnostics
|
||||
=======================
|
||||
|
||||
* **root ssh keys not setup** -- If you are prompted for a password when ssh to
|
||||
the service node, then check to see if /root/.ssh has authorized_keys. If
|
||||
the directory does not exist or no keys, on the MN, run xdsh service -K,
|
||||
to exchange the ssh keys for root. You will be prompted for the root
|
||||
password, which should be the password you set for the key=system in the
|
||||
passwd table.
|
||||
* **XCAT rpms not on SN** --On the SN, run rpm -qa | grep xCAT and make sure
|
||||
the appropriate xCAT rpms are installed on the servicenode. See the list of
|
||||
xCAT rpms in :ref:`setup_service_node_stateful_label`. If rpms
|
||||
missing check your install setup as outlined in Build the Service Node
|
||||
Stateless Image for diskless or :ref:`setup_service_node_stateful_label` for
|
||||
diskful installs.
|
||||
* **otherpkgs(including xCAT rpms) installation failed on the SN** --The OS
|
||||
repository is not created on the SN. When the "yum" command is processing
|
||||
the dependency, the rpm packages (including expect, nmap, and httpd, etc)
|
||||
required by xCATsn can't be found. In this case, check whether the
|
||||
``/install/postscripts/repos/<osver>/<arch>/`` directory exists on the MN.
|
||||
If it is not on the MN, you need to re-run the "copycds" command, and there
|
||||
will be some file created under the
|
||||
``/install/postscripts/repos/<osver>/<arch>`` directory on the MN. Then, you
|
||||
need to re-install the SN, and this issue should be gone.
|
||||
* **Error finding the database/starting xcatd** -- If on the Service node when
|
||||
you run tabdump site, you get "Connection failure: IO::Socket::SSL:
|
||||
connect: Connection refused at ``/opt/xcat/lib/perl/xCAT/Client.pm``". Then
|
||||
restart the xcatd daemon and see if it passes by running the command:
|
||||
service xcatd restart. If it fails with the same error, then check to see
|
||||
if ``/etc/xcat/cfgloc`` file exists. It should exist and be the same as
|
||||
``/etc/xcat/cfgloc`` on the MN. If it is not there, copy it from the MN to
|
||||
the SN. The run service xcatd restart. This indicates the servicenode
|
||||
postscripts did not complete successfully. Check to see your postscripts
|
||||
table was setup correctly in :ref:`add_service_node_postscripts_label` to the
|
||||
postscripts table.
|
||||
* **Error accessing database/starting xcatd credential failure**-- If you run
|
||||
tabdump site on the servicenode and you get "Connection failure:
|
||||
IO::Socket::SSL: SSL connect attempt failed because of handshake
|
||||
problemserror:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca
|
||||
at ``/opt/xcat/lib/perl/xCAT/Client.pm``", check ``/etc/xcat/cert``. The
|
||||
directory should contain the files ca.pem and server-cred.pem. These were
|
||||
suppose to transfer from the MN ``/etc/xcat/cert`` directory during the
|
||||
install. Also check the ``/etc/xcat/ca`` directory. This directory should
|
||||
contain most files from the ``/etc/xcat/ca`` directory on the MN. You can
|
||||
manually copy them from the MN to the SN, recursively. This indicates the
|
||||
the servicenode postscripts did not complete successfully. Check to see
|
||||
your postscripts table was setup correctly in
|
||||
:ref:`add_service_node_postscripts_label` to the postscripts table. Again
|
||||
service xcatd restart and try the tabdump site again.
|
||||
* **Missing ssh hostkeys** -- Check to see if ``/etc/xcat/hostkeys`` on the SN,
|
||||
has the same files as ``/etc/xcat/hostkeys`` on the MN. These are the ssh
|
||||
keys that will be installed on the compute nodes, so root can ssh between
|
||||
compute nodes without password prompting. If they are not there copy them
|
||||
from the MN to the SN. Again, these should have been setup by the
|
||||
servicenode postscripts.
|
||||
* **root ssh keys not setup** -- If you are prompted for a password when ssh to the service node, then check to see if ``/root/.ssh`` directory on MN has ``authorized_keys`` file. If the directory does not exist or no keys, run ``xdsh service -K``, to exchange the ssh keys for root. You will be prompted for the root password, which should be the password you set for the ``key=system`` in the passwd table.
|
||||
|
||||
* **Errors running hierarchical commands such as xdsh** -- xCAT has a number of
|
||||
commands that run hierarchically. That is, the commands are sent from xcatd
|
||||
on the management node to the correct service node xcatd, which in turn
|
||||
processes the command and sends the results back to xcatd on the management
|
||||
node. If a hierarchical command such as xcatd fails with something like
|
||||
"Error: Permission denied for request", check ``/var/log/messages`` on the
|
||||
management node for errors. One error might be "Request matched no policy
|
||||
rule". This may mean you will need to add policy table entries for your
|
||||
xCAT management node and service node:
|
||||
* **XCAT rpms not on SN** -- On the SN, run ``rpm -qa | grep xCAT`` and make sure the appropriate xCAT rpms are installed on the servicenode. See the list of xCAT rpms in :ref:`setup_service_node_stateful_label`. If rpms are missing, check your install setup as outlined in :ref:`setup_service_node_stateless_label` for diskless or :ref:`setup_service_node_stateful_label` for diskful installs.
|
||||
|
||||
* **otherpkgs(including xCAT rpms) installation failed on the SN** -- The OS repository is not created on the SN. When the "yum" command is processing the dependency, the rpm packages (including expect, nmap, and httpd, etc) required by xCATsn can't be found. In this case, check whether the ``/install/postscripts/repos/<osver>/<arch>/`` directory exists on the MN. If it is not on the MN, you need to re-run the ``copycds`` command, and there will be files created under the ``/install/postscripts/repos/<osver>/<arch>`` directory on the MN. Then, you need to re-install the SN.
|
||||
|
||||
* **Error finding the database/starting xcatd** -- If on the Service node when you run tabdump site, you get "Connection failure: IO::Socket::SSL: connect: Connection refused at ``/opt/xcat/lib/perl/xCAT/Client.pm``". Then restart the xcatd daemon and see if it passes by running the command ``service xcatd restart``. If it fails with the same error, then check to see if ``/etc/xcat/cfgloc`` file exists. It should exist and be the same as ``/etc/xcat/cfgloc`` on the MN. If it is not there, copy it from the MN to the SN. The run ``service xcatd restart``. This indicates the servicenode postscripts did not complete successfully. Run ``lsdef <service node> -i postscripts -c`` and verify ``servicenode`` postscript appears on the list..
|
||||
|
||||
* **Error accessing database/starting xcatd credential failure**-- If you run ``tabdump site`` on the service node and get "Connection failure: IO::Socket::SSL: SSL connect attempt failed because of handshake problemserror:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown at ``/opt/xcat/lib/perl/xCAT/Client.pm``", check ``/etc/xcat/cert``. The directory should contain the files ``ca.pem`` and ``server-cred.pem``. These were suppose to transfer from the MN ``/etc/xcat/cert`` directory during the install. Also check the ``/etc/xcat/ca`` directory. This directory should contain most files from the ``/etc/xcat/ca`` directory on the MN. You can manually copy them from the MN to the SN, recursively. This indicates the the servicenode postscripts did not complete successfully. Run ``lsdef <service node> -i postscripts -c`` and verify ``servicenode`` postscript appears on the list. Run ``service xcatd restart`` again and try the tabdump site again.
|
||||
|
||||
* **Missing ssh hostkeys** -- Check to see if ``/etc/xcat/hostkeys`` on the SN, has the same files as ``/etc/xcat/hostkeys`` on the MN. These are the ssh keys that will be installed on the compute nodes, so root can ssh between compute nodes without password prompting. If they are not there copy them from the MN to the SN. Again, these should have been setup by the servicenode postscripts.
|
||||
|
||||
* **Errors running hierarchical commands such as xdsh** -- xCAT has a number of commands that run hierarchically. That is, the commands are sent from xcatd on the management node to the correct service node xcatd, which in turn processes the command and sends the results back to xcatd on the management node. If a hierarchical command such as xcatd fails with something like "Error: Permission denied for request", check ``/var/log/messages`` on the management node for errors. One error might be "Request matched no policy rule". This may mean you will need to add policy table entries for your xCAT management node and service node.
|
||||
|
||||
* **/install is not mounted on service node from managemen mode** -- If service node does not have ``/install`` directory mounted from management node, run ``lsdef -t site clustersite -i installloc`` and verify ``installloc="/install"``
|
||||
|
@@ -6,33 +6,51 @@ Diskful (Stateful) Installation
|
||||
|
||||
Any cluster using statelite compute nodes must use a stateful (diskful) Service Nodes.
|
||||
|
||||
**Note: All xCAT Service Nodes must be at the exact same xCAT version as the xCAT Management Node**. Copy the files to the Management Node (MN) and untar them in the appropriate sub-directory of ``/install/post/otherpkgs``
|
||||
**Note:** All xCAT Service Nodes must be at the exact same xCAT version as the xCAT Management Node.
|
||||
|
||||
**Note for the appropriate directory below, check the ``otherpkgdir=/install/post/otherpkgs/rhels7/x86_64`` attribute of the osimage defined for the servicenode.**
|
||||
|
||||
For example, for osimage rhels7-x86_64-install-service ::
|
||||
Configure ``otherpkgdir`` and ``otherpkglist`` for service node osimage
|
||||
----------------------------------------------------------------------
|
||||
|
||||
mkdir -p /install/post/otherpkgs/**rhels7**/x86_64/xcat
|
||||
cd /install/post/otherpkgs/**rhels7**/x86_64/xcat
|
||||
* Create a subdirectory ``xcat`` under a path specified by ``otherpkgdir`` attribute of the service node os image, selected during the :doc:`../define_service_nodes` step.
|
||||
|
||||
For example, for osimage *rhels7-x86_64-install-service* ::
|
||||
|
||||
[root@fs4 xcat]# lsdef -t osimage rhels7-x86_64-install-service -i otherpkgdir
|
||||
Object name: rhels7-x86_64-install-service
|
||||
otherpkgdir=/install/post/otherpkgs/rhels7/x86_64
|
||||
[root@fs4 xcat]# mkdir -p /install/post/otherpkgs/rhels7/x86_64/xcat
|
||||
|
||||
* Download or copy `xcat-core` and `xcat-dep` .bz2 files into that `xcat` directory ::
|
||||
|
||||
wget https://xcat.org/files/xcat/xcat-core/<version>_Linux/xcat-core/xcat-core-<version>-linux.tar.bz2
|
||||
wget https://xcat.org/files/xcat/xcat-dep/<version>_Linux/xcat-dep-<version>-linux.tar.bz2
|
||||
|
||||
* untar the `xcat-core` and `xcat-dep` .bz2 files ::
|
||||
|
||||
cd /install/post/otherpkgs/<os>/<arch>/xcat
|
||||
tar jxvf core-rpms-snap.tar.bz2
|
||||
tar jxvf xcat-dep-*.tar.bz2
|
||||
|
||||
Next, add rpm names into your own version of service.<osver>.<arch>.otherpkgs.pkglist file. In most cases, you can find an initial copy of this file under ``/opt/xcat/share/xcat/install/<platform>`` . Or copy one from another similar platform. ::
|
||||
* Verify the following entries are included in the package file specified by the ``otherpkglist`` attribute of the service node osimage. ::
|
||||
|
||||
mkdir -p /install/custom/install/rh
|
||||
cp /opt/xcat/share/xcat/install/rh/service.rhels7.x86_64.otherpkgs.pkglist \
|
||||
/install/custom/install/rh
|
||||
vi /install/custom/install/rh/service.rhels7.x86_64.otherpkgs.pkglist
|
||||
xcat/xcat-dep/<os>/<arch>/xCATsn
|
||||
xcat/xcat-dep/<os>/<arch>/conserver-xcat
|
||||
xcat/xcat-dep/<os>/<arch>/perl-Net-Telnet
|
||||
xcat/xcat-dep/<os>/<arch>/perl-Expect
|
||||
|
||||
Make sure the following entries are included in the
|
||||
/install/custom/install/rh/service.rhels7.x86_64.otherpkgs.pkglist: ::
|
||||
For example, for the osimage *rhels7-x86_64-install-service* ::
|
||||
|
||||
xCATsn
|
||||
conserver-xcat
|
||||
perl-Net-Telnet
|
||||
perl-Expect
|
||||
[root@fs4 ~]# lsdef -t osimage rhels7-x86_64-install-service -i otherpkglist
|
||||
Object name: rhels7-x86_64-install-service
|
||||
otherpkglist=/opt/xcat/share/xcat/install/rh/service.rhels7.x86_64.otherpkgs.pkglist
|
||||
[root@fs4 ~]# cat /opt/xcat/share/xcat/install/rh/service.rhels7.x86_64.otherpkgs.pkglist
|
||||
xcat/xcat-core/xCATsn
|
||||
xcat/xcat-dep/rh7/x86_64/conserver-xcat
|
||||
xcat/xcat-dep/rh7/x86_64/perl-Net-Telnet
|
||||
xcat/xcat-dep/rh7/x86_64/perl-Expect
|
||||
[root@fs4 ~]#
|
||||
|
||||
**Note: you will be installing the xCAT Service Node rpm xCATsn meta-package on the Service Node, not the xCAT Management Node meta-package. Do not install both.**
|
||||
**Note:** you will be installing the xCAT Service Node rpm xCATsn meta-package on the Service Node, not the xCAT Management Node meta-package. Do not install both.
|
||||
|
||||
Update the rhels6 RPM repository (rhels6 only)
|
||||
----------------------------------------------
|
||||
@@ -71,12 +89,12 @@ Update the rhels6 RPM repository (rhels6 only)
|
||||
|
||||
**Note:** you should use comps-rhel6-Server.xml with its key as the group file.
|
||||
|
||||
Set the node status to ready for installation
|
||||
---------------------------------------------
|
||||
Set the node boot state to ready for installation
|
||||
-------------------------------------------------
|
||||
|
||||
Run nodeset to the osimage name defined in the provmethod attribute on your Service Node. ::
|
||||
Run **nodeset** command to the osimage name defined in the ``provmethod`` attribute on your Service Node. ::
|
||||
|
||||
nodeset service osimage="<osimagename>"
|
||||
nodeset <service_node> osimage="<osimagename>"
|
||||
|
||||
For example ::
|
||||
|
||||
|
@@ -1,3 +1,5 @@
|
||||
.. _setup_service_node_stateless_label:
|
||||
|
||||
Diskless (Stateless) Installation
|
||||
=================================
|
||||
|
||||
|
@@ -2,10 +2,10 @@ Verify Service Node Installation
|
||||
================================
|
||||
|
||||
* ssh to the service nodes. You should not be prompted for a password.
|
||||
* Check to see that the xcat daemon xcatd is running.
|
||||
* Run some database command on the service node, e.g tabdump site, or nodels,
|
||||
and see that the database can be accessed from the service node.
|
||||
* Check that ``/install`` and ``/tftpboot`` are mounted on the service node
|
||||
from the Management Node, if appropriate.
|
||||
* Make sure that the Service Node has Name resolution for all nodes, it will
|
||||
service.
|
||||
* Check to see that the xcat daemon ``xcatd`` is running.
|
||||
* Run some database command on the service node, e.g ``tabdump site``, or ``nodels``, and see that the database can be accessed from the service node.
|
||||
* Check that ``/install`` and ``/tftpboot`` are mounted on the service node from the Management Node, if appropriate.
|
||||
* Make sure that the Service Node has name resolution for all nodes it will service.
|
||||
* Run ``updatenode <compute node> -V -s`` on management node and verify output contains ``Running command on <service node>`` that indicates the command from management node is sent to service node to run against compute node target.
|
||||
|
||||
See :doc:`Appendix B <../appendix/appendix_b_diagnostics>` for possible solutions.
|
||||
|
Reference in New Issue
Block a user