mirror of
https://github.com/xcat2/xcat-core.git
synced 2025-05-22 11:42:05 +00:00
Add hierarchy doc
Add hierarchy documents including introduction, database management,defining node, service node setup, test service node and appendix. TODO: resolve the fake_todo link reference in the following patch
This commit is contained in:
parent
4bbde50119
commit
57c7dca6da
@ -0,0 +1,167 @@
|
||||
Appendix A: Setup backup Service Nodes
|
||||
======================================
|
||||
|
||||
For reliability, availability, and serviceability purposes you may wish to
|
||||
designate backup service nodes in your hierarchical cluster. The backup
|
||||
service node will be another active service node that is set up to easily
|
||||
take over from the original service node if a problem occurs. This is not an
|
||||
automatic failover feature. You will have to initiate the switch from the
|
||||
primary service node to the backup manually. The xCAT support will handle most
|
||||
of the setup and transfer of the nodes to the new service node. This
|
||||
procedure can also be used to simply switch some compute nodes to a new
|
||||
service node, for example, for planned maintenance.
|
||||
|
||||
Abbreviations used below:
|
||||
|
||||
* MN - management node.
|
||||
* SN - service node.
|
||||
* CN - compute node.
|
||||
|
||||
Initial deployment
|
||||
------------------
|
||||
|
||||
Integrate the following steps into the hierarchical deployment process
|
||||
described above.
|
||||
|
||||
|
||||
#. Make sure both the primary and backup service nodes are installed,
|
||||
configured, and can access the MN database.
|
||||
#. When defining the CNs add the necessary service node values to the
|
||||
"servicenode" and "xcatmaster" attributes of the `node definitions
|
||||
<http://localhost/fake_todo>`_.
|
||||
#. (Optional) Create an xCAT group for the nodes that are assigned to each SN.
|
||||
This will be useful when setting node attributes as well as providing an
|
||||
easy way to switch a set of nodes back to their original server.
|
||||
|
||||
To specify a backup service node you must specify a comma-separated list of
|
||||
two **service nodes** for the servicenode value of the compute node. The first
|
||||
one is the primary and the second is the backup (or new SN) for that node.
|
||||
Use the hostnames of the SNs as known by the MN.
|
||||
|
||||
For the **xcatmaster** value you should only include the primary SN, as known
|
||||
by the compute node.
|
||||
|
||||
In most hierarchical clusters, the networking is such that the name of the
|
||||
SN as known by the MN is different than the name as known by the CN. (If
|
||||
they are on different networks.)
|
||||
|
||||
The following example assume the SN interface to the MN is on the "a"
|
||||
network and the interface to the CN is on the "b" network. To set the
|
||||
attributes you would run a command similar to the following. ::
|
||||
|
||||
chdef <noderange> servicenode="xcatsn1a,xcatsn2a" xcatmaster="xcatsn1b"
|
||||
|
||||
The process can be simplified by creating xCAT node groups to use as the
|
||||
<noderange> in the `chdef <http://localhost/fake_todo>`_ command. To create an
|
||||
xCAT node group containing all the nodes that have the service node "SN27"
|
||||
you could run a command similar to the following. ::
|
||||
|
||||
mkdef -t group sn1group members=node[01-20]
|
||||
|
||||
**Note: Normally backup service nodes are the primary SNs for other compute
|
||||
nodes. So, for example, if you have 2 SNs, configure half of the CNs to use
|
||||
the 1st SN as their primary SN, and the other half of CNs to use the 2nd SN
|
||||
as their primary SN. Then each SN would be configured to be the backup SN
|
||||
for the other half of CNs.**
|
||||
|
||||
When you run `makedhcp <http://localhost/fake_todo>`_, it will configure dhcp
|
||||
and tftp on both the primary and backup SNs, assuming they both have network
|
||||
access to the CNs. This will make it possible to do a quick SN takeover
|
||||
without having to wait for replication when you need to switch.
|
||||
|
||||
xdcp Behaviour with backup servicenodes
|
||||
---------------------------------------
|
||||
|
||||
The xdcp command in a hierarchical environment must first copy (scp) the
|
||||
files to the service nodes for them to be available to scp to the node from
|
||||
the service node that is it's master. The files are placed in
|
||||
``/var/xcat/syncfiles`` directory by default, or what is set in site table
|
||||
SNsyncfiledir attribute. If the node has multiple service nodes assigned,
|
||||
then xdcp will copy the file to each of the service nodes assigned to the
|
||||
node. For example, here the files will be copied (scp) to both service1 and
|
||||
rhsn. lsdef cn4 | grep servicenode. ::
|
||||
|
||||
servicenode=service1,rhsn
|
||||
|
||||
f a service node is offline ( e.g. service1), then you will see errors on
|
||||
your xdcp command, and yet if rhsn is online then the xdcp will actually
|
||||
work. This may be a little confusing. For example, here service1 is offline,
|
||||
but we are able to use rhsn to complete the xdcp. ::
|
||||
|
||||
xdcp cn4 /tmp/lissa/file1 /tmp/file1
|
||||
|
||||
service1: Permission denied (publickey,password,keyboard-interactive).
|
||||
service1: Permission denied (publickey,password,keyboard-interactive).
|
||||
service1: lost connection
|
||||
The following servicenodes: service1, have errors and cannot be updated
|
||||
Until the error is fixed, xdcp will not work to nodes serviced by these service nodes.
|
||||
|
||||
xdsh cn4 ls /tmp/file1
|
||||
cn4: /tmp/file1
|
||||
|
||||
Switch to the backup SN
|
||||
-----------------------
|
||||
|
||||
When an SN fails, or you want to bring it down for maintenance, use this
|
||||
procedure to move its CNs over to the backup SN.
|
||||
|
||||
Move the nodes to the new service nodes
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Use the xCAT `snmove <http://localhost/fake_todo>`_ to make the database
|
||||
updates necessary to move a set of nodes from one service node to another, and
|
||||
to make configuration modifications to the nodes.
|
||||
|
||||
For example, if you want to switch all the compute nodes that use service
|
||||
node "sn1" to the backup SN (sn2), run: ::
|
||||
|
||||
snmove -s sn1
|
||||
|
||||
Modified database attributes
|
||||
""""""""""""""""""""""""""""
|
||||
|
||||
The **snmove** command will check and set several node attribute values.
|
||||
|
||||
**servicenode**: : This will be set to either the second server name in the
|
||||
servicenode attribute list or the value provided on the command line.
|
||||
**xcatmaster**: : Set with either the value provided on the command line or it
|
||||
will be automatically determined from the servicenode attribute.
|
||||
**nfsserver**: : If the value is set with the source service node then it will
|
||||
be set to the destination service node.
|
||||
**tftpserver**: : If the value is set with the source service node then it will
|
||||
be reset to the destination service node.
|
||||
**monserver**: : If set to the source service node then reset it to the
|
||||
destination servicenode and xcatmaster values.
|
||||
**conserver**: : If set to the source service node then reset it to the
|
||||
destination servicenode and run **makeconservercf**
|
||||
|
||||
Run postscripts on the nodes
|
||||
""""""""""""""""""""""""""""
|
||||
|
||||
If the CNs are up at the time the snmove command is run then snmove will run
|
||||
postscripts on the CNs to reconfigure them for the new SN. The "syslog"
|
||||
postscript is always run. The "mkresolvconf" and "setupntp" scripts will be
|
||||
run IF they were included in the nodes postscript list.
|
||||
|
||||
You can also specify an additional list of postscripts to run.
|
||||
|
||||
Modify system configuration on the nodes
|
||||
""""""""""""""""""""""""""""""""""""""""
|
||||
|
||||
If the CNs are up the snmove command will also perform some configuration on
|
||||
the nodes such as setting the default gateway and modifying some
|
||||
configuration files used by xCAT.
|
||||
|
||||
Switching back
|
||||
--------------
|
||||
|
||||
The process for switching nodes back will depend on what must be done to
|
||||
recover the original service node. If the SN needed to be reinstalled, you
|
||||
need to set it up as an SN again and make sure the CN images are replicated
|
||||
to it. Once you've done this, or if the SN's configuration was not lost,
|
||||
then follow these steps to move the CNs back to their original SN:
|
||||
|
||||
* Use snmove: ::
|
||||
|
||||
snmove sn1group -d sn1
|
||||
|
66
docs/source/advanced/hierarchy/appendix_b_diagnostics.rst
Normal file
66
docs/source/advanced/hierarchy/appendix_b_diagnostics.rst
Normal file
@ -0,0 +1,66 @@
|
||||
Appendix B: Diagnostics
|
||||
=======================
|
||||
|
||||
* **root ssh keys not setup** -- If you are prompted for a password when ssh to
|
||||
the service node, then check to see if /root/.ssh has authorized_keys. If
|
||||
the directory does not exist or no keys, on the MN, run xdsh service -K,
|
||||
to exchange the ssh keys for root. You will be prompted for the root
|
||||
password, which should be the password you set for the key=system in the
|
||||
passwd table.
|
||||
* **XCAT rpms not on SN** --On the SN, run rpm -qa | grep xCAT and make sure
|
||||
the appropriate xCAT rpms are installed on the servicenode. See the list of
|
||||
xCAT rpms in :ref:`setup_service_node_stateful_label`. If rpms
|
||||
missing check your install setup as outlined in Build the Service Node
|
||||
Stateless Image for diskless or :ref:`setup_service_node_stateful_label` for
|
||||
diskfull installs.
|
||||
* **otherpkgs(including xCAT rpms) installation failed on the SN** --The OS
|
||||
repository is not created on the SN. When the "yum" command is processing
|
||||
the dependency, the rpm packages (including expect, nmap, and httpd, etc)
|
||||
required by xCATsn can't be found. In this case, please check whether the
|
||||
``/install/postscripts/repos/<osver>/<arch>/`` directory exists on the MN.
|
||||
If it is not on the MN, you need to re-run the "copycds" command, and there
|
||||
will be some file created under the
|
||||
``/install/postscripts/repos/<osver>/<arch>`` directory on the MN. Then, you
|
||||
need to re-install the SN, and this issue should be gone.
|
||||
* **Error finding the database/starting xcatd** -- If on the Service node when
|
||||
you run tabdump site, you get "Connection failure: IO::Socket::SSL:
|
||||
connect: Connection refused at ``/opt/xcat/lib/perl/xCAT/Client.pm``". Then
|
||||
restart the xcatd daemon and see if it passes by running the command:
|
||||
service xcatd restart. If it fails with the same error, then check to see
|
||||
if ``/etc/xcat/cfgloc`` file exists. It should exist and be the same as
|
||||
``/etc/xcat/cfgloc`` on the MN. If it is not there, copy it from the MN to
|
||||
the SN. The run service xcatd restart. This indicates the servicenode
|
||||
postscripts did not complete successfully. Check to see your postscripts
|
||||
table was setup correctly in :ref:`add_service_node_postscripts_label` to the
|
||||
postscripts table.
|
||||
* **Error accessing database/starting xcatd credential failure**-- If you run
|
||||
tabdump site on the servicenode and you get "Connection failure:
|
||||
IO::Socket::SSL: SSL connect attempt failed because of handshake
|
||||
problemserror:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca
|
||||
at ``/opt/xcat/lib/perl/xCAT/Client.pm``", check ``/etc/xcat/cert``. The
|
||||
directory should contain the files ca.pem and server-cred.pem. These were
|
||||
suppose to transfer from the MN ``/etc/xcat/cert`` directory during the
|
||||
install. Also check the ``/etc/xcat/ca`` directory. This directory should
|
||||
contain most files from the ``/etc/xcat/ca`` directory on the MN. You can
|
||||
manually copy them from the MN to the SN, recursively. This indicates the
|
||||
the servicenode postscripts did not complete successfully. Check to see
|
||||
your postscripts table was setup correctly in
|
||||
:ref:`add_service_node_postscripts_label` to the postscripts table. Again
|
||||
service xcatd restart and try the tabdump site again.
|
||||
* **Missing ssh hostkeys** -- Check to see if ``/etc/xcat/hostkeys`` on the SN,
|
||||
has the same files as ``/etc/xcat/hostkeys`` on the MN. These are the ssh
|
||||
keys that will be installed on the compute nodes, so root can ssh between
|
||||
compute nodes without password prompting. If they are not there copy them
|
||||
from the MN to the SN. Again, these should have been setup by the
|
||||
servicenode postscripts.
|
||||
|
||||
* **Errors running hierarchical commands such as xdsh** -- xCAT has a number of
|
||||
commands that run hierarchically. That is, the commands are sent from xcatd
|
||||
on the management node to the correct service node xcatd, which in turn
|
||||
processes the command and sends the results back to xcatd on the management
|
||||
node. If a hierarchical command such as xcatd fails with something like
|
||||
"Error: Permission denied for request", check ``/var/log/messages`` on the
|
||||
management node for errors. One error might be "Request matched no policy
|
||||
rule". This may mean you will need to add policy table entries for your
|
||||
xCAT management node and service node:
|
||||
|
@ -0,0 +1,14 @@
|
||||
Appendix C: Migrating a Management Node to a Service Node
|
||||
=========================================================
|
||||
|
||||
If you find you want to convert an existing Management Node to a Service
|
||||
Node you need to work with the xCAT team. It is recommended for now, to
|
||||
backup your database, setup your new Management Server, and restore your
|
||||
database into it. Take the old Management Node and remove xCAT and all xCAT
|
||||
directories, and your database. See ``Uninstalling_xCAT
|
||||
<http://localhost/fake_todo>`_ and then follow the process for setting up a
|
||||
SN as if it is a new node.
|
||||
|
||||
|
||||
|
||||
|
@ -0,0 +1,13 @@
|
||||
Appendix D: Set up Hierarchical Conserver
|
||||
=========================================
|
||||
|
||||
To allow you to open the rcons from the Management Node using the
|
||||
conserver daemon on the Service Nodes, do the following:
|
||||
|
||||
* Set nodehm.conserver to be the service node (using the ip that faces the
|
||||
management node) ::
|
||||
|
||||
chdef -t <noderange> conserver=<servicenodeasknownbytheMN>
|
||||
makeconservercf
|
||||
service conserver stop
|
||||
service conserver start
|
11
docs/source/advanced/hierarchy/configure_dhcp.rst
Normal file
11
docs/source/advanced/hierarchy/configure_dhcp.rst
Normal file
@ -0,0 +1,11 @@
|
||||
Configure DHCP
|
||||
==============
|
||||
|
||||
Add the relevant networks into the DHCP configuration, refer to:
|
||||
`XCAT_pLinux_Clusters/#setup-dhcp <http://localhost/fake_todo>`_
|
||||
|
||||
Add the defined nodes into the DHCP configuration, refer to:
|
||||
`XCAT_pLinux_Clusters/#configure-dhcp <http://localhost/fake_todo>`_
|
||||
|
||||
|
||||
|
@ -0,0 +1,39 @@
|
||||
Define and install your Compute Nodes
|
||||
=====================================
|
||||
|
||||
Make /install available on the Service Nodes
|
||||
--------------------------------------------
|
||||
|
||||
Note that all of the files and directories pointed to by your osimages should
|
||||
be placed under the directory referred to in site.installdir (usually
|
||||
/install), so they will be available to the service nodes. The installdir
|
||||
directory is mounted or copied to the service nodes during the hierarchical
|
||||
installation.
|
||||
|
||||
If you are not using the NFS-based statelite method of booting your compute
|
||||
nodes and you are not using service node pools, set the installloc attribute
|
||||
to "/install". This instructs the service node to mount /install from the
|
||||
management node. (If you don't do this, you have to manually sync /install
|
||||
between the management node and the service nodes.)
|
||||
|
||||
::
|
||||
|
||||
chdef -t site clustersite installloc="/install"
|
||||
|
||||
Make compute node syncfiles available on the servicenodes
|
||||
---------------------------------------------------------
|
||||
|
||||
If you are not using the NFS-based statelite method of booting your compute
|
||||
nodes, and you plan to use the syncfiles postscript to update files on the
|
||||
nodes during install, you must ensure that those files are sync'd to the
|
||||
servicenodes before the install of the compute nodes. To do this after your
|
||||
nodes are defined, you will need to run the following whenever the files in
|
||||
your synclist change on the Management Node:
|
||||
::
|
||||
|
||||
updatenode <computenoderange> -f
|
||||
|
||||
At this point you can return to the documentation for your cluster environment
|
||||
to define and deploy your compute nodes.
|
||||
|
||||
|
345
docs/source/advanced/hierarchy/define_service_node.rst
Normal file
345
docs/source/advanced/hierarchy/define_service_node.rst
Normal file
@ -0,0 +1,345 @@
|
||||
Define the service nodes in the database
|
||||
========================================
|
||||
|
||||
This document assumes that you have previously **defined** your compute nodes
|
||||
in the database. It is also possible at this point that you have generic
|
||||
entries in your db for the nodes you will use as service nodes as a result of
|
||||
the node discovery process. We are now going to show you how to add all the
|
||||
relevant database data for the service nodes (SN) such that the SN can be
|
||||
installed and managed from the Management Node (MN). In addition, you will
|
||||
be adding the information to the database that will tell xCAT which service
|
||||
nodes (SN) will service which compute nodes (CN).
|
||||
|
||||
For this example, we have two service nodes: **sn1** and **sn2**. We will call
|
||||
our Management Node: **mn1**. Note: service nodes are, by convention, in a
|
||||
group called **service**. Some of the commands in this document will use the
|
||||
group **service** to update all service nodes.
|
||||
|
||||
Note: a Service Node's service node is the Management Node; so a service node
|
||||
must have a direct connection to the management node. The compute nodes do not
|
||||
have to be directly attached to the Management Node, only to their service
|
||||
node. This will all have to be defined in your networks table.
|
||||
|
||||
Add Service Nodes to the nodelist Table
|
||||
---------------------------------------
|
||||
|
||||
Define your service nodes (if not defined already), and by convention we put
|
||||
them in a **service** group. We usually have a group compute for our compute
|
||||
nodes, to distinguish between the two types of nodes. (If you want to use your
|
||||
own group name for service nodes, rather than service, you need to change some
|
||||
defaults in the xCAT db that use the group name service. For example, in the
|
||||
postscripts table there is by default a group entry for service, with the
|
||||
appropriate postscripts to run when installing a service node. Also, the
|
||||
default ``kickstart/autoyast`` template, pkglist, etc that will be used have
|
||||
files names based on the profile name service.) ::
|
||||
|
||||
mkdef sn1,sn2 groups=service,ipmi,all
|
||||
|
||||
Add OS and Hardware Attributes to Service Nodes
|
||||
-----------------------------------------------
|
||||
|
||||
When you ran copycds, it creates several osimage definitions, including some
|
||||
appropriate for SNs. Display the list of osimages and choose one with
|
||||
"service" in the name: ::
|
||||
|
||||
lsdef -t osimage
|
||||
|
||||
For this example, let's assume you chose the stateful osimage definition for
|
||||
rhels 6.3: rhels6.3-x86_64-install-service . If you want to modify any of the
|
||||
osimage attributes (e.g. ``kickstart/autoyast`` template, pkglist, etc),
|
||||
make a copy of the osimage definition and also copy to ``/install/custom``
|
||||
any files it points to that you are modifying.
|
||||
|
||||
Now set some of the common attributes for the SNs at the group level: ::
|
||||
|
||||
chdef -t group service arch=x86_64 \
|
||||
os=rhels6.3 \
|
||||
nodetype=osi
|
||||
profile=service \
|
||||
netboot=xnba installnic=mac \
|
||||
primarynic=mac \
|
||||
provmethod=rhels6.3-x86_64-install-service
|
||||
|
||||
Add Service Nodes to the servicenode Table
|
||||
------------------------------------------
|
||||
|
||||
An entry must be created in the servicenode table for each service node or the
|
||||
service group. This table describes all the services you would like xcat to
|
||||
setup on the service nodes. (Even if you don't want xCAT to set up any
|
||||
services - unlikely - you must define the service nodes in the servicenode
|
||||
table with at least one attribute set (you can set it to 0), otherwise it will
|
||||
not be recognized as a service node.)
|
||||
|
||||
When the xcatd daemon is started or restarted on the service node, it will
|
||||
make sure all of the requested services are configured and started. (To
|
||||
temporarily avoid this when restarting xcatd, use "service xcatd reload"
|
||||
instead.)
|
||||
|
||||
To set up the minimum recommended services on the service nodes: ::
|
||||
|
||||
chdef -t group -o service setupnfs=1 \
|
||||
setupdhcp=1 setuptftp=1 \
|
||||
setupnameserver=1 \
|
||||
setupconserver=1
|
||||
|
||||
.. TODO
|
||||
|
||||
See the setup* attributes in the `node object definition man page
|
||||
<http://localhost/fake_todo>`_ for the services available. (The HTTP server
|
||||
is also started when setupnfs is set.)
|
||||
|
||||
If you are using the setupntp postscript on the compute nodes, you should also
|
||||
set setupntp=1. For clusters with subnetted management networks (i.e. the
|
||||
network between the SN and its compute nodes is separate from the network
|
||||
between the MN and the SNs) you might want to also set setupipforward=1.
|
||||
|
||||
.. _add_service_node_postscripts_label:
|
||||
|
||||
Add Service Node Postscripts
|
||||
----------------------------
|
||||
|
||||
By default, xCAT defines the service node group to have the "servicenode"
|
||||
postscript run when the SNs are installed or diskless booted. This
|
||||
postscript sets up the xcatd credentials and installs the xCAT software on
|
||||
the service nodes. If you have your own postscript that you want run on the
|
||||
SN during deployment of the SN, put it in ``/install/postscripts`` on the MN
|
||||
and add it to the service node postscripts or postbootscripts. For example: ::
|
||||
|
||||
chdef -t group -p service postscripts=<mypostscript>
|
||||
|
||||
Notes:
|
||||
|
||||
* For Red Hat type distros, the postscripts will be run before the reboot
|
||||
of a kickstart install, and the postbootscripts will be run after the
|
||||
reboot.
|
||||
* Make sure that the servicenode postscript is set to run before the
|
||||
otherpkgs postscript or you will see errors during the service node
|
||||
deployment.
|
||||
* The -p flag automatically adds the specified postscript at the end of the
|
||||
comma-separated list of postscripts (or postbootscripts).
|
||||
|
||||
If you are running additional software on the service nodes that need **ODBC**
|
||||
to access the database (e.g. LoadLeveler or TEAL), use this command to add
|
||||
the xCAT supplied postbootscript called "odbcsetup". ::
|
||||
|
||||
chdef -t group -p service postbootscripts=odbcsetup
|
||||
|
||||
Assigning Nodes to their Service Nodes
|
||||
--------------------------------------
|
||||
|
||||
The node attributes **servicenode** and **xcatmaster** define which SN
|
||||
services this particular node. The servicenode attribute for a compute node
|
||||
defines which SN the MN should send a command to (e.g. xdsh), and should be
|
||||
set to the hostname or IP address of the service node that the management
|
||||
node contacts it by. The xcatmaster attribute of the compute node defines
|
||||
which SN the compute node should boot from, and should be set to the
|
||||
hostname or IP address of the service node that the compute node contacts it
|
||||
by. Unless you are using service node pools, you must set the xcatmaster
|
||||
attribute for a node when using service nodes, even if it contains the same
|
||||
value as the node's servicenode attribute.
|
||||
|
||||
Host name resolution must have been setup in advance, with ``/etc/hosts``, DNS
|
||||
or dhcp to ensure that the names put in this table can be resolved on the
|
||||
Management Node, Service nodes, and the compute nodes. It is easiest to have a
|
||||
node group of the compute nodes for each service node. For example, if all the
|
||||
nodes in node group compute1 are serviced by sn1 and all the nodes in node
|
||||
group compute2 are serviced by sn2:
|
||||
|
||||
::
|
||||
|
||||
chdef -t group compute1 servicenode=sn1 xcatmaster=sn1-c
|
||||
chdef -t group compute2 servicenode=sn2 xcatmaster=sn2-c
|
||||
|
||||
Note: in this example, sn1 and sn2 are the node names of the service nodes
|
||||
(and therefore the hostnames associated with the NICs that the MN talks to).
|
||||
The hostnames sn1-c and sn2-c are associated with the SN NICs that communicate
|
||||
with their compute nodes.
|
||||
|
||||
Note: if not set, the attribute tftpserver's default value is xcatmaster,
|
||||
but in some releases of xCAT it has not defaulted correctly, so it is safer
|
||||
to set the tftpserver to the value of xcatmaster.
|
||||
|
||||
These attributes will allow you to specify which service node should run the
|
||||
conserver (console) and monserver (monitoring) daemon for the nodes in the
|
||||
group specified in the command. In this example, we are having each node's
|
||||
primary SN also act as its conserver and monserver (the most typical setup).
|
||||
::
|
||||
|
||||
chdef -t group compute1 conserver=sn1 monserver=sn1,sn1-c
|
||||
chdef -t group compute2 conserver=sn2 monserver=sn2,sn2-c
|
||||
|
||||
Service Node Pools
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Service Node Pools are multiple service nodes that service the same set of
|
||||
compute nodes. Having multiple service nodes allows backup service node(s) for
|
||||
a compute node when the primary service node is unavailable, or can be used
|
||||
for work-load balancing on the service nodes. But note that the selection of
|
||||
which SN will service which compute node is made at compute node boot time.
|
||||
After that, the selection of the SN for this compute node is fixed until the
|
||||
compute node is rebooted or the compute node is explicitly moved to another SN
|
||||
using the `snmove <http://localhost/fake_todo>`_ command.
|
||||
|
||||
To use Service Node pools, you need to architect your network such that all of
|
||||
the compute nodes and service nodes in a partcular pool are on the same flat
|
||||
network. If you don't want the management node to respond to manage some of
|
||||
the compute nodes, it shouldn't be on that same flat network. The
|
||||
site, dhcpinterfaces attribute should be set such that the SNs' DHCP daemon
|
||||
only listens on the NIC that faces the compute nodes, not the NIC that faces
|
||||
the MN. This avoids some timing issues when the SNs are being deployed (so
|
||||
that they don't respond to each other before they are completely ready). You
|
||||
also need to make sure the `networks <http://localhost/fake_todo>`_ table
|
||||
accurately reflects the physical network structure.
|
||||
|
||||
To define a list of service nodes that support a set of compute nodes, set the
|
||||
servicenode attribute to a comma-delimited list of the service nodes. When
|
||||
running an xCAT command like xdsh or updatenode for compute nodes, the list
|
||||
will be processed left to right, picking the first service node on the list to
|
||||
run the command. If that service node is not available, then the next service
|
||||
node on the list will be chosen until the command is successful. Errors will
|
||||
be logged. If no service node on the list can process the command, then the
|
||||
error will be returned. You can provide some load-balancing by assigning your
|
||||
service nodes as we do below.
|
||||
|
||||
When using service node pools, the intent is to have the service node that
|
||||
responds first to the compute node's DHCP request during boot also be the
|
||||
xcatmaster, the tftpserver, and the NFS/http server for that node. Therefore,
|
||||
the xcatmaster and nfsserver attributes for nodes should not be set. When
|
||||
nodeset is run for the compute nodes, the service node interface on the
|
||||
network to the compute nodes should be defined and active, so that nodeset
|
||||
will default those attribute values to the "node ip facing" interface on that
|
||||
service node.
|
||||
|
||||
For example: ::
|
||||
|
||||
chdef -t node compute1 servicenode=sn1,sn2 xcatmaster="" nfsserver=""
|
||||
chdef -t node compute2 servicenode=sn2,sn1 xcatmaster="" nfsserver=""
|
||||
|
||||
You need to set the sharedtftp site attribute to 0 so that the SNs will not
|
||||
automatically mount the ``/tftpboot`` directory from the management node:
|
||||
::
|
||||
|
||||
chdef -t site clustersite sharedtftp=0
|
||||
|
||||
For statefull (full-disk) node installs, you will need to use a local
|
||||
``/install`` directory on each service node. The ``/install/autoinst/node``
|
||||
files generated by nodeset will contain values specific to that service node
|
||||
for correctly installing the nodes.
|
||||
::
|
||||
|
||||
chdef -t site clustersite installloc=""
|
||||
|
||||
With this setting, you will need to remember to rsync your ``/install``
|
||||
directory from the xCAT management node to the service nodes anytime you
|
||||
change your ``/install/postscripts``, custom osimage files, os repositories,
|
||||
or other directories. It is best to exclude the ``/install/autoinst`` directory
|
||||
from this rsync.
|
||||
|
||||
::
|
||||
|
||||
rsync -auv --exclude 'autoinst' /install sn1:/
|
||||
|
||||
Note: If your service nodes are stateless and site.sharedtftp=0, if you reboot
|
||||
any service node when using servicenode pools, any data written to the local
|
||||
``/tftpboot`` directory of that SN is lost. You will need to run nodeset for
|
||||
all of the compute nodes serviced by that SN again.
|
||||
|
||||
For additional information about service node pool related settings in the
|
||||
networks table, see ref: networks table, see :ref:`setup_networks_table_label`.
|
||||
|
||||
Conserver and Monserver and Pools
|
||||
"""""""""""""""""""""""""""""""""
|
||||
|
||||
The support of conserver and monserver with Service Node Pools is still not
|
||||
supported. You must explicitly assign these functions to a service node using
|
||||
the nodehm.conserver and noderes.monserver attribute as above.
|
||||
|
||||
Setup Site Table
|
||||
----------------
|
||||
|
||||
If you are not using the NFS-based statelite method of booting your compute
|
||||
nodes, set the installloc attribute to ``/install``. This instructs the
|
||||
service node to mount ``/install`` from the management node. (If you don't do
|
||||
this, you have to manually sync ``/install`` between the management node and
|
||||
the service nodes.) ::
|
||||
|
||||
chdef -t site clustersite installloc="/install"
|
||||
|
||||
For IPMI controlled nodes, if you want the out-of-band IPMI operations to be
|
||||
done directly from the management node (instead of being sent to the
|
||||
appropriate service node), set site.ipmidispatch=n.
|
||||
|
||||
If you want to throttle the rate at which nodes are booted up, you can set the
|
||||
following site attributes:
|
||||
|
||||
|
||||
* syspowerinterval
|
||||
* syspowermaxnodes
|
||||
* powerinterval (system p only)
|
||||
|
||||
See the `site table man page <http://localhost/fack_todo>`_ for details.
|
||||
|
||||
.. _setup_networks_table_label:
|
||||
|
||||
Setup networks Table
|
||||
--------------------
|
||||
|
||||
All networks in the cluster must be defined in the networks table. When xCAT
|
||||
is installed, it runs makenetworks, which creates an entry in the networks
|
||||
table for each of the networks the management node is on. You need to add
|
||||
entries for each network the service nodes use to communicate to the compute
|
||||
nodes.
|
||||
|
||||
For example: ::
|
||||
|
||||
mkdef -t network net1 net=10.5.1.0 mask=255.255.255.224 gateway=10.5.1.1
|
||||
|
||||
If you want to set the nodes' xcatmaster as the default gateway for the nodes,
|
||||
the gateway attribute can be set to keyword "<xcatmaster>". In this case, xCAT
|
||||
code will automatically substitute the IP address of the node's xcatmaster for
|
||||
the keyword. Here is an example:
|
||||
::
|
||||
|
||||
mkdef -t network net1 net=10.5.1.0 mask=255.255.255.224 gateway=<xcatmaster>
|
||||
|
||||
The ipforward attribute should be enabled on all the xcatmaster nodes that
|
||||
will be acting as default gateways. You can set ipforward to 1 in the
|
||||
servicenode table or add the line "net.ipv4.ip_forward = 1" in file
|
||||
``/etc/sysctl``.conf and then run "sysctl -p /etc/sysctl.conf" manually to
|
||||
enable the ipforwarding.
|
||||
|
||||
Note:If using service node pools, the networks table dhcpserver attribute can
|
||||
be set to any single service node in your pool. The networks tftpserver, and
|
||||
nameserver attributes should be left blank.
|
||||
|
||||
Verify the Tables
|
||||
--------------------
|
||||
|
||||
To verify that the tables are set correctly, run lsdef on the service nodes,
|
||||
compute1, compute2: ::
|
||||
|
||||
lsdef service,compute1,compute2
|
||||
|
||||
Add additional adapters configuration script (optional)
|
||||
------------------------------------------------------------
|
||||
|
||||
It is possible to have additional adapter interfaces automatically configured
|
||||
when the nodes are booted. XCAT provides sample configuration scripts for
|
||||
ethernet, IB, and HFI adapters. These scripts can be used as-is or they can be
|
||||
modified to suit your particular environment. The ethernet sample is
|
||||
``/install/postscript/configeth``. When you have the configuration script that
|
||||
you want you can add it to the "postscripts" attribute as mentioned above. Make
|
||||
sure your script is in the ``/install/postscripts`` directory and that it is
|
||||
executable.
|
||||
|
||||
Note: For system p servers, if you plan to have your service node perform the
|
||||
hardware control functions for its compute nodes, it is necessary that the SN
|
||||
ethernet network adapters connected to the HW service VLAN be configured.
|
||||
|
||||
Configuring Secondary Adapters
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
o configure secondary adapters, see `Configuring_Secondary_Adapters
|
||||
<http://localhost/fake_todo>`_
|
||||
|
||||
|
@ -1,2 +0,0 @@
|
||||
Setting Up a Linux Hierarchical Cluster
|
||||
=======================================
|
@ -5,3 +5,14 @@ Hierarchical Clusters
|
||||
:maxdepth: 2
|
||||
|
||||
introduction.rst
|
||||
setup_mn_hierachical_database.rst
|
||||
define_service_node.rst
|
||||
configure_dhcp.rst
|
||||
setup_service_node.rst
|
||||
service_node_for_diskfull.rst
|
||||
service_node_for_diskless.rst
|
||||
test_service_node_installation.rst
|
||||
appendix_a_setup_backup_service_nodes.rst
|
||||
appendix_b_diagnostics.rst
|
||||
appendix_c_migrating_mn_to_sn.rst
|
||||
appendix_d_set_up_hierarchical_conserver.rst
|
||||
|
@ -1,10 +1,50 @@
|
||||
Introduction
|
||||
============
|
||||
|
||||
In supporting large clusters, it is desirable to have more than a single management node handling the installation and management of compute nodes.
|
||||
|
||||
In xCAT, these additional nodes are referred to as *Service Nodes (SN)*. The management node can delegate all management operations for a compute node to the Service node that is managing them. You can have one of more service nodes configured to install and manage a group of compute nodes.
|
||||
|
||||
In large clusters, it is desirable to have more than one node (the Management
|
||||
Node - MN) handle the installation and management of the compute nodes. We
|
||||
call these additional nodes **service nodes (SN)**. The management node can
|
||||
delegate all management operations needed by a compute node to the SN that is
|
||||
managing that compute node. You can have one or more service nodes setting up
|
||||
to install and manage groups of compute nodes.
|
||||
|
||||
Service Nodes
|
||||
-------------
|
||||
|
||||
With xCAT, you have the choice of either having each service node
|
||||
install/manage a specific set of compute nodes, or having a pool of service
|
||||
nodes, any of which can respond to an installation request from a compute
|
||||
node. (Service node pools must be aligned with the network broadcast domains,
|
||||
because the way a compute node choose its SN for that boot is by whoever
|
||||
responds to the DHCP request broadcast first.) You can also have a hybrid of
|
||||
the 2 approaches, in which for each specific set of compute nodes you have 2
|
||||
or more SNs in a pool.
|
||||
|
||||
Each SN runs an instance of xcatd, just like the MN does. The xcatd daemons
|
||||
communicate with each other using the same XML/SSL protocol that the xCAT
|
||||
client uses to communicate with xcatd on the MN.
|
||||
|
||||
Daemon-based Databases
|
||||
----------------------
|
||||
|
||||
The service nodes need to communicate with the xCAT database on the Management
|
||||
Node. They do this by using the remote client capability of the database (i.e.
|
||||
they don't go through xcatd for that). Therefore the Management Node must be
|
||||
running one of the daemon-based databases supported by xCAT (PostgreSQL,
|
||||
MySQL).
|
||||
|
||||
The default SQLite database does not support remote clients and cannot be used
|
||||
in hierarchical clusters. This document includes instructions for migrating
|
||||
your cluster from SQLite to one of the other databases. Since the initial
|
||||
install of xCAT will always set up SQLite, you must migrate to a database that
|
||||
supports remote clients before installing your service nodes.
|
||||
|
||||
Setup
|
||||
-----
|
||||
xCAT will help you install your service nodes as well as install on the SNs
|
||||
xCAT software and other required rpms such as perl, the database client, and
|
||||
other pre-reqs. Service nodes require all the same software as the MN
|
||||
(because it can do all of the same functions), except that there is a special
|
||||
top level xCAT rpm for SNs called xCATsn vs. the xCAT rpm that is on the
|
||||
Management Node. The xCATsn rpm tells the SN that the xcatd on it should
|
||||
behave as an SN, not the MN.
|
||||
|
156
docs/source/advanced/hierarchy/service_node_for_diskfull.rst
Normal file
156
docs/source/advanced/hierarchy/service_node_for_diskfull.rst
Normal file
@ -0,0 +1,156 @@
|
||||
.. _setup_service_node_stateful_label:
|
||||
|
||||
Set Up the Service Nodes for Stateful (Diskful) Installation (optional)
|
||||
=======================================================================
|
||||
|
||||
Any cluster using statelite compute nodes must use a stateful (diskful) service
|
||||
nodes.
|
||||
|
||||
Note: If you are using diskless service nodes, go to
|
||||
:ref:`setup_service_node_stateless_label`.
|
||||
|
||||
First, go to the `Download_xCAT <http://localhost/fake_todo>`_ site and
|
||||
download the level of the xCAT tarball you desire. Then go to
|
||||
http://localhost/fake_todo and get the latest xCAT dependency tarball.
|
||||
**Note: All xCAT service nodes must be at the exact same xCAT version as the
|
||||
xCAT Management Node**. Copy the files to the Management Node (MN) and untar
|
||||
them in the appropriate sub-directory of ``/install/post/otherpkgs``
|
||||
|
||||
**Note for the appropriate directory below, check the
|
||||
``otherpkgdir=/install/post/otherpkgs/rhels6.4/ppc64`` attribute of the
|
||||
osimage defined for the servicenode.**
|
||||
|
||||
For example ubuntu14.04.1-ppc64el-install-service **** ::
|
||||
|
||||
mkdir -p /install/post/otherpkgs/ubuntu14.04.1/ppc64el/
|
||||
cd /install/post/otherpkgs/ubuntu14.04.1/ppc64el/
|
||||
tar jxvf core-rpms-snap.tar.bz2
|
||||
tar jxvf xcat-dep-ubuntu*.tar.bz2
|
||||
|
||||
Next, add rpm names into your own version of
|
||||
service.<osver>.<arch>.otherpkgs.pkglist file. In most cases, you can find an
|
||||
initial copy of this file under /opt/xcat/share/xcat/install/<platform> . If
|
||||
not, copy one from a similar platform.
|
||||
::
|
||||
|
||||
mkdir -p /install/custom/install/ubuntu/
|
||||
cp /opt/xcat/share/xcat/install/ubuntu/service.ubuntu.otherpkgs.pkglist/\
|
||||
install/custom/install/ubuntu/service.ubuntu.otherpkgs.pkglist
|
||||
vi /install/custom/install/ubuntu/service.ubuntu.otherpkgs.pkglist
|
||||
|
||||
Make sure the following entries are included in the
|
||||
``/install/custom/install/ubuntu/service.ubuntu.otherpkgs.pkglist``:
|
||||
::
|
||||
|
||||
mariadb-client
|
||||
mariadb-common
|
||||
xcatsn
|
||||
conserver-xcat
|
||||
|
||||
The "pkgdir" should include the online/local ubuntu official mirror with the
|
||||
following command:
|
||||
::
|
||||
|
||||
chdef -t osimage -o ubuntu14.04.1-ppc64el-install-service \
|
||||
-p pkgdir="http://ports.ubuntu.com/ubuntu-ports trusty main, \
|
||||
http://ports.ubuntu.com/ubuntu-ports trusty-updates main, \
|
||||
http://ports.ubuntu.com/ubuntu-ports trusty universe, \
|
||||
http://ports.ubuntu.com/ubuntu-ports trusty-updates universe"
|
||||
|
||||
plus the "otherpkgdir" should include the mirror under otherpkgdir on MN, this
|
||||
can be done with: ::
|
||||
|
||||
chdef -t osimage -o ubuntu14.04.1-ppc64el-install-service -p \
|
||||
otherpkgdir="http:// < Name or ip of Management Node > \
|
||||
/install/post/otherpkgs/ubuntu14.04.1/ppc64el/xcat-core/ \
|
||||
trusty main, http://< Name or ip of Management Node > \
|
||||
/install/post/otherpkgs/ubuntu14.04.1/ppc64el/xcat-dep/ trusty main"
|
||||
|
||||
**Note: you will be installing the xCAT Service Node rpm xCATsn meta-package
|
||||
on the Service Node, not the xCAT Management Node meta-package. Do not install
|
||||
both.**
|
||||
|
||||
Update the rhels6 RPM repository (rhels6 only)
|
||||
----------------------------------------------
|
||||
|
||||
* This section could be removed after the powerpc-utils-1.2.2-18.el6.ppc64.rpm
|
||||
is built in the base rhels6 ISO.
|
||||
* The direct rpm download link is:
|
||||
ftp://linuxpatch.ncsa.uiuc.edu/PERCS/powerpc-utils-1.2.2-18.el6.ppc64.rpm
|
||||
* The update steps are as following: ::
|
||||
|
||||
put the new rpm in the base OS packages
|
||||
cd /install/rhels6/ppc64/Server/Packages
|
||||
mv powerpc-utils-1.2.2-17.el6.ppc64.rpm /tmp
|
||||
cp /tmp/powerpc-utils-1.2.2-18.el6.ppc64.rpm
|
||||
# make sure that the rpm is be readable by other users
|
||||
chmod +r powerpc-utils-1.2.2-18.el6.ppc64.rpm
|
||||
|
||||
|
||||
|
||||
* create the repodata
|
||||
|
||||
::
|
||||
|
||||
cd /install/rhels6/ppc64/Server
|
||||
ls -al repodata/
|
||||
total 14316
|
||||
dr-xr-xr-x 2 root root 4096 Jul 20 09:34 .
|
||||
dr-xr-xr-x 3 root root 4096 Jul 20 09:34 ..
|
||||
-r--r--r-- 1 root root 1305862 Sep 22 2010 20dfb74c144014854d3b16313907ebcf30c9ef63346d632369a19a4add8388e7-other.sqlite.bz2
|
||||
-r--r--r-- 1 root root 1521372 Sep 22 2010 57b3c81512224bbb5cebbfcb6c7fd1f7eb99cca746c6c6a76fb64c64f47de102-primary.xml.gz
|
||||
-r--r--r-- 1 root root 2823613 Sep 22 2010 5f664ea798d1714d67f66910a6c92777ecbbe0bf3068d3026e6e90cc646153e4-primary.sqlite.bz2
|
||||
-r--r--r-- 1 root root 1418180 Sep 22 2010 7cec82d8ed95b8b60b3e1254f14ee8e0a479df002f98bb557c6ccad5724ae2c8-other.xml.gz
|
||||
-r--r--r-- 1 root root 194113 Sep 22 2010 90cbb67096e81821a2150d2b0a4f3776ab1a0161b54072a0bd33d5cadd1c234a-comps-rhel6-Server.xml.gz
|
||||
**-r--r--r-- 1 root root 1054944 Sep 22 2010 98462d05248098ef1724eddb2c0a127954aade64d4bb7d4e693cff32ab1e463c-comps-rhel6-Server.xml**
|
||||
-r--r--r-- 1 root root 3341671 Sep 22 2010 bb3456b3482596ec3aa34d517affc42543e2db3f4f2856c0827d88477073aa45-filelists.sqlite.bz2
|
||||
-r--r--r-- 1 root root 2965960 Sep 22 2010 eb991fd2bb9af16a24a066d840ce76365d396b364d3cdc81577e4cf6e03a15ae-filelists.xml.gz
|
||||
-r--r--r-- 1 root root 3829 Sep 22 2010 repomd.xml
|
||||
-r--r--r-- 1 root root 2581 Sep 22 2010 TRANS.TBL
|
||||
createrepo -g repodata \
|
||||
/98462d05248098ef1724eddb2c0a127954aade64d4bb7d4e693cff32ab1e463c-comps-rhel6-Server.xml
|
||||
|
||||
Note: you should use comps-rhel6-Server.xml with its key as the group file.
|
||||
|
||||
Set the node status to ready for installation
|
||||
---------------------------------------------
|
||||
|
||||
Run nodeset to the osimage name defined in the provmethod attribute on your
|
||||
service node. ::
|
||||
|
||||
nodeset service osimage="<osimagename>"
|
||||
|
||||
For example ::
|
||||
|
||||
nodeset service osimage="ubuntu14.04.1-ppc64el-install-service"
|
||||
|
||||
Initialize network boot to install Service Nodes
|
||||
------------------------------------------------
|
||||
|
||||
::
|
||||
|
||||
rnetboot service
|
||||
|
||||
Monitor the Installation
|
||||
------------------------
|
||||
|
||||
Watch the installation progress using either wcons or rcons: ::
|
||||
|
||||
wcons service # make sure DISPLAY is set to your X server/VNC or
|
||||
rcons <one-node-at-a-time>
|
||||
tail -f /var/log/messages
|
||||
|
||||
Note: We have experienced one problem while trying to install RHEL6 diskful
|
||||
service node working with SAS disks. The service node cannot reboots from SAS
|
||||
disk after the RHEL6 operating system has been installed. We are waiting for
|
||||
the build with fixes from RHEL6 team, once meet this problem, you need to
|
||||
manually select the SAS disk to be the first boot device and boots from the
|
||||
SAS disk.
|
||||
|
||||
Update Service Node Diskfull Image
|
||||
----------------------------------
|
||||
|
||||
If you need to update the service nodes later on with a new version of xCAT
|
||||
and its dependencies, obtain the new xCAT and xCAT dependencies rpms.
|
||||
(Follow the same steps that were followed in
|
||||
:ref:`setup_service_node_stateful_label`.
|
221
docs/source/advanced/hierarchy/service_node_for_diskless.rst
Normal file
221
docs/source/advanced/hierarchy/service_node_for_diskless.rst
Normal file
@ -0,0 +1,221 @@
|
||||
.. _setup_service_node_stateless_label:
|
||||
|
||||
Setup the Service Node for Stateless Deployment (optional)
|
||||
==========================================================
|
||||
|
||||
**Note: The stateless service node is not supported in ubuntu hierarchy
|
||||
cluster. For ubuntu, please skip this section.**
|
||||
|
||||
If you want, your service nodes can be stateless (diskless). The service node
|
||||
must contain not only the OS, but also the xCAT software and its dependencies.
|
||||
In addition, a number of files are added to the service node to support the
|
||||
PostgreSQL, or MySQL database access from the service node to the Management
|
||||
node, and ssh access to the nodes that the service nodes services.
|
||||
The following sections explain how to accomplish this.
|
||||
|
||||
|
||||
Build the Service Node Diksless Image
|
||||
--------------------------------------
|
||||
|
||||
This section assumes you can build the stateless image on the management node
|
||||
because the service nodes are the same OS and architecture as the management
|
||||
node. If this is not the case, you need to build the image on a machine that
|
||||
matches the service node's OS architecture.
|
||||
|
||||
* Create an osimage definition. When you run copycds, xCAT will create a
|
||||
service node osimage definitions for that distribution. For a stateless
|
||||
service node, use the *-netboot-service definition.
|
||||
|
||||
::
|
||||
|
||||
lsdef -t osimage | grep -i service
|
||||
rhels6.4-ppc64-install-service (osimage)
|
||||
rhels6.4-ppc64-netboot-service (osimage)
|
||||
rhels6.4-ppc64-statelite-service (osimage)
|
||||
|
||||
lsdef -t osimage -l rhels6.3-ppc64-netboot-service
|
||||
Object name: rhels6.3-ppc64-netboot-service
|
||||
exlist=/opt/xcat/share/xcat/netboot/rh/service.exlist
|
||||
imagetype=linux
|
||||
osarch=ppc64
|
||||
osdistroname=rhels6.3-ppc64
|
||||
osname=Linux
|
||||
osvers=rhels6.3
|
||||
otherpkgdir=/install/post/otherpkgs/rhels6.3/ppc64
|
||||
otherpkglist=/opt/xcat/share/xcat/netboot/rh/service.rhels6.ppc64.otherpkgs.pkglist
|
||||
pkgdir=/install/rhels6.3/ppc64
|
||||
pkglist=/opt/xcat/share/xcat/netboot/rh/service.rhels6.ppc64.pkglist
|
||||
postinstall=/opt/xcat/share/xcat/netboot/rh/service.rhels6.ppc64.postinstall
|
||||
profile=service
|
||||
provmethod=netboot
|
||||
rootimgdir=/install/netboot/rhels6.3/ppc64/service
|
||||
|
||||
* You can check the service node packaging to see if it has all the rpms you
|
||||
require. We ship a basic requirements lists that will create a fully
|
||||
functional service node. However, you may want to customize your service
|
||||
node by adding additional operating system packages or modifying the files
|
||||
excluded by the exclude list. View the files referenced by the osimage
|
||||
pkglist, otherpkglist and exlist attributes:
|
||||
|
||||
::
|
||||
|
||||
cd /opt/xcat/share/xcat/netboot/rh/
|
||||
view service.rhels6.ppc64.pkglist
|
||||
view service.rhels6.ppc64.otherpkgs.pkglist
|
||||
view service.exlist
|
||||
|
||||
If you would like to change any of these files, copy them to a custom
|
||||
directory. This can be any directory you choose, but we recommend that you
|
||||
keep it /install somewhere. A good location is something like
|
||||
``/install/custom/netboot/<os>/service``. Make sure that your
|
||||
``otherpkgs.pkglist`` file as an entry for
|
||||
|
||||
::
|
||||
|
||||
xcat/xcat-core/xCATsn
|
||||
|
||||
This is required to install the xCAT service node function into your image.
|
||||
|
||||
You may also choose to create an appropriate /etc/fstab file in your
|
||||
service node image. Copy the script referenced by the postinstall
|
||||
attribute to your directory and modify it as you would like:
|
||||
|
||||
::
|
||||
|
||||
cp /opt/xcat/share/xcat/netboot/rh/service.rhels6.ppc64.postinstall
|
||||
/install/custom/netboot/rh
|
||||
vi /install/custom/netboot/rh
|
||||
# uncomment the sample fstab lines and change as needed:
|
||||
proc /proc proc rw 0 0
|
||||
sysfs /sys sysfs rw 0 0
|
||||
devpts /dev/pts devpts rw,gid=5,mode=620 0 0
|
||||
service_x86_64 / tmpfs rw 0 1
|
||||
none /tmp tmpfs defaults,size=10m 0 2
|
||||
none /var/tmp tmpfs defaults,size=10m 0 2
|
||||
|
||||
After modifying the files, you will need to update the osimage definition to
|
||||
reference these files. We recommend creating a new osimage definition for
|
||||
your custom image: ::
|
||||
|
||||
lsdef -t osimage -l rhels6.3-ppc64-netboot-service -z > /tmp/myservice.def
|
||||
vi /tmp/myservice.def
|
||||
# change the name of the osimage definition
|
||||
# change any attributes that now need to reference your custom files
|
||||
# change the rootimgdir attribute replacing 'service'
|
||||
with a name to match your new osimage definition
|
||||
cat /tmp/msyservice.def | mkdef -z
|
||||
|
||||
While you are here, if you'd like, you can do the same for your compute node
|
||||
images, creating custom files and new custom osimage definitions as you need
|
||||
to.
|
||||
|
||||
For more information on the use and syntax of otherpkgs and pkglist files,
|
||||
see `Update Service Node Stateless Image <http://localhost/fake_todo>`_
|
||||
|
||||
* Make your xCAT software available for otherpkgs processing
|
||||
|
||||
* If you downloaded xCAT to your management node for installation, place a
|
||||
copy of your xcat-core and xcat-dep in your otherpkgdir directory ::
|
||||
|
||||
lsdef -t osimage -o rhels6.3-ppc64-netboot-service -i otherpkgdir
|
||||
Object name: rhels6.3-ppc64-netboot-service
|
||||
otherpkgdir=/install/post/otherpkgs/rhels6.3/ppc64
|
||||
cd /install/post/otherpkgs/rhels6.3/ppc64
|
||||
mkdir xcat
|
||||
cd xcat
|
||||
cp -Rp <current location of xcat-core>/xcat-core
|
||||
cp -Rp <current location of xcat-dep>/xcat-dep
|
||||
|
||||
* If you installed your management node directly from the Linux online
|
||||
repository, you will need to download the xcat-core and xcat-dep tarballs
|
||||
|
||||
- Go to the `Download xCAT page <http://localhost/fake_todo>`_ and download
|
||||
the level of xCAT tarball you desire.
|
||||
- Go to the `Download xCAT Dependencies <http://localhost/fake_todo>`_ page
|
||||
and download the latest xCAT dependency tarball. Place these into your
|
||||
otherpkdir directory:
|
||||
|
||||
::
|
||||
|
||||
lsdef -t osimage -o rhels6.3-ppc64-netboot-service -i otherpkgdir
|
||||
Object name: rhels6.3-ppc64-netboot-service
|
||||
otherpkgdir=/install/post/otherpkgs/rhels6.3/ppc64
|
||||
cd /install/post/otherpkgs/rhels6.3/ppc64
|
||||
mkdir xcat
|
||||
cd xcat
|
||||
mv <xcat-core tarball> .
|
||||
tar -jxvf <xcat-core tarball>
|
||||
mv <xcat-dep tarball> .
|
||||
tar -jxvf <xcat-dep tarball>
|
||||
|
||||
* Run image generation for your osimage definition:
|
||||
|
||||
::
|
||||
|
||||
genimage rhels6.3-ppc64-netboot-service
|
||||
|
||||
* Prevent DHCP from starting up until xcatd has had a chance to configure it:
|
||||
|
||||
::
|
||||
|
||||
chroot /install/netboot/rhels6.3/ppc64/service/rootimg chkconfig dhcpd off
|
||||
chroot /install/netboot/rhels6.3/ppc64/service/rootimg chkconfig dhcrelay off
|
||||
|
||||
* IF using NFS hybrid mode, export /install read-only in service node image:
|
||||
|
||||
::
|
||||
|
||||
cd /install/netboot/rhels6.3/ppc64/service/rootimg/etc
|
||||
echo '/install *(ro,no_root_squash,sync,fsid=13)' >exports
|
||||
|
||||
* Pack the image for your osimage definition:
|
||||
|
||||
::
|
||||
|
||||
packimage rhels6.3-ppc64-netboot-service
|
||||
|
||||
* Set the node status to ready for netboot using your osimage definition and
|
||||
your 'service' nodegroup:
|
||||
|
||||
::
|
||||
|
||||
nodeset service osimage=rhels6.3-ppc64-netboot-service
|
||||
|
||||
* To diskless boot the service nodes
|
||||
|
||||
::
|
||||
|
||||
rnetboot service
|
||||
|
||||
Update Service Node Stateless Image
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
To update the xCAT software in the image at a later time:
|
||||
|
||||
* Download the updated xcat-core and xcat-dep tarballs and place them in
|
||||
your osimage's otherpkgdir xcat directory as you did above.
|
||||
* Generate and repack the image and reboot your service node.
|
||||
* Run image generation for your osimage definition.
|
||||
|
||||
::
|
||||
|
||||
genimage rhels6.3-ppc64-netboot-service
|
||||
packimage rhels6.3-ppc64-netboot-service
|
||||
nodeset service osimage=rhels6.3-ppc64-netboot-service
|
||||
rnetboot service
|
||||
|
||||
Note: The service nodes are set up as NFS-root servers for the compute nodes.
|
||||
Any time changes are made to any compute image on the mgmt node it will be
|
||||
necessary to sync all changes to all service nodes. In our case the
|
||||
``/install`` directory is mounted on the servicenodes, so the update to the
|
||||
compute node image is automatically available.
|
||||
|
||||
Monitor install and boot
|
||||
------------------------
|
||||
|
||||
::
|
||||
|
||||
wcons service # make sure DISPLAY is set to your X server/VNC or
|
||||
rcons <one-node-at-a-time> # or do rcons for each node
|
||||
tail -f /var/log/messages
|
||||
|
@ -0,0 +1,27 @@
|
||||
Setup the MN Hierarchical Database
|
||||
==================================
|
||||
|
||||
Before setting up service nodes, you need to set up either MySQL, PostgreSQL,
|
||||
as the xCAT Database on the Management Node. The database client on the
|
||||
Service Nodes will be set up later when the SNs are installed. MySQL and
|
||||
PostgreSQL are available with the Linux OS.
|
||||
|
||||
Follow the instructions in one of these documents for setting up the
|
||||
Management node to use the selected database:
|
||||
|
||||
MySQL or MariaDB
|
||||
----------------
|
||||
|
||||
* Follow this documentation and be sure to use the xCAT provided mysqlsetup
|
||||
command to setup the database for xCAT:
|
||||
.. TODO http link
|
||||
|
||||
- `Setting_Up_MySQL_as_the_xCAT_DB <http://localhost/fake_todo>`_
|
||||
|
||||
PostgreSQL:
|
||||
-----------
|
||||
* Follow this documentation and be sure and use the xCAT provided pgsqlsetup
|
||||
command to setup the database for xCAT:
|
||||
.. TODO http link
|
||||
|
||||
- `Setting_Up_PostgreSQL_as_the_xCAT_DB <http://localhost/fake_todo>`_
|
@ -0,0 +1,27 @@
|
||||
Setup the MN Hierarchical Database
|
||||
==================================
|
||||
|
||||
Before setting up service nodes, you need to set up either MySQL, PostgreSQL,
|
||||
as the xCAT Database on the Management Node. The database client on the
|
||||
Service Nodes will be set up later when the SNs are installed. MySQL and
|
||||
PostgreSQL are available with the Linux OS.
|
||||
|
||||
Follow the instructions in one of these documents for setting up the
|
||||
Management node to use the selected database:
|
||||
|
||||
MySQL or MariaDB
|
||||
----------------
|
||||
|
||||
* Follow this documentation and be sure to use the xCAT provided mysqlsetup
|
||||
command to setup the database for xCAT:
|
||||
.. TODO http link
|
||||
|
||||
- `Setting_Up_MySQL_as_the_xCAT_DB <http://localhost/fake_todo>`_
|
||||
|
||||
PostgreSQL:
|
||||
-----------
|
||||
* Follow this documentation and be sure and use the xCAT provided pgsqlsetup
|
||||
command to setup the database for xCAT:
|
||||
.. TODO http link
|
||||
|
||||
- `Setting_Up_PostgreSQL_as_the_xCAT_DB <http://localhost/fake_todo>`_
|
6
docs/source/advanced/hierarchy/setup_service_node.rst
Normal file
6
docs/source/advanced/hierarchy/setup_service_node.rst
Normal file
@ -0,0 +1,6 @@
|
||||
Setup Service Node
|
||||
==================
|
||||
|
||||
* Follow this documentation to :ref:`setup_service_node_stateful_label`.
|
||||
|
||||
* Follow this documentation to :ref:`setup_service_node_stateless_label`.
|
@ -0,0 +1,11 @@
|
||||
Test Service Node installation
|
||||
==============================
|
||||
|
||||
* ssh to the service nodes. You should not be prompted for a password.
|
||||
* Check to see that the xcat daemon xcatd is running.
|
||||
* Run some database command on the service node, e.g tabdump site, or nodels,
|
||||
and see that the database can be accessed from the service node.
|
||||
* Check that ``/install`` and ``/tftpboot`` are mounted on the service node
|
||||
from the Management Node, if appropriate.
|
||||
* Make sure that the Service Node has Name resolution for all nodes, it will
|
||||
service.
|
Loading…
x
Reference in New Issue
Block a user