mirror of
				https://github.com/xcat2/xcat-core.git
				synced 2025-10-26 17:05:33 +00:00 
			
		
		
		
	Add hierarchy doc
Add hierarchy documents including introduction, database management,defining node, service node setup, test service node and appendix. TODO: resolve the fake_todo link reference in the following patch
This commit is contained in:
		| @@ -0,0 +1,167 @@ | ||||
| Appendix A: Setup backup Service Nodes | ||||
| ====================================== | ||||
|  | ||||
| For reliability, availability, and serviceability purposes you may wish to | ||||
| designate backup service nodes in your hierarchical cluster. The backup | ||||
| service node will be another active service node that is set up to easily | ||||
| take over from the original service node if a problem occurs. This is not an | ||||
| automatic failover feature. You will have to initiate the switch from the | ||||
| primary service node to the backup manually. The xCAT support will handle most | ||||
| of the setup and transfer of the nodes to the new service node. This | ||||
| procedure can also be used to simply switch some compute nodes to a new | ||||
| service node, for example, for planned maintenance. | ||||
|  | ||||
| Abbreviations used below: | ||||
|  | ||||
| * MN - management node. | ||||
| * SN - service node. | ||||
| * CN - compute node. | ||||
|  | ||||
| Initial deployment | ||||
| ------------------ | ||||
|  | ||||
| Integrate the following steps into the hierarchical deployment process | ||||
| described above. | ||||
|  | ||||
|  | ||||
| #. Make sure both the primary and backup service nodes are installed, | ||||
|    configured, and can access the MN database. | ||||
| #. When defining the CNs add the necessary service node values to the | ||||
|    "servicenode" and "xcatmaster" attributes of the `node definitions | ||||
|    <http://localhost/fake_todo>`_. | ||||
| #. (Optional) Create an xCAT group for the nodes that are assigned to each SN. | ||||
|    This will be useful when setting node attributes as well as providing an | ||||
|    easy way to switch a set of nodes back to their original server. | ||||
|  | ||||
| To specify a backup service node you must specify a comma-separated list of | ||||
| two **service nodes** for the servicenode value of the compute node. The first | ||||
| one is the primary and the second is the backup (or new SN) for that node. | ||||
| Use the hostnames of the SNs as known by the MN. | ||||
|  | ||||
| For the **xcatmaster** value you should only include the primary SN, as known | ||||
| by the compute node. | ||||
|  | ||||
| In most hierarchical clusters, the networking is such that the name of the | ||||
| SN as known by the MN is different than the name as known by the CN. (If | ||||
| they are on different networks.) | ||||
|  | ||||
| The following example assume the SN interface to the MN is on the "a" | ||||
| network and the interface to the CN is on the "b" network. To set the | ||||
| attributes you would run a command similar to the following. :: | ||||
|  | ||||
|   chdef <noderange>  servicenode="xcatsn1a,xcatsn2a" xcatmaster="xcatsn1b" | ||||
|  | ||||
| The process can be simplified by creating xCAT node groups to use as the | ||||
| <noderange> in the `chdef <http://localhost/fake_todo>`_ command. To create an | ||||
| xCAT node group containing all the nodes that have the service node "SN27" | ||||
| you could run a command similar to the following. :: | ||||
|  | ||||
|   mkdef -t group sn1group members=node[01-20] | ||||
|  | ||||
| **Note: Normally backup service nodes are the primary SNs for other compute | ||||
| nodes. So, for example, if you have 2 SNs, configure half of the CNs to use | ||||
| the 1st SN as their primary SN, and the other half of CNs to use the 2nd SN | ||||
| as their primary SN. Then each SN would be configured to be the backup SN | ||||
| for the other half of CNs.** | ||||
|  | ||||
| When you run `makedhcp <http://localhost/fake_todo>`_, it will configure dhcp | ||||
| and tftp on both the primary and backup SNs, assuming they both have network | ||||
| access to the CNs. This will make it possible to do a quick SN takeover | ||||
| without having to wait for replication when you need to switch. | ||||
|  | ||||
| xdcp Behaviour with backup servicenodes | ||||
| --------------------------------------- | ||||
|  | ||||
| The xdcp command in a hierarchical environment must first copy (scp) the | ||||
| files to the service nodes for them to be available to scp to the node from | ||||
| the service node that is it's master. The files are placed in | ||||
| ``/var/xcat/syncfiles`` directory by default, or what is set in site table | ||||
| SNsyncfiledir attribute. If the node has multiple service nodes assigned, | ||||
| then xdcp will copy the file to each of the service nodes assigned to the | ||||
| node. For example, here the files will be copied (scp) to both service1 and | ||||
| rhsn. lsdef cn4 | grep servicenode. :: | ||||
|  | ||||
|   servicenode=service1,rhsn | ||||
|  | ||||
| f a service node is offline ( e.g. service1), then you will see errors on | ||||
| your xdcp command, and yet if rhsn is online then the xdcp will actually | ||||
| work. This may be a little confusing. For example, here service1 is offline, | ||||
| but we are able to use rhsn to complete the xdcp. :: | ||||
|  | ||||
|   xdcp cn4  /tmp/lissa/file1 /tmp/file1 | ||||
|  | ||||
|   service1: Permission denied (publickey,password,keyboard-interactive). | ||||
|   service1: Permission denied (publickey,password,keyboard-interactive). | ||||
|   service1: lost connection | ||||
|   The following servicenodes: service1, have errors and cannot be updated | ||||
|   Until the error is fixed, xdcp will not work to nodes serviced by these service nodes. | ||||
|  | ||||
|   xdsh cn4 ls /tmp/file1 | ||||
|   cn4: /tmp/file1 | ||||
|  | ||||
| Switch to the backup SN | ||||
| ----------------------- | ||||
|  | ||||
| When an SN fails, or you want to bring it down for maintenance, use this | ||||
| procedure to move its CNs over to the backup SN. | ||||
|  | ||||
| Move the nodes to the new service nodes | ||||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||||
|  | ||||
| Use the xCAT `snmove <http://localhost/fake_todo>`_ to make the database | ||||
| updates necessary to move a set of nodes from one service node to another, and | ||||
| to make configuration modifications to the nodes. | ||||
|  | ||||
| For example, if you want to switch all the compute nodes that use service | ||||
| node "sn1" to the backup SN (sn2), run: :: | ||||
|  | ||||
|   snmove -s sn1 | ||||
|  | ||||
| Modified database attributes | ||||
| """""""""""""""""""""""""""" | ||||
|  | ||||
| The **snmove** command will check and set several node attribute values. | ||||
|  | ||||
| **servicenode**: : This will be set to either the second server name in the | ||||
| servicenode attribute list or the value provided on the command line. | ||||
| **xcatmaster**: : Set with either the value provided on the command line or it | ||||
| will be automatically determined from the servicenode attribute. | ||||
| **nfsserver**: : If the value is set with the source service node then it will | ||||
| be set to the destination service node. | ||||
| **tftpserver**: : If the value is set with the source service node then it will | ||||
|  be reset to the destination service node. | ||||
| **monserver**: : If set to the source service node then reset it to the | ||||
| destination servicenode and xcatmaster values. | ||||
| **conserver**: : If set to the source service node then reset it to the | ||||
| destination servicenode and run **makeconservercf** | ||||
|  | ||||
| Run postscripts on the nodes | ||||
| """""""""""""""""""""""""""" | ||||
|  | ||||
| If the CNs are up at the time the snmove command is run then snmove will run | ||||
| postscripts on the CNs to reconfigure them for the new SN. The "syslog" | ||||
| postscript is always run. The "mkresolvconf" and "setupntp" scripts will be | ||||
| run IF they were included in the nodes postscript list. | ||||
|  | ||||
| You can also specify an additional list of postscripts to run. | ||||
|  | ||||
| Modify system configuration on the nodes | ||||
| """""""""""""""""""""""""""""""""""""""" | ||||
|  | ||||
| If the CNs are up the snmove command will also perform some configuration on | ||||
| the nodes such as setting the default gateway and modifying some | ||||
| configuration files used by xCAT. | ||||
|  | ||||
| Switching back | ||||
| -------------- | ||||
|  | ||||
| The process for switching nodes back will depend on what must be done to | ||||
| recover the original service node. If the SN needed to be reinstalled, you | ||||
| need to set it up as an SN again and make sure the CN images are replicated | ||||
| to it. Once you've done this, or if the SN's configuration was not lost, | ||||
| then follow these steps to move the CNs back to their original SN: | ||||
|  | ||||
| * Use snmove: :: | ||||
|  | ||||
|   snmove sn1group -d sn1 | ||||
|  | ||||
							
								
								
									
										66
									
								
								docs/source/advanced/hierarchy/appendix_b_diagnostics.rst
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										66
									
								
								docs/source/advanced/hierarchy/appendix_b_diagnostics.rst
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,66 @@ | ||||
| Appendix B: Diagnostics | ||||
| ======================= | ||||
|  | ||||
| * **root ssh keys not setup** -- If you are prompted for a password when ssh to | ||||
|   the service node, then check to see if /root/.ssh has authorized_keys. If | ||||
|   the directory does not exist or no keys, on the MN, run xdsh service -K, | ||||
|   to exchange the ssh keys for root. You will be prompted for the root | ||||
|   password, which should be the password you set for the key=system in the | ||||
|   passwd table. | ||||
| * **XCAT rpms not on SN** --On the SN, run rpm -qa | grep xCAT and make sure | ||||
|   the appropriate xCAT rpms are installed on the servicenode. See the list of | ||||
|   xCAT rpms in :ref:`setup_service_node_stateful_label`. If rpms | ||||
|   missing check your install setup as outlined in Build the Service Node | ||||
|   Stateless Image for diskless or :ref:`setup_service_node_stateful_label` for | ||||
|   diskfull installs. | ||||
| * **otherpkgs(including xCAT rpms) installation failed on the SN** --The OS | ||||
|   repository is not created on the SN. When the "yum" command is processing | ||||
|   the dependency, the rpm packages (including expect, nmap, and httpd, etc) | ||||
|   required by xCATsn can't be found. In this case, please check whether the | ||||
|   ``/install/postscripts/repos/<osver>/<arch>/`` directory exists on the MN. | ||||
|   If it is not on the MN, you need to re-run the "copycds" command, and there | ||||
|   will be some file created under the | ||||
|   ``/install/postscripts/repos/<osver>/<arch>`` directory on the MN. Then, you | ||||
|   need to re-install the SN, and this issue should be gone. | ||||
| * **Error finding the database/starting xcatd** -- If on the Service node when | ||||
|   you run tabdump site, you get "Connection failure: IO::Socket::SSL: | ||||
|   connect: Connection refused at ``/opt/xcat/lib/perl/xCAT/Client.pm``". Then | ||||
|   restart the xcatd daemon and see if it passes by running the command: | ||||
|   service xcatd restart. If it fails with the same error, then check to see | ||||
|   if ``/etc/xcat/cfgloc`` file exists. It should exist and be the same as | ||||
|   ``/etc/xcat/cfgloc`` on the MN. If it is not there, copy it from the MN to | ||||
|   the SN. The run service xcatd restart. This indicates the servicenode | ||||
|   postscripts did not complete successfully. Check to see your postscripts | ||||
|   table was setup correctly in :ref:`add_service_node_postscripts_label` to the | ||||
|   postscripts table. | ||||
| * **Error accessing database/starting xcatd credential failure**-- If you run | ||||
|   tabdump site on the servicenode and you get "Connection failure: | ||||
|   IO::Socket::SSL: SSL connect attempt failed because of handshake | ||||
|   problemserror:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca | ||||
|   at ``/opt/xcat/lib/perl/xCAT/Client.pm``", check ``/etc/xcat/cert``. The | ||||
|   directory should contain the files ca.pem and server-cred.pem. These were | ||||
|   suppose to transfer from the MN ``/etc/xcat/cert`` directory during the | ||||
|   install. Also check the ``/etc/xcat/ca`` directory. This directory should | ||||
|   contain most files from the ``/etc/xcat/ca`` directory on the MN. You can | ||||
|   manually copy them from the MN to the SN, recursively. This indicates the | ||||
|   the servicenode postscripts did not complete successfully. Check to see | ||||
|   your postscripts table was setup correctly in | ||||
|   :ref:`add_service_node_postscripts_label` to the postscripts table. Again | ||||
|   service xcatd restart and try the tabdump site again. | ||||
| * **Missing ssh hostkeys** -- Check to see if ``/etc/xcat/hostkeys`` on the SN, | ||||
|   has the same files as ``/etc/xcat/hostkeys`` on the MN. These are the ssh | ||||
|   keys that will be installed on the compute nodes, so root can ssh between | ||||
|   compute nodes without password prompting. If they are not there copy them | ||||
|   from the MN to the SN. Again, these should have been setup by the | ||||
|   servicenode postscripts. | ||||
|  | ||||
| * **Errors running hierarchical commands such as xdsh** -- xCAT has a number of | ||||
|   commands that run hierarchically. That is, the commands are sent from xcatd | ||||
|   on the management node to the correct service node xcatd, which in turn | ||||
|   processes the command and sends the results back to xcatd on the management | ||||
|   node. If a hierarchical command such as xcatd fails with something like | ||||
|   "Error: Permission denied for request", check ``/var/log/messages`` on the | ||||
|   management node for errors. One error might be "Request matched no policy | ||||
|   rule". This may mean you will need to add policy table entries for your | ||||
|   xCAT management node and service node: | ||||
|  | ||||
| @@ -0,0 +1,14 @@ | ||||
| Appendix C: Migrating a Management Node to a Service Node | ||||
| ========================================================= | ||||
|  | ||||
| If you find you want to convert an existing Management Node to a Service | ||||
| Node you need to work with the xCAT team. It is recommended for now, to | ||||
| backup your database, setup your new Management Server, and restore your | ||||
| database into it. Take the old Management Node and remove xCAT and all xCAT | ||||
| directories, and your database. See ``Uninstalling_xCAT | ||||
| <http://localhost/fake_todo>`_ and then follow the process for setting up a | ||||
| SN as if it is a new node. | ||||
|  | ||||
|  | ||||
|  | ||||
|    | ||||
| @@ -0,0 +1,13 @@ | ||||
| Appendix D: Set up Hierarchical Conserver | ||||
| ========================================= | ||||
|  | ||||
| To allow you to open the rcons from the Management Node using the | ||||
| conserver daemon on the Service Nodes, do the following: | ||||
|  | ||||
| * Set nodehm.conserver to be the service node (using the ip that faces the | ||||
|   management node) :: | ||||
|  | ||||
|     chdef -t <noderange> conserver=<servicenodeasknownbytheMN> | ||||
|     makeconservercf | ||||
|     service conserver stop | ||||
|     service conserver start | ||||
							
								
								
									
										11
									
								
								docs/source/advanced/hierarchy/configure_dhcp.rst
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										11
									
								
								docs/source/advanced/hierarchy/configure_dhcp.rst
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,11 @@ | ||||
| Configure DHCP | ||||
| ============== | ||||
|  | ||||
| Add the relevant networks into the DHCP configuration, refer to: | ||||
| `XCAT_pLinux_Clusters/#setup-dhcp <http://localhost/fake_todo>`_ | ||||
|  | ||||
| Add the defined nodes into the DHCP configuration, refer to: | ||||
| `XCAT_pLinux_Clusters/#configure-dhcp <http://localhost/fake_todo>`_ | ||||
|  | ||||
|  | ||||
|  | ||||
| @@ -0,0 +1,39 @@ | ||||
| Define and install your Compute Nodes | ||||
| ===================================== | ||||
|  | ||||
| Make /install available on the Service Nodes | ||||
| -------------------------------------------- | ||||
|  | ||||
| Note that all of the files and directories pointed to by your osimages should  | ||||
| be placed under the directory referred to in site.installdir (usually  | ||||
| /install), so they will be available to the service nodes. The installdir  | ||||
| directory is mounted or copied to the service nodes during the hierarchical  | ||||
| installation. | ||||
|  | ||||
| If you are not using the NFS-based statelite method of booting your compute  | ||||
| nodes and you are not using service node pools, set the installloc attribute  | ||||
| to "/install". This instructs the service node to mount /install from the  | ||||
| management node. (If you don't do this, you have to manually sync /install  | ||||
| between the management node and the service nodes.) | ||||
|  | ||||
| :: | ||||
|  | ||||
|   chdef -t site  clustersite installloc="/install" | ||||
|  | ||||
| Make compute node syncfiles available on the servicenodes | ||||
| --------------------------------------------------------- | ||||
|  | ||||
| If you are not using the NFS-based statelite method of booting your compute  | ||||
| nodes, and you plan to use the syncfiles postscript to update files on the  | ||||
| nodes during install, you must ensure that those files are sync'd to the  | ||||
| servicenodes before the install of the compute nodes. To do this after your  | ||||
| nodes are defined, you will need to run the following whenever the files in  | ||||
| your synclist change on the Management Node: | ||||
| :: | ||||
|  | ||||
|   updatenode <computenoderange> -f | ||||
|  | ||||
| At this point you can return to the documentation for your cluster environment  | ||||
| to define and deploy your compute nodes. | ||||
|  | ||||
|  | ||||
							
								
								
									
										345
									
								
								docs/source/advanced/hierarchy/define_service_node.rst
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										345
									
								
								docs/source/advanced/hierarchy/define_service_node.rst
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,345 @@ | ||||
| Define the service nodes in the database | ||||
| ======================================== | ||||
|  | ||||
| This document assumes that you have previously **defined** your compute nodes | ||||
| in the database. It is also possible at this point that you have generic | ||||
| entries in your db for the nodes you will use as service nodes as a result of | ||||
| the node discovery process. We are now going to show you how to add all the | ||||
| relevant database data for the service nodes (SN) such that the SN can be | ||||
| installed and managed from the Management Node (MN). In addition, you will | ||||
| be adding the information to the database that will tell xCAT which service | ||||
| nodes (SN) will service which compute nodes (CN). | ||||
|  | ||||
| For this example, we have two service nodes: **sn1** and **sn2**. We will call | ||||
| our Management Node: **mn1**. Note: service nodes are, by convention, in a | ||||
| group called **service**. Some of the commands in this document will use the | ||||
| group **service** to update all service nodes. | ||||
|  | ||||
| Note: a Service Node's service node is the Management Node; so a service node | ||||
| must have a direct connection to the management node. The compute nodes do not  | ||||
| have to be directly attached to the Management Node, only to their service  | ||||
| node. This will all have to be defined in your networks table. | ||||
|  | ||||
| Add Service Nodes to the nodelist Table | ||||
| --------------------------------------- | ||||
|  | ||||
| Define your service nodes (if not defined already), and by convention we put | ||||
| them in a **service** group. We usually have a group compute for our compute | ||||
| nodes, to distinguish between the two types of nodes. (If you want to use your  | ||||
| own group name for service nodes, rather than service, you need to change some  | ||||
| defaults in the xCAT db that use the group name service. For example, in the  | ||||
| postscripts table there is by default a group entry for service, with the  | ||||
| appropriate postscripts to run when installing a service node. Also, the  | ||||
| default ``kickstart/autoyast`` template, pkglist, etc that will be used have | ||||
| files names based on the profile name service.) :: | ||||
|  | ||||
|   mkdef sn1,sn2 groups=service,ipmi,all | ||||
|  | ||||
| Add OS and Hardware Attributes to Service Nodes | ||||
| ----------------------------------------------- | ||||
|  | ||||
| When you ran copycds, it creates several osimage definitions, including some | ||||
| appropriate for SNs. Display the list of osimages and choose one with | ||||
| "service" in the name: :: | ||||
|  | ||||
|    lsdef -t osimage | ||||
|  | ||||
| For this example, let's assume you chose the stateful osimage definition for  | ||||
| rhels 6.3: rhels6.3-x86_64-install-service . If you want to modify any of the  | ||||
| osimage attributes (e.g. ``kickstart/autoyast`` template, pkglist, etc), | ||||
| make a copy of the osimage definition and also copy to ``/install/custom`` | ||||
| any files it points to that you are modifying. | ||||
|  | ||||
| Now set some of the common attributes for the SNs at the group level: :: | ||||
|  | ||||
|   chdef -t group service arch=x86_64 \ | ||||
|                          os=rhels6.3 \ | ||||
|                          nodetype=osi | ||||
|                          profile=service \ | ||||
|                          netboot=xnba installnic=mac \ | ||||
|                          primarynic=mac \ | ||||
|                          provmethod=rhels6.3-x86_64-install-service | ||||
|  | ||||
| Add Service Nodes to the servicenode Table | ||||
| ------------------------------------------ | ||||
|  | ||||
| An entry must be created in the servicenode table for each service node or the  | ||||
| service group. This table describes all the services you would like xcat to  | ||||
| setup on the service nodes. (Even if you don't want xCAT to set up any  | ||||
| services - unlikely - you must define the service nodes in the servicenode  | ||||
| table with at least one attribute set (you can set it to 0), otherwise it will  | ||||
| not be recognized as a service node.) | ||||
|  | ||||
| When the xcatd daemon is started or restarted on the service node, it will  | ||||
| make sure all of the requested services are configured and started. (To  | ||||
| temporarily avoid this when restarting xcatd, use "service xcatd reload"  | ||||
| instead.) | ||||
|  | ||||
| To set up the minimum recommended services on the service nodes: :: | ||||
|  | ||||
|   chdef -t group -o service setupnfs=1 \ | ||||
|                             setupdhcp=1 setuptftp=1 \ | ||||
|                             setupnameserver=1 \ | ||||
|                             setupconserver=1 | ||||
|  | ||||
| .. TODO | ||||
|  | ||||
| See the setup* attributes in the `node object definition man page | ||||
| <http://localhost/fake_todo>`_  for the services available. (The HTTP server | ||||
| is also started when setupnfs is set.) | ||||
|  | ||||
| If you are using the setupntp postscript on the compute nodes, you should also | ||||
| set setupntp=1. For clusters with subnetted management networks (i.e. the | ||||
| network between the SN and its compute nodes is separate from the network | ||||
| between the MN and the SNs) you might want to also set setupipforward=1. | ||||
|  | ||||
| .. _add_service_node_postscripts_label: | ||||
|  | ||||
| Add Service Node Postscripts | ||||
| ---------------------------- | ||||
|  | ||||
| By default, xCAT defines the service node group to have the "servicenode" | ||||
| postscript run when the SNs are installed or diskless booted. This | ||||
| postscript sets up the xcatd credentials and installs the xCAT software on | ||||
| the service nodes. If you have your own postscript that you want run on the | ||||
| SN during deployment of the SN, put it in ``/install/postscripts`` on the MN | ||||
| and add it to the service node postscripts or postbootscripts. For example: :: | ||||
|  | ||||
|   chdef -t group -p service postscripts=<mypostscript> | ||||
|  | ||||
| Notes: | ||||
|  | ||||
|   * For Red Hat type distros, the postscripts will be run before the reboot | ||||
|     of a kickstart install, and the postbootscripts will be run after the | ||||
|     reboot. | ||||
|   * Make sure that the servicenode postscript is set to run before the | ||||
|     otherpkgs postscript or you will see errors during the service node | ||||
|     deployment. | ||||
|   * The -p flag automatically adds the specified postscript at the end of the | ||||
|     comma-separated list of postscripts (or postbootscripts). | ||||
|  | ||||
| If you are running additional software on the service nodes that need **ODBC** | ||||
| to access the database (e.g. LoadLeveler or TEAL), use this command to add | ||||
| the xCAT supplied postbootscript called "odbcsetup". :: | ||||
|  | ||||
|   chdef -t group -p service postbootscripts=odbcsetup | ||||
|  | ||||
| Assigning Nodes to their Service Nodes | ||||
| -------------------------------------- | ||||
|  | ||||
| The node attributes **servicenode** and **xcatmaster** define which SN | ||||
| services this particular node. The servicenode attribute for a compute node | ||||
| defines which SN the MN should send a command to (e.g. xdsh), and should be | ||||
| set to the hostname or IP address of the service node that the management | ||||
| node contacts it by. The xcatmaster attribute of the compute node defines | ||||
| which SN the compute node should boot from, and should be set to the | ||||
| hostname or IP address of the service node that the compute node contacts it | ||||
| by. Unless you are using service node pools, you must set the xcatmaster | ||||
| attribute for a node when using service nodes, even if it contains the same | ||||
| value as the node's servicenode attribute. | ||||
|  | ||||
| Host name resolution must have been setup in advance, with ``/etc/hosts``, DNS | ||||
| or dhcp to ensure that the names put in this table can be resolved on the | ||||
| Management Node, Service nodes, and the compute nodes. It is easiest to have a  | ||||
| node group of the compute nodes for each service node. For example, if all the  | ||||
| nodes in node group compute1 are serviced by sn1 and all the nodes in node  | ||||
| group compute2 are serviced by sn2: | ||||
|  | ||||
| :: | ||||
|  | ||||
|   chdef -t group compute1 servicenode=sn1 xcatmaster=sn1-c | ||||
|   chdef -t group compute2 servicenode=sn2 xcatmaster=sn2-c | ||||
|  | ||||
| Note: in this example, sn1 and sn2 are the node names of the service nodes  | ||||
| (and therefore the hostnames associated with the NICs that the MN talks to).  | ||||
| The hostnames sn1-c and sn2-c are associated with the SN NICs that communicate  | ||||
| with their compute nodes. | ||||
|  | ||||
| Note: if not set, the attribute tftpserver's default value is xcatmaster, | ||||
| but in some releases of xCAT it has not defaulted correctly, so it is safer | ||||
| to set the tftpserver to the value of xcatmaster. | ||||
|  | ||||
| These attributes will allow you to specify which service node should run the  | ||||
| conserver (console) and monserver (monitoring) daemon for the nodes in the  | ||||
| group specified in the command. In this example, we are having each node's  | ||||
| primary SN also act as its conserver and monserver (the most typical setup). | ||||
| :: | ||||
|  | ||||
|   chdef -t group compute1 conserver=sn1 monserver=sn1,sn1-c | ||||
|   chdef -t group compute2 conserver=sn2 monserver=sn2,sn2-c | ||||
|  | ||||
| Service Node Pools | ||||
| ^^^^^^^^^^^^^^^^^^ | ||||
|  | ||||
| Service Node Pools are multiple service nodes that service the same set of  | ||||
| compute nodes. Having multiple service nodes allows backup service node(s) for  | ||||
| a compute node when the primary service node is unavailable, or can be used  | ||||
| for work-load balancing on the service nodes. But note that the selection of  | ||||
| which SN will service which compute node is made at compute node boot time.  | ||||
| After that, the selection of the SN for this compute node is fixed until the  | ||||
| compute node is rebooted or the compute node is explicitly moved to another SN  | ||||
| using the `snmove <http://localhost/fake_todo>`_  command. | ||||
|  | ||||
| To use Service Node pools, you need to architect your network such that all of  | ||||
| the compute nodes and service nodes in a partcular pool are on the same flat  | ||||
| network. If you don't want the management node to respond to manage some of | ||||
| the compute nodes, it shouldn't be on that same flat network. The  | ||||
| site, dhcpinterfaces attribute should be set such that the SNs' DHCP daemon | ||||
| only listens on the NIC that faces the compute nodes, not the NIC that faces  | ||||
| the MN. This avoids some timing issues when the SNs are being deployed (so  | ||||
| that they don't respond to each other before they are completely ready). You  | ||||
| also need to make sure the `networks <http://localhost/fake_todo>`_ table | ||||
| accurately reflects the physical network structure. | ||||
|  | ||||
| To define a list of service nodes that support a set of compute nodes, set the  | ||||
| servicenode attribute to a comma-delimited list of the service nodes. When  | ||||
| running an xCAT command like xdsh or updatenode for compute nodes, the list  | ||||
| will be processed left to right, picking the first service node on the list to  | ||||
| run the command. If that service node is not available, then the next service  | ||||
| node on the list will be chosen until the command is successful. Errors will  | ||||
| be logged. If no service node on the list can process the command, then the  | ||||
| error will be returned. You can provide some load-balancing by assigning your  | ||||
| service nodes as we do below. | ||||
|  | ||||
| When using service node pools, the intent is to have the service node that  | ||||
| responds first to the compute node's DHCP request during boot also be the  | ||||
| xcatmaster, the tftpserver, and the NFS/http server for that node. Therefore,  | ||||
| the xcatmaster and nfsserver attributes for nodes should not be set. When  | ||||
| nodeset is run for the compute nodes, the service node interface on the  | ||||
| network to the compute nodes should be defined and active, so that nodeset  | ||||
| will default those attribute values to the "node ip facing" interface on that  | ||||
| service node. | ||||
|  | ||||
| For example: :: | ||||
|  | ||||
|   chdef -t node compute1 servicenode=sn1,sn2 xcatmaster="" nfsserver="" | ||||
|   chdef -t node compute2 servicenode=sn2,sn1 xcatmaster="" nfsserver="" | ||||
|  | ||||
| You need to set the sharedtftp site attribute to 0 so that the SNs will not  | ||||
| automatically mount the ``/tftpboot`` directory from the management node: | ||||
| :: | ||||
|  | ||||
|   chdef -t site clustersite sharedtftp=0 | ||||
|  | ||||
| For statefull (full-disk) node installs, you will need to use a local | ||||
| ``/install`` directory on each service node. The ``/install/autoinst/node`` | ||||
| files generated by nodeset will contain values specific to that service node | ||||
| for correctly installing the nodes. | ||||
| :: | ||||
|  | ||||
|   chdef -t site clustersite installloc="" | ||||
|  | ||||
| With this setting, you will need to remember to rsync your ``/install`` | ||||
| directory from the xCAT management node to the service nodes anytime you | ||||
| change your ``/install/postscripts``, custom osimage files, os repositories, | ||||
| or other directories. It is best to exclude the ``/install/autoinst`` directory | ||||
| from this rsync. | ||||
|  | ||||
| :: | ||||
|  | ||||
|   rsync -auv --exclude 'autoinst' /install sn1:/ | ||||
|  | ||||
| Note: If your service nodes are stateless and site.sharedtftp=0, if you reboot  | ||||
| any service node when using servicenode pools, any data written to the local  | ||||
| ``/tftpboot`` directory of that SN is lost. You will need to run nodeset for | ||||
| all of the compute nodes serviced by that SN again. | ||||
|  | ||||
| For additional information about service node pool related settings in the | ||||
| networks table, see ref: networks table, see :ref:`setup_networks_table_label`. | ||||
|  | ||||
| Conserver and Monserver and Pools | ||||
| """"""""""""""""""""""""""""""""" | ||||
|  | ||||
| The support of conserver and monserver with Service Node Pools is still not  | ||||
| supported. You must explicitly assign these functions to a service node using  | ||||
| the nodehm.conserver and noderes.monserver attribute as above. | ||||
|  | ||||
| Setup Site Table | ||||
| ---------------- | ||||
|  | ||||
| If you are not using the NFS-based statelite method of booting your compute  | ||||
| nodes, set the installloc attribute to ``/install``. This instructs the | ||||
| service node to mount ``/install`` from the management node. (If you don't do | ||||
| this, you have to manually sync ``/install`` between the management node and | ||||
| the service nodes.) :: | ||||
|  | ||||
|   chdef -t site  clustersite installloc="/install" | ||||
|  | ||||
| For IPMI controlled nodes, if you want the out-of-band IPMI operations to be  | ||||
| done directly from the management node (instead of being sent to the  | ||||
| appropriate service node), set site.ipmidispatch=n. | ||||
|  | ||||
| If you want to throttle the rate at which nodes are booted up, you can set the  | ||||
| following site attributes: | ||||
|  | ||||
|  | ||||
| * syspowerinterval | ||||
| * syspowermaxnodes | ||||
| * powerinterval (system p only) | ||||
|  | ||||
| See the `site table man page <http://localhost/fack_todo>`_ for details. | ||||
|  | ||||
| .. _setup_networks_table_label: | ||||
|  | ||||
| Setup networks Table | ||||
| -------------------- | ||||
|  | ||||
| All networks in the cluster must be defined in the networks table. When xCAT  | ||||
| is installed, it runs makenetworks, which creates an entry in the networks | ||||
| table for each of the networks the management node is on. You need to add | ||||
| entries for each network the service nodes use to communicate to the compute | ||||
| nodes. | ||||
|  | ||||
| For example: :: | ||||
|  | ||||
|   mkdef -t network net1 net=10.5.1.0 mask=255.255.255.224 gateway=10.5.1.1 | ||||
|  | ||||
| If you want to set the nodes' xcatmaster as the default gateway for the nodes,  | ||||
| the gateway attribute can be set to keyword "<xcatmaster>". In this case, xCAT  | ||||
| code will automatically substitute the IP address of the node's xcatmaster for  | ||||
| the keyword. Here is an example: | ||||
| :: | ||||
|  | ||||
|   mkdef -t network net1 net=10.5.1.0 mask=255.255.255.224 gateway=<xcatmaster> | ||||
|  | ||||
| The ipforward attribute should be enabled on all the xcatmaster nodes that  | ||||
| will be acting as default gateways. You can set ipforward to 1 in the  | ||||
| servicenode table or add the line "net.ipv4.ip_forward = 1" in file  | ||||
| ``/etc/sysctl``.conf and then run "sysctl -p /etc/sysctl.conf" manually to | ||||
| enable the ipforwarding. | ||||
|  | ||||
| Note:If using service node pools, the networks table dhcpserver attribute can  | ||||
| be set to any single service node in your pool. The networks tftpserver, and  | ||||
| nameserver attributes should be left blank. | ||||
|  | ||||
| Verify the Tables | ||||
| -------------------- | ||||
|  | ||||
| To verify that the tables are set correctly, run lsdef on the service nodes, | ||||
| compute1, compute2: :: | ||||
|  | ||||
|   lsdef service,compute1,compute2 | ||||
|  | ||||
| Add additional adapters configuration script (optional) | ||||
| ------------------------------------------------------------ | ||||
|  | ||||
| It is possible to have additional adapter interfaces automatically configured  | ||||
| when the nodes are booted. XCAT provides sample configuration scripts for  | ||||
| ethernet, IB, and HFI adapters. These scripts can be used as-is or they can be  | ||||
| modified to suit your particular environment. The ethernet sample is  | ||||
| ``/install/postscript/configeth``. When you have the configuration script that | ||||
| you want you can add it to the "postscripts" attribute as mentioned above. Make | ||||
| sure your script is in the ``/install/postscripts`` directory and that it is | ||||
| executable. | ||||
|  | ||||
| Note: For system p servers, if you plan to have your service node perform the  | ||||
| hardware control functions for its compute nodes, it is necessary that the SN  | ||||
| ethernet network adapters connected to the HW service VLAN be configured. | ||||
|  | ||||
| Configuring Secondary Adapters | ||||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||||
|  | ||||
| o configure secondary adapters, see `Configuring_Secondary_Adapters | ||||
| <http://localhost/fake_todo>`_ | ||||
|  | ||||
|  | ||||
| @@ -1,2 +0,0 @@ | ||||
| Setting Up a Linux Hierarchical Cluster | ||||
| ======================================= | ||||
| @@ -5,3 +5,14 @@ Hierarchical Clusters | ||||
|    :maxdepth: 2 | ||||
|  | ||||
|    introduction.rst | ||||
|    setup_mn_hierachical_database.rst | ||||
|    define_service_node.rst | ||||
|    configure_dhcp.rst | ||||
|    setup_service_node.rst | ||||
|    service_node_for_diskfull.rst | ||||
|    service_node_for_diskless.rst | ||||
|    test_service_node_installation.rst | ||||
|    appendix_a_setup_backup_service_nodes.rst | ||||
|    appendix_b_diagnostics.rst | ||||
|    appendix_c_migrating_mn_to_sn.rst | ||||
|    appendix_d_set_up_hierarchical_conserver.rst | ||||
|   | ||||
| @@ -1,10 +1,50 @@ | ||||
| Introduction | ||||
| ============ | ||||
|  | ||||
| In supporting large clusters, it is desirable to have more than a single management node handling the installation and management of compute nodes.   | ||||
|  | ||||
| In xCAT, these additional nodes are referred to as *Service Nodes (SN)*.  The management node can delegate all management operations for a compute node to the Service node that is managing them.  You can have one of more service nodes configured to install and manage a group of compute nodes.  | ||||
|  | ||||
| In large clusters, it is desirable to have more than one node (the Management | ||||
| Node - MN) handle the installation and management of the compute nodes. We  | ||||
| call these additional nodes **service nodes (SN)**. The management node can | ||||
| delegate all management operations needed by a compute node to the SN that is | ||||
| managing that compute node. You can have one or more service nodes setting up | ||||
| to install and manage groups of compute nodes. | ||||
|  | ||||
| Service Nodes | ||||
| ------------- | ||||
|  | ||||
| With xCAT, you have the choice of either having each service node  | ||||
| install/manage a specific set of compute nodes, or having a pool of service  | ||||
| nodes, any of which can respond to an installation request from a compute  | ||||
| node. (Service node pools must be aligned with the network broadcast domains,  | ||||
| because the way a compute node choose its SN for that boot is by whoever  | ||||
| responds to the DHCP request broadcast first.) You can also have a hybrid of | ||||
| the 2 approaches, in which for each specific set of compute nodes you have 2  | ||||
| or more SNs in a pool. | ||||
|  | ||||
| Each SN runs an instance of xcatd, just like the MN does. The xcatd daemons | ||||
| communicate with each other using the same XML/SSL protocol that the xCAT  | ||||
| client uses to communicate with xcatd on the MN. | ||||
|  | ||||
| Daemon-based Databases | ||||
| ---------------------- | ||||
|  | ||||
| The service nodes need to communicate with the xCAT database on the Management  | ||||
| Node. They do this by using the remote client capability of the database (i.e.  | ||||
| they don't go through xcatd for that). Therefore the Management Node must be  | ||||
| running one of the daemon-based databases supported by xCAT (PostgreSQL,  | ||||
| MySQL). | ||||
|  | ||||
| The default SQLite database does not support remote clients and cannot be used  | ||||
| in hierarchical clusters. This document includes instructions for migrating  | ||||
| your cluster from SQLite to one of the other databases. Since the initial  | ||||
| install of xCAT will always set up SQLite, you must migrate to a database that  | ||||
| supports remote clients before installing your service nodes. | ||||
|  | ||||
| Setup | ||||
| ----- | ||||
| xCAT will help you install your service nodes as well as install on the SNs | ||||
| xCAT software and other required rpms such as perl, the database client, and | ||||
| other pre-reqs. Service nodes require all the same software as the MN | ||||
| (because it can do all of the same functions), except that there is a special | ||||
| top level xCAT rpm for SNs called xCATsn vs. the xCAT rpm that is on the | ||||
| Management Node. The xCATsn rpm tells the SN that the xcatd on it should | ||||
| behave as an SN, not the MN. | ||||
|   | ||||
							
								
								
									
										156
									
								
								docs/source/advanced/hierarchy/service_node_for_diskfull.rst
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										156
									
								
								docs/source/advanced/hierarchy/service_node_for_diskfull.rst
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,156 @@ | ||||
| .. _setup_service_node_stateful_label: | ||||
|  | ||||
| Set Up the Service Nodes for Stateful (Diskful) Installation (optional) | ||||
| ======================================================================= | ||||
|  | ||||
| Any cluster using statelite compute nodes must use a stateful (diskful) service | ||||
| nodes. | ||||
|  | ||||
| Note: If you are using diskless service nodes, go to | ||||
| :ref:`setup_service_node_stateless_label`. | ||||
|  | ||||
| First, go to the `Download_xCAT <http://localhost/fake_todo>`_ site and | ||||
| download the level of the xCAT tarball you desire. Then go to | ||||
| http://localhost/fake_todo and get the latest xCAT dependency tarball. | ||||
| **Note: All xCAT service nodes must be at the exact same xCAT version as the | ||||
| xCAT Management Node**. Copy the files to the Management Node (MN) and untar | ||||
| them in the appropriate sub-directory of ``/install/post/otherpkgs`` | ||||
|  | ||||
| **Note for the appropriate directory below, check the | ||||
| ``otherpkgdir=/install/post/otherpkgs/rhels6.4/ppc64`` attribute of the | ||||
| osimage defined for the servicenode.** | ||||
|   | ||||
| For example ubuntu14.04.1-ppc64el-install-service **** :: | ||||
|  | ||||
|   mkdir -p /install/post/otherpkgs/ubuntu14.04.1/ppc64el/ | ||||
|   cd /install/post/otherpkgs/ubuntu14.04.1/ppc64el/ | ||||
|   tar jxvf core-rpms-snap.tar.bz2 | ||||
|   tar jxvf xcat-dep-ubuntu*.tar.bz2 | ||||
|  | ||||
| Next, add rpm names into your own version of  | ||||
| service.<osver>.<arch>.otherpkgs.pkglist file. In most cases, you can find an  | ||||
| initial copy of this file under /opt/xcat/share/xcat/install/<platform> . If  | ||||
| not, copy one from a similar platform. | ||||
| :: | ||||
|  | ||||
|   mkdir -p /install/custom/install/ubuntu/ | ||||
|   cp /opt/xcat/share/xcat/install/ubuntu/service.ubuntu.otherpkgs.pkglist/\ | ||||
|      install/custom/install/ubuntu/service.ubuntu.otherpkgs.pkglist | ||||
|   vi /install/custom/install/ubuntu/service.ubuntu.otherpkgs.pkglist | ||||
|  | ||||
| Make sure the following entries are included in the | ||||
| ``/install/custom/install/ubuntu/service.ubuntu.otherpkgs.pkglist``: | ||||
| :: | ||||
|  | ||||
|   mariadb-client | ||||
|   mariadb-common | ||||
|   xcatsn | ||||
|   conserver-xcat | ||||
|  | ||||
| The "pkgdir" should include the online/local ubuntu official mirror with the | ||||
| following command: | ||||
| :: | ||||
|  | ||||
|   chdef -t osimage -o ubuntu14.04.1-ppc64el-install-service \ | ||||
|        -p pkgdir="http://ports.ubuntu.com/ubuntu-ports trusty main, \ | ||||
|                   http://ports.ubuntu.com/ubuntu-ports trusty-updates main, \ | ||||
|                   http://ports.ubuntu.com/ubuntu-ports trusty universe, \ | ||||
|                   http://ports.ubuntu.com/ubuntu-ports trusty-updates universe" | ||||
|  | ||||
| plus the "otherpkgdir" should include the mirror under otherpkgdir on MN, this | ||||
| can be done with:  :: | ||||
|  | ||||
|   chdef -t osimage -o ubuntu14.04.1-ppc64el-install-service -p \ | ||||
|   otherpkgdir="http:// < Name or ip of Management Node > \ | ||||
|   /install/post/otherpkgs/ubuntu14.04.1/ppc64el/xcat-core/ \ | ||||
|   trusty main, http://< Name or ip of Management Node > \ | ||||
|   /install/post/otherpkgs/ubuntu14.04.1/ppc64el/xcat-dep/ trusty main" | ||||
|  | ||||
| **Note: you will be installing the xCAT Service Node rpm xCATsn meta-package | ||||
| on the Service Node, not the xCAT Management Node meta-package. Do not install | ||||
| both.** | ||||
|  | ||||
| Update the rhels6 RPM repository (rhels6 only) | ||||
| ---------------------------------------------- | ||||
|  | ||||
| * This section could be removed after the powerpc-utils-1.2.2-18.el6.ppc64.rpm | ||||
|   is built in the base rhels6 ISO. | ||||
| * The direct rpm download link is: | ||||
|   ftp://linuxpatch.ncsa.uiuc.edu/PERCS/powerpc-utils-1.2.2-18.el6.ppc64.rpm | ||||
| * The update steps are as following: :: | ||||
|  | ||||
|     put the new rpm in the base OS packages | ||||
|     cd /install/rhels6/ppc64/Server/Packages | ||||
|     mv powerpc-utils-1.2.2-17.el6.ppc64.rpm /tmp | ||||
|     cp /tmp/powerpc-utils-1.2.2-18.el6.ppc64.rpm | ||||
|     # make sure that the rpm is be readable by other users | ||||
|     chmod +r powerpc-utils-1.2.2-18.el6.ppc64.rpm | ||||
|  | ||||
|  | ||||
|  | ||||
| * create the repodata | ||||
|  | ||||
| :: | ||||
|  | ||||
|   cd /install/rhels6/ppc64/Server | ||||
|   ls -al repodata/ | ||||
|      total 14316 | ||||
|      dr-xr-xr-x 2 root root    4096 Jul 20 09:34 . | ||||
|      dr-xr-xr-x 3 root root    4096 Jul 20 09:34 .. | ||||
|      -r--r--r-- 1 root root 1305862 Sep 22  2010 20dfb74c144014854d3b16313907ebcf30c9ef63346d632369a19a4add8388e7-other.sqlite.bz2 | ||||
|      -r--r--r-- 1 root root 1521372 Sep 22  2010 57b3c81512224bbb5cebbfcb6c7fd1f7eb99cca746c6c6a76fb64c64f47de102-primary.xml.gz | ||||
|      -r--r--r-- 1 root root 2823613 Sep 22  2010 5f664ea798d1714d67f66910a6c92777ecbbe0bf3068d3026e6e90cc646153e4-primary.sqlite.bz2 | ||||
|      -r--r--r-- 1 root root 1418180 Sep 22  2010 7cec82d8ed95b8b60b3e1254f14ee8e0a479df002f98bb557c6ccad5724ae2c8-other.xml.gz | ||||
|      -r--r--r-- 1 root root  194113 Sep 22  2010 90cbb67096e81821a2150d2b0a4f3776ab1a0161b54072a0bd33d5cadd1c234a-comps-rhel6-Server.xml.gz | ||||
|      **-r--r--r-- 1 root root 1054944 Sep 22  2010 98462d05248098ef1724eddb2c0a127954aade64d4bb7d4e693cff32ab1e463c-comps-rhel6-Server.xml** | ||||
|      -r--r--r-- 1 root root 3341671 Sep 22  2010 bb3456b3482596ec3aa34d517affc42543e2db3f4f2856c0827d88477073aa45-filelists.sqlite.bz2 | ||||
|      -r--r--r-- 1 root root 2965960 Sep 22  2010 eb991fd2bb9af16a24a066d840ce76365d396b364d3cdc81577e4cf6e03a15ae-filelists.xml.gz | ||||
|      -r--r--r-- 1 root root    3829 Sep 22  2010 repomd.xml | ||||
|      -r--r--r-- 1 root root    2581 Sep 22  2010 TRANS.TBL | ||||
|   createrepo -g repodata \ | ||||
|   /98462d05248098ef1724eddb2c0a127954aade64d4bb7d4e693cff32ab1e463c-comps-rhel6-Server.xml | ||||
|  | ||||
|  Note: you should use comps-rhel6-Server.xml with its key as the group file. | ||||
|  | ||||
| Set the node status to ready for installation | ||||
| --------------------------------------------- | ||||
|  | ||||
| Run nodeset to the osimage name defined in the provmethod attribute on your | ||||
| service node. :: | ||||
|  | ||||
|   nodeset service osimage="<osimagename>" | ||||
|  | ||||
| For example :: | ||||
|  | ||||
|   nodeset service osimage="ubuntu14.04.1-ppc64el-install-service" | ||||
|  | ||||
| Initialize network boot to install Service Nodes | ||||
| ------------------------------------------------ | ||||
|  | ||||
| :: | ||||
|  | ||||
|   rnetboot service | ||||
|  | ||||
| Monitor the Installation | ||||
| ------------------------ | ||||
|  | ||||
| Watch the installation progress using either wcons or rcons: :: | ||||
|  | ||||
|   wcons service     # make sure DISPLAY is set to your X server/VNC or | ||||
|   rcons <one-node-at-a-time> | ||||
|   tail -f /var/log/messages | ||||
|  | ||||
| Note: We have experienced one problem while trying to install RHEL6 diskful | ||||
| service node working with SAS disks. The service node cannot reboots from SAS | ||||
| disk after the RHEL6 operating system has been installed. We are waiting for | ||||
| the build with fixes from RHEL6 team, once meet this problem, you need to | ||||
| manually select the SAS disk to be the first boot device and boots from the | ||||
| SAS disk. | ||||
|  | ||||
| Update Service Node Diskfull Image | ||||
| ---------------------------------- | ||||
|  | ||||
| If you need to update the service nodes later on with a new version of xCAT | ||||
| and its dependencies, obtain the new xCAT and xCAT dependencies rpms. | ||||
| (Follow the same steps that were followed in | ||||
| :ref:`setup_service_node_stateful_label`. | ||||
							
								
								
									
										221
									
								
								docs/source/advanced/hierarchy/service_node_for_diskless.rst
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										221
									
								
								docs/source/advanced/hierarchy/service_node_for_diskless.rst
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,221 @@ | ||||
| .. _setup_service_node_stateless_label: | ||||
|  | ||||
| Setup the Service Node for Stateless Deployment (optional) | ||||
| ========================================================== | ||||
|  | ||||
| **Note: The stateless service node is not supported in ubuntu hierarchy | ||||
| cluster. For ubuntu, please skip this section.** | ||||
|  | ||||
| If you want, your service nodes can be stateless (diskless). The service node | ||||
| must contain not only the OS, but also the xCAT software and its dependencies. | ||||
| In addition, a number of files are added to the service node to support the | ||||
| PostgreSQL, or MySQL database access from the service node to the Management | ||||
| node, and ssh access to the nodes that the service nodes services. | ||||
| The following sections explain how to accomplish this. | ||||
|  | ||||
|  | ||||
| Build the Service Node Diksless Image | ||||
| -------------------------------------- | ||||
|  | ||||
| This section assumes you can build the stateless image on the management node | ||||
| because the service nodes are the same OS and architecture as the management | ||||
| node. If this is not the case, you need to build the image on a machine that | ||||
| matches the service node's OS architecture. | ||||
|  | ||||
| * Create an osimage definition. When you run copycds, xCAT will create a | ||||
|   service node osimage definitions for that distribution. For a stateless | ||||
|   service node, use the *-netboot-service definition. | ||||
|  | ||||
|   :: | ||||
|  | ||||
|     lsdef -t osimage | grep -i service | ||||
|     rhels6.4-ppc64-install-service  (osimage) | ||||
|     rhels6.4-ppc64-netboot-service  (osimage) | ||||
|     rhels6.4-ppc64-statelite-service  (osimage) | ||||
|  | ||||
|     lsdef -t osimage -l rhels6.3-ppc64-netboot-service | ||||
|     Object name: rhels6.3-ppc64-netboot-service | ||||
|         exlist=/opt/xcat/share/xcat/netboot/rh/service.exlist | ||||
|         imagetype=linux | ||||
|         osarch=ppc64 | ||||
|         osdistroname=rhels6.3-ppc64 | ||||
|         osname=Linux | ||||
|         osvers=rhels6.3 | ||||
|         otherpkgdir=/install/post/otherpkgs/rhels6.3/ppc64 | ||||
|         otherpkglist=/opt/xcat/share/xcat/netboot/rh/service.rhels6.ppc64.otherpkgs.pkglist | ||||
|         pkgdir=/install/rhels6.3/ppc64 | ||||
|         pkglist=/opt/xcat/share/xcat/netboot/rh/service.rhels6.ppc64.pkglist | ||||
|         postinstall=/opt/xcat/share/xcat/netboot/rh/service.rhels6.ppc64.postinstall | ||||
|         profile=service | ||||
|         provmethod=netboot | ||||
|         rootimgdir=/install/netboot/rhels6.3/ppc64/service | ||||
|  | ||||
| * You can check the service node packaging to see if it has all the rpms you | ||||
|   require. We ship a basic requirements lists that will create a fully | ||||
|   functional service node. However, you may want to customize your service | ||||
|   node by adding additional operating system packages or modifying the files | ||||
|   excluded by the exclude list. View the files referenced by the osimage | ||||
|   pkglist, otherpkglist and exlist attributes: | ||||
|  | ||||
|   :: | ||||
|  | ||||
|     cd /opt/xcat/share/xcat/netboot/rh/ | ||||
|     view service.rhels6.ppc64.pkglist | ||||
|     view service.rhels6.ppc64.otherpkgs.pkglist | ||||
|     view service.exlist | ||||
|  | ||||
|   If you would like to change any of these files, copy them to a custom | ||||
|   directory. This can be any directory you choose, but we recommend that you | ||||
|   keep it /install somewhere. A good location is something like | ||||
|   ``/install/custom/netboot/<os>/service``. Make sure that your | ||||
|   ``otherpkgs.pkglist`` file as an entry for | ||||
|  | ||||
|   :: | ||||
|  | ||||
|     xcat/xcat-core/xCATsn | ||||
|  | ||||
|   This is required to install the xCAT service node function into your image. | ||||
|  | ||||
|   You may also choose to create an appropriate /etc/fstab file in your | ||||
|   service node image. Copy the script referenced by the postinstall | ||||
|   attribute to your directory and modify it as you would like: | ||||
|  | ||||
|   :: | ||||
|  | ||||
|     cp /opt/xcat/share/xcat/netboot/rh/service.rhels6.ppc64.postinstall | ||||
|     /install/custom/netboot/rh | ||||
|     vi /install/custom/netboot/rh | ||||
|     # uncomment the sample fstab lines and change as needed: | ||||
|     proc /proc proc rw 0 0 | ||||
|     sysfs /sys sysfs rw 0 0 | ||||
|     devpts /dev/pts devpts rw,gid=5,mode=620 0 0 | ||||
|     service_x86_64 / tmpfs rw 0 1 | ||||
|     none /tmp tmpfs defaults,size=10m 0 2 | ||||
|     none /var/tmp tmpfs defaults,size=10m 0 2 | ||||
|  | ||||
|   After modifying the files, you will need to update the osimage definition to | ||||
|   reference these files. We recommend creating a new osimage definition for | ||||
|   your custom image: :: | ||||
|  | ||||
|     lsdef -t osimage -l rhels6.3-ppc64-netboot-service -z > /tmp/myservice.def | ||||
|     vi /tmp/myservice.def | ||||
|     # change the name of the osimage definition | ||||
|     # change any attributes that now need to reference your custom files | ||||
|     # change the rootimgdir attribute replacing 'service' | ||||
|     with a name to match your new osimage definition | ||||
|     cat /tmp/msyservice.def | mkdef -z | ||||
|  | ||||
|   While you are here, if you'd like, you can do the same for your compute node | ||||
|   images, creating custom files and new custom osimage definitions as you need | ||||
|   to. | ||||
|  | ||||
|   For more information on the use and syntax of otherpkgs and pkglist files, | ||||
|   see `Update Service Node Stateless Image <http://localhost/fake_todo>`_ | ||||
|  | ||||
| * Make your xCAT software available for otherpkgs processing | ||||
|  | ||||
| * If you downloaded xCAT to your management node for installation, place a | ||||
|   copy of your xcat-core and xcat-dep in your otherpkgdir directory :: | ||||
|  | ||||
|     lsdef -t osimage -o rhels6.3-ppc64-netboot-service -i otherpkgdir | ||||
|     Object name: rhels6.3-ppc64-netboot-service | ||||
|     otherpkgdir=/install/post/otherpkgs/rhels6.3/ppc64 | ||||
|     cd /install/post/otherpkgs/rhels6.3/ppc64 | ||||
|     mkdir xcat | ||||
|     cd xcat | ||||
|     cp -Rp <current location of xcat-core>/xcat-core | ||||
|     cp -Rp <current location of xcat-dep>/xcat-dep | ||||
|  | ||||
| * If you installed your management node directly from the Linux online | ||||
|   repository, you will need to download the xcat-core and xcat-dep tarballs | ||||
|  | ||||
|   - Go to the `Download xCAT page  <http://localhost/fake_todo>`_ and download | ||||
|     the level of xCAT tarball you desire. | ||||
|   - Go to the `Download xCAT Dependencies  <http://localhost/fake_todo>`_ page | ||||
|     and download the latest xCAT dependency tarball. Place these into your | ||||
|     otherpkdir directory: | ||||
|  | ||||
|     :: | ||||
|  | ||||
|       lsdef -t osimage -o rhels6.3-ppc64-netboot-service -i otherpkgdir | ||||
|       Object name: rhels6.3-ppc64-netboot-service | ||||
|           otherpkgdir=/install/post/otherpkgs/rhels6.3/ppc64 | ||||
|       cd /install/post/otherpkgs/rhels6.3/ppc64 | ||||
|       mkdir xcat | ||||
|       cd xcat | ||||
|       mv <xcat-core tarball>  . | ||||
|       tar -jxvf <xcat-core tarball> | ||||
|       mv <xcat-dep tarball>   . | ||||
|       tar -jxvf <xcat-dep tarball> | ||||
|  | ||||
| * Run image generation for your osimage definition: | ||||
|  | ||||
|   :: | ||||
|  | ||||
|       genimage rhels6.3-ppc64-netboot-service | ||||
|  | ||||
| * Prevent DHCP from starting up until xcatd has had a chance to configure it: | ||||
|  | ||||
|   :: | ||||
|  | ||||
|     chroot /install/netboot/rhels6.3/ppc64/service/rootimg chkconfig dhcpd off | ||||
|     chroot /install/netboot/rhels6.3/ppc64/service/rootimg chkconfig dhcrelay off | ||||
|  | ||||
| * IF using NFS hybrid mode, export /install read-only in service node image: | ||||
|  | ||||
|   :: | ||||
|  | ||||
|     cd /install/netboot/rhels6.3/ppc64/service/rootimg/etc | ||||
|     echo '/install *(ro,no_root_squash,sync,fsid=13)' >exports | ||||
|  | ||||
| * Pack the image for your osimage definition: | ||||
|  | ||||
|   :: | ||||
|  | ||||
|     packimage rhels6.3-ppc64-netboot-service | ||||
|  | ||||
| * Set the node status to ready for netboot using your osimage definition and | ||||
|   your 'service' nodegroup: | ||||
|  | ||||
|   :: | ||||
|  | ||||
|     nodeset service osimage=rhels6.3-ppc64-netboot-service | ||||
|  | ||||
| *  To diskless boot the service nodes | ||||
|  | ||||
|   :: | ||||
|  | ||||
|     rnetboot service | ||||
|  | ||||
| Update Service Node Stateless Image | ||||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||||
|  | ||||
| To update the xCAT software in the image at a later time: | ||||
|  | ||||
|   * Download the updated xcat-core and xcat-dep tarballs and place them in | ||||
|     your osimage's otherpkgdir xcat directory as you did above. | ||||
|   * Generate and repack the image and reboot your service node. | ||||
|   * Run image generation for your osimage definition. | ||||
|  | ||||
|   :: | ||||
|  | ||||
|     genimage rhels6.3-ppc64-netboot-service | ||||
|     packimage rhels6.3-ppc64-netboot-service | ||||
|     nodeset service osimage=rhels6.3-ppc64-netboot-service | ||||
|     rnetboot service | ||||
|  | ||||
| Note: The service nodes are set up as NFS-root servers for the compute nodes. | ||||
| Any time changes are made to any compute image on the mgmt node it will be | ||||
| necessary to sync all changes to all service nodes. In our case the | ||||
| ``/install`` directory is mounted on the servicenodes, so the update to the | ||||
| compute node image is automatically available. | ||||
|  | ||||
| Monitor install and boot | ||||
| ------------------------ | ||||
|  | ||||
| :: | ||||
|  | ||||
|     wcons service # make sure DISPLAY is set to your X server/VNC or | ||||
|     rcons <one-node-at-a-time> # or do rcons for each node | ||||
|     tail -f /var/log/messages | ||||
|  | ||||
| @@ -0,0 +1,27 @@ | ||||
| Setup the MN Hierarchical Database | ||||
| ================================== | ||||
|  | ||||
| Before setting up service nodes, you need to set up either MySQL, PostgreSQL, | ||||
| as the xCAT Database on the Management Node. The database client on the | ||||
| Service Nodes will be set up later when the SNs are installed. MySQL and | ||||
| PostgreSQL are available with the Linux OS. | ||||
|  | ||||
| Follow the instructions in one of these documents for setting up the | ||||
| Management node to use the selected database: | ||||
|  | ||||
| MySQL or MariaDB | ||||
| ---------------- | ||||
|  | ||||
| * Follow this documentation and be sure to use the xCAT provided mysqlsetup | ||||
|   command to setup the database for xCAT: | ||||
|   .. TODO http link | ||||
|  | ||||
|   - `Setting_Up_MySQL_as_the_xCAT_DB <http://localhost/fake_todo>`_ | ||||
|  | ||||
| PostgreSQL: | ||||
| ----------- | ||||
| * Follow this documentation and be sure and use the xCAT provided pgsqlsetup | ||||
|   command to setup the database for xCAT: | ||||
|   .. TODO http link | ||||
|  | ||||
|   - `Setting_Up_PostgreSQL_as_the_xCAT_DB <http://localhost/fake_todo>`_ | ||||
| @@ -0,0 +1,27 @@ | ||||
| Setup the MN Hierarchical Database | ||||
| ================================== | ||||
|  | ||||
| Before setting up service nodes, you need to set up either MySQL, PostgreSQL, | ||||
| as the xCAT Database on the Management Node. The database client on the | ||||
| Service Nodes will be set up later when the SNs are installed. MySQL and | ||||
| PostgreSQL are available with the Linux OS. | ||||
|  | ||||
| Follow the instructions in one of these documents for setting up the | ||||
| Management node to use the selected database: | ||||
|  | ||||
| MySQL or MariaDB | ||||
| ---------------- | ||||
|  | ||||
| * Follow this documentation and be sure to use the xCAT provided mysqlsetup | ||||
|   command to setup the database for xCAT: | ||||
|   .. TODO http link | ||||
|  | ||||
|   - `Setting_Up_MySQL_as_the_xCAT_DB <http://localhost/fake_todo>`_ | ||||
|  | ||||
| PostgreSQL: | ||||
| ----------- | ||||
| * Follow this documentation and be sure and use the xCAT provided pgsqlsetup | ||||
|   command to setup the database for xCAT: | ||||
|   .. TODO http link | ||||
|  | ||||
|   - `Setting_Up_PostgreSQL_as_the_xCAT_DB <http://localhost/fake_todo>`_ | ||||
							
								
								
									
										6
									
								
								docs/source/advanced/hierarchy/setup_service_node.rst
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										6
									
								
								docs/source/advanced/hierarchy/setup_service_node.rst
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,6 @@ | ||||
| Setup Service Node | ||||
| ================== | ||||
|  | ||||
| * Follow this documentation to :ref:`setup_service_node_stateful_label`. | ||||
|  | ||||
| * Follow this documentation to :ref:`setup_service_node_stateless_label`. | ||||
| @@ -0,0 +1,11 @@ | ||||
| Test Service Node installation | ||||
| ============================== | ||||
|  | ||||
| * ssh to the service nodes. You should not be prompted for a password. | ||||
| * Check to see that the xcat daemon xcatd is running. | ||||
| * Run some database command on the service node, e.g tabdump site, or nodels, | ||||
|   and see that the database can be accessed from the service node. | ||||
| * Check that ``/install`` and ``/tftpboot`` are mounted on the service node | ||||
|   from the Management Node, if appropriate. | ||||
| * Make sure that the Service Node has Name resolution for all nodes, it will | ||||
|   service. | ||||
		Reference in New Issue
	
	Block a user