diff --git a/docs/source/advanced/hamn/high_available_management_node.rst b/docs/source/advanced/hamn/high_available_management_node.rst index eb1299a6f..7da41db12 100644 --- a/docs/source/advanced/hamn/high_available_management_node.rst +++ b/docs/source/advanced/hamn/high_available_management_node.rst @@ -1,10 +1,3 @@ -Overview -======== - -The xCAT management node plays an important role in the cluster, if the management node is down for whatever reason, the administrators will lose the management capability for the whole cluster, until the management node is back up and running. In some configuration, like the Linux nfs-based statelite in a non-hierarchy cluster, the compute nodes may not be able to run at all without the management node. So, it is important to consider the high availability for the management node. - -The goal of the HAMN(High Availability Management Node) configuration is, when the primary xCAT management node fails, the standby management node can take over the role of the management node, either through automatic failover or through manual procedure performed by the administrator, and thus avoid long periods of time during which your cluster does not have active cluster management function available. - Configuration considerations ============================ @@ -16,9 +9,9 @@ Data synchronization mechanism The data synchronization is important for any high availability configuration. When the xCAT management node failover occurs, the xCAT data needs to be exactly the same before failover, and some of the operating system configuration should also be synchronized between the two management nodes. To be specific, the following data should be synchronized between the two management nodes to make the xCAT HAMN work: * xCAT database -* xCAT configuration files, like /etc/xcat, ~/.xcat, /opt/xcat +* xCAT configuration files, like ``/etc/xcat``, ``~/.xcat``, ``/opt/xcat`` * The configuration files for the services that are required by xCAT, like named, DHCP, apache, nfs, ssh, etc. -* The operating systems images repository and users customization data repository, the /install directory contains these repositories in most cases. +* The operating systems images repository and users customization data repository, the ``/install`` directory contains these repositories in most cases. There are a lot of ways for data syncronization, but considering the specific xCAT HAMN requirements, only several of the data syncronziation options are practical for xCAT HAMN. @@ -47,7 +40,7 @@ The configuration for the high availability applications is usually complex, it **3\. Maintenance effort** -The automatic failover brings in several high availability applications, after the initial configuration is done, additional maintenance effort will be needed. For example, taking care of the high availability applications during cluster update, the updates for the high availability applications themselves, trouble shooting any problems with the high availability applications. A simple question may be able to help you to decide: could you get technical support if some of the high availability applications run into problems? All software has bugs ... +The automatic failover brings in several high availability applications, after the initial configuration is done, additional maintenance effort will be needed. For example, taking care of the high availability applications during cluster update, the updates for the high availability applications themselves, trouble shooting any problems with the high availability applications. A simple question may be able to help you to decide: could you get technical support if some of the high availability applications run into problems? All software has bugs. Configuration Options ===================== @@ -59,7 +52,17 @@ The combinations of data synchronization mechanism and manual/automatic failover +-------------------+-------------------------+-----------------+--------------+ |Manual Failover | **1** | **2** | 3 | +-------------------+-------------------------+-----------------+--------------+ -|Automatic Failover | 4 | 5 | **6** | +|Automatic Failover | 4 | **5** | **6** | +-------------------+-------------------------+-----------------+--------------+ +Option 1, :ref:`setup_ha_mgmt_node_with_raid1_and disks_move` +Option 2, :ref:`setup_ha_mgmt_node_with_shared_data` + +Option 3, it is doable but not currently supported. + +Option 4, it is not practical. + +Option 5, :ref:`setup_xcat_high_available_management_node_with_nfs` + +Option 6, :ref:`setup_ha_mgmt_node_with_drbd_pacemaker_corosync` diff --git a/docs/source/advanced/hamn/index.rst b/docs/source/advanced/hamn/index.rst index 3bcf0a6b3..649ac74e0 100644 --- a/docs/source/advanced/hamn/index.rst +++ b/docs/source/advanced/hamn/index.rst @@ -1,6 +1,11 @@ High Avaiability ================ +The xCAT management node plays an important role in the cluster, if the management node is down for whatever reason, the administrators will lose the management capability for the whole cluster, until the management node is back up and running. In some configuration, like the Linux nfs-based statelite in a non-hierarchy cluster, the compute nodes may not be able to run at all without the management node. So, it is important to consider the high availability for the management node. + +The goal of the HAMN(High Availability Management Node) configuration is, when the primary xCAT management node fails, the standby management node can take over the role of the management node, either through automatic failover or through manual procedure performed by the administrator, and thus avoid long periods of time during which your cluster does not have active cluster management function available. + + The following pages describes ways to configure the xCAT Management Node for High Availbility. .. toctree:: diff --git a/docs/source/advanced/hamn/setup_ha_mgmt_node_with_drbd_pacemaker_corosync.rst b/docs/source/advanced/hamn/setup_ha_mgmt_node_with_drbd_pacemaker_corosync.rst new file mode 100644 index 000000000..8b794e5d0 --- /dev/null +++ b/docs/source/advanced/hamn/setup_ha_mgmt_node_with_drbd_pacemaker_corosync.rst @@ -0,0 +1,1533 @@ +.. _setup_ha_mgmt_node_with_drbd_pacemaker_corosync: + +Setup HA Mgmt Node With DRBD Pacemaker Corosync +================================================ + +This documentation illustrates how to setup a second management node, or standby management node. **Pacemaker** and **Corosync** are only support ``x86_64`` systems. In your cluster to provide high availability management capability, using several high availability products: + +* **DRBD** http://www.drbd.org/ for data replication between the two management nodes. + +* **drbdlinks** http://www.tummy.com/Community/software/drbdlinks/ to manage symbolic links from the configuration directories to DRBD storage device. + +* **Pacemaker** http://www.clusterlabs.org/ to manage the cluster resources. Note: in RHEL 6.4 and above, the Pacemaker component ``crm`` is replaced with ``pcs``, the Pacemaker configuration on RHEL 6.4 is a little bit different. A sample Pacemaker configuration through the new ``pcs`` is listed in the Appendix A and Appendix B , Appendix A show the configuration with the same corosync and new pcs, Appendix B shows the configuration with ``cman`` and ``pcs``. ``cman`` and ``ccs`` are the preferred and supported HA tools provided from RHEL 6.5 and above corosync is likely to be phased out. This configuration was contributed by some community user, it has not been formally tested by xCAT team, so use this at your own risk. +* **Corosync** http://www.corosync.org for messaging level communication between the two management nodes. + +When the primary xCAT management node fails, the standby management node can take over the role of management node automatically, and thus avoid periods of time during which your cluster does not have active cluster management function available. + +The nfs service on the primary management node or the primary management node itself will be shutdown during the failover process, so any NFS mount or other network connections from the compute nodes to the management node will be temporarily disconnected during the failover process. If the network connectivity is required for compute node run-time operations, you should consider some other way to provide high availability for the network services unless the compute nodes can also be taken down during the failover process. This also implies: + +#. This HAMN approach is primarily intended for clusters in which the management node manages diskful nodes or linux stateless nodes. This also includes hierarchical clusters in which the management node only directly manages the diskful or linux stateless service nodes, and the compute nodes managed by the service nodes can be of any type. + +#. If the nodes use only readonly nfs mounts from the MN management node, then you can use this doc as long as you recognize that your nodes will go down while you are failing over to the standby management node. + +Setting up HAMN can be done at any time during the life cycle of the cluster, in this documentation we assume the HAMN setup is done from the very beginning of the xCAT cluster setup, there will be some minor differences if the HAMN setup is done from the middle of the xCAT cluster setup. + +Configuration Requirements +========================== + +#. xCAT HAMN requires that the operating system version, xCAT version and database version all be identical on the two management nodes. + +#. The hardware type/model are not required to be identical on the two management nodes, but it is recommended to have similar hardware capability on the two management nodes to support the same operating system and have similar management capability. + +#. Since the management node needs to provide IP services through broadcast such as DHCP to the compute nodes, the primary management node and standby management node should be in the same subnet to ensure the network services will work correctly after failover. + +#. Network connections between the two management nodes: there are several networks defined in the general cluster configuration strucutre, like cluster network, management network and service network; the two management nodes should be in all of these networks(if exist at all). Besides that, it is recommended, though not strictly required, to use a direct, back-to-back, Gigabit Ethernet or higher bandwidth connection for the DRBD, Pacemaker and Corosync communication between the two management nodes. If the connection is run over switches, use of redundant components and the bonding driver (in active-backup mode) is recommended. + +``Note``: A crossover Ethernet cable is required to setup the direct, back-to-back, Ethernet connection between the two management nodes, but with most of the current hardware, a normal Ethernet cable can also work, the Ethernet adapters will internally handle the crossover bit. Hard disk for DRBD: DRBD device can be setup on a partition of the disk that the operating system runs on, but it is recommended to use a separate standalone disk or RAID/Multipath disk for DRBD configuration. + +Examples in this doc +==================== + +The examples in this documentation are based on the following cluster environment: :: + + Virtual login ip address: 9.114.34.4 + Virtual cluster IP alias address: 10.1.0.1 + Primary Management Node: x3550m4n01(10.1.0.221), netmask is 255.255.255.0. Running x86_64 RHEL 6.3 and MySQL 5.1.61. + Standby Management Node: x3550m4n02(10.1.0.222), netmask is 255.255.255.0. Running x86_64 RHEL 6.3 and MySQL 5.1.61. + +The dedicated direct, back-to-back Gigabit Ethernet connection between the two management nodes for ``DRBD``, ``Pacemaker`` and ``Corosync``: :: + + On Primary Management Node: 10.12.0.221 + On Standby Management Node: 10.12.0.222 + +You need to substitute the hostnames and ip address with your own values when setting up your HAMN environment. + +Get the RPM packages +==================== + +You have several options to get the RPM packages for ``DRBD``, ``drbdlinks``, ``pacemaker`` and ``corosync``: + +#. Operating system repository: some of these packages are shipped with the operating system itself, in this case, you can simply install the packages from the operating system repository. For example, RHEL 6.3 ships ``pacemaker`` and ``corosync``. + +#. Application website: the application website will usually provides download links for the precompiled RPM packages for some operating systems, you can download the RPM packages from the application website also. + +#. Compile from source code: if none of the options work for some specific applications, you will have to compile RPMs from the source code. You can compile these packages on one of the management node or on a separate build machine with the same arch and operating system with the management nodes. Here are the instructions for compiling the RPM packages from source code: + +Before compiling the RPM packages, you need to install some compling tools like gcc, make, glibc, rpm-build. :: + + yum groupinstall "Development tools" + yum install libxslt libxslt-devel + +DRBD +---- + +DRBD binary RPMs heavily depend on the kernel version running on the machine, so it is very likely that you need to compile DRBD on your own. An exception is that DRBD is shipped with SLES 11 SPx High Availability extension, you can download the pre-compiled RPMs from SuSE website, see more details at `https://www.suse.com/products/highavailability/`. + +#. Download the latest drbd source code tar ball: :: + + wget http://oss.linbit.com/drbd/8.4/drbd-8.4.2.tar.gz + +#. Uncompress the source code tar ball: :: + + tar zxvf drbd-8.4.2.tar.gz + +#. Make the RPM packages: :: + + cd drbd-8.4.2 + mkdir -p /root/rpmbuild/SOURCES/ + mkdir -p /root/rpmbuild/RPMS/ + mkdir -p /root/rpmbuild/SPECS/ + ./configure + make rpm + make km-rpm + + After the procedure above is finished successfully, all the ``DRBD`` packages are under directory ``/root/rpmbuild/RPMS/x86_64/``: :: + + [root@x3550m4n01 ~]# ls /root/rpmbuild/RPMS/x86_64/drbd* + /root/rpmbuild/RPMS/x86_64/drbd-8.4.2-2.el6.x86_64.rpm + /root/rpmbuild/RPMS/x86_64/drbd-bash-completion-8.4.2-2.el6.x86_64.rpm + /root/rpmbuild/RPMS/x86_64/drbd-debuginfo-8.4.2-2.el6.x86_64.rpm + /root/rpmbuild/RPMS/x86_64/drbd-heartbeat-8.4.2-2.el6.x86_64.rpm + /root/rpmbuild/RPMS/x86_64/drbd-km-2.6.32_279.el6.x86_64-8.4.2-2.el6.x86_64.rpm + /root/rpmbuild/RPMS/x86_64/drbd-km-debuginfo-8.4.2-2.el6.x86_64.rpm + /root/rpmbuild/RPMS/x86_64/drbd-pacemaker-8.4.2-2.el6.x86_64.rpm + /root/rpmbuild/RPMS/x86_64/drbd-udev-8.4.2-2.el6.x86_64.rpm + /root/rpmbuild/RPMS/x86_64/drbd-utils-8.4.2-2.el6.x86_64.rpm + /root/rpmbuild/RPMS/x86_64/drbd-xen-8.4.2-2.el6.x86_64.rpm + +drbdlinks +--------- + +The ``drbdlinks`` provides a RPM that could be installed on most of the hardware platform and operating system, it could be downloaded from ``ftp://ftp.tummy.com/pub/tummy/drbdlinks/``, so there is no need to compile ``drbdlinks`` in most cases. + +Pacemaker +--------- + +Pacemaker ships as part of all recent Fedora, openSUSE, and SLES(in High Availability Extension). And the project also makes the latest binaries available for Fedora, openSUSE, and EPEL-compatible distributions (RHEL, CentOS, Scientific Linux, etc). So there is no need to compile Pacemaker in most cases. + +``Note``: if you choose to use heartbeat instead of corosync in your configuration for whatever reason, you will need to compile the Pacemaker from source code, the version shipped with operating system might not provide all you need. + +Corosync +-------- + +The Corosync is shipped with all recent Fedora, openSUSE, and SLES(in High Availability Extension), so there is no need to compile Pacemaker in most cases. + +Setup xCAT on the Primary Management Node +========================================= + +Most of the xCAT data will eventually be put on the shared DRBD storage, but you might want to keep a copy of xCAT data on the local disks on the two management nodes, with this local copy, you could get at least one usable management node even if severe problems occur with the HA configuration and the DRBD data is not available any more, although this is unlikely to happen, it does not hurt anything to keep this local copy. + +So, in this documentation, we will setup xCAT on both management nodes before we setup DRBD, just to keep the local copies of xCAT data. If you do NOT want to keep these local copies, swap the "Configure DRBD" section with this section, then you will have all the xCAT data on shared DRBD storage. + +#. Set up the ``Virtual IP address``. The xcatd daemon should be addressable with the same ``Virtual IP address``, regardless of which management node it runs on. The same ``Virtual IP address`` will be configured as an alias IP address on the management node (primary and standby) that the xcatd runs on. The Virtual IP address can be any unused ip address that all the compute nodes and service nodes could reach. Here is an example on how to configure Virtual IP on Linux: :: + + ifconfig eth2:0 10.1.0.1 netmask 255.255.255.0 + + Since ifconfig will not make the ip address configuration be persistent through reboots, so the Virtual IP address needs to be re-configured right after the management node is rebooted. This non-persistent Virtual IP address is designed to avoid ip address conflict when the crashed previous primary management node is recovered with the Virtual IP address configured. + +#. Add the alias ip address into the ``/etc/resolv.conf`` as the nameserver. Change the hostname resolution order to be using ``/etc/hosts`` before using name server, change to "hosts: files dns" in ``/etc/nsswitch.conf``. + +#. Install xCAT. The procedure described in :doc:`xCAT Install Guide <../../guides/install-guides/index>` should be used for the xCAT setup on the primary management node. + +#. Change the site table master and nameservers and network tftpserver attribute to the Virtual ip : :: + + tabdump site + + If not correct: :: + + chdef -t site master=10.1.0.1 + chdef -t site nameservers=10.1.0.1 + chdef -t network 10_1_0_0-255_255_255_0 tftpserver=10.1.0.1 + +#. Install and configure MySQL. MySQL will be used as the xCAT database system, please refer to the doc [ **todo** Setting_Up_MySQL_as_the_xCAT_DB]. + + Verify xcat is running on MySQL by running: :: + + lsxcatd -a + +#. Add the virtual cluster ip into the MySQL access list: :: + + [root@x3550m4n01 var]$mysql -u root -p + Enter password: + Welcome to the MySQL monitor. Commands end with ; or \g. + Your MySQL connection id is 11 + Server version: 5.1.61 Source distribution + + Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved. + + Oracle is a registered trademark of Oracle Corporation and/or its + affiliates. Other names may be trademarks of their respective + owners. + + Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. + + mysql> + mysql> GRANT ALL on xcatdb.* TO xcatadmin@10.1.0.1 IDENTIFIED BY 'cluster'; + Query OK, 0 rows affected (0.00 sec) + + mysql> SELECT host, user FROM mysql.user; + +------------+-----------+ + | host | user | + +------------+-----------+ + | % | xcatadmin | + | 10.1.0.1 | xcatadmin | + | 10.1.0.221 | xcatadmin | + | 127.0.0.1 | root | + | localhost | | + | localhost | root | + | x3550m4n01 | | + | x3550m4n01 | root | + +------------+-----------+ + 8 rows in set (0.00 sec) + + mysql> quit + Bye + [root@x3550m4n01 var]$ + +#. Make sure the ``/etc/xcat/cfgloc`` points to the virtual ip address + + The ``/etc/xcat/cfglog`` should point to the virtual ip address, here is an example: :: + + mysql:dbname=xcatdb;host=10.1.0.1|xcatadmin|cluster + +#. Continue with the nodes provisioning configuration using the primary management node + + Follow the corresponding xCAT docs to continue with the nodes provisioning configuration using the primary management node, including hardware discovery, configure hardware control, configure DNS, configure DHCP, configure conserver, create os image, run nodeset. It is recommended not to start the real os provisioning process until the standby management node setup and HA configuration are done. + + ``Note``: If there are service nodes configured in the cluster, when running makeconservercf to configure conserver, both the virtual ip address and physical ip addresses configured on both management nodes need to be added to the trusted hosts list in conserver, use the command like this: :: + + makeconservercf -t ,, + +Setup xCAT on the Standby Management Node +========================================= + +#. Copy the following files from primary management node: :: + + /etc/resolv.conf + /etc/hosts + /etc/nsswitch.conf. + +#. Install xCAT. The procedure described in :doc:`xCAT Install Guide <../../guides/install-guides/index>` should be used for the xCAT setup on the standby management node. + +#. Install and configure MySQL. MySQL will be used as the xCAT database system, please refer to the doc [Setting_Up_MySQL_as_the_xCAT_DB]. + + Verify xcat is running on MySQL by running: :: + + lsxcatd -a + +#. Copy the xCAT database from primary management node + + On primary management node: :: + + dumpxCATdb -p /tmp/xcatdb + scp -r /tmp/xcatdb x3550m4n02:/tmp/ + + On the standby management node: :: + + restorexCATdb -p /tmp/xcatdb + +#. Setup hostname resolution between the primary management node and standby management node. Make sure the primary management node can resolve the hostname of the standby management node, and vice versa. + +#. Setup ssh authentication between the primary management node and standby management node. It should be setup as "passwordless ssh authentication" and it should work in both directions. The summary of this procedure is: + + a. cat keys from ``/.ssh/id_rsa.pub`` on the primary management node and add them to ``/.ssh/authorized_keys`` on the standby management node. Remove the standby management node entry from ``/.ssh/known_hosts`` on the primary management node prior to issuing ssh to the standby management node. + + b. cat keys from ``/.ssh/id_rsa.pub`` on the standby management node and add them to ``/.ssh/authorized_keys`` on the primary management node. Remove the primary management node entry from ``/.ssh/known_hosts`` on the standby management node prior to issuing ssh to the primary management node. + +#. Make sure the time on the primary management node and standby management node is synchronized. + + Now, do a test reboot on each server, one at a time. This is a sanity check, so that if you have an issue later, you know that it was working before you started. Do NOT skip this step. + +Install DRBD, drbdlinks, Pacemaker and Corosync on both management nodes +======================================================================== + +To avoid RPM dependency issues, it is recommended to use ``yum/zypper`` install the RPMs of DRBD, drbdlinks, Pacemaker and Corosync, here is an example: + +#. Put all of these RPM packages into a directory, for example ``/root/hamn/packages`` + +#. Add a new repository: + + * **[RedHat]**: :: + + [hamn-packages] + name=HAMN Packages + baseurl=file:///root/hamn/packages + enabled=1 + gpgcheck=0 + + * **[SLES]**: :: + + zypper ar file:///root/hamn/packages + +#. Install the packages: + + * **[RedHat]**: :: + + yum install drbd drbd-bash-completion drbd-debuginfo drbd-km drbd-km-debuginfo drbd-pacemaker drbd-utils drbd-xen drbd-heartbeat + yum install drbdlinks + yum install pacemaker pacemaker-cli pacemaker-cluster-libs pacemaker-libs pacemaker-libs-devel + yum install corosync corosynclib corosynclib-devel + + * **[SLES]**: :: + + zypper install drbd drbd-bash-completion drbd-debuginfo drbd-km drbd-km-debuginfo drbd-pacemaker drbd-utils drbd-xen drbd-heartbeat + zypper install drbdlinks + zypper install pacemaker pacemaker-cli pacemaker-cluster-libs pacemaker-libs pacemaker-libs-devel + zypper install corosync corosynclib corosynclib-devel + +Turn off init scripts for HA managed services +============================================= + +All the HA managed services, including drbd, nfs, nfslock, dhcpd, xcatd, httpd, mysqld, conserver will be controlled by ``pacemaker``. These services should not start on boot. Need to turn off the init scripts for these services on both management nodes. Here is an example: :: + + chkconfig drbd off + chkconfig nfs off + chkconfig nfslock off + chkconfig dhcpd off + chkconfig xcatd off + chkconfig httpd off + chkconfig mysqld off + chkconfig conserver off + +``Note``: The conserver package is optional for xCAT to work, if the conserver is not used in your xCAT cluster, then it is not needed to manage conserver service using ``pacemaker``. + +Configure DRBD +============== + +``Note``: ``DRBD`` (by convention) uses TCP ports from 7788 upwards, with every resource listening on a separate port. DRBD uses two TCP connections for every resource configured. For proper DRBD functionality, it is required that these connections are allowed by your firewall configuration. + +#. Create disk partition for DRBD device + + In this example, we use a separate disk ``/dev/sdb`` for DRBD device, before using ``/dev/sdb`` as the DRBD device, we need to create a partition using either fdisk or parted. The partition size can be determined by the cluster configuration, generally speaking, 100GB should be enough for most cases. The partition size should be the same on the two management nodes, the partition device name need not have the same name on the two management nodes, but it is recommended to have the same partition device name on the two management nodes. After the partition is created, do not create file system on it. Here is an example: :: + + [root@x3550m4n01 ~]# fdisk -l /dev/sdb + + Disk /dev/sdb: 299.0 GB, 298999349248 bytes + 255 heads, 63 sectors/track, 36351 cylinders + Units = cylinders of 16065 * 512 = 8225280 bytes + Sector size (logical/physical): 512 bytes / 512 bytes + I/O size (minimum/optimal): 512 bytes / 512 bytes + Disk identifier: 0x00000000 + + Device Boot Start End Blocks Id System + /dev/sdb1 1 13055 104864256 5 Extended + /dev/sdb5 1 13055 104864224+ 83 Linux + +#. Create ``DRBD`` resource configuration file + + All the ``DRBD`` resource configuration files are under ``/etc/drbd.d/``, we need to create a ``DRBD`` resource configuration file for the xCAT HA MN. Here is an example: :: + + [root@x3550m4n01 ~]# cat /etc/drbd.d/xcat.res + resource xCAT { + net { + verify-alg sha1; + after-sb-0pri discard-least-changes; + after-sb-1pri consensus; + after-sb-2pri call-pri-lost-after-sb; + } + on x3550m4n01 { + device /dev/drbd1; + disk /dev/sdb5; + address 10.12.0.221:7789; + meta-disk internal; + } + on x3550m4n02 { + device /dev/drbd1; + disk /dev/sdb5; + address 10.12.0.222:7789; + meta-disk internal; + } + } + + substitute the hostname, device, disk partition and ip address with your own values. + +#. Create device metadata + + This step must be completed only on initial device creation. It initializes DRBD.s metadata, it should be run on both management nodes. :: + + [root@x3550m4n01 drbd.d]# drbdadm create-md xCAT + Writing meta data... + initializing activity log + NOT initializing bitmap + New drbd meta data block successfully created. + success + [root@x3550m4n01 drbd.d]# + + [root@x3550m4n02 ~]# drbdadm create-md xCAT + Writing meta data... + initializing activity log + NOT initializing bitmap + New drbd meta data block successfully created. + success + +#. Enable the resource + + This step associates the resource with its backing device (or devices, in case of a multi-volume resource), sets replication parameters, and connects the resource to its peer. This step should be done on both management nodes. :: + + [root@x3550m4n01 ~]# drbdadm up xCAT + [root@x3550m4n02 ~]# drbdadm up xCAT + + Observe /proc/drbd. DRBD.s virtual status file in the /proc filesystem, /proc/drbd, should now contain information similar to the following: :: + + [root@x3550m4n01 ~]# cat /proc/drbd + version: 8.4.2 (api:1/proto:86-101) + GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root@x3550m4n01, 2012-09-14 10:08:13 + + 1: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r----- + ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:104860984 + [root@x3550m4n01 ~]# + +#. Start the initial full synchronization + + This step must be performed on only one node, only on initial resource configuration, and only on the node you selected as the synchronization source. To perform this step, issue this command: :: + + [root@x3550m4n01 ~]# drbdadm primary --force xCAT + + Based on the DRBD device size and the network bandwidth, the initial full synchronization might take a while to finish, in this configuration, a 100GB DRBD device through 1Gb networks takes about 30 minutes. The ``/proc/drbd`` or ``service drbd status`` shows the progress of the initial full synchronization. :: + + version: 8.4.2 (api:1/proto:86-101) GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root@x3550m4n01, 2012-09-14 10:08:13 + + 1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----- + ns:481152 nr:0 dw:0 dr:481816 al:0 bm:29 lo:0 pe:4 ua:0 ap:0 ep:1 wo:f oos:104380216 + [>....................] sync'ed: 0.5% (101932/102400)M + finish: 2:29:06 speed: 11,644 (11,444) K/sec + + If a direct, back-to-back Gigabyte Ethernet connection is setup between the two management nodes and you are unhappy with the syncronization speed, it is possible to speed up the initial synchronization through some tunable parameters in DRBD. This setting is not permanent, and will not be retained after boot. For details, see `http://www.drbd.org/users-guide-emb/s-configure-sync-rate.html`_:: + + drbdadm disk-options --resync-rate=110M xCAT + +#. Create file system on DRBD device and mount the file system + + Even while the DRBD sync is taking place, you can go ahead and create a filesystem on the DRBD device, but it is recommended to wait for the inital full synchronization is finished before creating the file system. + + After the initial full synchronization is finished, you can take the DRBD device as a normal disk partition to create file system and mount it to some directory. The DRDB device name is set in the ``/etc/drbd.d/xcat.res`` created in the previous step. In this doc, the DRBD device name is ``/dev/drbd1``. :: + + [root@x3550m4n01]# mkfs -t ext4 /dev/drbd1 + ... ... + [root@x3550m4n01]# mkdir /xCATdrbd + [root@x3550m4n01]# mount /dev/drbd1 /xCATdrbd + + To test the file system is working correctly, create a test file: :: + + [root@x3550m4n01]# echo "this is a test file" > /xCATdrbd/testfile + [root@x3550m4n01]# cat /xCATdrbd/testfile + this is a test file + [root@x3550m4n01]# + + ``Note``: make sure the DRBD initial full synchronization is finished before taking any subsequent step. + +#. Test the ``DRBD`` failover + + To test the ``DRBD`` failover, you need to change the primary/secondary role on the two management nodes. + + On the ``DRDB`` primary server(x3550m4n01): :: + + [root@x3550m4n01 ~]# umount /xCATdrbd + [root@x3550m4n01 ~]# drbdadm secondary xCAT + + Then the ``service drbd status`` shows both management nodes are now "Secondary" servers.:: + + [root@x3550m4n01 ~]# service drbd status + drbd driver loaded OK; device status: + version: 8.4.2 (api:1/proto:86-101) + GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root@x3550m4n01, 2012-09-14 10:36:39 + m:res cs ro ds p mounted fstype + 1:xCAT Connected Secondary/Secondary UpToDate/UpToDate C + [root@x3550m4n01 ~]# + + On the ``DRBD`` secondary server(x3550m4n02): :: + + [root@x3550m4n02 ~]# drbdadm primary xCAT + + Then the ``service drbd status`` shows the new primary DRBD server is x3550m4n02: :: + + [root@x3550m4n02 ~]# service drbd status + drbd driver loaded OK; device status: + version: 8.4.2 (api:1/proto:86-101) + GIT-hash: 7ad5f850d711223713d6dcadc3dd48860321070c build by root@x3550m4n01, 2012-09-14 10:36:39 + m:res cs ro ds p mounted fstype + 1:xCAT Connected Primary/Secondary UpToDate/UpToDate C + + Mount the ``DRBD`` device to the directory ``/xCATdrbd`` on the new DRBD primary server, verify the file system is synchronized: :: + + [root@x3550m4n02 ~]# mount /dev/drbd1 /xCATdrbd + [root@x3550m4n02]# cat /xCATdrbd/testfile + this is a test file + [root@x3550m4n02]# + + Before proceed with the following steps, you need to failover the DRBD primary server back to x3550m4n01, using the same procedure mentioned above. + +Configure drbdlinks +=================== + +The drbdlinks configuration is quite easy, only needs to create a configuration file, say ``/xCATdrbd/etc/drbdlinks.xCAT.conf``, and then run ``drbdlinks`` command to manage the symbolic links. + +Note: There are three relative symbolic links in the web server (apache/httpd) files, needs to change them to be absolute links . or the web server won't start. Run the following commands on both management nodes: :: + + rm /etc/httpd/logs ; ln -s /var/log/httpd /etc/httpd/logs + rm /etc/httpd/modules ; ln -s /usr/lib64/httpd/modules /etc/httpd/modules + rm /etc/httpd/run ; ln -s /var/run/httpd /etc/httpd/run + +Here is an example of the ``/xCATdrbd/etc/drbdlinks.xCAT.conf`` content, you might need to edit ``/xCATdrbd/etc/drbdlinks.xCAT.conf`` to reflect your needs. For example, if you are managing DNS outside of xCAT, you will not need to manage the DNS service via drbdlinks or pacemaker.:: + + [root@x3550m4n01 ~]# cat /xCATdrbd/etc/drbdlinks.xCAT.conf + # + # Sample configuration file for drbdlinks + # If passed an option of 1, SELinux features will be used. If 0, they + # will not. The default is to auto-detect if SELinux is enabled. If + # enabled, created links will be added to the SELinux context using + # chcon -h -u -r -t , where the values plugged + # in this command are pulled from the original file. + #selinux(1) + + # One mountpoint must be listed. This is the location where the DRBD + # drive is mounted. + #mountpoint('/shared') + + # Multiple "link" lines may be listed, one for each link that needs to be + # set up into the above shared mountpoint. If "link()" is passed one + # argument, it is assumed that it is linked into that name under the + # mountpoint above. Otherwise, you can specify a second argument which is + # the location of the file on the shared partition. + # + # For example, if mountpoint is "/shared" and you call "link('/etc/httpd')", + # it is equivalent to calling "link('/etc/httpd', '/shared/etc/httpd')". + #link('/etc/httpd') + #link('/var/lib/pgsql/') + # + # + # services mounted under /xCATdrbd + # + restartSyslog(1) + cleanthisconfig(1) + mountpoint('/xCATdrbd') + # ==== xCAT ==== + link('/install') + link('/etc/xcat') + link('/opt/xcat') + link('/root/.xcat') + # Hosts is a bit odd - may just want to rsync out... + link('/etc/hosts') + # ==== Conserver ==== + link('/etc/conserver.cf') + # ==== DNS ==== + #link('/etc/named') + #link('/etc/named.conf') + #link('/etc/named.iscdlv.key') + #link('/etc/named.rfc1912.zones') + #link('/etc/named.root.key') + #link('/etc/rndc.conf') + #link('/etc/rndc.key') + #link('/etc/sysconfig/named') + #link('/var/named') + # ==== YUM ==== + link('/etc/yum.repos.d') + # ==== DHCP ==== + link('/etc/dhcp') + link('/var/lib/dhcpd') + link('/etc/sysconfig/dhcpd') + link('/etc/sysconfig/dhcpd6') + # ==== Apache ==== + link('/etc/httpd') + link('/var/www') + # + # ==== MySQL ==== + link('/etc/my.cnf') + link('/var/lib/mysql') + # + # ==== tftp ==== + link('/tftpboot') + # + # ==== NFS ==== + link('/etc/exports') + link('/var/lib/nfs') + link('/etc/sysconfig/nfs') + # + # ==== SSH ==== + link('/etc/ssh') + link('/root/.ssh') + +``Note``: Make sure that none of the directories we have specified in the ``drbdlinks`` config are not mount points. If any of them are, we should a new mount point for them and edit ``/etc/fstab`` to use the new mount point. + +Then run the following commands to create the symbolic links: :: + + [root@x3550m4n01]# drbdlinks -c /xCATdrbd/etc/drbdlinks.xCAT.conf initialize_shared_storage + [root@x3550m4n01]# drbdlinks -c /xCATdrbd/etc/drbdlinks.xCAT.conf start + +Configure Corosync +================== + +#. Create ``/etc/corosync/corosync.conf`` + + The ``/etc/corosync/corosync.conf`` is the configuration file for Corosync, you need to modify the ``/etc/corosync/corosync.conf`` according to the cluster configuration.:: + + [root@x3550m4n01]#cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf + + Modify the ``/etc/corosync/corosync.conf``, the default configuration of Corosync uses multicast to discover the cluster members in the subnet, since this cluster only has two members and no new members will join the cluster, so we can hard code the members for this cluster.:: + + [root@x3550m4n01 ~]# cat /etc/corosync/corosync.conf + # Please read the corosync.conf.5 manual page + compatibility: whitetank + + totem { + version: 2 + secauth: off + interface { + member { + memberaddr: 10.12.0.221 + } + member { + memberaddr: 10.12.0.222 + } + ringnumber: 0 + bindnetaddr: 10.12.0.0 + mcastport: 5405 + ttl: 1 + } + transport: udpu + } + + logging { + fileline: off + to_logfile: yes + to_syslog: yes + logfile: /var/log/cluster/corosync.log + debug: off + timestamp: on + logger_subsys { + subsys: AMF + debug: off + } + } + +#. Create the service file for Pacemaker: + + To have Corosync call Pacemaker, a configuration file needs to be created under the directory ``/etc/corosync/service.d/``. Here is an example: :: + + [root@x3550m4n01 ~]# cat /etc/corosync/service.d/pcmk + service { + # Load the Pacemaker Cluster Resource Manager + name: pacemaker + ver: 0 + } + +#. Copy the Corosync configuration files to standby management node + + The Corosync configuration files are needed on both the primary and standby management node, copy these configuration files to the standby management node. :: + + [root@x3550m4n01 ~]# scp /etc/corosync/corosync.conf x3550m4n02:/etc/corosync/corosync.conf + + [root@x3550m4n01 ~]# scp /etc/corosync/service.d/pcmk x3550m4n02:/etc/corosync/service.d/pcmk + +#. Star Corosync + + Start Corosync on both management nodes by running: :: + + service corosync start + +#. Verify the cluster status + + If the setup is correct, the cluster should now be up and running, the Pacemaker command crm_mon could show the cluster status.:: + + crm_mon + + ============ + Last updated: Thu Sep 20 12:23:37 2012 + Last change: Thu Sep 20 12:23:23 2012 via cibadmin on x3550m4n01 + Stack: openais + Current DC: x3550m4n01 - partition with quorum + Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14 + 2 Nodes configured, 2 expected votes + 0 Resources configured. + ============ + + Online: [ x3550m4n01 x3550m4n02 ] + + The cluster initialization procedure might take a short while, you can also monitor the crososync log file ``/var/log/cluster/corosync.log`` for the cluster initialization progress, after the cluster initialization process is finished, there will be some message like "Completed service synchronization, ready to provide service." in the corosync log file. + +Configure Pacemaker +=================== + +``Note``: a temporary workaround: the ``/etc/rc.d/init.d/conserver`` shipped with conserver-xcat is not lsb compliant, will cause ``pacemaker`` problems, we need to modify the ``/etc/rc.d/init.d/conserver`` to be lsb compliant before we create ``pacemaker`` resources for conserver. xCAT will be fixing this problem in the future, but for now, we have to use this temporary workaround: :: + + diff -ruN conserver conserver.xcat + --- conserver 2012-03-20 00:56:46.000000000 +0800 + +++ conserver.xcat 2012-09-25 17:03:57.703159703 +0800 + @@ -84,9 +84,9 @@ + stop) + $STATUS conserver >& /dev/null + if [ "$?" != "0" ]; then + - echo -n "conserver not running, already stopped. " + + echo -n "conserver not running, not stopping " + $PASSED + - exit 0 + + exit 1 + fi + echo -n "Shutting down conserver: " + killproc conserver + @@ -100,7 +100,6 @@ + ;; + status) + $STATUS conserver + - exit $? + ;; + restart) + $STATUS conserver >& /dev/null + +All the cluster resources are managed by Pacemaker, here is an example ``pacemaker`` configuration that has been used by different HA MN customers. You might need to do some minor modifications based on your cluster configuration. + +Please be aware that you need to apply ALL the configuration at once. You cannot pick and choose which pieces to put in, and you cannot put some in now, and some later. Don't execute individual commands, but use crm configure edit instead. :: + + node x3550m4n01 + node x3550m4n02 + # + # NFS server - monitored by 'status' operation + # + primitive NFS_xCAT lsb:nfs \ + op start interval="0" timeout="120s" \ + op stop interval="0" timeout="120s" \ + op monitor interval="41s" + # + # NFS Lock Daemon - monitored by 'status' operation + # + primitive NFSlock_xCAT lsb:nfslock \ + op start interval="0" timeout="120s" \ + op stop interval="0" timeout="120s" \ + op monitor interval="43s" + # + # Apache web server - we monitor it by doing wgets on the 'statusurl' and looking for 'testregex' + # + primitive apache_xCAT ocf:heartbeat:apache \ + op start interval="0" timeout="600s" \ + op stop interval="0" timeout="120s" \ + op monitor interval="57s" timeout="120s" \ + params configfile="/etc/httpd/conf/httpd.conf" statusurl="http://localhost:80/icons/README.html" testregex="" \ + meta target-role="Started" + # + # MySQL for xCAT database. We monitor it by doing a trivial query that will always succeed. + # + primitive db_xCAT ocf:heartbeat:mysql \ + params config="/xCATdrbd/etc/my.cnf" test_user="mysql" binary="/usr/bin/mysqld_safe" pid="/var/run/mysqld/mysqld.pid" socket="/var/lib/mysql/mysql.sock" \ + op start interval="0" timeout="600" \ + op stop interval="0" timeout="600" \ + op monitor interval="57" timeout="120" + # + # DHCP daemon - monitored by 'status' operation + # + primitive dhcpd lsb:dhcpd \ + op start interval="0" timeout="120s" \ + op stop interval="0" timeout="120s" \ + op monitor interval="37s" + # + # DRBD filesystem replication (single instance) + # DRBD is a master/slave resource + # + primitive drbd_xCAT ocf:linbit:drbd \ + params drbd_resource="xCAT" \ + op start interval="0" timeout="240" \ + op stop interval="0" timeout="120s" \ + op monitor interval="17s" role="Master" timeout="120s" \ + op monitor interval="16s" role="Slave" timeout="119s" + # + # Dummy resource that starts after all other + # resources have started + # + primitive dummy ocf:heartbeat:Dummy \ + op start interval="0" timeout="600s" \ + op stop interval="0" timeout="120s" \ + op monitor interval="57s" timeout="120s" \ + meta target-role="Started" + # + # Filesystem resource - mounts /xCATdrbd - monitored by checking to see if it + # is still mounted. Other options are available, but not currently used. + # + primitive fs_xCAT ocf:heartbeat:Filesystem \ + op start interval="0" timeout="600s" \ + op stop interval="0" timeout="120s" \ + op monitor interval="57s" timeout="120s" \ + params device="/dev/drbd/by-res/xCAT" directory="/xCATdrbd" fstype="ext4" + #TODO TODO TODO TODO TODO TODO TODO TODO TODO TODO TODO TODO TODO TODO TODO TODO + # + # Extra external IP bound to the active xCAT instance - monitored by ping + # + primitive ip_IBM ocf:heartbeat:IPaddr2 \ + params ip="9.114.34.4" iflabel="blue" nic="eth3" cidr_netmask="24" \ + op start interval="0" timeout="120s" \ + op stop interval="0" timeout="120s" \ + op monitor interval="37s" timeout="120s" + # + # Unneeded IP address - monitored by ping + # + #primitive ip_dhcp1 ocf:heartbeat:IPaddr2 \ + # params ip="10.5.0.1" iflabel="dh" nic="bond-mlan.30" cidr_netmask="16" \ + # op start interval="0" timeout="120s" \ + # op stop interval="0" timeout="120s" \ + # op monitor interval="37s" timeout="120s" + # + # Another unneeded IP address - monitored by ping + # + #primitive ip_dhcp2 ocf:heartbeat:IPaddr2 \ + # params ip="10.6.0.1" iflabel="dhcp" nic="eth2.30" cidr_netmask="16" \ + # op start interval="0" timeout="120s" \ + # op stop interval="0" timeout="120s" \ + # op monitor interval="39s" timeout="120s" + # + # IP address for SNMP traps - monitored by ping + # + #primitive ip_snmp ocf:heartbeat:IPaddr2 + # params ip="10.1.0.1" iflabel="snmp" nic="eth2" cidr_netmask="16" + # op start interval="0" timeout="120s" + # op stop interval="0" timeout="120s" + # op monitor interval="37s" timeout="120s" + # + # END TODO TODO TODO TODO TODO TODO TODO TODO TODO TODO TODO TODO TODO TODO TODO + # Main xCAT IP address - monitored by ping + # + primitive ip_xCAT ocf:heartbeat:IPaddr2 \ + params ip="10.1.0.1" iflabel="xCAT" nic="eth2" cidr_netmask="24" \ + op start interval="0" timeout="120s" \ + op stop interval="0" timeout="120s" \ + op monitor interval="37s" timeout="120s" + # + # + # BIND DNS daemon (named) - monitored by 'status' operation + # + primitive named lsb:named \ + op start interval="0" timeout="120s" \ + op stop interval="0" timeout="120s" \ + op monitor interval="37s" + # + # DRBDlinks resource to manage symbolic links - monitored by checking symlinks + # + primitive symlinks_xCAT ocf:tummy:drbdlinks \ + params configfile="/xCATdrbd/etc/drbdlinks.xCAT.conf" \ + op start interval="0" timeout="600s" \ + op stop interval="0" timeout="120s" \ + op monitor interval="31s" timeout="120s" + # + # Custom xCAT Trivial File Transfer Protocol daemon for + # booting diskless machines - monitored by 'status' operation + # + #primitive tftpd lsb:tftpd \ + # op start interval="0" timeout="120s" \ + # op stop interval="0" timeout="120s" \ + # op monitor interval="41s" + # + # Main xCAT daemon + # xCAT is best understood and modelled as a master/slave type + # resource - but we don't do that yet. If it were master/slave + # we could easily take the service nodes into account. + # We just model it as an LSB init script resource :-(. + # Monitored by 'status' operation + # + primitive xCAT lsb:xcatd \ + op start interval="0" timeout="120s" \ + op stop interval="0" timeout="120s" \ + op monitor interval="42s" \ + meta target-role="Started" + # + # xCAT console server - monitored by 'status' operation + # + primitive xCAT_conserver lsb:conserver \ + op start interval="0" timeout="120s" \ + op stop interval="0" timeout="120s" \ + op monitor interval="53" + # + # Group consisting only of filesystem and its symlink setup + # + group grp_xCAT fs_xCAT symlinks_xCAT + # + # Typical Master/Slave DRBD resource - mounted as /xCATdrbd elsewhere + # We configured it as a single master resource - with only the master side being capable of + # being written (i.e., mounted) + # + ms ms_drbd_xCAT drbd_xCAT \ + meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" + # + # We model 'named' as a clone resource and set up /etc/resolv.conf as follows: + # virtual IP + # permanent IP of one machine + # permanent IP of the other machine + # + # This helps cut us a little slack in DNS resolution during failovers. We made it a + # clone resource rather than just a regular resource because named binds to all existing addresses + # when it starts and (a) never notices any added after it starts and (b) shuts down if any of the + # IPs it bound to go away after it starts up. So we need to coordinate it with bringing up and + # down our IP addresses. + # + clone clone_named named \ + meta clone-max="2" clone-node-max="1" notify="false" + # + # NFS needs to be on same machine as its filesystem + colocation colo1 inf: NFS_xCAT grp_xCAT + # TODO + colocation colo10 inf: ip_dhcp2 ms_drbd_xCAT:Master + #colocation colo11 inf: ip_IBM ms_drbd_xCAT:Master + # END TODO + # NFS lock daemon needs to be on same machine as its filesystem + colocation colo2 inf: NFSlock_xCAT grp_xCAT + # TODO + # SNMP IP needs to be on same machine as xCAT + #colocation colo3 inf: ip_snmp grp_xCAT + # END TODO + # Apache needs to be on same machine as xCAT + colocation colo4 inf: apache_xCAT grp_xCAT + # DHCP needs to be on same machine as xCAT + colocation colo5 inf: dhcpd grp_xCAT + # tftpd needs to be on same machine as xCAT + #colocation colo6 inf: tftpd grp_xCAT + # Console Server needs to be on same machine as xCAT + colocation colo7 inf: xCAT_conserver grp_xCAT + # MySQL needs to be on same machine as xCAT + colocation colo8 inf: db_xCAT grp_xCAT + # TODO + #colocation colo9 inf: ip_dhcp1 ms_drbd_xCAT:Master + # END TODO + # Dummy resource needs to be on same machine as xCAT (not really necessary) + colocation dummy_colocation inf: dummy xCAT + # xCAT group (filesystem and symlinks) needs to be on same machine as DRBD master + colocation grp_xCAT_on_drbd inf: grp_xCAT ms_drbd_xCAT:Master + # xCAT IP address needs to be on same machine as DRBD master + colocation ip_xCAT_on_drbd inf: ip_xCAT ms_drbd_xCAT:Master + # xCAT itself needs to be on same machine as xCAT filesystem + colocation xCAT_colocation inf: xCAT grp_xCAT + # Lots of things need to start after the filesystem is mounted + order Most_aftergrp inf: grp_xCAT ( NFS_xCAT NFSlock_xCAT apache_xCAT db_xCAT xCAT_conserver dhcpd ) + # Some things will bind to the IP and therefore need to start after the IP + # Note that some of these also have to start after the filesystem is mounted + order Most_afterip inf: ip_xCAT ( apache_xCAT db_xCAT xCAT_conserver ) + # TODO + #order after_dhcp1 inf: ip_dhcp1 dhcpd + #order after_dhcp2 inf: ip_dhcp2 dhcpd + # END TODO + # We start named after we start the xCAT IP + # Note that both sides are restarted every time the IP moves. + # This prevents the problems with named not liking IP addresses coming and going. + order clone_named_after_ip_xCAT inf: ip_xCAT clone_named + order dummy_order0 inf: NFS_xCAT dummy + # + # We make the dummy resource start after basically all other resources + # + order dummy_order1 inf: xCAT dummy + order dummy_order2 inf: NFSlock_xCAT dummy + order dummy_order3 inf: clone_named dummy + order dummy_order4 inf: apache_xCAT dummy + order dummy_order5 inf: dhcpd dummy + #order dummy_order6 inf: tftpd dummy + order dummy_order7 inf: xCAT_conserver dummy + # TODO + #order dummy_order8 inf: ip_dhcp1 dummy + #order dummy_order9 inf: ip_dhcp2 dummy + # END TODO + # We mount the filesystem and set up the symlinks afer DRBD is promoted to master + order grp_xCAT_after_drbd_xCAT inf: ms_drbd_xCAT:promote grp_xCAT:start + # xCAT has to start after its database (mySQL) + order xCAT_dborder inf: db_xCAT xCAT + property $id="cib-bootstrap-options" \ + dc-version="1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14" \ + cluster-infrastructure="openais" \ + expected-quorum-votes="2" \ + stonith-enabled="false" \ + no-quorum-policy="ignore" \ + last-lrm-refresh="1348180592" + +Cluster Maintenance Considerations +================================== + +The standby management node should be taken into account when doing any maintenance work in the xCAT cluster with HAMN setup. + +#. Software Maintenance - Any software updates on the primary management node should also be done on the standby management node. + +#. File Synchronization - Although we have setup crontab to synchronize the related files between the primary management node and standby management node, the crontab entries are only run in specific time slots. The synchronization delay may cause potential problems with HAMN, so it is recommended to manually synchronize the files mentioned in the section above whenever the files are modified. + +#. Reboot management nodes - In the primary management node needs to be rebooted, since the daemons are set to not auto start at boot time, and the shared disks file systems will not be mounted automatically, you should mount the shared disks and start the daemons manually. + +#. Update xCAT - We should avoid failover during the xCAT upgrade, the failover will cause drbd mount changes, since the xCAT upgrade procedure needs to restart xcatd for one or more times, it will likely trigger failover. So it will be safer if we put the backup xCAT MN in inactive state while updating the xCAT MN, through either stopping corosync+pacemaker on the back xCAT MN or shutdown the backup xCAT MN. After the primary MN is upgraded, make the backup MN be active, failover to the backup MN, put the primary MN be inactive, and then update the backup xCAT MN. + +``Note``: after software upgrade, some services that were set to not autostart on boot might be started by the software upgrade process, or even set to autostart on boot, the admin should check the services on both primary and standby EMS, if any of the services are set to autostart on boot, turn it off; if any of the services are started on the backup EMS, stop the service. + +At this point, the HA MN Setup is complete, and customer workloads and system administration can continue on the primary management node until a failure occurs. The xcatdb and files on the standby management node will continue to be synchronized until such a failure occurs. + +Failover +======== + +There are two kinds of failover, planned failover and unplanned failover. The planned failover can be useful for updating the management nodes or any scheduled maintainance activities; the unplanned failover covers the unexpected hardware or software failures. + +In a planned failover, you can do necessary cleanup work on the previous primary management node before failover to the previous standby management node. In a unplanned failover, the previous management node probably is not functioning at all, you can simply shutdown the system. + +But, both the planned failover and unplanned failover are fully automatic, the administrator does not need to do anything else. + +On the current primary management node, if the current primary management node is still available to run commands, run the following command to cleanup things: :: + + service corosync stop + +You can run ``crm resource list`` to see which node is the current primary management node: :: + + [root@x3550m4n01 html]# crm resource list + NFS_xCAT (lsb:nfs) Started + NFSlock_xCAT (lsb:nfslock) Started + apache_xCAT (ocf::heartbeat:apache) Started + db_xCAT (ocf::heartbeat:mysql) Started + dhcpd (lsb:dhcpd) Started + dummy (ocf::heartbeat:Dummy) Started + ip_xCAT (ocf::heartbeat:IPaddr2) Started + xCAT (lsb:xcatd) Started + xCAT_conserver (lsb:conserver) Started + Resource Group: grp_xCAT + fs_xCAT (ocf::heartbeat:Filesystem) Started + symlinks_xCAT (ocf::tummy:drbdlinks) Started + Master/Slave Set: ms_drbd_xCAT [drbd_xCAT] + Masters: [ x3550m4n01 ] + Slaves: [ x3550m4n02 ] + Clone Set: clone_named [named] + Started: [ x3550m4n02 x3550m4n01 ] + ip_IBM (ocf::heartbeat:IPaddr2) Started + +The "Masters" of ms_drbd_xCAT should be the current primary management node. + +If any of the management node is rebooted for whatever reason while the HA MN configuration is up and running, you might need to start the corosync service manually. :: + + service corosync start + +To avoid this, run the following command to set the autostart for the corosync service on both management nodes: :: + + chkconfig corosync on + +Backup working Pacemaker configuration (Optional) +================================================= + +It is a good practice to backup the working ``pacemaker`` configuration, the backup could be in both plain text format or XML format, the plain text is more easily editable and can be modified and used chunk by chunk, the xml can be used to do a full replacement restore. It will be very useful to make such a backup everytime before you make a change. + +To backup in the plain text format, run the following command: :: + + crm configure save /path/to/backup/textfile + +To backup in the xml format, run the following command: :: + + crm configure save xml /path/to/backup/xmlfile + +If necessary, the backup procedure can be done periodically through crontab or at, here is an sample script that will backup the ``pacemaker`` configuration automatically: :: + + TXT_CONFIG=/xCATdrbd/pacemakerconfigbackup/pacemaker.conf.txt-$(hostname -s).$(date +"%Y.%m.%d.%H.%M.%S") + XML_CONFIG=/xCATdrbd/pacemakerconfigbackup/pacemaker.conf.xml-$(hostname -s).$(date +"%Y.%m.%d.%H.%M.%S") + test -e $TXT_CONFIG && /bin/cp -f $TXT_CONFIG $TXT_CONFIG.bak + test -e $XML_CONFIG && /bin/cp -f $XML_CONFIG $XML_CONFIG.bak + crm configure save $TXT_CONFIG + crm configure save xml $XML_CONFIG + +To restore the ``pacemaker`` configuration from the backup xml file. :: + + crm configure load replace /path/to/backup/xmlfile + +Correcting DRBD Differences (Optional) +====================================== + +It is possible that the data between the two sides of the DRBD mirror could be different in a few chunks of data, although these differences might be harmless, but it will be good if we could discover and fix these differences in time. + +Add a crontab entry to check the differences +-------------------------------------------- + :: + + 0 6 * * * /sbin/drbdadm verify all + +Please note that this process will take a few hours. You could schedule it at a time when it can be expected to run when things are relatively idle. You might choose to only run it once a week, but nightly seems to be a nice choice as well. You should only put this cron job on one side or the other of the DRBD mirror . not both. + +Correcting the differences automatically +---------------------------------------- + +The crontab entry mentioned above will discover differences between the two sides, but will not correct any it might find. This section describes a method for automatically correcting those differences. + +There are basically three reasons why this might happen: + +1. A series of well-known Linux kernel bugs that have only been recently fixed and do not yet appear in any version of RHEL. All of them are known to be harmless. + +2. Hardware failure - one side stored the data on disk incorrectly , + +3. Other Bugs. I don't know of any - but all software has bugs. + +We do see occasional 4K chunks of data differing between the two sides of the mirror. As long as there are only a handful of them, it is almost certainly due to the harmless bugs mentioned above. + +There is also a script, say drbdforceresync, which has been written to force correction of the two sides. It should be run on both sides an hour or so after the verify process kicked off after the cron job has completed. The script written for this purpose is shown below: :: + + #version: 8.4.0 (api:1/proto:86-100) + #GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by root@wbsm15-mgmt01, 2011-11-17 18:14:37 + # + # 1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----- + # ns:0 nr:301816 dw:14440824 dr:629126328 al:0 bm:206 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 + # + # Force a DRBD resync + # + force_resync() { + echo "Disconnecting and reconnecting DRBD resource $1." + drbdadm disconnect $1 + drbdadm connect $1 + } + # + # Convert a DRBD resource name to a device number + # + resource2devno() { + dev=$(readlink /dev/drbd/by-res/$1) + echo $dev | sed 's%.*/drbd%%' + } + + # + # Force a DRBD resync if we are in the secondary role + # + check_resync() { + # We should only do the force_resync if we are the secondary + resource=$1 + whichdev=$(resource2devno $resource) + DRBD=$(cat /proc/drbd | grep "^ *${whichdev}: *cs:") + # It would be nice if to know for sure that the ds: for the secondary + # role would be when it has known issues... + # Then we could do this only when strictly necessary + case "$DRBD" in + *${whichdev}:*'cs:Connected'*'ro:Secondary/Primary'*/UpToDate*) + force_resync $resource;; + esac + } + +``Note``: this script has been tested in some HAMN clusters, and uses the DRBD-recommended method of forcing a resync (a disconnect/reconnect). If there are no differences, this script causes near-zero DRBD activity. It is only when there are differences that the disconnect/reconnect sequence does anything. So, it is recommended to add this script into crontab also, like: :: + + 0 6 * * 6 /sbin/drbdforceresync + +Setup the Cluster +================= + +At this point you have setup your Primary and Standby management node for HA. You can now continue to setup your cluster. Return to using the Primary management node. Now setup your Hierarchical cluster using the following documentation, depending on your Hardware,OS and type of install you want to do on the Nodes :doc:`Admin Guide <../../guides/admin-guides/index>`. + +For all the xCAT docs: http://xcat-docs.readthedocs.org. + +Trouble shooting and debug tips +=============================== + +#. ``Pacemaker`` resources could not start + + In case some of the ``pacemaker`` resources could not start, it mainly because the corresponding service(like xcatd) has some problem and could not be started, after the problem is fixed, the ``pacemaker`` resource status will be updated soon, or you can run the following command to refresh the status immediately. :: + + crm resource cleanup + +#. Add new ``Pacemaker`` resources into configuration file + + If you want to add your own ``Pacemaker`` resources into the configuration file, you might need to lookup the table on which resources are available in ``Pacemaker``, use the following commands: :: + + [root@x3550m4n01 ~]#crm ra + crm(live)ra# classes + heartbeat + lsb + ocf / heartbeat linbit pacemaker redhat tummy + stonith + crm(live)ra# list ocf + ASEHAagent.sh AoEtarget AudibleAlarm CTDB ClusterMon + Delay Dummy EvmsSCC Evmsd Filesystem + HealthCPU HealthSMART ICP IPaddr IPaddr2 + IPsrcaddr IPv6addr LVM LinuxSCSI MailTo + ManageRAID ManageVE Pure-FTPd Raid1 Route + SAPDatabase SAPInstance SendArp ServeRAID SphinxSearchDaemon + Squid Stateful SysInfo SystemHealth VIPArip + VirtualDomain WAS WAS6 WinPopup Xen + Xinetd anything apache apache.sh clusterfs.sh + conntrackd controld db2 drbd drbdlinks + eDir88 ethmonitor exportfs fio fs.sh + iSCSILogicalUnit iSCSITarget ids ip.sh iscsi + jboss lvm.sh lvm_by_lv.sh lvm_by_vg.sh lxc + mysql mysql-proxy mysql.sh named.sh netfs.sh + nfsclient.sh nfsexport.sh nfsserver nfsserver.sh nginx + o2cb ocf-shellfuncs openldap.sh oracle oracledb.sh + orainstance.sh oralistener.sh oralsnr pgsql ping + pingd portblock postfix postgres-8.sh proftpd + rsyncd samba.sh script.sh scsi2reservation service.sh + sfex svclib_nfslock symlink syslog-ng tomcat + tomcat-6.sh vm.sh vmware + crm(live)ra# meta IPaddr2 + ... + + Operations' defaults (advisory minimum): + + start timeout=20s + stop timeout=20s + status interval=10s timeout=20s + monitor interval=10s timeout=20s + crm(live)ra# providers IPaddr2 + heartbeat + crm(live)ra# + +#. Fixing drbd split brain + + The machine that has taken over as the primary, lets say it's x3550m4n01, and x3550m4n02 has been left stranded, then we need to run the following commands to fix the problem + + * **x3550m4n02** :: + + drbdadm disconnect xCAT + drbdadm secondary xCAT + drbdadm connect --discard-my-data xCAT + + * **x3550m4n01** :: + + drbdadm connect xCAT + +Disable HA MN +============= + +For whatever reason, the user might want to disable HA MN, here is the procedur of disabling HA MN: + +* Shut down standby management node + +If the HA MN configuration is still functioning, failover the primary management node to be the management node that you would like to use as the management node after the HA MN is disabled; if the HA MN configuration is not functioning correctly, select one management node that you would like to use as the management node after the HA MN is disabled. + +* Stop the HA MN services + + chkconfig off: + + pacemaker corosync drdb drdblinks clean + +* Start the xCAT services + + chkconfig on: + + nfs nfslock dhcpd postgresql httpd (apache) named conserver xcatd + +* Reconfigure the xcat interface + +ifconfig to see the current xcat interface before shutting down HA services go to ``/etc/ifconfig/network-scripts`` and create the new interface: :: + + /etc/init.d/pacemaker stop + /etc/init.d/corosync stop + /etc/init.d/drbdlinksclean stop + +With drbd on and with the filesystem mounted look at each link in ``/etc/drbdlinks.xCAT.conf`` for each link, remove the link if it is still linked, + +then copy the drbd file or directory to the filesystem eg. first make sure that the files/directories are no longer linked: :: + + [root@ms1 etc]# ls -al drwxr-xr-x 5 root root 4096 Sep 19 05:09 xcat + [root@ms1 etc]# cp -rp /drbd/etc/xcat /etc/ + +In our case, we handled the /install directory like this: :: + + rsync -av /drbd/install/ /oldinstall/ + rsync -av /drbd/install/ /oldinstall/ --delete + unmount /oldinstall change fstab to mount /install mount /install + +start services by hand ( or reboot ) nfs nfslock dhcpd postgresql httpd (apache) named conserver xcatd + +Appendix A +========== + +A sample Pacemaker configuration through pcs on RHEL 6.4, These are commands that need to be run on the MN: + +Create a file to queue up the changes, this creates a file with the current configuration into a file xcat_cfg: :: + + pcs cluster cib xcat_cfg + +We use the pcs -f option to make changes in the file, so this is not changing it live: :: + + pcs -f xcat_cfg property set stonith-enabled=false + pcs -f xcat_cfg property set no-quorum-policy=ignore + pcs -f xcat_cfg resource op defaults timeout="120s" + + pcs -f xcat_cfg resource create ip_xCAT ocf:heartbeat:IPaddr2 ip="10.1.0.1" \ + iflabel="xCAT" cidr_netmask="24" nic="eth2"\ + op monitor interval="37s" + pcs -f xcat_cfg resource create NFS_xCAT lsb:nfs \ + op monitor interval="41s" + pcs -f xcat_cfg resource create NFSlock_xCAT lsb:nfslock \ + op monitor interval="43s" + pcs -f xcat_cfg resource create apache_xCAT ocf:heartbeat:apache configfile="/etc/httpd/conf/httpd.conf" \ + statusurl="http://localhost:80/icons/README.html" testregex="" \ + op monitor interval="57s" + pcs -f xcat_cfg resource create db_xCAT ocf:heartbeat:mysql config="/xCATdrbd/etc/my.cnf" test_user="mysql" \ + binary="/usr/bin/mysqld_safe" pid="/var/run/mysqld/mysqld.pid" socket="/var/lib/mysql/mysql.sock" \ + op monitor interval="57s" + pcs -f xcat_cfg resource create dhcpd lsb:dhcpd \ + op monitor interval="37s" + pcs -f xcat_cfg resource create drbd_xCAT ocf:linbit:drbd drbd_resource=xCAT + pcs -f xcat_cfg resource master ms_drbd_xCAT drbd_xCAT master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" + pcs -f xcat_cfg resource create dummy ocf:heartbeat:Dummy + pcs -f xcat_cfg resource create fs_xCAT ocf:heartbeat:Filesystem device="/dev/drbd/by-res/xCAT" directory="/xCATdrbd" fstype="ext4" \ + op monitor interval="57s" + pcs -f xcat_cfg resource create named lsb:named \ + op monitor interval="37s" + pcs -f xcat_cfg resource create symlinks_xCAT ocf:tummy:drbdlinks configfile="/xCATdrbd/etc/drbdlinks.xCAT.conf" \ + op monitor interval="31s" + pcs -f xcat_cfg resource create xCAT lsb:xcatd \ + op monitor interval="42s" + pcs -f xcat-cfg resource create xCAT_conserver lsb:conserver \ + op monitor interval="53" + pcs -f xcat_cfg resource clone clone_named named clone-max=2 clone-node-max=1 notify=false + pcs -f xcat_cfg resource group add grp_xCAT fs_xCAT symlinks_xCAT + pcs -f xcat_cfg constraint colocation add NFS_xCAT grp_xCAT + pcs -f xcat_cfg constraint colocation add NFSlock_xCAT grp_xCAT + pcs -f xcat_cfg constraint colocation add apache_xCAT grp_xCAT + pcs -f xcat_cfg constraint colocation add dhcpd grp_xCAT + pcs -f xcat_cfg constraint colocation add db_xCAT grp_xCAT + pcs -f xcat_cfg constraint colocation add dummy grp_xCAT + pcs -f xcat_cfg constraint colocation add xCAT grp_xCAT + pcs -f xcat-cfg constraint colocation add xCAT_conserver grp_xCAT + pcs -f xcat_cfg constraint colocation add grp_xCAT ms_drbd_xCAT INFINITY with-rsc-role=Master + pcs -f xcat_cfg constraint colocation add ip_xCAT ms_drbd_xCAT INFINITY with-rsc-role=Master + pcs -f xcat_cfg constraint order list xCAT dummy + pcs -f xcat_cfg constraint order list NFSlock_xCAT dummy + pcs -f xcat_cfg constraint order list apache_xCAT dummy + pcs -f xcat_cfg constraint order list dhcpd dummy + pcs -f xcat_cfg constraint order list db_xCAT dummy + pcs -f xcat_cfg constraint order list NFS_xCAT dummy + pcs -f xcat-cfg constraint order list xCAT_conserver dummy + + pcs -f xcat_cfg constraint order list fs_xCAT symlinks_xCAT + + pcs -f xcat_cfg constraint order list ip_xCAT db_xCAT + pcs -f xcat_cfg constraint order list ip_xCAT apache_xCAT + pcs -f xcat_cfg constraint order list ip_xCAT dhcpd + pcs -f xcat-cfg constraint order list ip_xCAT xCAT_conserver + + pcs -f xcat_cfg constraint order list grp_xCAT NFS_xCAT + pcs -f xcat_cfg constraint order list grp_xCAT NFSlock_xCAT + pcs -f xcat_cfg constraint order list grp_xCAT apache_xCAT + pcs -f xcat_cfg constraint order list grp_xCAT db_xCAT + pcs -f xcat_cfg constraint order list grp_xCAT dhcpd + pcs -f xcat-cfg constraint order list grp_xCAT xCAT_conserver + pcs -f xcat_cfg constraint order list db_xCAT xCAT + + pcs -f xcat_cfg constraint order promote ms_drbd_xCAT then start grp_xCAT + +Finally we commit the changes that are in xcat_cfg into the live system: :: + + pcs cluster push cib xcat_cfg + +Appendix B +========== + +from RHEL 6.5, corosync is being outdated, and will be replaced by ``cman`` and ``ccs``; so as part of the installation, instead of installing corosync we need to install ``pcs`` and ``ccs``, as shown below: :: + + yum -y install cman ccs pcs + +In order to do similar configs to corosync, that we need to apply to cman, is shown below. :: + + ccs -f /etc/cluster/cluster.conf --createcluster xcat-cluster + ccs -f /etc/cluster/cluster.conf --addnode x3550m4n01 + ccs -f /etc/cluster/cluster.conf --addnode x3550m4n02 + ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk + ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect x3550m4n01 + ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect x3550m4n02 + ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk x3550m4n01 pcmk-redirect port=x3550m4n01 + ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk x3550m4n02 pcmk-redirect port=x3550m4n02 + ccs -f /etc/cluster/cluster.conf --setcman two_node=1 expected_votes=1 + + echo "CMAN_QUORUM_TIMEOUT=0" >> /etc/sysconfig/cman + +As per Appendix A, a sample Pacemaker configuration through pcs on RHEL 6.5 is shown below; but there are some slight changes compared to RHEL 6.4 (So we need to keep these in mind). The commands below need to be run on the MN: + +Create a file to queue up the changes, this creates a file with the current configuration into a file xcat_cfg: :: + + pcs cluster cib xcat_cfg + +We use the pcs -f option to make changes in the file, so this is not changing it live: :: + + pcs -f xcat_cfg property set stonith-enabled=false + pcs -f xcat_cfg property set no-quorum-policy=ignore + pcs -f xcat_cfg resource op defaults timeout="120s" + pcs -f xcat_cfg resource create ip_xCAT ocf:heartbeat:IPaddr2 ip="10.1.0.1" \ + iflabel="xCAT" cidr_netmask="24" nic="eth2"\ + op monitor interval="37s" + pcs -f xcat_cfg resource create NFS_xCAT lsb:nfs \ + op monitor interval="41s" + pcs -f xcat_cfg resource create NFSlock_xCAT lsb:nfslock \ + op monitor interval="43s" + pcs -f xcat_cfg resource create apache_xCAT ocf:heartbeat:apache configfile="/etc/httpd/conf/httpd.conf" \ + statusurl="http://localhost:80/icons/README.html" testregex="" \ + op monitor interval="57s" + pcs -f xcat_cfg resource create db_xCAT ocf:heartbeat:mysql config="/xCATdrbd/etc/my.cnf" test_user="mysql" \ + binary="/usr/bin/mysqld_safe" pid="/var/run/mysqld/mysqld.pid" socket="/var/lib/mysql/mysql.sock" \ + op monitor interval="57s" + pcs -f xcat_cfg resource create dhcpd lsb:dhcpd \ + op monitor interval="37s" + pcs -f xcat_cfg resource create drbd_xCAT ocf:linbit:drbd drbd_resource=xCAT + pcs -f xcat_cfg resource master ms_drbd_xCAT drbd_xCAT master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" + pcs -f xcat_cfg resource create dummy ocf:heartbeat:Dummy + pcs -f xcat_cfg resource create fs_xCAT ocf:heartbeat:Filesystem device="/dev/drbd/by-res/xCAT" directory="/xCATdrbd" fstype="ext4" \ + op monitor interval="57s" + pcs -f xcat_cfg resource create named lsb:named \ + op monitor interval="37s" + pcs -f xcat_cfg resource create symlinks_xCAT ocf:tummy:drbdlinks configfile="/xCATdrbd/etc/drbdlinks.xCAT.conf" \ + op monitor interval="31s" + pcs -f xcat_cfg resource create xCAT lsb:xcatd \ + op monitor interval="42s" + pcs -f xcat-cfg resource create xCAT_conserver lsb:conserver \ + op monitor interval="53" + pcs -f xcat_cfg resource clone named clone-max=2 clone-node-max=1 notify=false + pcs -f xcat_cfg resource group add grp_xCAT fs_xCAT symlinks_xCAT + pcs -f xcat_cfg constraint colocation add NFS_xCAT grp_xCAT + pcs -f xcat_cfg constraint colocation add NFSlock_xCAT grp_xCAT + pcs -f xcat_cfg constraint colocation add apache_xCAT grp_xCAT + pcs -f xcat_cfg constraint colocation add dhcpd grp_xCAT + pcs -f xcat_cfg constraint colocation add db_xCAT grp_xCAT + pcs -f xcat_cfg constraint colocation add dummy grp_xCAT + pcs -f xcat_cfg constraint colocation add xCAT grp_xCAT + pcs -f xcat-cfg constraint colocation add xCAT_conserver grp_xCAT + pcs -f xcat_cfg constraint colocation add grp_xCAT ms_drbd_xCAT INFINITY with-rsc-role=Master + pcs -f xcat_cfg constraint colocation add ip_xCAT ms_drbd_xCAT INFINITY with-rsc-role=Master + pcs -f xcat_cfg constraint order xCAT then dummy + pcs -f xcat_cfg constraint order NFSlock_xCAT then dummy + pcs -f xcat_cfg constraint order apache_xCAT then dummy + pcs -f xcat_cfg constraint order dhcpd then dummy + pcs -f xcat_cfg constraint order db_xCAT then dummy + pcs -f xcat_cfg constraint order NFS_xCAT then dummy + pcs -f xcat-cfg constraint order xCAT_conserver then dummy + pcs -f xcat_cfg constraint order fs_xCAT then symlinks_xCAT + pcs -f xcat_cfg constraint order ip_xCAT then db_xCAT + pcs -f xcat_cfg constraint order ip_xCAT then apache_xCAT + pcs -f xcat_cfg constraint order ip_xCAT then dhcpd + pcs -f xcat-cfg constraint order ip_xCAT then xCAT_conserver + pcs -f xcat_cfg constraint order grp_xCAT then NFS_xCAT + pcs -f xcat_cfg constraint order grp_xCAT then NFSlock_xCAT + pcs -f xcat_cfg constraint order grp_xCAT then apache_xCAT + pcs -f xcat_cfg constraint order grp_xCAT then db_xCAT + pcs -f xcat_cfg constraint order grp_xCAT then dhcpd + pcs -f xcat-cfg constraint order grp_xCAT then xCAT_conserver + pcs -f xcat_cfg constraint order db_xCAT then xCAT + pcs -f xcat_cfg constraint order promote ms_drbd_xCAT then start grp_xCAT + +Finally we commit the changes that are in xcat_cfg into the live system: :: + + pcs cluster cib-push xcat_cfg + +Once the changes have been commited, we can view the config, by running the command below: :: + + pcs config + +which should result in the following output: :: + + Cluster Name: xcat-cluster + Corosync Nodes: + Pacemaker Nodes: + x3550m4n01 x3550m4n02 + Resources: + Resource: ip_xCAT (class=ocf provider=heartbeat type=IPaddr2) + Attributes: ip=10.1.0.1 iflabel=xCAT cidr_netmask=24 nic=eth2 + Operations: monitor interval=37s (ip_xCAT-monitor-interval-37s) + Resource: NFS_xCAT (class=lsb type=nfs) + Operations: monitor interval=41s (NFS_xCAT-monitor-interval-41s) + Resource: NFSlock_xCAT (class=lsb type=nfslock) + Operations: monitor interval=43s (NFSlock_xCAT-monitor-interval-43s) + Resource: apache_xCAT (class=ocf provider=heartbeat type=apache) + Attributes: configfile=/etc/httpd/conf/httpd.conf statusurl=http://localhost:80/icons/README.html testregex= + Operations: monitor interval=57s (apache_xCAT-monitor-interval-57s) + Resource: db_xCAT (class=ocf provider=heartbeat type=mysql) + Attributes: config=/xCATdrbd/etc/my.cnf test_user=mysql binary=/usr/bin/mysqld_safe pid=/var/run/mysqld/mysqld.pid socket=/var/lib/mysql/mysql.sock + Operations: monitor interval=57s (db_xCAT-monitor-interval-57s) + Resource: dhcpd (class=lsb type=dhcpd) + Operations: monitor interval=37s (dhcpd-monitor-interval-37s) + Master: ms_drbd_xCAT + Meta Attrs: master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true + Resource: drbd_xCAT (class=ocf provider=linbit type=drbd) + Attributes: drbd_resource=xCAT + Operations: monitor interval=60s (drbd_xCAT-monitor-interval-60s) + Resource: dummy (class=ocf provider=heartbeat type=Dummy) + Operations: monitor interval=60s (dummy-monitor-interval-60s) + Resource: xCAT (class=lsb type=xcatd) + Operations: monitor interval=42s (xCAT-monitor-interval-42s) + Resource: xCAT_conserver (class=lsb type=conserver) + Operations: monitor interval=53 (xCAT_conserver-monitor-interval-53) + Clone: named-clone + Meta Attrs: clone-max=2 clone-node-max=1 notify=false + Resource: named (class=lsb type=named) + Operations: monitor interval=37s (named-monitor-interval-37s) + Group: grp_xCAT + Resource: fs_xCAT (class=ocf provider=heartbeat type=Filesystem) + Attributes: device=/dev/drbd/by-res/xCAT directory=/xCATdrbd fstype=ext4 + Operations: monitor interval=57s (fs_xCAT-monitor-interval-57s) + Resource: symlinks_xCAT (class=ocf provider=tummy type=drbdlinks) + Attributes: configfile=/xCATdrbd/etc/drbdlinks.xCAT.conf + Operations: monitor interval=31s (symlinks_xCAT-monitor-interval-31s) + + Stonith Devices: + Fencing Levels: + + Location Constraints: + Ordering Constraints: + start xCAT then start dummy (Mandatory) (id:order-xCAT-dummy-mandatory) + start NFSlock_xCAT then start dummy (Mandatory) (id:order-NFSlock_xCAT-dummy-mandatory) + start apache_xCAT then start dummy (Mandatory) (id:order-apache_xCAT-dummy-mandatory) + start dhcpd then start dummy (Mandatory) (id:order-dhcpd-dummy-mandatory) + start db_xCAT then start dummy (Mandatory) (id:order-db_xCAT-dummy-mandatory) + start NFS_xCAT then start dummy (Mandatory) (id:order-NFS_xCAT-dummy-mandatory) + start xCAT_conserver then start dummy (Mandatory) (id:order-xCAT_conserver-dummy-mandatory) + start fs_xCAT then start symlinks_xCAT (Mandatory) (id:order-fs_xCAT-symlinks_xCAT-mandatory) + start ip_xCAT then start db_xCAT (Mandatory) (id:order-ip_xCAT-db_xCAT-mandatory) + start ip_xCAT then start apache_xCAT (Mandatory) (id:order-ip_xCAT-apache_xCAT-mandatory) + start ip_xCAT then start dhcpd (Mandatory) (id:order-ip_xCAT-dhcpd-mandatory) + start ip_xCAT then start xCAT_conserver (Mandatory) (id:order-ip_xCAT-xCAT_conserver-mandatory) + start grp_xCAT then start NFS_xCAT (Mandatory) (id:order-grp_xCAT-NFS_xCAT-mandatory) + start grp_xCAT then start NFSlock_xCAT (Mandatory) (id:order-grp_xCAT-NFSlock_xCAT-mandatory) + start grp_xCAT then start apache_xCAT (Mandatory) (id:order-grp_xCAT-apache_xCAT-mandatory) + start grp_xCAT then start db_xCAT (Mandatory) (id:order-grp_xCAT-db_xCAT-mandatory) + start grp_xCAT then start dhcpd (Mandatory) (id:order-grp_xCAT-dhcpd-mandatory) + start grp_xCAT then start xCAT_conserver (Mandatory) (id:order-grp_xCAT-xCAT_conserver-mandatory) + start db_xCAT then start xCAT (Mandatory) (id:order-db_xCAT-xCAT-mandatory) + promote ms_drbd_xCAT then start grp_xCAT (Mandatory) (id:order-ms_drbd_xCAT-grp_xCAT-mandatory) + Colocation Constraints: + NFS_xCAT with grp_xCAT (INFINITY) (id:colocation-NFS_xCAT-grp_xCAT-INFINITY) + NFSlock_xCAT with grp_xCAT (INFINITY) (id:colocation-NFSlock_xCAT-grp_xCAT-INFINITY) + apache_xCAT with grp_xCAT (INFINITY) (id:colocation-apache_xCAT-grp_xCAT-INFINITY) + dhcpd with grp_xCAT (INFINITY) (id:colocation-dhcpd-grp_xCAT-INFINITY) + db_xCAT with grp_xCAT (INFINITY) (id:colocation-db_xCAT-grp_xCAT-INFINITY) + dummy with grp_xCAT (INFINITY) (id:colocation-dummy-grp_xCAT-INFINITY) + xCAT with grp_xCAT (INFINITY) (id:colocation-xCAT-grp_xCAT-INFINITY) + xCAT_conserver with grp_xCAT (INFINITY) (id:colocation-xCAT_conserver-grp_xCAT-INFINITY) + grp_xCAT with ms_drbd_xCAT (INFINITY) (with-rsc-role:Master) (id:colocation-grp_xCAT-ms_drbd_xCAT-INFINITY) + ip_xCAT with ms_drbd_xCAT (INFINITY) (with-rsc-role:Master) (id:colocation-ip_xCAT-ms_drbd_xCAT-INFINITY) + + Cluster Properties: + cluster-infrastructure: cman + dc-version: 1.1.10-14.el6-368c726 + no-quorum-policy: ignore + stonith-enabled: false + +Then we can check the status of the cluster by running the following command: :: + + pcs status + +And the resulting output should be the following: :: + + Cluster name: xcat-cluster + Last updated: Wed Feb 5 14:23:08 2014 + Last change: Wed Feb 5 14:23:06 2014 via crm_attribute on x3550m4n01 + Stack: cman + Current DC: x3550m4n01 - partition with quorum + Version: 1.1.10-14.el6-368c726 + 2 Nodes configured + 14 Resources configured + + Online: [ x3550m4n01 x3550m4n02 ] + + Full list of resources: + + ip_xCAT (ocf::heartbeat:IPaddr2): Started x3550m4n01 + NFS_xCAT (lsb:nfs): Started x3550m4n01 + NFSlock_xCAT (lsb:nfslock): Started x3550m4n01 + apache_xCAT (ocf::heartbeat:apache): Started x3550m4n01 + db_xCAT (ocf::heartbeat:mysql): Started x3550m4n01 + dhcpd (lsb:dhcpd): Started x3550m4n01 + Master/Slave Set: ms_drbd_xCAT [drbd_xCAT] + Masters: [ x3550m4n01 ] + Slaves: [ x3550m4n02 ] + dummy (ocf::heartbeat:Dummy): Started x3550m4n01 + xCAT (lsb:xcatd): Started x3550m4n01 + xCAT_conserver (lsb:conserver): Started x3550m4n01 + Clone Set: named-clone [named] + Started: [ x3550m4n01 x3550m4n02 ] + Resource Group: grp_xCAT + fs_xCAT (ocf::heartbeat:Filesystem): Started x3550m4n01 + symlinks_xCAT (ocf::tummy:drbdlinks): Started x3550m4n01 + + diff --git a/docs/source/advanced/hamn/setup_ha_mgmt_node_with_raid1_and_disks_move.rst b/docs/source/advanced/hamn/setup_ha_mgmt_node_with_raid1_and_disks_move.rst new file mode 100644 index 000000000..806712b30 --- /dev/null +++ b/docs/source/advanced/hamn/setup_ha_mgmt_node_with_raid1_and_disks_move.rst @@ -0,0 +1,84 @@ +.. _setup_ha_mgmt_node_with_raid1_and disks_move: + +Setup HA Mgmt Node With RAID1 and disks move +============================================ + +This documentation illustrates how to setup a second management node, or standby management node, in your cluster to provide high availability management capability, using RAID1 configuration inside the management node and physically moving the disks between the two management nodes. + +When one disk fails on the primary xCAT management node, replace the failed disk and use the RAID1 functionality to reconstruct the RAID1. + +When the primary xCAT management node fails, the administrator can shutdown the failed primary management node, unplug the disks from the primary management node and insert the disks into the standby management node, power on the standby management node and then the standby management immediately takes over the cluster management role. + +This HAMN approach is primarily intended for clusters in which the management node manages diskful nodes or linux stateless nodes. This also includes hierarchical clusters in which the management node only directly manages the diskful or linux stateless service nodes, and the compute nodes managed by the service nodes can be of any type. + +If the compute nodes use only readonly nfs mounts from the MN management node, you can use this doc as long as you recognize that your nodes will go down while you are failing over to the standby management node. If the compute nodes depend on the management node being up to run its operating system over NFS, this doc is not suitable. + +Configuration requirements +========================== + +#. The hardware type/model are not required to be identical on the two management nodes, but it is recommended to use similar hardware configuration on the two management nodes, at least have similar hardware capability on the two management nodes to support the same operating system and have similar management capability. + +#. Hardware RAID: Most of the IBM servers provide hardware RAID option, it is assumed that the hardware RAID configuration will be used in this HAMN configuration, if hardware RAID is not available on your servers, the software RAID MIGHT also work, but use it at your own risk. + +#. The network connections on the two management nodes must be the same, the ethx on the standby management node must be connected to same network with the ethx on the primary management node. + +#. Use router/switch for routing: if the nodes in the cluster need to connect to the external network through gateway, the gateway should be on the router/switch instead of the management node, the router/switch have their own redundancy. + +Configuration procedure +======================= + +Configure hardware RAID on the two management nodes +----------------------------------------------------- + +Follow the server documentation to setup the hardware RAID1 on the standby management node first, and then move the disks to the primary management node, setup hardware RAID1 on the primary management node. + +Install OS on the primary management node +------------------------------------------------ + +Install operating system on the primary management node using whatever method and configure the network interfaces. + +Make sure the attribute **HWADDR** is not specified in the network interface configuration file, like ifcfg-eth0. + +Initial failover test +---------------------- + +This is a sanity check, need to make sure the disks work on the two management nodes, just in case the disks do not work on the standby management node, we do not need to redo too much. **DO NOT** skip this step. + +Power off the primary management node, unplug the disks from the primary management node and insert them into the standby management node, boot up the standby management node and make sure the operating system is working correctly, and the network interfaces could connect to the network. + +If there are more than one network interfaces managed by the same network driver, like ``e1000``, the network interfaces sequence might be different on the two management nodes even if the hardware configuration is identical on the two management nodes, you need to test the network connections during initial configuration to make sure it works. + +It is unlikely to happen, but just in case the ip addresses on the management node are assigned by DHCP, make sure the DHCP server is configured to assign the same ip address to the network interfaces on the two management nodes. + +After this, fail back to the primary management node, using the same procedure mentioned above. + +Setup xCAT on the Primary Management Node +------------------------------------------- + +Follow the doc :doc:`xCAT Install Guide <../../guides/install-guides/index>` to setup xCAT on the primary management node + +Continue setting up the cluster +-------------------------------- + +You can now continue to setup your cluster. Return to using the primary management node. Now setup your cluster using the following documentation, depending on your Hardware,OS and type of install you want to do on the Nodes :doc:`Admin Guide <../../guides/admin-guides/index>`. + +For all the xCAT docs: http://xcat-docs.readthedocs.org + +During the cluster setup, there is one important thing to consider: + +**Network services on management node** + +Avoid using the management node to provide network services that are needed to be run continuously, like DHCP, named, ntp, put these network services on the service nodes if possible, multiple service nodes can provide network services redundancy, for example, use more than one service nodes as the name servers, DHCP servers and ntp servers for each compute node; if there is no service node configured in the cluster at all, static configuration on the compute nodes, like static ip address and /etc/hosts name resolution, can be used to eliminate the dependency with the management node. + +Failover +======== + +The failover procedure is simple and straightforward: + +#. Shutdown the primary management node + +#. Unplug the disks from the primary management node, insert these disks into the standby management node + +#. Boot up the standby management node + +#. Verify the standby management node could now perform all the cluster management operations. diff --git a/docs/source/advanced/hamn/setup_ha_mgmt_node_with_shared_data.rst b/docs/source/advanced/hamn/setup_ha_mgmt_node_with_shared_data.rst new file mode 100644 index 000000000..7e4a20530 --- /dev/null +++ b/docs/source/advanced/hamn/setup_ha_mgmt_node_with_shared_data.rst @@ -0,0 +1,495 @@ +.. _setup_ha_mgmt_node_with_shared_data: + +Setup HA Mgmt Node With Shared Data +=================================== + +This documentation illustrates how to setup a second management node, or standby management node, in your cluster to provide high availability management capability, using shared data between the two management nodes. + +When the primary xCAT management node fails, the administrator can easily have the standby management node take over role of the management node, and thus avoid long periods of time during which your cluster does not have active cluster management function available. + +The xCAT high availability management node(``HAMN``) through shared data is not designed for automatic setup or automatic failover, this documentation describes how to use shared data between the primary management node and standby management node, and describes how to perform some manual steps to have the standby management node takeover the management node role when the primary management node fails. However, high availability applications such as ``IBM Tivoli System Automation(TSA)`` and Linux ``Pacemaker`` could be used to achieve automatic failover, how to configure the high availability applications is beyond the scope of this documentation, you could refer to the applications documentation for instructions. + +The nfs service on the primary management node or the primary management node itself will be shutdown during the failover process, so any NFS mount or other network connections from the compute nodes to the management node should be temporarily disconnected during the failover process. If the network connectivity is required for compute node run-time operations, you should consider some other way to provide high availability for the network services unless the compute nodes can also be taken down during the failover process. This also implies: + +#. This HAMN approach is primarily intended for clusters in which the management node manages linux diskful nodes or stateless nodes. This also includes hierarchical clusters in which the management node only directly manages the linux diskful or linux stateless service nodes, and the compute nodes managed by the service nodes can be of any type. + +#. If the nodes use only readonly nfs mounts from the MN management node, then you can use this doc as long as you recognize that your nodes will go down while you are failing over to the standby management node. + +What is Shared Data +==================== + +The term ``Shared Data`` means that the two management nodes use a single copy of xCAT data, no matter which management node is the primary MN, the cluster management capability is running on top of the single data copy. The acess to the data could be done through various ways like shared storage, NAS, NFS, samba etc. Based on the protocol being used, the data might be accessable only on one management node at a time or be accessable on both management nodes in parellel. If the data could only be accessed from one management node, the failover process need to take care of the data access transition; if the data could be accessed on both management nodes, the failover does not need to consider the data access transition, it usually means the failover process could be faster. + +``Warning``: Running database through network file system has a lot of potential problems and is not practical, however, most of the database system provides database replication feature that can be used to synronize the database between the two management nodes + +Configuration Requirements +========================== + +#. xCAT HAMN requires that the operating system version, xCAT version and database version be identical on the two management nodes. + +#. The hardware type/model are not required to be the same on the two management nodes, but it is recommended to have similar hardware capability on the two management nodes to support the same operating system and have similar management capability. + +#. Since the management node needs to provide IP services through broadcast such as DHCP to the compute nodes, the primary management node and standby management node should be in the same subnet to ensure the network services will work correctly after failover. + +#. Setting up HAMN can be done at any time during the life of the cluster, in this documentation we assume the HAMN setup is done from the very beginning of the xCAT cluster setup, there will be some minor differences if the HAMN setup is done from the middle of the xCAT cluster setup. + +The example given in this document is for RHEL 6. The same approach can be applied to SLES, but the specific commands might be slightly different. The examples in this documentation are based on the following cluster environment: + +Virtual IP Alias Address: 9.114.47.97 + +Primary Management Node: rhmn1(9.114.47.103), netmask is 255.255.255.192, hostname is rhmn1, running RHEL 6. + +Standby Management Node: rhmn2(9.114.47.104), netmask is 255.255.255.192, hostname is rhmn2. Running RHEL 6. + +You need to substitute the hostnames and ip address with your own values when setting up your HAMN environment. + +Configuring Shared Data +======================= + +``Note``: Shared data itself needs high availability also, the shared data should not become a single point of failure. + +The configuration procedure will be quite different based on the shared data mechanism that will be used. Configuring these shared data mechanisms is beyond the scope of this documentation. After the shared data mechanism is configured, the following xCAT directory structure should be on the shared data, if this is done before xCAT is installed, you need to create the directories manually; if this is done after xCAT is installed, the directories need to be copied to the shared data. :: + + /etc/xcat + /install + ~/.xcat + / + + +``Note``:For mysql, the database directory is ``/var/lib/mysql``; for postgresql, the database directory is ``/var/lib/pgsql``; for DB2, the database directory is specified with the site attribute databaseloc; for sqlite, the database directory is /etc/xcat, already listed above. + +Here is an example of how to make directories be shared data through NFS: :: + + mount -o rw :/dir1 /etc/xcat + mount -o rw :/dir2 /install + mount -o rw :/dir3 ~/.xcat + mount -o rw :/dir4 / + +``Note``: if you need to setup high availability for some other applications, like the HPC software stack, between the two xCAT management nodes, the applications data should be on the shared data. + +Setup xCAT on the Primary Management Node +========================================= + +#. Make the shared data be available on the primary management node. + +#. Set up a ``Virtual IP address``. The xcatd daemon should be addressable with the same ``Virtual IP address``, regardless of which management node it runs on. The same ``Virtual IP address`` will be configured as an alias IP address on the management node (primary and standby) that the xcatd runs on. The Virtual IP address can be any unused ip address that all the compute nodes and service nodes could reach. Here is an example on how to configure Virtual IP address: :: + + ifconfig eth0:0 9.114.47.97 netmask 255.255.255.192 + + The option ``firstalias`` will configure the Virtual IP ahead of the interface ip address, since ifconfig will not make the ip address configuration be persistent through reboots, so the Virtual IP address needs to be re-configured right after the management node is rebooted. This non-persistent Virtual IP address is designed to avoid ip address conflict when the crashed previous primary management is recovered with the Virtual IP address configured. + +#. Add the alias ip address into the ``/etc/resolv.conf`` as the nameserver. Change the hostname resolution order to be using ``/etc/hosts`` before using name server, change to "hosts: files dns" in ``/etc/nsswitch.conf``. + +#. Change hostname to the hostname that resolves to the Virtual IP address. This is required for xCAT and database to be setup properly. + +#. Install xCAT. The procedure described in :doc:`xCAT Install Guide <../../guides/install-guides/index>` could be used for the xCAT setup on the primary management node. + +#. Check the site table master and nameservers and network tftpserver attribute is the Virtual ip: :: + + lsdef -t site + + If not correct: :: + + chdef -t site master=9.114.47.97 + chdef -t site nameservers=9.114.47.97 + chdef -t network tftpserver=9.114.47.97 + + Add the two management nodes into policy table: :: + + tabdump policy + "1.2","rhmn1",,,,,,"trusted",, + "1.3","rhmn2",,,,,,"trusted",, + +#. (Optional) DB2 only, change the databaseloc in site table: :: + + chdef -t site databaseloc=/dbdirectory + +#. Install and configure database. Refer to the doc [**doto:** choosing_the_Database] to configure the database on the xCAT management node. + + Verify xcat is running on correct database by running: :: + + lsxcatd -a + +#. Backup the xCAT database tables for the current configuration on standby management node, using command : :: + + dumpxCATdb -p . + +#. Setup a crontab to backup the database each night by running ``dumpxCATdb`` and storing the backup to some filesystem not on the shared data. + +#. Stop the xcatd daemon and some related network services from starting on reboot: :: + + service xcatd stop + chkconfig --level 345 xcatd off + service conserver off + chkconfig --level 2345 conserver off + service dhcpd stop + chkconfig --level 2345 dhcpd off + +#. Stop Database and prevent the database from auto starting at boot time, use mysql as an example: :: + + service mysqld stop + chkconfig mysqld off + +#. (Optional) If DFM is being used for hardware control capabilities, install DFM package, setup xCAT to communicate directly to the System P server's service processor.:: + + xCAT-dfm RPM + ISNM-hdwr_svr RPM + +#. If there is any node that is already managed by the Management Node,change the noderes table tftpserver & xcatmaster & nfsserver attributes to the Virtual ip + +#. Set the hostname back to original non-alias hostname. + +#. After installing xCAT and database, you could setup service node or compute node. + +Setup xCAT on the Standby Management Node +========================================= + +#. Make sure the standby management node is NOT using the shared data. + +#. Add the alias ip address into the ``/etc/resolv.conf`` as the nameserver. Change the hostname resolution order to be using ``/etc/hosts`` before using name server. Change "hosts: files dns" in /etc/nsswitch.conf. + +#. Temporarily change the hostname to the hostname that resolves to the Virtual IP address. This is required for xCAT and database to be setup properly. This only needs to be done one time. + + Also configure the Virtual IP address during this setup. :: + + ifconfig eth0:0 9.114.47.97 netmask 255.255.255.192 + +#. Install xCAT. The procedure described in :doc:`xCAT Install Guide <../../guides/install-guides/index>` can be used for the xCAT setup on the standby management node. The database system on the standby management node must be the same as the one running on the primary management node. + +#. (Optional) DFM only, Install DFM package: :: + + xCAT-dfm RPM + ISNM-hdwr_svr RPM + +#. Setup hostname resolution between the primary management node and standby management node. Make sure the primary management node can resolve the hostname of the standby management node, and vice versa. + +#. Setup ssh authentication between the primary management node and standby management node. It should be setup as "passwordless ssh authentication" and it should work in both directions. The summary of this procedure is: + + a. cat keys from ``/.ssh/id_rsa.pub`` on the primary management node and add them to ``/.ssh/authorized_keys`` on the standby management node. Remove the standby management node entry from ``/.ssh/known_hosts`` on the primary management node prior to issuing ssh to the standby management node. + + b. cat keys from ``/.ssh/id_rsa.pub`` on the standby management node and add them to ``/.ssh/authorized_keys`` on the primary management node. Remove the primary management node entry from ``/.ssh/known_hosts`` on the standby management node prior to issuing ssh to the primary management node. + +#. Make sure the time on the primary management node and standby management node is synchronized. + +#. Stop the xcatd daemon and related network services from starting on reboot: :: + + service xcatd stop + chkconfig --level 345 xcatd off + service conserver off + chkconfig --level 2345 conserver off + service dhcpd stop + chkconfig --level 2345 dhcpd off + +#. Stop Database and prevent the database from auto starting at boot time. Use mysql as an example: :: + + service mysqld stop + chkconfig mysqld off + +#. Backup the xCAT database tables for the current configuration on standby management node, using command: :: + + dumpxCATdb -p . + +#. Change the hostname back to the original hostname. + +#. Remove the Virtual Alias IP. :: + + ifconfig eth0:0 0.0.0.0 0.0.0.0 + +File Synchronization +==================== + +For the files that are changed constantly such as xcat database, ``/etc/xcat/*``, we have to put the files on the shared data; but for the files that are not changed frequently or unlikely to be changed at all, we can simply copy the the files from the primary management node to the standby management node or use crontab and rsync to keep the files synchronized between primary management node and standby management node. Here are some files we recommend to keep synchronization between the primary management node and standby management node: + +SSL Credentials and SSH Keys +-------------------------------- + +To enable both the primary and the standby management nodes to ssh to the service nodes and compute nodes, the ssh keys should be kept synchronized between the primary management node and standby management node. To allow xcatd on both the primary and the standby management nodes to communicate with xcatd on the services nodes, the xCAT SSL credentials should be kept synchronized between the primary management node and standby management node. + +The xCAT SSL credentials reside in the directories ``/etc/xcat/ca``, ``/etc/xcat/cert`` and ``$HOME/.xcat/``. The ssh host keys that xCAT generates to be placed on the compute nodes are in the directory ``/etc/xcat/hostkeys``. These directories are on the shared data. + +In addition the ssh root keys in the management node's root home directory (in ~/.ssh) must be kept in sync between the primary management node and standby management node. Only sync the key files and not the authorized_key file. These keys will seldom change, so you can just do it manually when they do, or setup a cron entry like this sample: :: + + 0 1 * * * /usr/bin/rsync -Lprgotz $HOME/.ssh/id* rhmn2:$HOME/.ssh/ + +Now go to the Standby node and add the Primary's id_rsa.pub to the Standby's authorized_keys file. + +Network Services Configuration Files +------------------------------------ + +A lot of network services are configured on the management node, such as DNS, DHCP and HTTP. The network services are mainly controlled by configuration files. However, some of the network services configuration files contain the local hostname/ipaddresses related information, so simply copying these network services configuration files to the standby management node may not work. Generating these network services configuration files is very easy and quick by running xCAT commands such as makedhcp, makedns or nimnodeset, as long as the xCAT database contains the correct information. + +While it is easier to configure the network services on the standby management node by running xCAT commands when failing over to the standby management node, an exception is the ``/etc/hosts``; the ``/etc/hosts`` may be modified on your primary management node as ongoing cluster maintenance occurs. Since the ``/etc/hosts`` is very important for xCAT commands, the ``/etc/hosts`` will be synchronized between the primary management node and standby management node. Here is an example of the crontab entries for synchronizing the ``/etc/hosts``: :: + + 0 2 * * * /usr/bin/rsync -Lprogtz /etc/hosts rhmn2:/etc/ + +Additional Customization Files and Production files +---------------------------------------------------- + +Besides the files mentioned above, there may be some additional customization files and production files that need to be copied over to the standby management node, depending on your local unique requirements. You should always try to keep the standby management node as an identical clone of the primary management node. Here are some example files that can be considered: :: + + /.profile + /.rhosts + /etc/auto_master + /etc/auto/maps/auto.u + /etc/motd + /etc/security/limits + /etc/netscvc.conf + /etc/ntp.conf + /etc/inetd.conf + /etc/passwd + /etc/security/passwd + /etc/group + /etc/security/group + /etc/exports + /etc/dhcpsd.cnf + /etc/services + /etc/inittab + (and more) + +``Note``: +If the IBM HPC software stack is configured in your environment, execute additional steps required to copy additional data or configuration files for HAMN setup. +The dhcpsd.cnf should be syncronized between the primary management node and standby management node only when the DHCP configuration on the two management nodes are exactly the same. + +Cluster Maintenance Considerations +================================== + +The standby management node should be taken into account when doing any maintenance work in the xCAT cluster with HAMN setup. + +#. Software Maintenance - Any software updates on the primary management node should also be done on the standby management node. + +#. File Synchronization - Although we have setup crontab to synchronize the related files between the primary management node and standby management node, the crontab entries are only run in specific time slots. The synchronization delay may cause potential problems with HAMN, so it is recommended to manually synchronize the files mentioned in the section above whenever the files are modified. + +#. Reboot management nodes - In the primary management node needs to be rebooted, since the daemons are set to not auto start at boot time, and the shared data will not be mounted automatically, you should mount the shared data and start the daemons manually. + +``Note``: after software upgrade, some services that were set to not autostart on boot might be started by the software upgrade process, or even set to autostart on boot, the admin should check the services on both primary and standby management node, if any of the services are set to autostart on boot, turn it off; if any of the services are started on the backup management node, stop the service. + +At this point, the HA MN Setup is complete, and customer workloads and system administration can continue on the primary management node until a failure occurs. The xcatdb and files on the standby management node will continue to be synchronized until such a failure occurs. + +Failover +======== + +There are two kinds of failover, planned failover and unplanned failover. The planned failover can be useful for updating the management nodes or any scheduled maintainance activities; the unplanned failover covers the unexpected hardware or software failures. + +In a planned failover, you can do necessary cleanup work on the previous primary management node before failover to the previous standby management node. In a unplanned failover, the previous management node probably is not functioning at all, you can simply shutdown the system. + +Take down the Current Primary Management Node +--------------------------------------------- + +xCAT ships a sample script ``/opt/xcat/share/xcat/hamn/deactivate-mn`` to make the machine be a standby management node. Before using this script, you need to review the script carefully and make updates accordingly, here is an example of how to use this script: :: + + /opt/xcat/share/xcat/hamn/deactivate-mn -i eth1:2 -v 9.114.47.97 + +On the current primary management node: + +If the management node is still available and running the cluster, perform the following steps to shutdown. + +#. (DFM only) Remove connections from CEC and Frame. :: + + rmhwconn cec,frame + rmhwconn cec,frame -T fnm + +#. Stop the xCAT daemon. + + ``Note``: xCAT must be stopped on all Service Nodes also, and LL if using the database. :: + + service xcatd stop + service dhcpd stop + +#. unexport the xCAT NFS directories + + The exported xCAT NFS directories will prevent the shared data partitions from being unmounted, so the exported xCAT NFS directories should be unmounted before failover: :: + + exportfs -ua + +#. Stop database + + Use mysql as an example: :: + + service mysqld stop + +#. Unmount shared data + + All the file systems on the shared data need to be unmounted to make the previous standby management be able to mount the file systems on the shared data. Here is an example: :: + + umount /etc/xcat + umount /install + umount ~/.xcat + umount /db2database + + When trying to umount the file systems, if there are some processes that are accessing the files and directories on the file systems, you will get "Device busy" error. Then stop or kill all the processes that are accessing the shared data file systems and retry the unmount. + +#. Unconfigure Virtual IP: :: + + ifconfig eth0:0 0.0.0.0 0.0.0.0 + + If the ifconfig command has been added to rc.local, remove it from rc.local. + +Bring up the New Primary Management Node +---------------------------------------- + +Execute script ``/opt/xcat/share/xcat/hamn/activate-mn`` to make the machine be a primary management node: :: + + /opt/xcat/share/xcat/hamn/activate-mn -i eth1:2 -v 9.114.47.97 -m 255.255.255.0 + +On the new primary management node: + +#. Configure Virtual IP: :: + + ifconfig eth0:0 9.114.47.97 netmask 255.255.255.192 + + You can put the ifconfig command into rc.local to make the Virtual IP be persistent after reboot. + +#. Mount shared data: :: + + mount /etc/xcat + mount /install + mount /.xcat + mount /db2database + +#. Start database, use mysql as an example: :: + + service mysql start + +#. Start the daemons: :: + + service dhcpd start + service xcatd start + service hdwr_svr start + service conserver start + +#. (DFM only) Setup connection for CEC and Frame: :: + + mkhwconn cec,frame -t + mkhwconn cec,frame -t -T fnm + chnwm -a + +#. Setup network services and conserver + + **DNS**: run ``makedns``. Verify dns services working for node resolution. Make sure the line ``nameserver=`` is in ``/etc/resolv.conf``. + + **DHCP**: if the dhcpd.leases is not syncronized between the primary management node and standby management node, run ``makedhcp -a`` to setup the DHCP leases. Verify dhcp is operational. + + **conserver**: run makeconservercf. This will recreate the ``/etc/conserver.cf`` config files for all the nodes. + +#. (Optional)Setup os deployment environment + + This step is required only when you want to use this new primary management node to perform os deployment tasks. + + The operating system images definitions are already in the xCAT database, and the operating system image files are already in ``/install`` directory. + + Run the following command to list all the operating system images. :: + + lsdef -t osimage -l + + If you are seeing ssh problems when trying to ssh the compute nodes or any other nodes, the hostname in ssh keys under directory $HOME/.ssh needs to be updated. + +#. Restart NFS service and re-export the NFS exports + + Because of the Virtual ip configuration and the other network configuration changes on the new primary management node, the NFS service needs to be restarted and the NFS exports need to be re-exported. :: + + exportfs -ua + service nfs stop + service nfs start + exportfs -a + +Setup the Cluster +----------------- + +At this point you have setup your Primary and Standby management node for HA. You can now continue to setup your cluster. Return to using the Primary management node attached to the shared data. Now setup your Hierarchical cluster using the following documentation, depending on your Hardware,OS and type of install you want to do on the Nodes. Other docs are available for full disk installs :doc:`Admin Guide <../../guides/admin-guides/index>`. + +For all the xCAT docs: http://xcat-docs.readthedocs.org + +Appendix A Configure Shared Disks +================================= + +The following two sections describe how to configure shared disks on Linux. And the steps do not apply to all shared disks configuration scenarios, you may need to use some slightly different steps according to your shared disks configuration. + +The operating system is installed on the internal disks. + +#. Connect the shared disk to both management nodes + + To verify the shared disks are connected correctly, run the sginfo command on both management nodes and look for the same serial number in the output. Please be aware that the sginfo command may not be installed by default on Linux, the sginfo command is shipped with package sg3_utils, you can manually install the package sg3_utils on both management nodes. + + Once the sginfo command is installed, run sginfo -l command on both management nodes to list all the known SCSI disks, for example, enter: :: + + sginfo -l + + Output will be similar to: :: + + /dev/sdd /dev/sdc /dev/sdb /dev/sda + /dev/sg0 [=/dev/sda scsi0 ch=0 id=1 lun=0] + /dev/sg1 [=/dev/sdb scsi0 ch=0 id=2 lun=0] + /dev/sg2 [=/dev/sdc scsi0 ch=0 id=3 lun=0] + /dev/sg3 [=/dev/sdd scsi0 ch=0 id=4 lun=0] + + Use the ``sginfo -s `` to identify disks with the same serial number on both management nodes, for example: + + On the primary management node: :: + + [root@rhmn1 ~]# sginfo -s /dev/sdb + Serial Number '1T23043224 ' + + [root@rhmn1 ~]# + + On the standby management node: :: + + [root@rhmn2~]# sginfo -s /dev/sdb + Serial Number '1T23043224 ' + + We can see that the ``/dev/sdb`` is a shared disk on both management nodes. In some cases, as with mirrored disks and when there is no matching of serial numbers between the two management nodes, multiple disks on a single server can have the same serial number, In these cases, format the disks, mount them on both management nodes, and then touch files on the disks to determine if they are shared between the management nodes. + +#. Create partitions on shared disks + + After the shared disks are identified, create the partitions on the shared disks using fdisk command on the primary management node. Here is an example: :: + + fdisk /dev/sdc + + Verify the partitions are created by running ``fdisk -l``. + +#. Create file systems on shared disks + + Run the ``mkfs.ext3`` command on the primary management node to create file systems on the shared disk that will contain the xCAT data. For example: :: + + mkfs.ext3 -v /dev/sdc1 + mkfs.ext3 -v /dev/sdc2 + mkfs.ext3 -v /dev/sdc3 + mkfs.ext3 -v /dev/sdc4 + + If you place entries for the disk in ``/etc/fstab``, which is not required, ensure that the entries do not have the system automatically mount the disk. + + ``Note``: Since the file systems will not be mounted automatically during system reboot, it implies that you need to manually mount the file systems after the primary management node reboot. Before mounting the file systems, stop xcat daemon first; after the file systems are mounted, start xcat daemon. + +#. Verify the file systems on the primary management node. + + Verify the file systems could be mounted and written on the primary management node, here is an example: :: + + mount /dev/sdc1 /etc/xcat + mount /dev/sdc2 /install + mount /dev/sdc3 ~/.xcat + mount /dev/sdc4 /db2database + + After that, umount the file system on the primary management node: :: + + umount /etc/xcat + umount /install + umount ~/.xcat + umount /db2database + +#. Verify the file systems on the standby management node. + + On the standby management node, verify the file systems could be mounted and written. :: + + mount /dev/sdc1 /etc/xcat + mount /dev/sdc2 /install + mount /dev/sdc3 ~/.xcat + mount /dev/sdc4 /db2database + + You may get errors "mount: you must specify the filesystem type" or "mount: special device /dev/sdb1 does not exist" when trying to mount the file systems on the standby management node, this is caused by the missing devices files on the standby management node, run ``fidsk /dev/sdx`` and simply select "w write table to disk and exit" in the fdisk menu, then retry the mount. + + After that, umount the file system on the standby management node: :: + + umount /etc/xcat + umount /install + umount ~/.xcat + umount /db2database + diff --git a/docs/source/advanced/hamn/setup_xcat_high_available_management_node_in_softlayer.rst b/docs/source/advanced/hamn/setup_xcat_high_available_management_node_in_softlayer.rst new file mode 100644 index 000000000..995cbcf43 --- /dev/null +++ b/docs/source/advanced/hamn/setup_xcat_high_available_management_node_in_softlayer.rst @@ -0,0 +1,511 @@ +.. _setup_xcat_high_available_management_node_with_nfs: + +Setup xCAT HA Mgmt with NFS pacemaker and corosync +==================================================================================== + +In this doc, we will configure a xCAT HA cluster using ``pacemaker`` and ``corosync`` based on NFS server. ``pacemaker`` and ``corosync`` only support ``x86_64`` systems, more information about ``pacemaker`` and ``corosync`` refer to doc :ref:`setup_ha_mgmt_node_with_drbd_pacemaker_corosync`. + +Prepare environments +-------------------- + +The NFS SERVER IP is: c902f02x44 10.2.2.44 + +The NFS shares are ``/disk1/install``, ``/etc/xcat``, ``/root/.xcat``, ``/root/.ssh/``, ``/disk1/hpcpeadmin`` + +First xCAT Management node is: rhmn1 10.2.2.235 + +Second xCAT Management node is: rhmn2 10.2.2.233 + +Virtual IP: 10.2.2.150 + +This example will use static IP to provision nodes, so we do not use dhcp service. If you want to use dhcp service, you should consider to save dhcp related configuration files in NFS server. +The DB is SQLlite. There is no service node in this example. + +Prepare NFS server +-------------------- + +In NFS server 10.2.2.44, execute commands to export fs; If you want to use another non-root user to manage xCAT, such as hpcpeadmin. +You should create a directory for ``/home/hpcpeadmin``; Execute commands in NFS server c902f02x44. :: + + # service nfs start + # mkdir ~/.xcat + # mkdir -p /etc/xcat + # mkdir -p /disk1/install/ + # mkdir -p /disk1/hpcpeadmin + # mkdir -p /disk1/install/xcat + + # vi /etc/exports + /disk1/install *(rw,no_root_squash,sync,no_subtree_check) + /etc/xcat *(rw,no_root_squash,sync,no_subtree_check) + /root/.xcat *(rw,no_root_squash,sync,no_subtree_check) + /root/.ssh *(rw,no_root_squash,sync,no_subtree_check) + /disk1/hpcpeadmin *(rw,no_root_squash,sync,no_subtree_check) + # exportfs -a + +Install First xCAT MN rhmn1 +------------------------------ + +Execute steps on xCAT MN rhmn1 + +#. Configure IP alias in rhmn1: :: + + ifconfig eth0:0 10.2.2.250 netmask 255.0.0.0 + +#. Add alias ip into ``/etc/resolv.conf``: :: + + #vi /etc/resolv.conf + search pok.stglabs.ibm.com + nameserver 10.2.2.250 + + ``rsync`` /etc/resolv.conf to ``c902f02x44:/disk1/install/xcat/``: :: + + rsync /etc/resolv.conf c902f02x44:/disk1/install/xcat/ + + Add alias ip,rhmn2,rhmn1 into ``/etc/hosts``: :: + + #vi /etc/hosts + 10.2.2.233 rhmn2 rhmn2.pok.stglabs.ibm.com + 10.2.2.235 rhmn1 rhmn1.pok.stglabs.ibm.com + + ``rsync`` /etc/hosts to ``c902f02x44:/disk1/install/xcat/``: :: + + rsync /etc/hosts c902f02x44:/disk1/install/xcat/ + +#. Install first xcat MN rhmn1 + + Mount share nfs from 10.2.2.44: :: + + # mkdir -p /install + # mkdir -p /etc/xcat + # mkdir -p /home/hpcpeadmin + # mount 10.2.2.44:/disk1/install /install + # mount 10.2.2.44:/etc/xcat /etc/xcat + # mkdir -p /root/.xcat + # mount 10.2.2.44:/root/.xcat /root/.xcat + # mount 10.2.2.44:/root/.ssh /root/.ssh + # mount 10.2.2.44:/disk1/hpcpeadmin /home/hpcpeadmin + + Create new user hpcpeadmin, change it password to hpcpeadminpw: :: + + # USER="hpcpeadmin" + # GROUP="hpcpeadmin" + # /usr/sbin/groupadd -f ${GROUP} + # /usr/sbin/useradd ${USER} -d /home/${USER} -s /bin/bash + # /usr/sbin/usermod -a -G ${GROUP} ${USER} + # passwd ${USER} + + Change new user hpcpeadmin as sudoers: :: + + # USERNAME="hpcpeadmin" + # SUDOERS_FILE="/etc/sudoers" + # sed s'/Defaults requiretty/#Defaults requiretty'/g ${SUDOERS_FILE} > /tmp/sudoers + # echo "$USERNAME ALL=(ALL) NOPASSWD:ALL" >> /tmp/sudoers + # cp -f /tmp/sudoers ${SUDOERS_FILE} + # chown hpcpeadmin:hpcpeadmin /home/hpcpeadmin + # rm -rf /tmp/sudoers + + Check the result: :: + + #su - hpcpeadmin + $ sudo cat /etc/sudoers|grep hpcpeadmin + hpcpeadmin ALL=(ALL) NOPASSWD:ALL + $exit + + Download xcat-core tar ball and xcat-dep tar ball from github, and untar them: :: + + # mkdir /install/xcat + # mv xcat-core-2.8.4.tar.bz2 /install/xcat/ + # mv xcat-dep-201404250449.tar.bz2 /install/xcat/ + # cd /install/xcat + # tar -jxvf xcat-core-2.8.4.tar.bz2 + # tar -jxvf xcat-dep-201404250449.tar.bz2 + # cd xcat-core + # ./mklocalrepo.sh + # cd ../xcat-dep/rh6/x86_64/ + # ./mklocalrepo.sh + # yum clean metadata + # yum install xCAT + # source /etc/profile.d/xcat.sh + +#. Use vip in site table and networks table: :: + + # chdef -t site master=10.2.2.250 nameservers=10.2.2.250 + # chdef -t network 10_0_0_0-255_0_0_0 tftpserver=10.2.2.250 + # tabdump networks + ~]#netname,net,mask,mgtifname,gateway,dhcpserver,tftpserver,nameservers,ntpservers,logservers,dynamicrange,staticrange,staticrangeincrement,nodehostname,ddnsdomain,vlanid,domain,comments,disable + "10_0_0_0-255_0_0_0","10.0.0.0","255.0.0.0","eth0","10.2.0.221",,"10.2.2.250",,,,,,,,,,,, + +#. Add 2 nodes into policy table: :: + + #tabedit policy + "1.2","rhmn1",,,,,,"trusted",, + "1.3","rhmn2",,,,,,"trusted",, + +#. Backup xcatDB(optional): :: + + dumpxCATdb -p . + +#. Check and handle the policy table to allow the user to run commands: :: + + # chtab policy.priority=6 policy.name=hpcpeadmin policy.rule=allow + # tabdump policy + /#priority,name,host,commands,noderange,parameters,time,rule,comments,disable + "1","root",,,,,,"allow",, + "1.2","rhmn1",,,,,,"trusted",, + "1.3","rhmn2",,,,,,"trusted",, + "2",,,"getbmcconfig",,,,"allow",, + "2.1",,,"remoteimmsetup",,,,"allow",, + "2.3",,,"lsxcatd",,,,"allow",, + "3",,,"nextdestiny",,,,"allow",, + "4",,,"getdestiny",,,,"allow",, + "4.4",,,"getpostscript",,,,"allow",, + "4.5",,,"getcredentials",,,,"allow",, + "4.6",,,"syncfiles",,,,"allow",, + "4.7",,,"litefile",,,,"allow",, + "4.8",,,"litetree",,,,"allow",, + "6","hpcpeadmin",,,,,,"allow",, + +#. Make sure xCAT commands are in the user's path :: + + # su - hpcpeadmin + $ echo $PATH | grep xcat + /opt/xcat/bin:/opt/xcat/sbin:/opt/xcat/share/xcat/tools:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/hpcpeadmin/bin + $lsdef -t site -l + +#. Stop the xcatd daemon and some related network services from starting on reboot :: + + # service xcatd stop + Stopping xCATd [ OK ] + # chkconfig --level 345 xcatd off + # service conserver stop + conserver not running, not stopping [PASSED] + # chkconfig --level 2345 conserver off + # service dhcpd stop + # chkconfig --level 2345 dhcpd off + + Remove the Virtual Alias IP :: + + # ifconfig eth0:0 0.0.0.0 0.0.0.0 + +Install second xCAT MN node rhmn2 +------------------------------------- + +The installation steps are the exactly same with above part ``Install fist xCAT MN node rhmn1``, using the same VIP with rhmn1. + +SSH Setup Across nodes rhmn1 and rhmn2 +--------------------------------------------- + +Setup ssh across nodes rhmn1 and rhmn2, make sure rhmn1 can ssh to rhmn2 using no password: :: + + cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys + rsync -ave ssh /etc/ssh/ rhmn2:/etc/ssh/ + rsync -ave ssh /root/.ssh/ rhmn2:/root/.ssh/ + +``Note``: if they can ssh each other using password, it is enough. + +Install corosync and pacemaker on both rhmn2 and rhmn1 +------------------------------------------------------------- + +#. Download crmsh pssh python-pssh: :: + + wget download.opensuse.org/repositories/network:/ha-clustering:/Stable/RedHat_RHEL-6/x86_64/crmsh-2.1-1.1.x86_64.rpm + wget download.opensuse.org/repositories/network:/ha-clustering:/Stable/RedHat_RHEL-6/x86_64/pssh-2.3.1-4.2.x86_64.rpm + wget download.opensuse.org/repositories/network:/ha-clustering:/Stable/RedHat_RHEL-6/x86_64/python-pssh-2.3.1-4.2.x86_64.rpm + rpm -ivh python-pssh-2.3.1-4.2.x86_64.rpm + rpm -ivh pssh-2.3.1-4.2.x86_64.rpm + yum install redhat-rpm-config + rpm -ivh crmsh-2.1-1.1.x86_64.rpm + +#. Install ``corosync`` and ``pacemaker`` from OS repositories: :: + + #cd /etc/yum.repos.d + #cat rhel-local.repo + [rhel-local] + name=HPCCloud configured local yum repository for rhels6.5/x86_64 + baseurl=http://10.2.0.221/install/rhels6.5/x86_64 + enabled=1 + gpgcheck=0 + + [rhel-local1] + name=HPCCloud1 configured local yum repository for rhels6.5/x86_64 + baseurl=http://10.2.0.221/install/rhels6.5/x86_64/HighAvailability + enabled=1 + gpgcheck=0 + +#. Install ``corosync`` and ``pacemaker``, then generate ssh key: + + Install ``corosync`` and ``pacemaker``: :: + + yum install -y corosync pacemaker + + Generate a Security Key, first generate a security key for authentication for all nodes in the cluster, + On one of the systems in the corosync cluster enter: :: + + corosync-keygen + + It will look like the command is not doing anything. It is waiting for entropy data + to be written to ``/dev/random`` until it gets 1024 bits. You can speed that process + up by going to another console for the system and entering: :: + + cd /tmp + wget http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.32.8.tar.bz2 + tar xvfj linux-2.6.32.8.tar.bz2 + find . + + This should create enough i/o, needed for entropy. + Then you need to copy that file to all of your nodes and put it in /etc/corosync/ + with ``user=root``, ``group=root`` and mode 0400: :: + + chmod 400 /etc/corosync/authkey + scp /etc/corosync/authkey vm2:/etc/corosync/ + +#. Edit corosync.conf: :: + + #cat /etc/corosync/corosync.conf + #Please read the corosync.conf.5 manual page + compatibility: whitetank + totem { + version: 2 + secauth: off + threads: 0 + interface { + member { + memberaddr: 10.2.2.233 + } + member { + memberaddr: 10.2.2.235 + } + ringnumber: 0 + bindnetaddr: 10.2.2.0 + mcastport: 5405 + } + transport: udpu + } + logging { + fileline: off + to_stderr: no + to_logfile: yes + to_syslog: yes + logfile: /var/log/cluster/corosync.log + debug: off + timestamp: on + logger_subsys { + subsys: AMF + debug: off + } + } + amf { + mode: disabled + } + +#. Configure ``pacemaker``: :: + + #vi /etc/corosync/service.d/pcmk + service { + name: pacemaker + ver: 1 + } + +#. Synchronize: :: + + for f in /etc/corosync/corosync.conf /etc/corosync/service.d/pcmk; do scp $f rhmn2:$f; done + +#. Start ``corosync`` and ``pacemaker`` in both rhmn1 and rhmn2: :: + + # /etc/init.d/corosync start + Starting Corosync Cluster Engine (corosync): [ OK ] + # /etc/init.d/pacemaker start + Starting Pacemaker Cluster Manager[ OK ] + +#. Verify and let stonith false: :: + + # crm_verify -L -V + error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined + error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option + error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity + Errors found during check: config not valid + # crm configure property stonith-enabled=false + +Customize corosync/pacemaker configuration for xCAT +------------------------------------------------------ + +Please be aware that you need to apply ALL the configuration at once. You cannot pick and choose which pieces to put in, and you cannot put some in now, and some later. Don't execute individual commands, but use crm configure edit instead. + + Check that both rhmn2 and chetha are standby state now: :: + + rhmn1 ~]# crm status + Last updated: Wed Aug 13 22:57:58 2014 + Last change: Wed Aug 13 22:40:31 2014 via cibadmin on rhmn1 + Stack: classic openais (with plugin) + Current DC: rhmn2 - partition with quorum + Version: 1.1.8-7.el6-394e906 + 2 Nodes configured, 2 expected votes + 14 Resources configured. + Node rhmn1: standby + Node rhmn2: standby + + Execute ``crm configure edit`` to add all configure at once: :: + + rhmn1 ~]# crm configure edit + node rhmn1 + node rhmn2 \ + attributes standby=on + primitive ETCXCATFS Filesystem \ + params device="10.2.2.44:/etc/xcat" fstype=nfs options=v3 directory="/etc/xcat" \ + op monitor interval=20 timeout=40 + primitive HPCADMIN Filesystem \ + params device="10.2.2.44:/disk1/hpcpeadmin" fstype=nfs options=v3 directory="/home/hpcpeadmin" \ + op monitor interval=20 timeout=40 + primitive ROOTSSHFS Filesystem \ + params device="10.2.2.44:/root/.ssh" fstype=nfs options=v3 directory="/root/.ssh" \ + op monitor interval=20 timeout=40 + primitive INSTALLFS Filesystem \ + params device="10.2.2.44:/disk1/install" fstype=nfs options=v3 directory="/install" \ + op monitor interval=20 timeout=40 + primitive NFS_xCAT lsb:nfs \ + op start interval=0 timeout=120s \ + op stop interval=0 timeout=120s \ + op monitor interval=41s + primitive NFSlock_xCAT lsb:nfslock \ + op start interval=0 timeout=120s \ + op stop interval=0 timeout=120s \ + op monitor interval=43s + primitive ROOTXCATFS Filesystem \ + params device="10.2.2.44:/root/.xcat" fstype=nfs options=v3 directory="/root/.xcat" \ + op monitor interval=20 timeout=40 + primitive apache_xCAT apache \ + op start interval=0 timeout=600s \ + op stop interval=0 timeout=120s \ + op monitor interval=57s timeout=120s \ + params configfile="/etc/httpd/conf/httpd.conf" statusurl="http://localhost:80/icons/README.html" testregex="" \ + meta target-role=Started + primitive dummy Dummy \ + op start interval=0 timeout=600s \ + op stop interval=0 timeout=120s \ + op monitor interval=57s timeout=120s \ + meta target-role=Started + primitive named lsb:named \ + op start interval=0 timeout=120s \ + op stop interval=0 timeout=120s \ + op monitor interval=37s + primitive dhcpd lsb:dhcpd \ + op start interval="0" timeout="120s" \ + op stop interval="0" timeout="120s" \ + op monitor interval="37s" + primitive xCAT lsb:xcatd \ + op start interval=0 timeout=120s \ + op stop interval=0 timeout=120s \ + op monitor interval=42s \ + meta target-role=Started + primitive xCAT_conserver lsb:conserver \ + op start interval=0 timeout=120s \ + op stop interval=0 timeout=120s \ + op monitor interval=53 + primitive xCATmnVIP IPaddr2 \ + params ip=10.2.2.250 cidr_netmask=8 \ + op monitor interval=30s + group XCAT_GROUP INSTALLFS ETCXCATFS ROOTXCATFS HPCADMIN ROOTSSHFS \ + meta resource-stickiness=100 failure-timeout=60 migration-threshold=3 target-role=Started + clone clone_named named \ + meta clone-max=2 clone-node-max=1 notify=false + colocation colo1 inf: NFS_xCAT XCAT_GROUP + colocation colo2 inf: NFSlock_xCAT XCAT_GROUP + colocation colo4 inf: apache_xCAT XCAT_GROUP + colocation colo7 inf: xCAT_conserver XCAT_GROUP + colocation dummy_colocation inf: dummy xCAT + colocation xCAT_colocation inf: xCAT XCAT_GROUP + colocation xCAT_makedns_colocation inf: xCAT xCAT_makedns + order Most_aftergrp inf: XCAT_GROUP ( NFS_xCAT NFSlock_xCAT apache_xCAT xCAT_conserver ) + order Most_afterip inf: xCATmnVIP ( apache_xCAT xCAT_conserver ) + order clone_named_after_ip_xCAT inf: xCATmnVIP clone_named + order dummy_order0 inf: NFS_xCAT dummy + order dummy_order1 inf: xCAT dummy + order dummy_order2 inf: NFSlock_xCAT dummy + order dummy_order3 inf: clone_named dummy + order dummy_order4 inf: apache_xCAT dummy + order dummy_order7 inf: xCAT_conserver dummy + order dummy_order8 inf: xCAT_makedns dummy + order xcat_makedns inf: xCAT xCAT_makedns + order dummy_order5 inf: dhcpd dummy + property cib-bootstrap-options: \ + dc-version=1.1.8-7.el6-394e906 \ + cluster-infrastructure="classic openais (with plugin)" \ + expected-quorum-votes=2 \ + stonith-enabled=false \ + last-lrm-refresh=1406859140 + \#vim:set syntax=pcmk + +Verify auto fail over +------------------------- + +#. Online rhmn1 + + Currently, rhmn2 and rhmn1 status are standby, let us online rhmn1: :: + + rhmn2 ~]# crm node online rhmn1 + rhmn2 /]# crm status + Last updated: Mon Aug 4 23:16:44 2014 + Last change: Mon Aug 4 23:13:09 2014 via crmd on rhmn2 + Stack: classic openais (with plugin) + Current DC: rhmn1 - partition with quorum + Version: 1.1.8-7.el6-394e906 + 2 Nodes configured, 2 expected votes + 12 Resources configured. + Node rhmn2: standby + Online: [ rhmn1 ] + Resource Group: XCAT_GROUP + xCATmnVIP (ocf::heartbeat:IPaddr2): Started rhmn1 + INSTALLFS (ocf::heartbeat:Filesystem): Started rhmn1 + ETCXCATFS (ocf::heartbeat:Filesystem): Started rhmn1 + ROOTXCATFS (ocf::heartbeat:Filesystem): Started rhmn1 + NFS_xCAT (lsb:nfs): Started rhmn1 + NFSlock_xCAT (lsb:nfslock): Started rhmn1 + apache_xCAT (ocf::heartbeat:apache): Started rhmn1 + xCAT (lsb:xcatd): Started rhmn1 + xCAT_conserver (lsb:conserver): Started rhmn1 + dummy (ocf::heartbeat:Dummy): Started rhmn1 + Clone Set: clone_named [named] + Started: [ rhmn1 ] + Stopped: [ named:1 ] + +#. xcat on rhmn2 is not working while it is running in rhmn1: :: + + rhmn2 /]# lsdef -t site -l + Unable to open socket connection to xcatd daemon on localhost:3001. + Verify that the xcatd daemon is running and that your SSL setup is correct. + Connection failure: IO::Socket::INET: connect: Connection refused at /opt/xcat/lib/perl/xCAT/Client.pm line 217. + + rhmn2 /]# ssh rhmn1 "lsxcatd -v" + Version 2.8.4 (git commit 7306ca8abf1c6d8c68d3fc3addc901c1bcb6b7b3, built Mon Apr 21 20:48:59 EDT 2014) + +#. Let rhmn1 standby and rhmn2 online, xcat will run on rhmn2: :: + + rhmn2 /]# crm node online rhmn2 + rhmn2 /]# crm node standby rhmn1 + rhmn2 /]# crm status + Last updated: Mon Aug 4 23:19:33 2014 + Last change: Mon Aug 4 23:19:40 2014 via crm_attribute on rhmn2 + Stack: classic openais (with plugin) + Current DC: rhmn1 - partition with quorum + Version: 1.1.8-7.el6-394e906 + 2 Nodes configured, 2 expected votes + 12 Resources configured. + + Node rhmn1: standby + Online: [ rhmn2 ] + + Resource Group: XCAT_GROUP + xCATmnVIP (ocf::heartbeat:IPaddr2): Started rhmn2 + INSTALLFS (ocf::heartbeat:Filesystem): Started rhmn2 + ETCXCATFS (ocf::heartbeat:Filesystem): Started rhmn2 + ROOTXCATFS (ocf::heartbeat:Filesystem): Started rhmn2 + NFSlock_xCAT (lsb:nfslock): Started rhmn2 + xCAT (lsb:xcatd): Started rhmn2 + Clone Set: clone_named [named] + Started: [ rhmn2 ] + Stopped: [ named:1 ] + + rhmn2 /]#lsxcatd -v + Version 2.8.4 (git commit 7306ca8abf1c6d8c68d3fc3addc901c1bcb6b7b3, built Mon Apr 21 20:48:59 EDT 2014) + +