diff --git a/README.rst b/README.rst index a8377c82e..31b6c2f5c 100644 --- a/README.rst +++ b/README.rst @@ -6,9 +6,20 @@ xCAT is a toolkit for the deployment and administration of clusters. Documentation ------------- -xCAT documentation is available at: http://xcat-docs.readthedocs.io/en/latest/ +Latest xCAT documentation is available at: http://xcat-docs.readthedocs.io/en/latest/ + +`document for xCAT 2.13.11 `_ + +`document for xCAT 2.13.10 `_ + +`document for xCAT 2.13.9 `_ + +`document for xCAT 2.13 `_ + +`document for xCAT 2.12 `_ + +`document for xCAT 2.11 `_ -|docs_latest| |docs_2137| |docs_2136| |docs_2135| |docs_2134| |docs_2133| |docs_2132| |docs_2131| |docs_2130| |docs_212| Open Source License ------------------- @@ -21,58 +32,3 @@ Developers Developers and prospective contributors are encouraged to read the `Developers Guide `_ In particular the `GitHub `_ related subsection. - -.. |docs_2137| image:: https://readthedocs.org/projects/xcat-docs/badge/?version=2.13.7 - :alt: 2.13.7 documentation status - :scale: 100% - :target: http://xcat-docs.readthedocs.io/en/2.13.7/ - -.. |docs_2136| image:: https://readthedocs.org/projects/xcat-docs/badge/?version=2.13.6 - :alt: 2.13.6 documentation status - :scale: 100% - :target: http://xcat-docs.readthedocs.io/en/2.13.6/ - -.. |docs_2135| image:: https://readthedocs.org/projects/xcat-docs/badge/?version=2.13.5 - :alt: 2.13.5 documentation status - :scale: 100% - :target: http://xcat-docs.readthedocs.io/en/2.13.5/ - -.. |docs_2134| image:: https://readthedocs.org/projects/xcat-docs/badge/?version=2.13.4 - :alt: 2.13.4 documentation status - :scale: 100% - :target: http://xcat-docs.readthedocs.io/en/2.13.4/ - -.. |docs_2133| image:: https://readthedocs.org/projects/xcat-docs/badge/?version=2.13.3 - :alt: 2.13.3 documentation status - :scale: 100% - :target: http://xcat-docs.readthedocs.io/en/2.13.3/ - -.. |docs_2132| image:: https://readthedocs.org/projects/xcat-docs/badge/?version=2.13.2 - :alt: 2.13.2 documentation status - :scale: 100% - :target: http://xcat-docs.readthedocs.io/en/2.13.2/ - -.. |docs_2131| image:: https://readthedocs.org/projects/xcat-docs/badge/?version=2.13.1 - :alt: 2.13.1 documentation status - :scale: 100% - :target: http://xcat-docs.readthedocs.io/en/2.13.1/ - -.. |docs_2130| image:: https://readthedocs.org/projects/xcat-docs/badge/?version=2.13.0 - :alt: 2.13.0 documentation status - :scale: 100% - :target: http://xcat-docs.readthedocs.io/en/2.13.0/ - -.. |docs_212| image:: https://readthedocs.org/projects/xcat-docs/badge/?version=2.12 - :alt: 2.12 documentation status - :scale: 100% - :target: http://xcat-docs.readthedocs.io/en/2.12/ - -.. |docs_211| image:: https://readthedocs.org/projects/xcat-docs/badge/?version=2.11 - :alt: 2.11 documentation status - :scale: 100% - :target: http://xcat-docs.readthedocs.io/en/2.11/ - -.. |docs_latest| image:: https://readthedocs.org/projects/xcat-docs/badge/?version=latest - :alt: Latest documentation status - :scale: 100% - :target: http://xcat-docs.readthedocs.io/en/latest/ diff --git a/Version b/Version index c38dd67f3..3c5ed7828 100644 --- a/Version +++ b/Version @@ -1 +1 @@ -2.13.8 +2.13.11 diff --git a/build-python-deb b/build-python-deb new file mode 100755 index 000000000..50148e301 --- /dev/null +++ b/build-python-deb @@ -0,0 +1,91 @@ +#!/bin/bash +# +# Author: Yuan Bai (bybai@cn.ibm.com) +# +# +printusage() +{ + echo "Usage : build-python-deb xcat-openbmc-py" +} +# For the purpose of getting the distribution name +if [[ ! -f /etc/lsb-release ]]; then + echo "ERROR: Could not find /etc/lsb-release, is this script executed on a Ubuntu machine?" + exit 1 +fi +. /etc/lsb-release +# Check the necessary commands before starting the build +for cmd in dch dpkg-buildpackage +do + if ! type "$cmd" >/dev/null 2>&1 + then + echo "ERROR: Required command, $package, not found." >&2 + exit 1 + fi +done + +# Supported distributions +pkg_name=$1 + +if [ "$pkg_name" != "xcat-openbmc-py" ]; then + printusage + exit 1 +fi + +# Find where this script is located to set some build variables +old_pwd=`pwd` +cd `dirname $0` +curdir=`pwd` + +if [ -z "$REL" ]; then + t=${curdir%/src/xcat-core} + REL=`basename $t` +fi + +if [ "$PROMOTE" != 1 ]; then + ver=`cat Version` + + echo "###############################" + echo "# Building xcat-openbmc-py package #" + echo "###############################" + + #the package type: local | snap | alpha + #the build introduce string + build_string="Snap_Build" + xcat_release="snap$(date '+%Y%m%d%H%M')" + pkg_version="${ver}-${xcat_release}" + packages="xCAT-openbmc-py" + for file in $packages + do + file_low="${file,,}" + target_archs="all" + for target_arch in $target_archs + do + cd $file + CURDIR=$(pwd) + dch -v $pkg_version -b -c debian/changelog $build_string + if [ "$target_arch" = "all" ]; then + CURDIR=$(pwd) + cp ${CURDIR}/debian/control ${CURDIR}/debian/control.save.998 + sed -i -e "s#>= 2.13-snap000000000000#= ${pkg_version}#g" ${CURDIR}/debian/control + dpkg-buildpackage -rfakeroot -uc -us + mv ${CURDIR}/debian/control.save.998 ${CURDIR}/debian/control + dh_testdir + dh_testroot + dh_clean -d + fi + rc=$? + if [ $rc -gt 0 ]; then + echo "Error: $file build package failed exit code $rc" + exit $rc + fi + rm -f debian/files + rm -f debian/xcat-openbmc-py.debhelper.log + rm -f debian/xcat-openbmc-py.substvars + sed -i -e "s/* Snap_Build//g" debian/changelog + cd - + rm -f ${file_low}_*.tar.gz + rm -f ${file_low}_*.changes + rm -f ${file_low}_*.dsc + done + done +fi diff --git a/build-ubunturepo b/build-ubunturepo index 729067680..97ce0a1c1 100755 --- a/build-ubunturepo +++ b/build-ubunturepo @@ -183,8 +183,8 @@ then short_short_ver=`cat Version|cut -d. -f 1` build_time=`date` build_machine=`hostname` - commit_id=`git rev-parse --short HEAD` commit_id_long=`git rev-parse HEAD` + commit_id="${commit_id_long:0:7}" package_dir_name=debs$REL #TODO: define the core path and tarball name @@ -213,7 +213,7 @@ then for file in $packages do file_low=`echo $file | tr '[A-Z]' '[a-z]'` - if [ "$file" = "xCAT" -o "$file" = "xCAT-genesis-scripts" ]; then + if [ "$file" = "xCAT" -o "$file" = "xCAT-genesis-scripts" -o "$file" = "xCATsn" ]; then target_archs="amd64 ppc64el" else target_archs="all" @@ -250,7 +250,7 @@ then # shipping bmcsetup and getipmi scripts as part of postscripts files=("bmcsetup" "getipmi") for f in "${files[@]}"; do - cp ${CURDIR}/../xCAT-genesis-scripts/bin/$f ${CURDIR}/postscripts/$f + cp ${CURDIR}/../xCAT-genesis-scripts/usr/bin/$f ${CURDIR}/postscripts/$f sed -i "s/xcat.genesis.$f/$f/g" ${CURDIR}/postscripts/$f done fi diff --git a/buildcore.sh b/buildcore.sh index 5d0bdf593..a7ed2f396 100755 --- a/buildcore.sh +++ b/buildcore.sh @@ -71,6 +71,7 @@ fi # These are the rpms that should be built for each kind of xcat build ALLBUILD="perl-xCAT xCAT-client xCAT-server xCAT-test xCAT-buildkit xCAT xCATsn xCAT-genesis-scripts xCAT-SoftLayer xCAT-vlan xCAT-confluent xCAT-probe xCAT-csm" +ALLBUILD="perl-xCAT xCAT-client xCAT-server xCAT-test xCAT-buildkit xCAT xCATsn xCAT-genesis-scripts xCAT-SoftLayer xCAT-vlan xCAT-confluent xCAT-probe xCAT-csm xCAT-openbmc-py" LENOVOBUILD="perl-xCAT xCAT-client xCAT-server xCAT xCATsn xCAT-genesis-scripts xCAT-vlan xCAT-probe" ZVMBUILD="perl-xCAT xCAT-server xCAT-UI" ZVMLINK="xCAT-client xCAT xCATsn" @@ -192,8 +193,8 @@ function setversionvars { SHORTSHORTVER=`echo $VER|cut -d. -f 1` BUILD_TIME=`date` BUILD_MACHINE=`hostname` - COMMIT_ID=`git rev-parse --short HEAD` COMMIT_ID_LONG=`git rev-parse HEAD` + COMMIT_ID="${COMMIT_ID_LONG:0:7}" XCAT_RELEASE="snap$(date '+%Y%m%d%H%M')" echo "$XCAT_RELEASE" >Release } @@ -327,7 +328,7 @@ if [ "$OSNAME" = "AIX" ]; then fi # Build the rest of the noarch rpms -for rpmname in xCAT-client xCAT-server xCAT-IBMhpc xCAT-rmc xCAT-UI xCAT-test xCAT-buildkit xCAT-SoftLayer xCAT-vlan xCAT-confluent xCAT-probe xCAT-csm; do +for rpmname in xCAT-client xCAT-server xCAT-IBMhpc xCAT-rmc xCAT-UI xCAT-test xCAT-buildkit xCAT-SoftLayer xCAT-vlan xCAT-confluent xCAT-probe xCAT-csm xCAT-openbmc-py; do if [[ " $EMBEDBUILD " != *\ $rpmname\ * ]]; then continue; fi if [ "$OSNAME" = "AIX" -a "$rpmname" = "xCAT-buildkit" ]; then continue; fi # do not build xCAT-buildkit on aix if [ "$OSNAME" = "AIX" -a "$rpmname" = "xCAT-SoftLayer" ]; then continue; fi # do not build xCAT-softlayer on aix diff --git a/docs/source/QA/index.rst b/docs/source/QA/index.rst new file mode 100644 index 000000000..2c8c72c0c --- /dev/null +++ b/docs/source/QA/index.rst @@ -0,0 +1,8 @@ +Questions & Answers +=================== + +.. toctree:: + :maxdepth: 2 + + + makehosts.rst \ No newline at end of file diff --git a/docs/source/QA/makehosts.rst b/docs/source/QA/makehosts.rst new file mode 100644 index 000000000..ba9755033 --- /dev/null +++ b/docs/source/QA/makehosts.rst @@ -0,0 +1,133 @@ +DNS, Hostname, Alias +==================== + +Q: When there are multiple NICs, how to generate ``/etc/hosts`` records? +------------------------------------------------------------------------ + +When there are multiple NICs, and you want to use ``confignetwork`` to configure these NICs, suggest to use ``hosts`` table to configure the installation NIC (``installnic``) and to use ``nics`` table to configure secondary NICs. Refer to the following example to generate ``/etc/hosts`` records. + +**Best practice example**: + + * There are 2 networks in different domains: ``mgtnetwork`` and ``pubnetwork`` + * ``mgtnetwork`` is xCAT management network + * There are 2 adapters in system node1: ``eth0`` and ``eth1`` + * Add installnic ``eth0`` ``10.5.106.101`` record in ``/etc/hosts``, its alias is ``mgtnic`` + * hostnames ``node1-pub`` and ``node1.public.com`` are for nic ``eth1``, IP is ``192.168.30.101`` + +**Steps**: + + #. Add networks entry in ``networks`` table: :: + + chdef -t network mgtnetwork net=10.0.0.0 mask=255.0.0.0 domain=cluster.com + chdef -t network pubnetwork net=192.168.30.0 mask=255.255.255.0 domain=public.com + + #. Create ``node1`` with installnic IP ``10.5.106.101``, its alias is ``mgtnic``: :: + + chdef node1 ip=10.5.106.101 hostnames=mgtnic groups=all + + #. Configure ``eth1`` in ``nics`` table: :: + + chdef node1 nicips.eth1=192.168.30.101 nichostnamesuffixes.eth1=-pub nicaliases.eth1=node1.public.com nictypes.eth1=Ethernet nicnetworks.eth1=pubnetwork + + #. Check ``node1`` definition: :: + + lsdef node1 + Object name: node1 + groups=all + ip=10.5.106.101 + hostnames=mgtnic + nicaliases.eth1=node1.public.com + nichostnamesuffixes.eth1=-pub + nicips.eth1=192.168.30.101 + nicnetworks.eth1=pubnetwork + nictypes.eth1=Ethernet + postbootscripts=otherpkgs + postscripts=syslog,remoteshell,syncfiles + + #. Execute ``makehosts -n`` to generate ``/etc/hosts`` records: :: + + makehosts -n + + #. Check results in ``/etc/hosts``: :: + + 10.5.106.101 node1 node1.cluster.com mgtnic + 192.168.30.101 node1-pub node1.public.com + + #. Edit ``/etc/resolv.conf``, xCAT management node IP like ``10.5.106.2`` is nameserver: :: + + search cluster.com public.com + nameserver 10.5.106.2 + + #. Execute ``makedns -n`` to configure DNS + + +Q: How to configure aliases? +---------------------------- + +There are 3 methods to configure aliases: + +#. Use ``hostnames`` in ``hosts`` table to configure aliases for the installnic. +#. If you want to use script ``confignetwork`` to configure secondary NICs, suggest to use ``aliases`` in ``nics`` table to configure aliases. Refer to :doc:`Configure Aliases <../guides/admin-guides/manage_clusters/common/deployment/network/cfg_network_aliases>` +#. If you want to generate aliases records in ``/etc/hosts`` for secondary NICs and you don't want to use the script ``confignetwork`` to configure these NICs, suggest to use ``otherinterfaces`` in ``hosts`` table to configure aliases. Refer to following example: + + * If you want to add ``node1-hd`` ``20.1.1.1`` in ``hosts`` table, and don't use ``confignetwork`` to configure it, you can add ``otherinterfaces`` like this: :: + + chdef node1 otherinterfaces="node1-hd:20.1.1.1" + + * After executing ``makehosts -n``, you can get records in ``/etc/hosts`` like following: :: + + 20.1.1.1 node1-hd + +**Note**: If suffixes or aliases for the same IP are configured in both ``hosts`` table and ``nics`` table, will cause conflicts. ``makehosts`` will use values from ``nics`` table. The values from ``nics`` table will over-write that from ``hosts`` table to create ``/etc/hosts`` records. + +Q: How to handle the same short hostname in different domains? +-------------------------------------------------------------- + +You can follow the best practice example. + +**Best practice example**: + + * There are 2 networks in different domains: ``mgtnetwork`` and ``pubnetwork`` + * ``mgtnetwork`` is xCAT management network + * Generate 2 records with the same hostname in ``/etc/hosts``, like: :: + + 10.5.106.101 node1.cluster.com + 192.168.20.101 node1.public.com + + * Nameserver is xCAT management node IP + +**Steps**: + + #. Add networks entry in ``networks`` table: :: + + chdef -t network mgtnetwork net=10.0.0.0 mask=255.0.0.0 domain=cluster.com + chdef -t network pubnetwork net=192.168.30.0 mask=255.255.255.0 domain=public.com + + #. Create ``node1`` with ``ip=10.5.106.101``, xCAT can manage and install this node: :: + + chdef node1 ip=10.5.106.101 groups=all + + #. Create ``node1-pub`` with ``ip=192.168.30.101``, this node is only used to generate ``/etc/hosts`` records for public network, can use ``_unmanaged`` group name to label it: :: + + chdef node1-pub ip=192.168.30.101 hostnames=node1.public.com groups=_unmanaged + + #. Execute ``makehosts -n`` to generate ``/etc/hosts`` records: :: + + makehosts -n + + #. Check results in ``/etc/hosts``: :: + + 10.5.106.101 node1 node1.cluster.com + 192.168.30.101 node1-pub node1.public.com + + #. Edit ``/etc/resolv.conf``, for example, xCAT management node IP is 10.5.106.2 : :: + + search cluster.com public.com + nameserver 10.5.106.2 + + #. Execute ``makedns -n`` to configure DNS + +Q: When to use ``hosts`` table and ``nics`` table? +-------------------------------------------------- + +``hosts`` table is used to store IP addresses and hostnames of nodes. ``makehosts`` use these data to create ``/etc/hosts`` records. ``nics`` table is used to stores secondary NICs details. Some scripts like ``confignetwork`` use data from ``nics`` table to configure secondary NICs. ``makehosts`` also use these data to create ``/etc/hosts`` records for each NIC. diff --git a/docs/source/QA/makehosts_qa.rst b/docs/source/QA/makehosts_qa.rst new file mode 100644 index 000000000..f0c425b87 --- /dev/null +++ b/docs/source/QA/makehosts_qa.rst @@ -0,0 +1,132 @@ +DNS,hostname and alias Q/A list +------------------------------- + +Q: When there are multiple NICs, how to generate ``/etc/hosts`` records? +```````````````````````````````````````````````````````````````````````` + +When there are multiple NICs, and you want to use ``confignetwork`` to configure these NICs, suggest to use ``hosts`` table to configure installnic and use ``nics`` table to configure secondary NICs. You can refer to the following best practice example to generate ``/etc/hosts`` records. + +**Best practice example**: + + * There are 2 networks in different domains: ``mgtnetwork`` and ``pubnetwork`` + * ``mgtnetwork`` is xCAT management network + * There are 2 adapters in system node1: ``eth0`` and ``eth1`` + * Add installnic ``eth0`` ``10.5.106.101`` record in ``/etc/hosts``, its alias is ``mgtnic`` + * hostnames ``node1-pub`` and ``node1.public.com`` are for nic ``eth1``, ip is ``192.168.30.101`` + +**Steps**: + + #. Add networks entry in ``networks`` table: :: + + chdef -t network mgtnetwork net=10.0.0.0 mask=255.0.0.0 domain=cluster.com + chdef -t network pubnetwork net=192.168.30.0 mask=255.255.255.0 domain=public.com + + #. Create ``node1`` with installnic ip ``10.5.106.101``, its alias is ``mgtnic``: :: + + chdef node1 ip=10.5.106.101 hostnames=mgtnic groups=all + + #. Configure ``eth1`` in ``nics`` table: :: + + chdef node1 nicips.eth1=192.168.30.101 nichostnamesuffixes.eth1=-pub nicaliases.eth1=node1.public.com nictypes.eth1=Ethernet nicnetworks.eth1=pubnetwork + + #. Check ``node1`` definition: :: + + lsdef node1 + Object name: node1 + groups=all + ip=10.5.106.101 + hostnames=mgtnic + nicaliases.eth1=node1.public.com + nichostnamesuffixes.eth1=-pub + nicips.eth1=192.168.30.101 + nicnetworks.eth1=pubnetwork + nictypes.eth1=Ethernet + postbootscripts=otherpkgs + postscripts=syslog,remoteshell,syncfiles + + #. Execute ``makehosts -n`` to generate ``/etc/hosts`` records: :: + + makehosts -n + + #. Check results in ``/etc/hosts``: :: + + 10.5.106.101 node1 node1.cluster.com mgtnic + 192.168.30.101 node1-pub node1.public.com + + #. Edit ``/etc/resolv.conf``, xCAT management node ip like ``10.5.106.2`` is nameserver: :: + + search cluster.com public.com + nameserver 10.5.106.2 + + #. Execute ``makedns -n`` to configure DNS + +Q: How to configure aliases? +```````````````````````````` + +There are 3 methods to configure aliases: + +#. Use ``hostnames`` in ``hosts`` table to configure aliases for the installnic. +#. If you want to use script ``confignetwork`` to configure secondary NICs, suggest to use ``aliases`` in ``nics`` table to configure aliases, you can refer to :doc:`Configure Aliases <../guides/admin-guides/manage_clusters/common/deployment/network/cfg_network_aliases>` +#. If you want to generate aliases records in ``/etc/hosts`` for secondary NICs, and don't want to use script ``confignetwork`` to configure these NICs, suggest to use ``otherinterfaces`` in ``hosts`` table to configure aliases. You can refer to following example: + + * If you want to add ``node1-hd`` ``20.1.1.1`` in ``hosts`` table, and don't use ``confignetwork`` to configure it, you can add ``otherinterfaces`` like this: :: + + chdef node1 otherinterfaces="node1-hd:20.1.1.1" + + * After executing ``makehosts -n``, you can get records in ``/etc/hosts`` like following: :: + + 20.1.1.1 node1-hd + +**Note**: If suffixes or aliases for the same IP are configured in both ``hosts`` table and ``nics`` table, will cause conflicts. ``makehosts`` will use values from ``nics`` table. The values from ``nics`` table will over-write that from ``hosts`` table to create ``/etc/hosts`` records. + +Q: How to handle the same short hostname in different domains? +`````````````````````````````````````````````````````````````` + +You can follow the best practice example. + +**Best practice example**: + + * There are 2 networks in different domains: ``mgtnetwork`` and ``pubnetwork`` + * ``mgtnetwork`` is xCAT management network + * Generate 2 records with the same hostname in ``/etc/hosts``, like: :: + + 10.5.106.101 node1.cluster.com + 192.168.20.101 node1.public.com + + * Nameserver is xCAT management node IP + +**Steps**: + + #. Add networks entry in ``networks`` table: :: + + chdef -t network mgtnetwork net=10.0.0.0 mask=255.0.0.0 domain=cluster.com + chdef -t network pubnetwork net=192.168.30.0 mask=255.255.255.0 domain=public.com + + #. Create ``node1`` with ``ip=10.5.106.101``, xCAT can manage and install this node: :: + + chdef node1 ip=10.5.106.101 groups=all + + #. Create ``node1-pub`` with ``ip=192.168.30.101``, this node is only used to generate ``/etc/hosts`` records for public network, can use ``_unmanaged`` group name to label it: :: + + chdef node1-pub ip=192.168.30.101 hostnames=node1.public.com groups=_unmanaged + + #. Execute ``makehosts -n`` to generate ``/etc/hosts`` records: :: + + makehosts -n + + #. Check results in ``/etc/hosts``: :: + + 10.5.106.101 node1 node1.cluster.com + 192.168.30.101 node1-pub node1.public.com + + #. Edit ``/etc/resolv.conf``, for example, xCAT management node IP is 10.5.106.2 : :: + + search cluster.com public.com + nameserver 10.5.106.2 + + #. Execute ``makedns -n`` to configure DNS + +Q: When to use ``hosts`` table and ``nics`` table? +`````````````````````````````````````````````````` + +``hosts`` table is used to store IP addresses and hostnames of nodes. ``makehosts`` use these data to create ``/etc/hosts`` records. ``nics`` table is used to stores secondary NICs details. Some scripts like ``confignetwork`` use data from ``nics`` table to configure secondary NICs. ``makehosts`` also use these data to create ``/etc/hosts`` records for each NIC. diff --git a/docs/source/advanced/cluster_maintenance/compute_node/changing_hostname_ip.rst b/docs/source/advanced/cluster_maintenance/compute_node/changing_hostname_ip.rst index 10b7a69c1..7dac634fd 100644 --- a/docs/source/advanced/cluster_maintenance/compute_node/changing_hostname_ip.rst +++ b/docs/source/advanced/cluster_maintenance/compute_node/changing_hostname_ip.rst @@ -18,9 +18,9 @@ Remove Old Provision Environment makedhcp -d -#. Remove the nodes from the conserver configuration :: +#. Remove the nodes from the goconserver configuration :: - makeconservercf -d + makegocons -d Change Definition ----------------- @@ -76,6 +76,6 @@ Update The Provision Environment makedhcp -a -#. Configure the new names in conserver :: +#. Configure the new names in goconserver :: - makeconservercf + makegocons diff --git a/docs/source/advanced/cluster_maintenance/mgmt_node/changing_hostname_ip.rst b/docs/source/advanced/cluster_maintenance/mgmt_node/changing_hostname_ip.rst index 72282f3fb..5dbccfa64 100644 --- a/docs/source/advanced/cluster_maintenance/mgmt_node/changing_hostname_ip.rst +++ b/docs/source/advanced/cluster_maintenance/mgmt_node/changing_hostname_ip.rst @@ -190,9 +190,9 @@ Then update the following in xCAT: "1.4","new_MN_name",,,,,,"trusted",,`` -* Setup up conserver with new credentials :: +* Setup up goconserver with new credentials :: - makeconservercf + makegocons External DNS Server Changed --------------------------- @@ -262,9 +262,9 @@ If it exists, then use the return name and do the following: makedhcp -a - - Add the MN to conserver :: + - Add the MN to goconserver :: - makeconservercf + makegocons Update the genesis packages --------------------------- diff --git a/docs/source/advanced/goconserver/configuration.rst b/docs/source/advanced/goconserver/configuration.rst new file mode 100644 index 000000000..5f6a50d61 --- /dev/null +++ b/docs/source/advanced/goconserver/configuration.rst @@ -0,0 +1,119 @@ +Configuration +============= + +Location +-------- + +The configuration file for ``goconserver`` is located at ``/etc/goconserver/server.conf``. +When the configuration is changed, reload using: ``systemctl restart goconserver.service``. +An example for the configuration could be found from +`Example Conf `_. + +Tag For xCAT +------------ + +xCAT generates a configuration file that includes a identifier on the first +line. For example: :: + + #generated by xcat Version 2.13.10 (git commit 7fcd37ffb7cec37c021ab47d4baec151af547ac0, built Thu Jan 25 07:15:36 EST 2018) + +``makegocons`` checks for this token and will not make changes to the +configuration file if it exists. This gives the user the ability to customize +the configuration based on their specific site configuration. + + +Multiple Output Plugins +----------------------- + +``goconserver`` support console redirection to multiple targets with ``file``, +``tcp`` and ``udp`` logger plugins. The entry could be found like below: :: + + console: + # the console session port for client(congo) to connect. + port: 12430 + + logger: + # for file logger + file: + # multiple file loggers could be specified + # valid fields: name, logdir + - name: default + logdir: /var/log/goconserver/nodes/ + - name: xCAT + logdir: /var/log/consoles + + tcp: + - name: logstash + host: briggs01 + port: 9653 + ssl_key_file: /etc/xcat/cert/server-cred.pem + ssl_cert_file: /etc/xcat/cert/server-cred.pem + ssl_ca_cert_file: /etc/xcat/cert/ca.pem + + - name: rsyslog + host: sn02 + port: 9653 + + udp: + - name: filebeat + host: 192.168.1.5 + port: 512 + +With the configuration above, the console log files for each node would be written in +both ``/var/log/goconserver/nodes/.log`` and ``/var/log/consoles/.log``. +In addition, console log content will be redirected into remote services +specified in the tcp and udp sections. + +Verification +------------ + +To check if ``goconserver`` works correctly, see the log file ``/var/log/goconserver/server.log``. + + #. Check if TCP logger has been activated. + + When starting ``goconserver``, if the log message is like below, it + means the TCP configuration has been activated. :: + + {"file":"github.com/xcat2/goconserver/console/logger/tcp.go (122)","level":"info","msg":"Starting TCP publisher: logstash","time":"2018-03-02T21:15:35-05:00"} + {"file":"github.com/xcat2/goconserver/console/logger/tcp.go (122)","level":"info","msg":"Starting TCP publisher: sn02","time":"2018-03-02T21:15:35-05:00"} + + #. Debug when encounter error about TCP logger + + If the remote service is not started or the network is unreachable, the + log message would be like below. :: + + {"file":"github.com/xcat2/goconserver/console/logger/tcp.go (127)","level":"error","msg":"TCP publisher logstash: dial tcp 10.6.27.1:9653: getsockopt: connection refused","time":"2018-03-07T21:12:58-05:00"} + + Check the service status and the network configuration including the + ``selinux`` and ``iptable rules``. When the remote service works + correctly, TCP or UDP logger of ``goconserver`` would recover automatically. + +Reconnect Interval +------------------ + +If console node is defined with ``ondemand=false``, when the console connection +could not be established, ``goconserver`` would reconnect automatically. The +interval time could be specified at :: + + console: + # retry interval in second if console could not be connected. + reconnect_interval: 10 + +Performance Tuning +------------------ + +Adjust the worker numbers to leverage multi-core processor performance based on +the site configuration. :: + + global: + # the max cpu cores for workload + worker: 4 + +Debug +----- + +The log level for ``goconserver`` is defined in ``/etc/goconserver/server.conf`` :: + + global: + # debug, info, warn, error, fatal, panic + log_level: info diff --git a/docs/source/advanced/goconserver/index.rst b/docs/source/advanced/goconserver/index.rst new file mode 100644 index 000000000..39f3ff2cc --- /dev/null +++ b/docs/source/advanced/goconserver/index.rst @@ -0,0 +1,14 @@ +Go Conserver +============ + +``goconserver`` is a conserver replacement written in `Go `_ +programming language. For more information, see https://github.com/xcat2/goconserver/ + +.. toctree:: + + :maxdepth: 2 + + quickstart.rst + configuration.rst + rest.rst + diff --git a/docs/source/advanced/goconserver/quickstart.rst b/docs/source/advanced/goconserver/quickstart.rst new file mode 100644 index 000000000..96559668b --- /dev/null +++ b/docs/source/advanced/goconserver/quickstart.rst @@ -0,0 +1,20 @@ +Quickstart +========== + +To enable ``goconserver``, execute the following steps: + +#. Install the ``goconserver`` RPM: :: + + yum install goconserver + + +#. If upgrading xCAT running ``conserver``, stop it first: :: + + systemctl stop conserver.service + + +#. Start ``goconserver`` and create the console configuration files with a single command :: + + makegocons + + The new console logs will start logging to ``/var/log/consoles/.log`` \ No newline at end of file diff --git a/docs/source/advanced/goconserver/rest.rst b/docs/source/advanced/goconserver/rest.rst new file mode 100644 index 000000000..81c112786 --- /dev/null +++ b/docs/source/advanced/goconserver/rest.rst @@ -0,0 +1,5 @@ +REST API +======== + +``goconserver`` provides REST API interface to manage the node sessions. For +detail, see `REST `_. diff --git a/docs/source/advanced/index.rst b/docs/source/advanced/index.rst index 8c84eb6ac..c98d96d8c 100755 --- a/docs/source/advanced/index.rst +++ b/docs/source/advanced/index.rst @@ -8,6 +8,7 @@ Advanced Topics cluster_maintenance/index.rst migration/index.rst confluent/index.rst + goconserver/index.rst docker/index.rst domain_name_resolution/index.rst gpu/index.rst @@ -23,6 +24,6 @@ Advanced Topics restapi/index.rst security/index.rst softlayer/index.rst - switches/index.rst sysclone/index.rst zones/index.rst + xcat-inventory/index.rst diff --git a/docs/source/advanced/networks/index.rst b/docs/source/advanced/networks/index.rst index cd29cd51f..83fe4c232 100644 --- a/docs/source/advanced/networks/index.rst +++ b/docs/source/advanced/networks/index.rst @@ -1,5 +1,5 @@ -Networks -======== +Networking +========== .. toctree:: :maxdepth: 2 diff --git a/docs/source/advanced/networks/onie_switches/os_cumulus/install.rst b/docs/source/advanced/networks/onie_switches/os_cumulus/install.rst index 8915d02af..b10dd9a79 100644 --- a/docs/source/advanced/networks/onie_switches/os_cumulus/install.rst +++ b/docs/source/advanced/networks/onie_switches/os_cumulus/install.rst @@ -12,12 +12,16 @@ xCAT provides support for detecting and installing the Cumulus Linux OS into ONI The mac address of the switch management port is required for xCAT to configure the DHCP information and send over the OS to install on the switch. - **[small clusters]** If you know the mac address of the management port on the switch, create the pre-defined switch definition providing the mac address. :: + **Small Clusters** + + If you know the mac address of the management port on the switch, create the pre-defined switch definition providing the mac address. :: mkdef frame01sw1 --template onieswitch arch=armv71 \ ip=192.168.1.1 mac="aa:bb:cc:dd:ee:ff" - **[large clusters]** xCAT's :doc:`switchdiscover ` command can be used to discover the mac address and fill in the predefined switch definitions based on the switch/switchport mapping. + **Large Clusters** + + xCAT's :doc:`switchdiscover ` command can be used to discover the mac address and fill in the predefined switch definitions based on the switch/switchport mapping. #. Define all the switch objects providing the switch/switchport mapping: :: diff --git a/docs/source/advanced/networks/onie_switches/os_cumulus/manage.rst b/docs/source/advanced/networks/onie_switches/os_cumulus/manage.rst index 512c4b1f4..b2976f56a 100644 --- a/docs/source/advanced/networks/onie_switches/os_cumulus/manage.rst +++ b/docs/source/advanced/networks/onie_switches/os_cumulus/manage.rst @@ -4,7 +4,19 @@ Switch Management Switch Port and VLAN Configuration ---------------------------------- -xCAT expects the configuration for the front-panel ports to be located at ``/etc/network/interfaces.d/xCAT.intf`` on the switch. The ``configinterface`` postscript can download an interface configuration file from the management node. Place the configuration file in the directory ``/install/custom/sw_os/cumulus/interface/`` on the management node. It will first look for a file named the same as the switch's hostname, followed by the name of each group, followed by the word 'default'. If the postscript cannot find a configuration file on the management node, it will set all the ports on the switch to be part of VLAN 1. See the Cumulus Networks documentation for more information regarding advanced networking configuration. :: +xCAT places the front-panel port configuration in ``/etc/network/interfaces.d/xCAT.intf``. + +The ``configinterface`` postscript can be used to pull switch interface configuration from the xCAT Management Node (MN) to the switch. Place the switch specific confguration files in the following directory on the MN: ``/install/custom/sw_os/cumulus/interface/``. + +xCAT will look for files in the above directory in the following order: + + 1. file name that matches the switch hostname + 2. file name that matches the switch group name + 3. file name that has the word 'default' + + Note: If the postscript cannot find a configuration file on the MN, it will set all ports on the switch to be part of VLAN 1. + +Execute the script using the following command: :: updatenode -P configinterface @@ -12,9 +24,11 @@ xCAT expects the configuration for the front-panel ports to be located at ``/etc Re-install OS ------------- -There may be occasions where a re-install of the OS is required. Assuming the files are available on the xCAT management node, the following commands will invoke the install process: +There may be occasions where a re-install of the Cumulus Linux OS is required. The following commands can be used to invoke the install: -* **[use xCAT]** ``xdsh`` can be used to invoke the reinstall of the OS: :: +**Note:** Assumption that the Cumulus Linux files are on the xCAT MN in the correct place. + +* **Using xCAT**, ``xdsh`` can invoke the reinstall of the OS: :: # to clear out all the previous configuration, use the -k option (optional) xdsh "/usr/cumulus/bin/onie-select -k @@ -22,7 +36,7 @@ There may be occasions where a re-install of the OS is required. Assuming the # to invoke the reinstall of the OS xdsh "/usr/cumulus/bin/onie-select -i -f;reboot" -* **[manually]** Log into the Cumulus OS switch and run the following commands: :: +* **Manually**, log into the switch and run the following commands: :: sudo onie-select -i sudo reboot diff --git a/docs/source/advanced/networks/onie_switches/os_cumulus/upgrade.rst b/docs/source/advanced/networks/onie_switches/os_cumulus/upgrade.rst index da079129c..ff125998e 100644 --- a/docs/source/advanced/networks/onie_switches/os_cumulus/upgrade.rst +++ b/docs/source/advanced/networks/onie_switches/os_cumulus/upgrade.rst @@ -1,62 +1,79 @@ -Cumulus OS upgrade +Cumulus OS Upgrade ================== -The Cumulus OS on the ONIE switches can be upgraded in 2 ways: +The Cumulus OS on the ONIE switches can be upgraded using one of the following methods: -* Upgrade only the changed packages, using ``apt-get update`` and ``apt-get upgrade``. If the ONIE switches has internet access, this is the preferred method, otherwise, you need to build up a local cumulus mirror in the cluster. +Full Install +------------ - Since in a typical cluster setup, the switches usually do not have internet access, you can create a local mirror on the server which has internet access and can be reached from the switches, the steps are :: +Perform a full install from the ``.bin`` file of the new Cumulus Linux OS version, using ONIE. + +**Note:** Make sure you back up all your data and configuration files as the binary install will erase all previous configuration. - mkdir -p /install/mirror/cumulus - cd /install/mirror/cumulus - #the wget might take a long time, it will be better if you can set up - #a cron job to sync the local mirror with upstream - wget -m --no-parent http://repo3.cumulusnetworks.com/repo/ - - then compose a ``sources.list`` file on MN like this(take 172.21.253.37 as ip address of the local mirror server) :: +#. Place the binary image under ``/install`` on the xCAT MN node. - #cat /tmp/sources.list - deb http://172.21.253.37/install/mirror/cumulus/repo3.cumulusnetworks.com/repo CumulusLinux-3 cumulus upstream - deb-src http://172.21.253.37/install/mirror/cumulus/repo3.cumulusnetworks.com/repo CumulusLinux-3 cumulus upstream - - deb http://172.21.253.37/install/mirror/cumulus/repo3.cumulusnetworks.com/repo CumulusLinux-3-security-updates cumulus upstream - deb-src http://172.21.253.37/install/mirror/cumulus/repo3.cumulusnetworks.com/repo CumulusLinux-3-security-updates cumulus upstream - - deb http://172.21.253.37/install/mirror/cumulus/repo3.cumulusnetworks.com/repo CumulusLinux-3-updates cumulus upstream - deb-src http://172.21.253.37/install/mirror/cumulus/repo3.cumulusnetworks.com/repo CumulusLinux-3-updates cumulus upstream - - distribute the ``sources.list`` file to the switches to upgrade with ``xdcp``, take "switch1" as an example here :: - - xdcp switch1 /tmp/sources.list /etc/apt/sources.list - - then invoke ``apt-get update`` and ``apt-get install`` on the switches to start package upgrade, a reboot might be needed after upgrading :: - - xdsh switch1 'apt-get update && apt-get upgrade && reboot' - - check the `/etc/os-release` file to make sure the Cumulus OS has been upgraded :: - - cat /etc/os-release - - - -* Performe a binary (full image) install of the new version, using ONIE. If you expect to upgrade between major versions or if you have the binary image to upgrade to, this way is the recommended one. Make sure to backup your data and configuration files because binary install will erase all the configuration and data on the switch. - - The steps to perform a binary (full image) install of the new version are: - - 1) place the binary image "cumulus-linux-3.4.1.bin" under ``/install`` directory on MN("172.21.253.37") :: + In this example, IP=172.21.253.37 is the IP on the Management Node. :: mkdir -p /install/onie/ cp cumulus-linux-3.4.1.bin /install/onie/ - 2) invoke the upgrade on switches with ``xdsh`` :: +#. Invoke the upgrade on the switches using :doc:`xdsh `: :: - xdsh switch1 "/usr/cumulus/bin/onie-install -a -f -i http://172.21.253.37/install/onie/cumulus-linux-3.4.1.bin && reboot" + xdsh switch1 "/usr/cumulus/bin/onie-install -a -f -i \ + http://172.21.253.37/install/onie/cumulus-linux-3.4.1.bin && reboot" - The full upgrade process might cost 30 min, you can ping the switch with ``ping switch1`` to check whether it finishes upgrade. - - 3) After upgrading, the license should be installed, see :ref:`Activate the License ` for detailed steps. + **Note:** The full upgrade process may run 30 minutes or longer. + +#. After upgrading, the license should be installed, see :ref:`Activate the License ` for details. + +#. Restore your data and configuration files on the switch. + + + +Update Changed Packages +----------------------- + +This is the preferred method for upgrading the switch OS for incremental OS updates. + +Create Local Mirror +``````````````````` + +If the switches do not have access to the public Internet, you can create a local mirror of the Cumulus Linux repo. + +#. Create a local mirror on the Management Node: :: - 4) Restore your data and configuration files on the switch. + mkdir -p /install/mirror/cumulus + cd /install/mirror/cumulus + wget -m --no-parent http://repo3.cumulusnetworks.com/repo/ + +#. Create a ``sources.list`` file to point to the local repo on the Management node. In this example, IP=172.21.253.37 is the IP on the Management Node. :: + + # cat /tmp/sources.list + deb http://172.21.253.37/install/mirror/cumulus/repo3.cumulusnetworks.com/repo CumulusLinux-3 cumulus upstream + deb-src http://172.21.253.37/install/mirror/cumulus/repo3.cumulusnetworks.com/repo CumulusLinux-3 cumulus upstream + + deb http://172.21.253.37/install/mirror/cumulus/repo3.cumulusnetworks.com/repo CumulusLinux-3-security-updates cumulus upstream + deb-src http://172.21.253.37/install/mirror/cumulus/repo3.cumulusnetworks.com/repo CumulusLinux-3-security-updates cumulus upstream + + deb http://172.21.253.37/install/mirror/cumulus/repo3.cumulusnetworks.com/repo CumulusLinux-3-updates cumulus upstream + deb-src http://172.21.253.37/install/mirror/cumulus/repo3.cumulusnetworks.com/repo CumulusLinux-3-updates cumulus upstream + + +#. Distribute the ``sources.list`` file to your switches using :doc:`xdcp `. :: + + xdcp switch1 /tmp/sources.list /etc/apt/sources.list + +Invoke the Update +````````````````` + +#. Use xCAT :doc:`xdsh ` to invoke the update: :: + + # + # A reboot may be needed after the upgrade + # + xdsh switch1 'apt-get update && apt-get upgrade && reboot' + +#. Check in ``/etc/os-release`` file to verify that the OS has been upgraded. diff --git a/docs/source/advanced/networks/switchdiscover/switch_based_switch_discovery.rst b/docs/source/advanced/networks/switchdiscover/switch_based_switch_discovery.rst index 356132d7f..2f207d0ff 100644 --- a/docs/source/advanced/networks/switchdiscover/switch_based_switch_discovery.rst +++ b/docs/source/advanced/networks/switchdiscover/switch_based_switch_discovery.rst @@ -105,7 +105,7 @@ if **--setup** flag is specified, the command will perform following steps: snmppassword=xcatadminpassw0rd@snmp snmpusername=xcatadmin snmpversion=3 - status=hostname_configed + status=hostname_configured statustime=08-31-2016 15:35:49 supportedarchs=ppc64 switch=switch-10-5-23-1 @@ -155,13 +155,13 @@ These two config files are located in the **/opt/xcat/share/xcat/scripts** direc Switch Status ~~~~~~~~~~~~~ -During the switch-based switch discovery process, there are four states displayed. User may only see **switch_configed** status on node definition if discovery process successfully finished. +During the switch-based switch discovery process, there are four states displayed. User may only see **switch_configured** status on node definition if discovery process successfully finished. **Matched** --- Discovered switch is matched to predefine switch, **otherinterfaces** attribute is updated to dhcp IP address, and mac address, **switch type** and **usercomment** also updated with vendor information for the predefined switch. -**ip_configed** --- switch is set up to static IP address based on predefine switch IP address. If failure to set up IP address, the status will stay as **Matched**. +**ip_configured** --- switch is set up to static IP address based on predefine switch IP address. If failure to set up IP address, the status will stay as **Matched**. -**hostname_configed** -- switch host name is changed based on predefine switch hostname. If failure to change hostname on the switch, the status will stay as **ip_configed**. +**hostname_configured** -- switch host name is changed based on predefine switch hostname. If failure to change hostname on the switch, the status will stay as **ip_configured**. -**switch_configed** -- snmpv3 is setup for the switches. This should be finial status after running ``switchdiscover --setup`` command. If failure to setup snmpv3, the status will stay as **hostname_configed**. +**switch_configured** -- snmpv3 is setup for the switches. This should be finial status after running ``switchdiscover --setup`` command. If failure to setup snmpv3, the status will stay as **hostname_configured**. diff --git a/docs/source/advanced/pdu/crpdu.rst b/docs/source/advanced/pdu/crpdu.rst new file mode 100644 index 000000000..9933a80bc --- /dev/null +++ b/docs/source/advanced/pdu/crpdu.rst @@ -0,0 +1,109 @@ +Collaborative PDU +================= + +Collaborative PDU is also referred as Coral PDU, it controls power for compute Rack. User can access PDU via SSH and can use the **PduManager** command to configure and manage the PDU product. + + +Pre-Defined PDU Objects +----------------------- + +A pre-defined PDU node object is required before running pdudiscover command. :: + + mkdef coralpdu groups=pdu mgt=pdu nodetype=pdu (required) + +all other attributes can be set by chdef command or pdudisocover command. :: + + --switch required for pdudiscover command to do mapping + --switchport required for pdudiscover command to do mapping + --ip ip address of the pdu. + --mac can be filled in by pdudiscover command + --pdutype crpdu(for coral pdu) or irpdu(for infrastructure PDUs) + + +The following attributes need to be set in order to configure snmp with non-default values. :: + + --community community string for coral pdu + --snmpversion snmp version number, required if configure snmpv3 for coral pdu + --snmpuser snmpv3 user name, required if configure snmpv3 for coral pdu + --authkey auth passphrase for snmpv3 configuration + --authtype auth protocol (MD5|SHA) for snmpv3 configuration + --privkey priv passphrase for snmpv3 configuration + --privtype priv protocol (AES|DES) for snmpv3 configuration + --seclevel security level (noAuthNoPriv|authNoPriv|authPriv) for snmpv3 configuration + +Make sure to run makehosts after pre-defined PDU. :: + + makehosts coralpdu + + +Configure PDUs +-------------- + +After pre-defining PDUs, user can use **pdudisocver --range ip_range --setup** to configure the PDUs, or following commands can be used: + + * To configure passwordless of Coral PDU: :: + + # rspconfig coralpdu sshcfg + + * To change hostname of Coral PDU: :: + + # rspconfig coralpdu hosname=f5pdu3 + + * To change ip address of PDU: :: + + # rsconfig coralpdu ip=x.x.x.x netmaks=255.x.x.x + + * To configure SNMP community string or snmpv3 of PDU (the attribute needs to pre-defined): :: + + # rspconfig coralpdu snmpcfg + + +Remote Power Control of PDU +--------------------------- + +Use the rpower command to remotely power on and off PDU. + + * To check power stat of PDU: :: + + # rpower coralpdu stat + + * To power off the PDU: :: + + # rpower coralpdu off + + * To power on the PDU: :: + + # rpower coralpdu on + +Coral PDUs have three relays, the following commands are for individual relay support of PDU: + + * To check power stat of relay: :: + + # rpower coralpdu relay=1 stat + + * To power off the relay: :: + + # rpower coralpdu relay=2 off + + * To power on the relay: :: + + # rpower coralpdu relay=3 on + + +Show Monitor Data +----------------- + +Use the rvitals command to show realtime monitor data(input voltage, current, power) of PDU. :: + + # rvitals coralpdu + + +Show manufacture information +----------------------------- + +Use the rinv command to show MFR information of PDU :: + + # rinv coralpdu + + + diff --git a/docs/source/advanced/pdu/index.rst b/docs/source/advanced/pdu/index.rst index 5141a09f5..625bfb269 100644 --- a/docs/source/advanced/pdu/index.rst +++ b/docs/source/advanced/pdu/index.rst @@ -1,10 +1,14 @@ PDUs ==== -Power Distribution Units (PDUs) are devices that distribute power to servers in a frame. Intelligent PDUs have the capability of monitoring the amount of power that is being used by devices plugged into it. +Power Distribution Units (PDUs) are devices that distribute power to servers in a frame. They have the capability of monitoring the amount of power that is being used by devices plugged into it and cycle power to individual receptacles. xCAT can support two kinds of PDUs, infrastructure PDU (irpdu) and collaborative PDU (crpdu). + +The Infrastructure rack PDUs are switched and monitored 1U PDU products which can connect up to nine C19 devices or up to 12 C13 devices and an additional three C13 peripheral devices to a signle dedicated power source. The Collaborative PDU is on the compute rack and has the 6x IEC 320-C13 receptacles that feed the rack switches. These two types of PDU have different design and implementation. xCAT has different code path to maintains PDU commands via **pdutype**. .. toctree:: :maxdepth: 2 pdu.rst + irpdu.rst + crpdu.rst diff --git a/docs/source/advanced/pdu/irpdu.rst b/docs/source/advanced/pdu/irpdu.rst new file mode 100644 index 000000000..dccaf049a --- /dev/null +++ b/docs/source/advanced/pdu/irpdu.rst @@ -0,0 +1,161 @@ +Infrastructure PDU +================== + +Users can access Infrastructure PDU via telnet and use the **IBM PDU Configuration Utility** to set up and configure the PDU. xCAT supports PDU commands for power management and monitoring through SNMP. + + +PDU Commands +------------ + +Administrators will need to know the exact mapping of the outlets to each server in the frame. xCAT cannot validate the physical cable is connected to the correct server. + +Add a ``pdu`` attribute to the compute node definition in the form "PDU_Name:outlet": :: + + # + # Compute server cn01 has two power supplies + # connected to outlet 6 and 7 on pdu=f5pdu3 + # + chdef cn01 pdu=f5pdu3:6,f5pdu3:7 + + +The following commands are supported against a compute node: + + * Check the pdu status for a compute node: :: + + # rpower cn01 pdustat + cn01: f5pdu3 outlet 6 is on + cn01: f5pdu3 outlet 7 is on + + + * Power off the PDU outlets for a compute node: :: + + # rpower cn01 pduoff + cn01: f5pdu3 outlet 6 is off + cn01: f5pdu3 outlet 7 is off + + * Power on the PDU outlets for a compute node: :: + + # rpower cn01 pduon + cn01: f5pdu3 outlet 6 is on + cn01: f5pdu3 outlet 7 is on + + * Power cycling the PDU outlets for a compute node: :: + + # rpower cn01 pdureset + cn01: f5pdu3 outlet 6 is reset + cn01: f5pdu3 outlet 7 is reset + +The following commands are supported against a PDU: + + * To change hostname of IR PDU: :: + + # rspconfig f5pdu3 hosname=f5pdu3 + + * To change ip address of IR PDU: :: + + # rsconfig f5pdu3 ip=x.x.x.x netmaks=255.x.x.x + + * Check the status of the full PDU: :: + + # rpower f5pdu3 stat + f5pdu3: outlet 1 is on + f5pdu3: outlet 2 is on + f5pdu3: outlet 3 is on + f5pdu3: outlet 4 is on + f5pdu3: outlet 5 is on + f5pdu3: outlet 6 is off + f5pdu3: outlet 7 is off + f5pdu3: outlet 8 is on + f5pdu3: outlet 9 is on + f5pdu3: outlet 10 is on + f5pdu3: outlet 11 is on + f5pdu3: outlet 12 is on + + * Power off the full PDU: :: + + # rpower f5pdu3 off + f5pdu3: outlet 1 is off + f5pdu3: outlet 2 is off + f5pdu3: outlet 3 is off + f5pdu3: outlet 4 is off + f5pdu3: outlet 5 is off + f5pdu3: outlet 6 is off + f5pdu3: outlet 7 is off + f5pdu3: outlet 8 is off + f5pdu3: outlet 9 is off + f5pdu3: outlet 10 is off + f5pdu3: outlet 11 is off + f5pdu3: outlet 12 is off + + * Power on the full PDU: :: + + # rpower f5pdu3 on + f5pdu3: outlet 1 is on + f5pdu3: outlet 2 is on + f5pdu3: outlet 3 is on + f5pdu3: outlet 4 is on + f5pdu3: outlet 5 is on + f5pdu3: outlet 6 is on + f5pdu3: outlet 7 is on + f5pdu3: outlet 8 is on + f5pdu3: outlet 9 is on + f5pdu3: outlet 10 is on + f5pdu3: outlet 11 is on + f5pdu3: outlet 12 is on + + * Power reset the full PDU: :: + + # rpower f5pdu3 reset + f5pdu3: outlet 1 is reset + f5pdu3: outlet 2 is reset + f5pdu3: outlet 3 is reset + f5pdu3: outlet 4 is reset + f5pdu3: outlet 5 is reset + f5pdu3: outlet 6 is reset + f5pdu3: outlet 7 is reset + f5pdu3: outlet 8 is reset + f5pdu3: outlet 9 is reset + f5pdu3: outlet 10 is reset + f5pdu3: outlet 11 is reset + f5pdu3: outlet 12 is reset + + * PDU inventory information: :: + + # rinv f6pdu16 + f6pdu16: PDU Software Version: "OPDP_sIBM_v01.3_2" + f6pdu16: PDU Machine Type: "1U" + f6pdu16: PDU Model Number: "dPDU4230" + f6pdu16: PDU Part Number: "46W1608" + f6pdu16: PDU Name: "IBM PDU" + f6pdu16: PDU Serial Number: "4571S9" + f6pdu16: PDU Description: "description" + + * PDU and outlet power information: :: + + # rvitals f6pdu15 + f6pdu15: Voltage Warning: 0 + f6pdu15: outlet 1 Current: 0 mA + f6pdu15: outlet 1 Max Capacity of the current: 16000 mA + f6pdu15: outlet 1 Current Threshold Warning: 9600 mA + f6pdu15: outlet 1 Current Threshold Critical: 12800 mA + f6pdu15: outlet 1 Last Power Reading: 0 Watts + f6pdu15: outlet 2 Current: 0 mA + f6pdu15: outlet 2 Max Capacity of the current: 16000 mA + f6pdu15: outlet 2 Current Threshold Warning: 9600 mA + f6pdu15: outlet 2 Current Threshold Critical: 12800 mA + f6pdu15: outlet 2 Last Power Reading: 0 Watts + f6pdu15: outlet 3 Current: 1130 mA + f6pdu15: outlet 3 Max Capacity of the current: 16000 mA + f6pdu15: outlet 3 Current Threshold Warning: 9600 mA + f6pdu15: outlet 3 Current Threshold Critical: 12800 mA + f6pdu15: outlet 3 Last Power Reading: 217 Wattsv + +**Note:** For BMC based compute nodes, turning the PDU outlet power on does not automatically power on the compute side. Users will need to issue ``rpower on`` to power on the compute side after the BMC boots. + + + + + + + + diff --git a/docs/source/advanced/pdu/pdu.rst b/docs/source/advanced/pdu/pdu.rst index a8566b5f8..6e0ca6862 100644 --- a/docs/source/advanced/pdu/pdu.rst +++ b/docs/source/advanced/pdu/pdu.rst @@ -1,17 +1,39 @@ -PDU -=== +Discovering PDUs +================ -xCAT provides basic remote management for each power outlet plugged into the PDUs using SNMP communication. This documentation will focus on configuration of the PDU and Node objects to allow xCAT to control power at the PDU outlet level. +xCAT provides `pdudiscover` command to discover the PDUs that are attached to the neighboring subnets on xCAT management node. :: + + pdudiscover [|--range ipranges] [-r|-x|-z] [-w] [-V|--verbose] [--setup] + +xCAT uses snmp scan method to discover PDU. Make sure net-snmp-utils package is installed on xCAT MN in order to use snmpwalk command. :: + + Options: + --range Specify one or more IP ranges. Each can be an ip address (10.1.2.3) or an ip range + (10.1.2.0/24). If the range is huge, for example, 192.168.1.1/8, the pdu + discover may take a very long time to scan. So the range should be exactly + specified. It accepts multiple formats. For example: + 192.168.1.1/24, 40-41.1-2.3-4.1-100. + + If the range is not specified, the command scans all the subnets that the active + network interfaces (eth0, eth1) are on where this command is issued. + -r Display Raw responses. + -x XML formatted output. + -z Stanza formatted output. + -w Writes output to xCAT database. + --setup Process switch-based pdu discovery and configure the PDUs. For crpdu, --setup options will configure passwordless , change ip address from dhcp to static, hostname changes and snmp v3 configuration. For irpdu, it will configure ip address and hostname. It required predefined PDU node definition with switch name and switch port attributes for mapping. Define PDU Objects ------------------ - #. Define pdu object :: - mkdef f5pdu3 groups=pdu ip=50.0.0.8 mgt=pdu nodetype=pdu + mkdef f5pdu3 groups=pdu ip=50.0.0.8 mgt=pdu nodetype=pdu pdutype=irpdu + +#. Define switch attribute for pdu object which will be used for pdudiscover **--setup** options. :: + + chdef f5pdu3 switch=mid08 switchport=3 #. Add hostname to /etc/hosts:: @@ -19,129 +41,6 @@ Define PDU Objects #. Verify the SNMP command responds against the PDU: :: - snmpwalk -v1 -cpublic -mALL f5pdu3 .1.3.6.1.2.1.1 - - -Define PDU Attribute --------------------- - -Administrators will need to know the exact mapping of the outlets to each server in the frame. xCAT cannot validate the physical cable is connected to the correct server. - -Add a ``pdu`` attribute to the compute node definition in the form "PDU_Name:outlet": :: - - # - # Compute server cn01 has two power supplies - # connected to outlet 6 and 7 on pdu=f5pdu3 - # - chdef cn01 pdu=f5pdu3:6,f5pdu3:7 - - -Verify the setting: ``lsdef cn01 -i pdu`` - - -PDU Commands ------------- - -The following commands are supported against a compute node: - - * Check the pdu status for a compute node: :: - - # rpower cn01 pdustat - cn01: f5pdu3 outlet 6 is on - cn01: f5pdu3 outlet 7 is on - - - * Power off the PDU outlets on a compute node: :: - - # rpower cn01 pduoff - cn01: f5pdu3 outlet 6 is off - cn01: f5pdu3 outlet 7 is off - - * Power on the PDU outlets on a compute node: :: - - # rpower cn01 pduon - cn01: f5pdu3 outlet 6 is on - cn01: f5pdu3 outlet 7 is on - - * Power cycling the PDU outlets on a compute node: :: - - # rpower cn01 pdureset - cn01: f5pdu3 outlet 6 is reset - cn01: f5pdu3 outlet 7 is reset - -The following commands are supported against a PDU: - - * Check the status of the full PDU: :: - - # rinv f5pdu3 - f5pdu3: outlet 1 is on - f5pdu3: outlet 2 is on - f5pdu3: outlet 3 is on - f5pdu3: outlet 4 is on - f5pdu3: outlet 5 is on - f5pdu3: outlet 6 is off - f5pdu3: outlet 7 is off - f5pdu3: outlet 8 is on - f5pdu3: outlet 9 is on - f5pdu3: outlet 10 is on - f5pdu3: outlet 11 is on - f5pdu3: outlet 12 is on - - * Power off the full PDU: :: - - # rpower f5pdu3 off - f5pdu3: outlet 1 is off - f5pdu3: outlet 2 is off - f5pdu3: outlet 3 is off - f5pdu3: outlet 4 is off - f5pdu3: outlet 5 is off - f5pdu3: outlet 6 is off - f5pdu3: outlet 7 is off - f5pdu3: outlet 8 is off - f5pdu3: outlet 9 is off - f5pdu3: outlet 10 is off - f5pdu3: outlet 11 is off - f5pdu3: outlet 12 is off - - * Power on the full PDU: :: - - # rpower f5pdu3 on - f5pdu3: outlet 1 is on - f5pdu3: outlet 2 is on - f5pdu3: outlet 3 is on - f5pdu3: outlet 4 is on - f5pdu3: outlet 5 is on - f5pdu3: outlet 6 is on - f5pdu3: outlet 7 is on - f5pdu3: outlet 8 is on - f5pdu3: outlet 9 is on - f5pdu3: outlet 10 is on - f5pdu3: outlet 11 is on - f5pdu3: outlet 12 is on - - * Power reset the full PDU: :: - - # rpower f5pdu3 reset - f5pdu3: outlet 1 is reset - f5pdu3: outlet 2 is reset - f5pdu3: outlet 3 is reset - f5pdu3: outlet 4 is reset - f5pdu3: outlet 5 is reset - f5pdu3: outlet 6 is reset - f5pdu3: outlet 7 is reset - f5pdu3: outlet 8 is reset - f5pdu3: outlet 9 is reset - f5pdu3: outlet 10 is reset - f5pdu3: outlet 11 is reset - f5pdu3: outlet 12 is reset - - -**Note:** For BMC based compute nodes, turning the PDU outlet power on does not automatically power on the compute side. Users will need to issue ``rpower on`` to power on the compute node after the BMC boots. - - - - - - + snmpwalk -v1 -cpublic -mALL f5pdu3 system diff --git a/docs/source/advanced/security/index.rst b/docs/source/advanced/security/index.rst index 5fdc56f08..790979759 100644 --- a/docs/source/advanced/security/index.rst +++ b/docs/source/advanced/security/index.rst @@ -6,4 +6,5 @@ The security of a system covers a wide range of elements, from the security of s .. toctree:: :maxdepth: 2 - security + ssl_config.rst + security.rst diff --git a/docs/source/advanced/security/ssl_config.rst b/docs/source/advanced/security/ssl_config.rst new file mode 100644 index 000000000..3ea6f360b --- /dev/null +++ b/docs/source/advanced/security/ssl_config.rst @@ -0,0 +1,56 @@ +OpenSSL Configuration +===================== + +xCAT does not ship OpenSSL RPMS nor does it statically link to any OpenSSL libraries. Communication between the xCAT client and daemon utilizes OpenSSL and the administrator can configure SSL_version and SSL_cipher that should be used by xCAT daemons. + +The configuration is stored in the xCAT site table using the ``site.xcatsslversion`` and ``site.xcatsslciphers`` variables. + +Configuration +------------- + +By default, xCAT ships with ``TLSv1`` configured. The current highest SSL version that can be supported is ``TLSv1.2``. + +* For rhels7.x and sles12.x and higher: :: + + chtab key=xcatsslversion site.value=TLSv12 + +* For ubuntu 14.x and higher: :: + + chtab key=xcatsslversion site.value=TLSv1_2 + +* For AIX 7.1.3.x: :: + + chtab key=xcatsslversion site.value=TLSv1_2 + + +If running > ``TLSv1``, it is possible to disable insecure ciphers. Here's an example of one possible configuration: :: + + "xcatsslciphers","kDH:kEDH:kRSA:!SSLv3:!SSLv2:!aNULL:!eNULL:!MEDIUM:!LOW:!MD5:!EXPORT:!CAMELLIA:!ECDH",, + +After making any changes to these configuration values, ``xcatd`` must be restarted: :: + + service restart xcatd + + +If any mistakes have been made and communiation is lost to xCAT, use ``XCATBYPASS`` to fix/remove the bad configuration: :: + + XCATBYPASS=1 tabedit site + + +Validation +---------- + +Use the ``openssl`` command to validate the SSL configuration is valid and expected. + +* To check whether TLSv1 is supported by xcatd: :: + + openssl s_client -connect 127.0.0.1:3001 -tls1 + +* To check if SSLv3 is disabled on ``xcatd``: :: + + openssl s_client -connect localhost:3001 -ssl3 + + You should get a reponse similar to: :: + + 70367087597568:error:14094410:SSL routines:SSL3_READ_BYTES:sslv3 alert handshake failure:s3_pkt.c:1259:SSL alert number 40 + 70367087597568:error:1409E0E5:SSL routines:SSL3_WRITE_BYTES:ssl handshake failure:s3_pkt.c:598: diff --git a/docs/source/advanced/switches/ethernet_switches.rst b/docs/source/advanced/switches/ethernet_switches.rst deleted file mode 100644 index 1c9a8d361..000000000 --- a/docs/source/advanced/switches/ethernet_switches.rst +++ /dev/null @@ -1,2 +0,0 @@ -Ethernet Switches -================= diff --git a/docs/source/advanced/switches/index.rst b/docs/source/advanced/switches/index.rst deleted file mode 100644 index 74d3b9cf9..000000000 --- a/docs/source/advanced/switches/index.rst +++ /dev/null @@ -1,7 +0,0 @@ -Switch Management -================= - -.. toctree:: - :maxdepth: 2 - - ethernet_switches.rst diff --git a/docs/source/advanced/xcat-inventory/define_create_cluster.rst b/docs/source/advanced/xcat-inventory/define_create_cluster.rst new file mode 100644 index 000000000..9a2dce023 --- /dev/null +++ b/docs/source/advanced/xcat-inventory/define_create_cluster.rst @@ -0,0 +1,28 @@ +Define and create your first xCAT cluster easily +================================================ + +The inventory templates for 2 kinds of typical xCAT cluster is shipped. You can create your first xCAT cluster easily by making several modifications on the template. The templates can be found under ``/opt/xcat/share/xcat/inventory_templates`` on management node with ``xcat-inventory`` installed. + +Currently, the inventory templates includes: + +1. flat_cluster_template.yaml: + + a flat baremetal cluster, including **openbmc controlled PowerLE servers**, **IPMI controlled Power servers(commented out)**, **X86_64 servers(commented out)** + +2. flat_kvm_cluster_template.yaml: a flat KVM based Virtual Machine cluster, including **PowerKVM based VM nodes**, **KVM based X86_64 VM nodes(commented out)** + +The steps to create your first xCAT cluster is: + +1. create a customized cluster inventory file "mycluster.yaml" based on ``flat_cluster_template.yaml`` :: + + cp /opt/xcat/share/xcat/inventory_templates/flat_cluster_template.yaml /git/cluster/mycluster.yaml + +2. custmize the cluster inventory file "mycluster.yaml" by modifying the attributs in the line under token ``#CHANGEME`` according to the setup of your phisical cluster. You can create new node definition by duplicating and modifying the node definition in the template. + +3. import the cluster inventory file :: + + xcat-inventory import -f /git/cluster/mycluster.yaml + +Now you have your 1st xCAT cluster, you can start bring up the cluster by provision nodes. + + diff --git a/docs/source/advanced/xcat-inventory/index.rst b/docs/source/advanced/xcat-inventory/index.rst new file mode 100644 index 000000000..5c7c4829b --- /dev/null +++ b/docs/source/advanced/xcat-inventory/index.rst @@ -0,0 +1,21 @@ +xcat-inventory +============== + +`xcat-inventory `_ is an inventory tool for the cluster managed by xCAT. Its features includes: + +* a object based view of the cluster inventory, which is flexible, extensible and well formatted + +* interfaces to export/import the cluster inventory data in yaml/json format, which can be then managed under source control + +* inventory templates for typical clusters, which help user to defines a cluster easily + +* ability to intergrate with Ansible(Comming Soon) + + +This section presents 2 typical user case of ``xcat-inventory`` + +.. toctree:: + :maxdepth: 2 + + version_control_inventory.rst + define_create_cluster.rst diff --git a/docs/source/advanced/xcat-inventory/version_control_inventory.rst b/docs/source/advanced/xcat-inventory/version_control_inventory.rst new file mode 100644 index 000000000..adc311e01 --- /dev/null +++ b/docs/source/advanced/xcat-inventory/version_control_inventory.rst @@ -0,0 +1,50 @@ +Manage the xCAT Cluster Definition under Source Control +======================================================= + +The xCAT cluster inventory data, including global configuration and object definitions(node/osimage/passwd/policy/network/router), and the relationship of the objects, can be exported to a YAML/JSON file(**inventory file**) from xCAT Database, or be imported to xCAT Database from the inventory file. + +By managing the inventory file under source control system, you can manage the xCAT cluster definition under source control. This section presents a typical step-by-step scenario on how to manage cluster inventory data under ``git``. + + +1. create a directory ``/git/cluster`` under git directory to hold the cluster inventory :: + + mkdir -p /git/cluster + cd /git/cluster + git init + +2. export the current cluster configuration to a inventory file "mycluster.yaml" under the git directory created above :: + + xcat-inventory export --format=yaml >/git/cluster/mycluster.yaml + +3. check diff and commit the cluster inventory file(commit no: c95673) :: + + cd /git/cluster + git diff + git add /git/cluster/mycluster.yaml + git commit /git/cluster/mycluster.yaml -m "$(date "+%Y_%m_%d_%H_%M_%S"): initial cluster inventory data; blah-blah" + +4. ordinary cluster maintenance and operation: replaced bad nodes, turn on xcatdebugmode... + +5. cluster setup is stable now, export and commit the cluster configuration(commit no: c95673) :: + + xcat-inventory export --format=yaml >/git/cluster/mycluster.yaml + cd /git/cluster + git diff + git add xcat-inventory export --format=yaml >/git/cluster/mycluster.yaml + git commit /git/cluster/mycluster.yaml -m "$(date "+%Y_%m_%d_%H_%M_%S"): replaced bad nodes; turn on xcatdebugmode; blah-blah" + +6. ordinary cluster maintenance and operation, some issues are founded in current cluster, need to restore the current cluster configuration to commit no c95673 [1]_ :: + + cd /git/cluster + git checkout c95673 + xcat-inventory import -f /git/cluster/mycluster.yaml + +*Notice:* + +1. The cluster inventory data exported by ``xcat-inventory`` does not include intermidiate data, transiate data and historical data in xCAT DataBase, such as node status, auditlog table + +2. We suggest you backup your xCAT database by ``dumpxCATdb`` before your trial on this feature, although we have run sufficient test on this + +.. [1] When you import the inventory data to xCAT Database in step 6, there are 2 modes: ``clean mode`` and ``update mode``. If you choose the ``clean mode`` by ``xcat-inventory import -c|--clean``, all the objects definitions that are not included in the inventory file will be removed; Otherwise, only the objects included in the inventory file will be updated or inserted. Please choose the proper mode according to your need + + diff --git a/docs/source/conf.py b/docs/source/conf.py index f62bace59..1a45ed914 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -59,7 +59,7 @@ author = u'IBM Corporation' # The short X.Y version. version = '2' # The full version, including alpha/beta/rc tags. -release = '2.13.8' +release = '2.13.11' # The language for content autogenerated by Sphinx. Refer to documentation # for a list of supported languages. diff --git a/docs/source/guides/admin-guides/manage_clusters/common/deployment/enable_kdump.rst b/docs/source/guides/admin-guides/manage_clusters/common/deployment/enable_kdump.rst index a6c0765d2..10f5ec4eb 100644 --- a/docs/source/guides/admin-guides/manage_clusters/common/deployment/enable_kdump.rst +++ b/docs/source/guides/admin-guides/manage_clusters/common/deployment/enable_kdump.rst @@ -1,87 +1,64 @@ -Enable Kdump Over Ethernet +Enable kdump Over Ethernet ========================== Overview -------- -kdump is an advanced crash dumping mechanism. When enabled, the system is booted from the context of another kernel. This second kernel reserves a small amount of memory, and its only purpose is to capture the core dump image in case the system crashes. Being able to analyze the core dump helps significantly to determine the exact cause of the system failure. +kdump is an feature of the Linux kernel that allows the system to be booted from the context of another kernel. This second kernel reserves a small amount of memory and its only purpose is to capture the core dump in the event of a kernel crash. The ability to analyze the core dump helps to determine causes of system failures. xCAT Interface -------------- -The pkglist, exclude and postinstall files location and name can be obtained by running the following command: :: +The following attributes of an osimage should be modified to enable ``kdump``: - lsdef -t osimage +* pkglist +* exlist +* postinstall +* dump +* crashkernelsize +* postscripts -Here is an example: :: +Configure the ``pkglist`` file +------------------------------ - lsdef -t osimage rhels7.1-ppc64le-netboot-compute - Object name: rhels7.1-ppc64le-netboot-compute - exlist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.ppc64le.exlist - imagetype=linux - osarch=ppc64le - osdistroname=rhels7.1-ppc64le - osname=Linux - osvers=rhels7.1 - otherpkgdir=/install/post/otherpkgs/rhels7.1/ppc64le - permission=755 - pkgdir=/install/rhels7.1/ppc64le - pkglist=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.ppc64le.pkglist - postinstall=/opt/xcat/share/xcat/netboot/rh/compute.rhels7.ppc64le.postinstall - profile=compute - provmethod=netboot - rootimgdir=/install/netboot/rhels7.1/ppc64le/compute +The ``pkglist`` for the osimage needs to include the appropriate RPMs. The following list of RPMs are provided as a sample, always refer to the Operating System specific documentataion to ensure the required packages are there for ``kdump`` support. -In above example, pkglist file is /opt/xcat/share/xcat/netboot/rh/compute.rhels7.ppc64le.pkglist, exclude files is in /opt/xcat/share/xcat/netboot/rh/compute.rhels7.ppc64le.exlist, and postinstall file is /opt/xcat/share/xcat/netboot/rh/compute.rhels7.ppc64le.postinstall. - -Setup pkglist -------------- - -Before setting up kdump, the appropriate rpms should be added to the pkglist file.Here is the rpm packages list which needs to be added to pkglist file for kdump for different OS. - -* **[RHEL]** :: +* **[RHELS]** :: kexec-tools crash -* **[SLES11]** :: +* **[SLES]** :: kdump kexec-tools makedumpfile -* **[SLES10]** :: - - kernel-kdump - kexec-tools - kdump - makedumpfile - * **[Ubuntu]** :: -The exclude file ----------------- +Modify the ``exlist`` file +-------------------------- -The base diskless image excludes the /boot directory, but it is required for kdump. Update the exlist file and remove the entry for /boot. Then run the packimage or liteimg command to update your image with the changes. +The default diskless image created by ``copycds`` excludes the ``/boot`` directory in the exclude list file, but this is required for ``kdump``. - +Update the ``exlist`` for the target osimage and remove the line ``/boot``: :: -The postinstall file --------------------- + ./boot* # <-- remove this line -The kdump will create a new initrd which used in the dumping stage. The /tmp or /var/tmp directory will be used as the temporary directory. These 2 directory only are allocated 10M space by default. You need to enlarge it to 200M. Modify the postinstall file to increase /tmp space. +Run ``packimage`` to update the diskless image with the changes. + +The ``postinstall`` file +------------------------ + +The kdump will create a new initrd which used in the dumping stage. The ``/tmp`` or ``/var/tmp`` directory will be used as the temporary directory. These 2 directory only are allocated 10M space by default. You need to enlarge it to 200M. Modify the postinstall file to increase ``/tmp`` space. * **[RHELS]** :: tmpfs /var/tmp tmpfs defaults,size=200m 0 2 -* **[SLES10]** :: - - tmpfs /var/tmp tmpfs defaults,size=200m 0 2 - * **[SLES11]** :: tmpfs /tmp tmpfs defaults,size=200m 0 2 @@ -90,105 +67,107 @@ The kdump will create a new initrd which used in the dumping stage. The /tmp or -The dump attribute ------------------- +The ``dump`` attribute +---------------------- -In order to support kdump, the dump attribute was added into linuximage table, which is used to define the remote path where the crash information should be dumped to. Use the chdef command to change the image's dump attribute using the URI format. :: +To support kernel dumps, the ``dump`` attribute **must** be set on the osimage definition. If not set, kdump service will not be enabled. The ``dump`` attribute defines the NFS remote path where the crash information is to be stored. + +Use the ``chdef`` command to set a value of the ``dump`` attribute: :: chdef -t osimage dump=nfs:/// -The can be excluded if the destination NFS server is the service or management node. :: +If the NFS server is the Service Node or Management Node, the server can be left out: :: chdef -t osimage dump=nfs:/// -The crashkernelsize attribute ------------------------------ +**Note:** Only NFS is currently supported as a storage location. Make sure the NFS remote path(``nfs:///``) is exported and it is read-writeable to the node where kdump service is enabled. -For system x machine, on sles10 set the crashkernelsize attribute like this: :: - chdef -t osimage crashkernelsize=M@16M +The ``crashkernelsize`` attribute +--------------------------------- -On sles11 and rhels6 set the crashkernelsize attribute like this: :: +To allow the Operating System to automatically reserve the appropriate amount of memory for the ``kdump`` kernel, set ``crashkernelsize=auto``. + +For setting specific sizes, use the following example: + +* For System X machines, set the ``crashkernelsize`` using this format: :: chdef -t osimage crashkernelsize=M -Where recommended value is 256. For more information about the size can refer to the following information: - ``_. - - ``_. - - ``_. - -For system p machine, set the crashkernelsize attribute to this: :: +* For Power System AC922, set the ``crashkernelsize`` using this format: :: + + chdef -t osimage crashkernelsize=M + +* For System P machines, set the ``crashkernelsize`` using this format: :: chdef -t osimage crashkernelsize=@32M -Where recommended value is 256, more information can refer the kdump document for the system x. +**Notes**: the value of the ``crashkernelsize`` depends on the total physical memory size on the machine. For more about size, refer to `Appedix`_ -When your node starts, and you get a kdump start error like this: :: +If kdump start error like this: :: Your running kernel is using more than 70% of the amount of space you reserved for kdump, you should consider increasing your crashkernel -You should modify this attribute using this chdef command: :: +The ``crashkernelsize`` is not large enough, you should change the ``crashkernelsize`` larger until the error message disappear. - chdef -t osimage crashkernelsize=512M@32M +The ``enablekdump`` postscript +------------------------------ -If 512M@32M is not large enough, you should change the crashkernelsize larger like 1024M until the error message disappear. - -The enablekdump postscript --------------------------- - -This postscript enablekdump is used to start the kdump service when the node is booting up. Add it to your nodes list of postscripts by running this command: :: +xCAT provides a postscript ``enablekdump`` that can be added to the Nodes to automatically start the ``kdump`` service when the node boots. Add to the nodes using the following command: :: chdef -t node -p postscripts=enablekdump -Notes ------ +Manually trigger a kernel panic on Linux +---------------------------------------- -Currently, only NFS is supported for the setup of kdump. +Normally, kernel ``panic()`` will trigger booting into capture kernel. Once the kernel panic is triggered, the node will reboot into the capture kernel, and a kernel dump (vmcore) will be automatically saved to the directory on the specified NFS server (````). -If the dump attribute is not set, the kdump service will not be enabled. +Check your Operating System specific documentation for the path where the kernel dump is saved. For example: -Make sure the NFS remote path(nfs:///) is exported and it is read-writeable to the node where kdump service is enabled. +* **[RHELS6]** :: -How to trigger kernel panic on Linux ------------------------------------- - -Normally, kernel panic() will trigger booting into capture kernel. Once the kernel panic is triggered, the node will reboot into the capture kernel, and a kernel dump (vmcore) will be automatically saved to the directory on the specified NFS server (). - -#. For RHESL6 the directory is /var/crash/-