2
0
mirror of https://github.com/xcat2/xcat-core.git synced 2025-05-29 17:23:08 +00:00

doc change: refine basic concept

This commit is contained in:
wangxiaopeng 2015-09-24 08:55:46 -04:00
parent 0e42c947be
commit 6676306e02
14 changed files with 350 additions and 186 deletions

View File

@ -1,9 +1,25 @@
Basic Concepts
==============
Most xCAT data, including global configuration and cluster information, are stored in xCAT tables in the databse. xCAT abstracts serveral object types from the cluster information to perform the cluster management work.
xCAT is not hard to use but you still need to learn some basic concepts of xCAT before starting to manage a real cluster.
This section presents some basic xCAT knowledge, including xCAT global configuration, the description of xCAT object types and database tables, some commands and techniques on xCAT objects and databse tables, the network services used in xCAT and the typical network planning for a xCAT managed cluster.
* **xCAT Objects**
The unit which can be managed in the xCAT is defined as an object. xCAT abstracts several types of objects from the cluster information to represent the physical or logical entities in the cluster. Each xCAT object has a set of attributes, each attribute is mapped from a specified field of a xCAT database table. The xCAT users can get cluster information and perform cluster management work through operations against the objects.
* **XCAT Database**
All the data for the xCAT Objects (node, group, network, osimage, policy ... and global configuration) are stored in xCAT Database. Tens of tables are created as the back-end of xCAT Objects. Generally the data in the database is used by user through **xCAT Objects**. But xCAT also offers a bunch of commands to handle the database directly.
* **Global Configuration**
xCAT has a bunch of **Global Configuration** for xCAT user to control the behaviors of xCAT. Some of the configuration items are mandatory for an xCAT cluster that you must set them correctly before starting to use xCAT.
* **xCAT Network**
xCAT's goal is to manage and configure a significant number of servers remotely and automatically through a central management server. All the hardware discovery/management, OS deployment/configuration and application install/configuration are performed through network. You need to have a deep understand of how xCAT will use network before setting up a cluster.
**Get Into the Detail of the Cencepts:**
.. toctree::
:maxdepth: 2
@ -12,3 +28,4 @@ This section presents some basic xCAT knowledge, including xCAT global configura
xcat_db/index.rst
global_cfg/index.rst
network_planning/index.rst
node_type.rst

View File

@ -1,12 +1,57 @@
Network Planning
================
This section introduces xCAT node types, the network services in xCAT, the network structure and presents some considerations on planning the xCAT network.
For a cluster, several networks are necessary to enable the cluster management and production.
* **Management network**
This network is used by the management node to install and manage the OS of the nodes. The MN and in-band NIC of the nodes are connected to this network. If you have a large cluster with service nodes, sometimes this network is segregated into separate VLANs for each service node.
Following network services need be set up in this network to supply the OS deployment, application install/configuration service.
* DNS(Domain Name Service)
The dns server, usually the management node or service node, provides the domain name service for the entire cluster.
* HTTP(HyperText Transfer Protocol)
The http server,usually the management node or service node, acts as the download server for the initrd and kernel, the configuration file for the installer and repository for the online installation.
* DHCP(Dynamic Host Configuration Protocol)
The dhcp server, usually the management node or service node, privides the dhcp service for the entire cluster.
* TFTP(Trivial File Transfer Protocol)
The tftp server, usually the management node or service node, acts as the download server for bootloader binaries, bootloader configuration file, initrd and kernel.
* NFS(Network File System)
The NFS server, usually the management node or service node, provides the file system sharing between the management node and service node, or persistent file system support for the stateless node.
* NTP(Network Time Protocol)
The NTP server, usually the management node or service node, provide the network time service for the entire cluster.
* **Service network**
This network is used by the management node to control the nodes out of band via the SP like BMC, FSP. If the BMCs are configured in shared mode [1]_, then this network can be combined with the management network.
* **Application network**
This network is used by the applications on the compute nodes. Usually an IB network for HPC cluster.
* **Site (Public) network**
This network is used to access the management node and sometimes for the compute nodes to provide services to the site.
From the system management perspective, the **Management network** and **Service network** are necessary to perform the hardware control and OS deployment.
**xCAT Network Planning for a New Cluster:**
.. toctree::
:maxdepth: 2
node_type.rst
network_service/index.rst
network_cfg/index.rst
xcat_net_planning.rst
.. [1] shared mode: In "Shared" mode, the BMC network interface and the in-band network interface will share the same network port.

View File

@ -1,12 +0,0 @@
Network Configuration
=====================
This section introduces the network structure in xCAT and presents some consideration on planning the xCAT network.
.. toctree::
:maxdepth: 2
xcat_network.rst
xcat_net_planning.rst

View File

@ -1,23 +0,0 @@
Networks in an xCAT Cluster
===========================
The networks that are typically used in a cluster are:
Management network
------------------
used by the management node to install and manage the OS of the nodes. The MN and in-band NIC of the nodes are connected to this network. If you have a large cluster with service nodes, sometimes this network is segregated into separate VLANs for each service node.
Service network
---------------
used by the management node to control the nodes out of band via the BMC. If the BMCs are configured in shared mode [1]_, then this network can be combined with the management network.
Application network
-------------------
used by the HPC applications on the compute nodes. Usually an IB network.
Site (Public) network
---------------------
used to access the management node and sometimes for the compute nodes to provide services to the site.
.. [1] shared mode: In "Shared" mode, the BMC network interface and the in-band network interface will share the same network port.

View File

@ -1,28 +0,0 @@
Network Services
================
The following network services are used by xCAT:
* DNS(Domain Name Service)
The dns server, usually the management node or service node, provides the domain name service for the entire cluster.
* HTTP(HyperText Transfer Protocol)
The http server,usually the management node or service node, acts as the download server for the initrd and kernel, the configuration file for the installer and repository for the online installation.
* DHCP(Dynamic Host Configuration Protocol)
The dhcp server,usually the management node or service node, privides the dhcp service for the entire cluster.
* TFTP(Trivial File Transfer Protocol)
The tftp server,usually the management node or service node, acts as the download server for bootloader binaries, bootloader configuration file, initrd and kernel.
* NFS(Network File System)
The NFS server, usually the management node or service node, provides the file system sharing between the management node and service node, or persistent file system support for the stateless node.
* NTP(Network Time Protocol)
The NTP server, usually the management node or service node, provide the network time service for the entire cluster.
* SYSLOG
Usually, xCAT uses rsyslog as the syslog service for the cluster, all the log messages of the nodes in the cluster are forwarded to the management node.

View File

@ -1,30 +0,0 @@
xCAT Cluster Node Types
=======================
This section describes 2 standard node types xCAT supports, gives the pros and cons of each, and describes the cluster characteristics that will result from each.
Stateful (diskfull)
-------------------
traditional cluster with OS on each node's local disk.
* Main advantage
this approach is familiar to most admins, and they typically have many years of experience with it
* Main disadvantage
you have to manage all of the individual OS copies
Stateless(diskless)
-------------------
nodes boot from a RAMdisk OS image downloaded from the xCAT mgmt node or service node at boot time. (This option is not available on AIX).
* Main advantage
central management of OS image, but nodes are not tethered to the mgmt node or service node it booted from
* Main disadvantage
you can't use a large image with many different applications all in the image for varied users, because it uses too much of the node's memory to store the ramdisk. (To mitigate this disadvantage, you can put your large application binaries and libraries in gpfs to reduce the ramdisk size. This requires some manual configuration of the image).
Each node can also have a local "scratch" disk for ``swap``, ``/tmp``, ``/var``, ``log`` files, dumps, etc. The purpose of the scratch disk is to provide a location for files that are written to by the node that can become quite large or for files that you don't want to have disappear when the node reboots. There should be nothing put on the scratch disk that represents the node's "state", so that if the disk fails you can simply replace it and reboot the node. A scratch disk would typically be used for situations like: job scheduling preemption is required (which needs a lot of swap space), the applications write large temp files, or you want to keep gpfs log or trace files persistently. (As a partial alternative to using the scratch disk, customers can choose to put ``/tmp`` ``/var/tmp``, and log files (except GPFS logs files) in GPFS, but must be willing to accept the dependency on GPFS). This can be done by enabling the 'localdisk' support. For the details, please refer to the section [TODO Enabling the localdisk Option].

View File

@ -0,0 +1,52 @@
xCAT Cluster OS Running Type
============================
Whether a pyhsical server or a virtual machine, it needs to run an Operating System to support user applications. Generally, the OS is installed in the hard disk of the compute node. But xCAT also support the type that running OS in the RAM.
This section gives the pros and cons of each OS running type, and describes the cluster characteristics that will impact from each.
Stateful (diskful)
------------------
Traditional cluster with OS on each node's local disk.
* Main advantage
This approach is familiar to most admins, and they typically have many years of experience with it.
* Main disadvantage
Admin has to manage all of the individual OS copies, has to face the failure of hard disk. For certain application which requires all the compute nodes have exactly same state, this is also changeable for admin.
Stateless(diskless)
-------------------
Nodes boot from a RAMdisk OS image downloaded from the xCAT mgmt node or service node at boot time.
* Main advantage
Central management of OS image, but nodes are not tethered to the mgmt node or service node it booted from. Whenever you need a new OS for the node, just reboot the node.
* Main disadvantage
You can't use a large image with many different applications all in the image for varied users, because it uses too much of the node's memory to store the ramdisk. (To mitigate this disadvantage, you can put your large application binaries and libraries in gpfs to reduce the ramdisk size. This requires some manual configuration of the image).
Each node can also have a local "scratch" disk for ``swap``, ``/tmp``, ``/var``, ``log`` files, dumps, etc. The purpose of the scratch disk is to provide a location for files that are written to by the node that can become quite large or for files that you don't want to have disappear when the node reboots. There should be nothing put on the scratch disk that represents the node's "state", so that if the disk fails you can simply replace it and reboot the node. A scratch disk would typically be used for situations like: job scheduling preemption is required (which needs a lot of swap space), the applications write large temp files, or you want to keep gpfs log or trace files persistently. (As a partial alternative to using the scratch disk, customers can choose to put ``/tmp`` ``/var/tmp``, and log files (except GPFS logs files) in GPFS, but must be willing to accept the dependency on GPFS). This can be done by enabling the 'localdisk' support. For the details, please refer to the section [TODO Enabling the localdisk Option].
OSimage Definition
------------------
The attribute **provmethod** is used to identify that the osimage is diskful or diskless: ::
$ lsdef -t osimage rhels7.1-x86_64-install-compute -i provmethod
Object name: rhels7.1-x86_64-install-compute
provmethod=install
install:
Diskful
netboot:
Diskless

View File

@ -1,17 +0,0 @@
Database Commands
=================
There are 5 database related commands in xCAT:
* ``tabdump`` : Displays the header and all the rows of the specified table in CSV (comma separated values) format.
* ``tabedit`` : Opens the specified table in the user's editor, allows them to edit any text, and then writes changes back to the database table. The table is flattened into a CSV (comma separated values) format file before giving it to the editor. After the editor is exited, the CSV file will be translated back into the database format.
* ``tabgrep`` : List table names in which an entry for the given node appears.
* ``dumpxCATdb`` : Dumps all the xCAT db tables to CSV files under the specified directory, often used to backup the xCAT database in xCAT reinstallation or management node migration.
* ``restorexCATdb`` : Restore the xCAT db tables with the CSV files under the specified directory.
For the complete reference on all the xCAT database related commands, please refer to the xCAT manpage with ``man <command>``

View File

@ -1,28 +0,0 @@
Key xCAT Tables
===============
They are many tables in xCAT databse to store various categories of information. This section only introduces several key xCAT tables which need to be initialized or viewed explicitly. For the complete reference on xCAT tables, please refer to the page <todo> or run ``tabdump -d <table name>``.
site
----
Global settings for the whole cluster. This table is different from the other tables in that each attribute is just named in the key column, rather than having a separate column for each attribute. Refer to the :doc:`Global Configuration </guides/admin-guides/basic_concepts/global_cfg/index>` page for the global attributes.
policy
------
Controls who has authority to run specific xCAT operations. It is basically the Access Control List (ACL) for xCAT. It is sorted on the priority field before evaluating. Please run ``tabdump -d policy`` for details.
passwd
------
Contains default userids and passwords for xCAT to access cluster components. In most cases, xCAT will also actually set the userid/password in the relevant component when it is being configured or installed. Userids/passwords for specific cluster components can be overidden in other tables, e.g. ``mpa`` , ``ipmi`` , ``ppchcp`` , etc.
networks
--------
Describes the networks in the cluster and info necessary to set up nodes on that network.
auditlog
--------
Contains the audit log data.
eventlog
--------
Stores the events occurred.

View File

@ -1,16 +1,70 @@
Database
========
xCAT Database
=============
xCAT stores all the persistent data including global configuration, user settings and cluster information in a database.
All of the xCAT Objects and Configuration data are stored in xCAT database. By default, xCAT uses **SQLite** - an OS contained simple database engine. The powerful open source dtabase engines like MySQL, MariaDB, PostgreSQL are also supported for a large cluster.
This section introduces some database related xCAT commands, some key xCAT database tables and, the usage of regular expressions in xCAT database.
xCAT defines about 70 tables to store different data. You can get the xCAT database definition from file ``/opt/xcat/lib/perl/xCAT/Schema.pm``.
For a complete reference, see the manpage for xcatdb: ``man xcatdb``
You can run ``tabdump`` command to get all the xCAT database tables. Or executing ``tabdump -d <tablename>`` or ``man <tablename>`` to get the detail columns of table definition. ::
$ tabdump
$ tabdump site
$ tabdump -d site
$ man site
For a complete reference, see the man page for xcatdb: ``man xcatdb``.
**The tables in xCAT:**
* **site table**
Global settings for the whole cluster. This table is different from the other tables. Each entry in **site table** is a key=>value pair. Refer to the :doc:`Global Configuration </guides/admin-guides/basic_concepts/global_cfg/index>` page for the major global attributes or run ``man site`` to get all global attributes.
* **policy table**
Controls who has authority to run specific xCAT operations. It is the Access Control List (ACL) in xCAT.
* **passwd table**
Contains default userids and passwords for xCAT to access cluster components. In most cases, xCAT will also actually set the userid/password in the relevant component (Generally for SP like bmc, fsp.) when it is being configured or installed. The default userids/passwords in passwd table for specific cluster components can be overridden by the columns in other tables, e.g. ``mpa`` , ``ipmi`` , ``ppchcp`` , etc.
* **networks table**
Contains the network definitions in the cluster.
You can manipulate the networks through ``*def command`` against the **network object**. ::
$ lsdef -t network
* **...**
**Manipulate xCAT Database Tables**
xCAT offers 5 commands to manipulate the databse tables:
* ``tabdump``
Displays the header and all the rows of the specified table in CSV (comma separated values) format.
* ``tabedit``
Opens the specified table in the user's editor, allows them to edit any text, and then writes changes back to the database table. The table is flattened into a CSV (comma separated values) format file before giving it to the editor. After the editor is exited, the CSV file will be translated back into the database format.
* ``tabgrep``
List table names in which an entry for the given node appears.
* ``dumpxCATdb``
Dumps all the xCAT db tables to CSV files under the specified directory, often used to backup the xCAT database in xCAT reinstallation or management node migration.
* ``restorexCATdb``
Restore the xCAT db tables with the CSV files under the specified directory.
**Advanced Topic: How to use Regular Expression in xCAT tables:**
.. toctree::
:maxdepth: 2
:maxdepth: 2
dbcmd.rst
dbtables.rst
regexp_db.rst

View File

@ -1,13 +0,0 @@
Basic xCAT object types
=======================
This section introduces the description and key attributes of several basic xCAT object types. For the complete description of all the xCAT object types, please refer to the [TODO:guides/admin-guides/references/index.html#xcat-man-pages] or run command ``man xcatdb``, for details of a specified xCAT object type, please run command ``man <object type>``.
.. toctree::
:maxdepth: 2
node.rst
group.rst
osimage.rst

View File

@ -1,14 +1,176 @@
xCAT Objects Types
==================
xCAT Objects
============
xCAT abstracts several types of objects from the cluster information in xCAT database to represent the physical or logical entities in the cluster. Each xCAT object has a set of attributes, each attribute is mapped from a specified field of a xCAT table. The xCAT users can get cluster information and perform cluster management work through operations against the objects.
Basically, xCAT has 20 types of objects. They are: ::
This section presents a brief introduction on xCAT objects, including some basic object types and xCAT commands on objects.
auditlog boottarget eventlog firmware group
kit kitcomponent kitrepo monitoring network
node notification osdistro osdistroupdate osimage
policy rack route site zone
This section will introduce you several important types of object to give you an overview of how the object looks like and how to manipulate them.
You can get the detail description of each object by ``man <object type>`` e.g. ``man node``.
* **node Object**
The **node** is the most important object in xCAT. Any physical server, virtual machine or SP (Service Processor for Hardware Control) can be defined as a node object.
For example, I have a physical server which has the following attributes: ::
groups: all,x86_64
The groups that this node belongs to.
arch: x86_64
The architecture of the server is x86_64.
bmc: 10.4.14.254
The IP of BMC which will be used for hardware control.
bmcusername: ADMIN
The username of bmc.
bmcpassword: admin
The password of bmc.
mac: 6C:AE:8B:1B:E8:52
The mac address of the ethernet adapter that will be used to
deploy OS for the node.
mgt: ipmi
The management method which will be used to manage the node.
This node will use ipmi protocol.
netboot: xnba
The network bootloader that will be used to deploy OS for the node.
provmethod: rhels7.1-x86_64-install-compute
The osimage that will be deployed to the node.
I want to name the node to be **cn1** (Compute Node #1) in xCAT. Then I define this node in xCAT with following command: ::
$mkdef -t node cn1 groups=all,x86_64 arch=x86_64 bmc=10.4.14.254
bmcusername=ADMIN bmcpassword=admin mac=6C:AE:8B:1B:E8:52
mgt=ipmi netboot=xnba provmethod=rhels7.1-x86_64-install-compute
After the define, I can use ``lsdef`` command to display the defined node: ::
$lsdef cn1
Object name: cn1
arch=x86_64
bmc=10.4.14.254
bmcpassword=admin
bmcusername=ADMIN
groups=all,x86_64
mac=6C:AE:8B:1B:E8:52
mgt=ipmi
netboot=xnba
postbootscripts=otherpkgs
postscripts=syslog,remoteshell,syncfiles
provmethod=rhels7.1-x86_64-install-compute
Then I can try to remotely **power on** the node **cn1**: ::
$rpower cn1 on
* **group Object**
**group** is an object which includes multiple **node object**. When you set **group** attribute for a **node object** to a group name like **x86_64**, the group **x86_64** is automatically genereated and the node is assigned to the group.
The benefits of using **group object**:
* **Handle multiple nodes through group**
I defined another server **cn2** which is similar with **cn1**, then my group **x86_64** has two nodes: **cn1** and **cn2**. ::
$ lsdef -t group x86_64
Object name: x86_64
cons=ipmi
members=cn1,cn2
Then I can power on all the nodes in the group **x86_64**. ::
$ rpower x86_64 on
* **Inherit attributes from group**
If the **group object** of **node object** has certain attribute that **node object** does not have, the node will inherit this attribute from its **group**.
I set the **cons** attribute for the **group object x86_64**. ::
$ chdef -t group x86_64 cons=ipmi
1 object definitions have been created or modified.
$ lsdef -t group x86_64
Object name: x86_64
cons=ipmi
members=cn1,cn2
The I can see the **cn1** inherits the attribute **cons** from the group **x86_64**: ::
$ lsdef cn1
Object name: cn1
arch=x86_64
bmc=10.4.14.254
bmcpassword=admin
bmcusername=ADMIN
cons=ipmi
groups=all,x86_64
mac=6C:AE:8B:1B:E8:52
mgt=ipmi
netboot=xnba
postbootscripts=otherpkgs
postscripts=syslog,remoteshell,syncfiles
provmethod=rhels7.1-x86_64-install-compute
This is useful that define common attributes in **group object** so that new added node will inherits them automatically. Since the attributes are defined in the **group object**, it will make the change of attributes easier that you don't need to touch the individual nodes.
* **Use Regular Expression to generate value for node attributes**
This is powerful feature in xCAT that you can generate individual attribute value from node name instead of sign them one by one. Refer to :doc:`Use Regular Expression in xCAT Database Table <../xcat_db/regexp_db>`.
* **osimage Object**
An **osimage** object represents an Operating System which can be deployed in xCAT. xCAT always generates several default **osimage** objects for certain Operating System when executing ``copycds`` command to generate the package repository for the OS.
You can display all the defined **osimage** object: ::
$ lsdef -t osimage
Display the detail attirbutes of one **osimage** named **rhels7.1-x86_64-install-compute**: ::
$ lsdef -t osimage rhels7.1-x86_64-install-compute
Object name: rhels7.1-x86_64-install-compute
imagetype=linux
osarch=x86_64
osdistroname=rhels7.1-x86_64
osname=Linux
osvers=rhels7.1
otherpkgdir=/install/post/otherpkgs/rhels7.1/x86_64
pkgdir=/install/rhels7.1/x86_64
pkglist=/opt/xcat/share/xcat/install/rh/compute.rhels7.pkglist
profile=compute
provmethod=install
synclists=/root/syncfiles.list
template=/opt/xcat/share/xcat/install/rh/compute.rhels7.tmpl
This **osimage** represents a **Linux** **rhels7.1** Operating System. The package repository is in **/install/rhels7.1/x86_64** and the packages which will be installed is listed in the file **/opt/xcat/share/xcat/install/rh/compute.rhels7.pkglist** ...
I can bind the **osimage** to **node** when I want to deploy **osimage rhels7.1-x86_64-install-compute** on my **node cn1**: ::
$ nodeset cn1 osimage=rhels7.1-x86_64-install-compute
Then in the next network boot, the node **cn1** will start to deploy **rhles7.1**.
* **Manipulate Object**
You already saw that I used the commands ``mkdef``, ``lsdef``, ``chdef`` to manipulate the objects. xCAT has 4 objects management commands to manage all the xCAT objects.
* ``mkdef`` : create object definitions
* ``chdef`` : modify object definitions
* ``lsdef`` : list object definitions
* ``rmdef`` : remove object definitions
To get the detail usage of the commands, refer to the man page. e.g. ``man mkdef``
**Get Into the Detail of the xCAT Objects:**
.. toctree::
:maxdepth: 2
object_command.rst
basic_object.rst
node.rst
group.rst
osimage.rst

View File

@ -1,15 +0,0 @@
Object Commands
===============
There are 4 commands in xCAT to manage the objects:
* ``mkdef`` : create object definitions
* ``chdef`` : modify object definitions
* ``lsdef`` : list object definitions
* ``rmdef`` : remove object definitions
For details on these commands, please refer to [TODO the manpage] or run ``man <command>``.