Introduction
A new console server facility, goconserver
was introduced. Which is developed by CHENG Long. This goconserver
is intend to replace the existing console server, conserver
used as a part of xCAT for decade.
And the main purpose of this console server replacement is to overcome the functional problem and performance issue in the existing conserver
.
The Existing Problem of conserver
- Cost around 4MiB memory for each compute node. Thus, 8GiB memory consume for 2,000 compute nodes.
- When a configuration change has to be made, the daemon need to be restart. This cause the console connection of all compute nodes get interrupted. See issue #4043.
- When a compute node disconnected, the
conserver
keep retry every 3 seconds and generated a great deal of logs.- Around 150 byte/node/3 sec, roughly equals 4 MiB/node/day. For a bunch of disconnected nodes, it will fill up
/var
file system very quickly.
- Around 150 byte/node/3 sec, roughly equals 4 MiB/node/day. For a bunch of disconnected nodes, it will fill up
- When the ssh authentication key is not configured on the OpenBMC side, there is no way to pass the ssh authentication with password. Thus, in this situation, the console server failed to work at all. See issue #4124.
Test Strategy
Function Verification Test
Scenario 1 - Normal console functionality
Test if the console can work normally in the following conditions
- Console against OpenPOWER machine via OpenBMC
- Console against OpenPOWER machine via IPMI
- Console against KVM guest via ssh to KVM host
- Console against x86-64 machine via IPMI
- Console against IBM PowerVM LPAR via HMC
Scenario 2 - Recovery
- Restart a compute node
- Restart the OpenBMC on a OpenPOWER machine with OpenBMC
- Restart the BMC on a OpenPOWER machine with IPMI
- Disconnect the network between the console server and the OpenBMC/BMC
Scenario 3 - Multiplex
- Multiple user connect to the console of the same compute node at the same time
Scenario 4 - Stability
- Leave a compute node with no console outputs for a quite long period of time, say 10 days
Performance Test
Scenario 1 - Memory Cost
- Measure the memory consume of
goconserver
for- 1 compute node,
- 2 compute nodes,
- 5 compute nodes,
- 10 compute nodes,
- 100 compute nodes.
Scenario 2 - Number of Open File Handles
- Measure the number of open file handles
goconserver
used for- 1 compute node,
- 2 compute nodes,
- 5 compute nodes,
- 10 compute nodes,
- 100 compute nodes.
Stress and Volume Test
Scenario 1 - Throughput and IOPS of hard disk drive may be a bottleneck
For a regular 115,200 baud serial console port, with common 1 start bit, 8 data bits, no parity, and 1 stop bit settings, it may generate 11,520 bytes per second. For 2,000 compute nodes, it may generate around
11,520 byte/sec x 2,000 = 23,040,000 bytes/sec = 21.97 MiB/s
And for a typical enterprise level mechanical hard disk drive, the 4K random write speed is around 20 MiB/s. Thus the throughput may not enough.
And the IOPS of a typical enterprise level mechanical hard disk drive is around 175. Thus write 2,000 console log files may also a challenge.
- Test against 2,000 compute nodes, each of them generate 11,520 bytes console outputs per seconds.
- Measure the memory consume
- Measure the CPU usage
- Count the number of child processes the daemon generated
- See if the daemon run stable
- See if the I/O throughput is good enough
See what happen if the I/O throughput does exceed the ability of the hard disk drive.
Scenario 2 - Limit of Open Files
The default ulimit of open files is 1024. For 2,000 nodes, the console server daemon may need more than 1024 file handlers.
- Test against 2,000 compute nodes, see what happens if the open files of the daemon reach the upper limit. Does the error handling works well?
Scenario 3 - Out of Memory
See what happens if malloc()
failed.
- Set a lower max memory size with
ulimit -m
Environment Requirements
News
- Apr 22, 2016: xCAT 2.11.1 released.
- Mar 11, 2016: xCAT 2.9.3 (AIX only) released.
- Dec 11, 2015: xCAT 2.11 released.
- Nov 11, 2015: xCAT 2.9.2 (AIX only) released.
- Jul 30, 2015: xCAT 2.10 released.
- Jul 30, 2015: xCAT migrates from sourceforge to github
- Jun 26, 2015: xCAT 2.7.9 released.
- Mar 20, 2015: xCAT 2.9.1 released.
- Dec 12, 2014: xCAT 2.9 released.
- Sep 5, 2014: xCAT 2.8.5 released.
- May 23, 2014: xCAT 2.8.4 released.
- Jan 24, 2014: xCAT 2.7.8 released.
- Nov 15, 2013: xCAT 2.8.3 released.
- Jun 26, 2013: xCAT 2.8.2 released.
- May 17, 2013: xCAT 2.7.7 released.
- May 10, 2013: xCAT 2.8.1 released.
- Feb 28, 2013: xCAT 2.8 released.
- Nov 30, 2012: xCAT 2.7.6 released.
- Oct 29, 2012: xCAT 2.7.5 released.
- Aug 27, 2012: xCAT 2.7.4 released.
- Jun 22, 2012: xCAT 2.7.3 released.
- May 25, 2012: xCAT 2.7.2 released.
- Apr 20, 2012: xCAT 2.7.1 released.
- Mar 19, 2012: xCAT 2.7 released.
- Mar 15, 2012: xCAT 2.6.11 released.
- Jan 23, 2012: xCAT 2.6.10 released.
- Nov 15, 2011: xCAT 2.6.9 released.
- Sep 30, 2011: xCAT 2.6.8 released.
- Aug 26, 2011: xCAT 2.6.6 released.
- May 20, 2011: xCAT 2.6 released.
- Feb 14, 2011: Watson plays on Jeopardy and is managed by xCAT!
- xCAT Release Notes Summary
- xCAT OS And Hw Support Matrix
- xCAT Test Environment Summary
History
- Oct 22, 2010: xCAT 2.5 released.
- Apr 30, 2010: xCAT 2.4 is released.
- Oct 31, 2009: xCAT 2.3 released.
xCAT's 10 year anniversary! - Apr 16, 2009: xCAT 2.2 released.
- Oct 31, 2008: xCAT 2.1 released.
- Sep 12, 2008: Support for xCAT 2
can now be purchased! - June 9, 2008: xCAT breaths life into
(at the time) the fastest
supercomputer on the planet - May 30, 2008: xCAT 2.0 for Linux
officially released! - Oct 31, 2007: IBM open sources
xCAT 2.0 to allow collaboration
among all of the xCAT users. - Oct 31, 1999: xCAT 1.0 is born!
xCAT started out as a project in
IBM developed by Egan Ford. It
was quickly adopted by customers
and IBM manufacturing sites to
rapidly deploy clusters.