Table of Contents
{{:Design Warning}}
Internal Code Changes of xCAT
The following sections are for the internal code changes.
Schema.pm
Put the dump attribute to the linuximage schema. The user can use the chdef command to set/change the dump attribute for the image.
"genimage" command
Disable the kdump service by default.
chroot $rootimg_dir chkconfig kdump off
Create one fake command (fsck.nfs) which always return true, if "fsck.nfs" doesn't exist in the root image.
anaconda.pm / sles.pm
Update code for
nodeset <noderange> osimage=<osimagename>
If the dump attribute is set for the corresponding image, then put the kernel parameter
crashkernel=128M@32M
to the boot config file. For the platforms using "yaboot", the config file is
/tftpboot/etc/<nodename>
, and then append another kernel parameter
dump=<dump value>
Postscript enablekdump
When the node is booting up, The enablekdump postscipt is used to start the kdump service; for RHEL6, it also do some workaround to generate the initial ramdisk for kdump. In the enablekdump postscript, /proc/kcmdline will be parsed, if dump= is found, its value will be parsed, and update the "/etc/kdump.conf" file. After the /etc/kdump.conf file is updated, the kdump service should be started by calling the command:
/etc/init.d/kdump start
For SLES11, it alse need workaround to generate the inital ramdisk for kdump. In the enablekdump postscript, /proc/kcmdline will be parsed, if dump= is found, its value will be parsed, and update the "/etc/sysconfig/kdump" file. After the "/etc/sysconfig/kdump" file is updated, the kdump service should be started by calling the command:
/etc/init.d/boot.kdumpstart
Workaround for RHEL6
Before kdump service is started the NFS directory is mounted to the /var/tmp which is used as a temp directory for the mkdumprd command to generate the intial ramdisk for kdump. The NFS directory is read-writeable. The $xcatmaster:/install/kdump/tmp will be created when the xCAT package is installed, since the /install directory is exported by default, the $xcatmaster:/install/kdump/tmp directory is read-writeable, too. After the kdump service is started successfully, this NFS directory will be umounted from the /var/tmp directory, so this workaround won't affect the running of the node.
For rhels6.1 the kdump service needs /tmp instead of /var/tmp for this workaround.
The link_delay = 180 is added to the /etc/kdump.conf in the enablekdump postscript. Some network cards take a long time to initialize, and some spanning tree enabled networks do not transmit user traffic for long periods after a link state changes. This optional parameter defines a wait period after a link is activated in which the initramfs will wait before attempting to transmit user data.
Workaround for SLES11
On SLES the boot.kdump service is configured via /etc/sysconfig/kdump file. The boot.kdump under /etc/init.d will call mkdumprd -K "$kdump_kernel" -I "$kdump_initrd" -q to create the initrd(call it kdumpinit) which will be used by the kdump. The mkdumprd will call /sbin/mkinitrd to create the kdumpinit. (the mkinitrd only work for diskfull install, it did not consider the diskless install scenario). The /sbin/mkinitrd runs all of the shell script under /lib/mkinitrd/setup to generate the kdumpinit(will pack all scripts under /lib/mkinitrd/boot into the kdumpinit). To simulate a crash do:
echo 1 > /proc/sys/kernel/sysrq ; echo c > /proc/sysrq-trigger
The kdumpinit generated by /sbin/mkinitrd contains all shell scripts under /lib/mkinitrd/boot. All these scripts will be found in the init. There are two special scripts 83-mount.sh and 84-remount.sh. 83-mount.sh is used to mount and check the root device, 84-remount.sh is used to mount the root file system and run the init under the root file system instead of the normal init binary. This is the reason of this problem. For a diskless install server, the root file system is tmpfs and there is no corresponding device, so the hanging error will appear when running 83-mount.sh. If dumping to a remote server, the root file system is useless, only initrd is enough. There is no need to pack these two scripts into the initrd. The around is change these two script names to avoid packing into the initrd. When the initrd created the names are changed back. There is no root device discovering and checking progress so the script 91-kdump.sh can run correctly and the dump is successful.
Questions
For hirarchical diskless environment, the /install directory of the Service Node is mounted from the Management Node. When the node is starting up, the $xcatmaster:/install/kdump/tmp directory cannot be mounted because NFS denies re-mount action. How can we do for such a scenario?
Source Files involved
xCAT/xCAT.spec
perl-xCAT/xCAT/Schema.pm
xCAT-server/share/xcat/netboot/rh/genimage
xCAT-server/share/xcat/netboot/add-on/statelite/rc.statelite
xCAT-server/lib/xcat/plugins/anaconda.pm
xCAT-server/lib/xcat/plugins/sles.pm
xCAT/postscripts/enablekdump
Other Design Considerations
- Required reviewers: Bruce Potter
- Required approvers: Bruce Potter
- Database schema changes: N/A
- Affect on other components: N/A
- External interface changes, documentation, and usability issues: N/A
- Packaging, installation, dependencies: N/A
- Portability and platforms (HW/SW) supported: N/A
- Performance and scaling considerations: N/A
- Migration and coexistence: N/A
- Serviceability: N/A
- Security: N/A
- NLS and accessibility: N/A
- Invention protection: N/A
News
- Apr 22, 2016: xCAT 2.11.1 released.
- Mar 11, 2016: xCAT 2.9.3 (AIX only) released.
- Dec 11, 2015: xCAT 2.11 released.
- Nov 11, 2015: xCAT 2.9.2 (AIX only) released.
- Jul 30, 2015: xCAT 2.10 released.
- Jul 30, 2015: xCAT migrates from sourceforge to github
- Jun 26, 2015: xCAT 2.7.9 released.
- Mar 20, 2015: xCAT 2.9.1 released.
- Dec 12, 2014: xCAT 2.9 released.
- Sep 5, 2014: xCAT 2.8.5 released.
- May 23, 2014: xCAT 2.8.4 released.
- Jan 24, 2014: xCAT 2.7.8 released.
- Nov 15, 2013: xCAT 2.8.3 released.
- Jun 26, 2013: xCAT 2.8.2 released.
- May 17, 2013: xCAT 2.7.7 released.
- May 10, 2013: xCAT 2.8.1 released.
- Feb 28, 2013: xCAT 2.8 released.
- Nov 30, 2012: xCAT 2.7.6 released.
- Oct 29, 2012: xCAT 2.7.5 released.
- Aug 27, 2012: xCAT 2.7.4 released.
- Jun 22, 2012: xCAT 2.7.3 released.
- May 25, 2012: xCAT 2.7.2 released.
- Apr 20, 2012: xCAT 2.7.1 released.
- Mar 19, 2012: xCAT 2.7 released.
- Mar 15, 2012: xCAT 2.6.11 released.
- Jan 23, 2012: xCAT 2.6.10 released.
- Nov 15, 2011: xCAT 2.6.9 released.
- Sep 30, 2011: xCAT 2.6.8 released.
- Aug 26, 2011: xCAT 2.6.6 released.
- May 20, 2011: xCAT 2.6 released.
- Feb 14, 2011: Watson plays on Jeopardy and is managed by xCAT!
- xCAT Release Notes Summary
- xCAT OS And Hw Support Matrix
- xCAT Test Environment Summary
History
- Oct 22, 2010: xCAT 2.5 released.
- Apr 30, 2010: xCAT 2.4 is released.
- Oct 31, 2009: xCAT 2.3 released.
xCAT's 10 year anniversary! - Apr 16, 2009: xCAT 2.2 released.
- Oct 31, 2008: xCAT 2.1 released.
- Sep 12, 2008: Support for xCAT 2
can now be purchased! - June 9, 2008: xCAT breaths life into
(at the time) the fastest
supercomputer on the planet - May 30, 2008: xCAT 2.0 for Linux
officially released! - Oct 31, 2007: IBM open sources
xCAT 2.0 to allow collaboration
among all of the xCAT users. - Oct 31, 1999: xCAT 1.0 is born!
xCAT started out as a project in
IBM developed by Egan Ford. It
was quickly adopted by customers
and IBM manufacturing sites to
rapidly deploy clusters.