Merge pull request #5038 from robin2008/perf-tuning-doc

Add performance tunning best practice in RTD
2025-11-21 01:36:04 +00:00 · 2018-04-02 17:23:05 +08:00
parent 1d65e24579 68f4271dbd
commit f2958a9ef5
5 changed files with 98 additions and 0 deletions
--- a/docs/source/advanced/index.rst
+++ b/docs/source/advanced/index.rst
@@ -23,6 +23,7 @@ Advanced Topics
   raid/index.rst
   restapi/index.rst
   security/index.rst
+   performance_tuning/index.rst
   softlayer/index.rst
   sysclone/index.rst
   zones/index.rst
--- a/docs/source/advanced/performance_tuning/database_tuning.rst
+++ b/docs/source/advanced/performance_tuning/database_tuning.rst
@@ -0,0 +1,13 @@
+Tuning the Database Server
+==========================
+
+#. MariaDB database
+
+    MariaDB: `Tuning Server Parameters <https://mariadb.com/kb/en/library/optimization-and-tuning>`_
+
+    According to this documentation, the two most important variables to configure are key_buffer_size and table_open_cache.
+
+
+#. PostgreSQL database
+
+    PostgreSQL: `Server Configuration <http://www.postgresql.org/docs/9.3/interactive/runtime-config.html>`_
--- a/docs/source/advanced/performance_tuning/index.rst
+++ b/docs/source/advanced/performance_tuning/index.rst
@@ -0,0 +1,13 @@
+Performance Tuning
+==================
+
+xCAT supports clusters of all sizes. This document is a collection of hints, tips, and special considerations when working with large clusters, especially a single server (management node or service node) manages more than 128 nodes.
+
+The information in this document should be viewed as example data only. Many of the suggestions are based on anecdotal experiences and may not apply to your particular environment. Suggestions in different sections of this document may recommend different or conflicting settings since they may have been provided by different people for different cluster environments. Often there is a significant amount of flexiblity in most of these settings -- you will need to resolve these differences in a way that works best for your cluster.
+
+.. toctree::
+   :maxdepth: 2
+
+   linux_os_tuning.rst
+   xcatd_tuning.rst
+   database_tuning.rst
--- a/docs/source/advanced/performance_tuning/linux_os_tuning.rst
+++ b/docs/source/advanced/performance_tuning/linux_os_tuning.rst
@@ -0,0 +1,46 @@
+System Tuning Settings for Linux
+==================================
+
+Adjusting Operating System tunables can improve large scale cluster performance, avoid bottlenecks, and prevent failures. The following sections are a collection of suggestions that have been gathered from various large scale HPC clusters. You should investigate and evaluate the validity of each suggestion before applying them to your cluster.
+
+
+#. Tuning Linux ulimits:  
+   
+    The open file limits are important to high concurrence network services, such as ``xcatd``. For a large cluster, it is required to increase the number of open file limit to avoid **Too many open files** error. The default value is *1024* in most OS distributions, to add below configuration in ``/etc/security/limits.conf`` to increase to *14096*.
+    ::
+
+        *   soft    nofiles     14096
+        *   hard    nofiles     14096
+
+
+#. Tuning Network kernel parameters:
+
+    There might be hundreds of hosts in a big network for large cluster, tuning the network kernel parameters for optimum throughput and latency could improve the performance of distributed application. For example, adding below configuration in ``/etc/sysctl.conf`` to increase the buffer.
+
+    ::
+
+        net.core.rmem_max = 33554432
+        net.core.wmem_max = 33554432
+        net.core.rmem_default = 65536
+        net.core.wmem_default = 65536
+        
+        net.ipv4.tcp_rmem = 4096 33554432 33554432
+        net.ipv4.tcp_wmem = 4096 33554432 33554432
+        net.ipv4.tcp_mem= 33554432 33554432 33554432
+        net.ipv4.route.flush=1
+        net.core.netdev_max_backlog=1500
+
+
+    And if you encounter **Neighbour table overflow** error, it meams there are two many ARP requests and the server cannot reply. Tune the ARP cache with below parameters.
+
+    ::
+
+        net.ipv4.conf.all.arp_filter      = 1
+        net.ipv4.conf.all.rp_filter       = 1
+        net.ipv4.neigh.default.gc_thresh1 = 30000
+        net.ipv4.neigh.default.gc_thresh2 = 32000
+        net.ipv4.neigh.default.gc_thresh3 = 32768
+        net.ipv4.neigh.ib0.gc_stale_time  = 2000000
+
+
+    For more tunable parameters, you can refer to `Linux System Tuning Recommendations <https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Welcome%20to%20High%20Performance%20Computing%20(HPC)%20Central/page/Linux%20System%20Tuning%20Recommendations>`_.
--- a/docs/source/advanced/performance_tuning/xcatd_tuning.rst
+++ b/docs/source/advanced/performance_tuning/xcatd_tuning.rst
@@ -0,0 +1,25 @@
+Tuning xCAT Daemon Attributes
+==================================
+
+For large clusters, you consider changing the default settings in ``site`` table to improve the performance on a large-scale cluster or if you are experiencing timeouts or failures in these areas:
+
+**consoleondemand** : When set to ``yes``, conserver connects and creates the console output for a node only when the user explicitly opens the console using rcons or wcons. Default is ``no`` on Linux, ``yes`` on AIX. Setting this to ``yes`` can reduce the load conserver places on your xCAT management node. If you need this set to ``no``, you may then need to consider setting up multiple servers to run the conserver daemon, and specify the correct server on a per-node basis by setting each node's conserver attribute.
+
+**nodestatus** : If set to ``n``, the ``nodelist.status`` column will not be updated during the node deployment, node discovery and power operations. Default is ``y``, always update ``nodelist.status``. Setting this to ``n`` for large clusters can eliminate one node-to-server contact and one xCAT database write operation for each node during node deployment, but you will then need to determine deployment status through some other means.
+
+**precreatemypostscripts** : (``yes/1`` or ``no/0``, only for Linux). Default is ``no``. If ``yes``, it will instruct xcat at ``nodeset`` and ``updatenode`` time to query the database once for all of the nodes passed into the command and create the ``mypostscript`` file for each node, and put them in a directory in ``site.tftpdir`` (such as: ``/tftpboot``). This prevents ``xcatd`` from having to create the ``mypostscript`` files one at a time when each deploying node contacts it, so it will speed up the deployment process. (But it also means that if you change database values for these nodes, you must rerun ``nodeset``.) If **precreatemypostscripts** is set to ``no``, the ``mypostscript`` files will not be generated ahead of time. Instead they will be generated when each node is deployed.
+
+**svloglocal** : if set to ``1``, syslog on the service node will not get forwarded to the mgmt node. The default is to forward all syslog messages. The tradeoff on setting this attribute is reducing network traffic and log size versus having local management node access to all system messages from across the cluster.
+
+**skiptables** : a comma separated list of tables to be skipped by ``dumpxCATdb``. A recommended setting is ``auditlog,eventlog`` because these tables can grow very large. Default is to skip no tables.
+
+**dhcplease** : The lease time for the DHCP client. The default value is *43200*.
+
+**xcatmaxconnections** : Number of concurrent xCAT protocol requests before requests begin queueing. This applies to both client command requests and node requests, e.g. to get postscripts. Default is ``64``.
+
+**xcatmaxbatchconnections** : Number of concurrent xCAT connections allowed from the nodes. Number must be less than **xcatmaxconnections**.
+
+**useflowcontrol** : If ``yes``, the postscript processing on each node contacts ``xcatd`` on the MN/SN using a lightweight UDP packet to wait until ``xcatd`` is ready to handle the requests associated with postscripts.  This prevents deploying nodes from flooding ``xcatd`` and locking out admin interactive use. This value works with the **xcatmaxconnections** and **xcatmaxbatch** attributes. If the value is ``no``, nodes sleep for a random time before contacting ``xcatd``, and retry. The default is ``no``. Not supported on AIX.
+
+
+These attributes may be changed based on the size of your cluster. For a large cluster, it is better to enable **useflowcontrol** and set ``xcatmaxconnection = 128``, ``xcatmaxbatchconnections = 100``. Then the daemon will only allow 100 concurrent connections from the nodes. This will allow 28 connections still to be available on the management node for xCAT commands (e.g ``nodels``).