public_docs/rpi4_hpl.md

18 KiB

Running xhpl on Rapberry Pi 4

Setup

  • OS

    Raspberry PI OS
    
  • Kernel

    Linux pi04.arif.local 5.4.44-v7l+ #1320 SMP Wed Jun 3 16:13:10 BST 2020 armv7l GNU/Linux*
    
  • Bootlader:

    root@pi04:~# vcgencmd bootloader_version 
    May 27 2020 18:47:29
    version d648db3968cd31d4948341e09cb8a925c49d2ea1 (release)
    timestamp 1590601649
    

Everything else is stock

  • Download latest mpich, and compile using

    tar xfz mpich-3.3.2.tar.gz
    cd mpich-3.3.2
    ./configure --prefix=/opt/mpich/3.3.2
    make -j 3
    sudo make install
    
  • Download latest OpenBLAS

    unzip OpenBLAS.zip
    cd OpenBLAS-develop
    make -j 3
    sudo make install
    
  • Download latest hpl

    tar xfz hpl-2.3.tar.gz
    cd hpl-2.3
    

HPL.dat

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any) 
6            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
19008         Ns
1            # of NBs
192           NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
1            Ps
1            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)
##### This line (no. 32) is ignored (it serves as a separator). ######
0                               Number of additional problem sizes for PTRANS
1200 10000 30000                values of N
0                               number of additional blocking sizes for PTRANS
40 9 8 13 13 20 16 32 64        values of NB

Make.rpi4-mpich

#
#  -- High Performance Computing Linpack Benchmark (HPL)
#     HPL - 2.3 - December 2, 2018
#     Antoine P. Petitet
#     University of Tennessee, Knoxville
#     Innovative Computing Laboratory
#     (C) Copyright 2000-2008 All Rights Reserved
#
#  -- Copyright notice and Licensing terms:
#
#  Redistribution  and  use in  source and binary forms, with or without
#  modification, are  permitted provided  that the following  conditions
#  are met:
#
#  1. Redistributions  of  source  code  must retain the above copyright
#  notice, this list of conditions and the following disclaimer.
#
#  2. Redistributions in binary form must reproduce  the above copyright
#  notice, this list of conditions,  and the following disclaimer in the
#  documentation and/or other materials provided with the distribution.
#
#  3. All  advertising  materials  mentioning  features  or  use of this
#  software must display the following acknowledgement:
#  This  product  includes  software  developed  at  the  University  of
#  Tennessee, Knoxville, Innovative Computing Laboratory.
#
#  4. The name of the  University,  the name of the  Laboratory,  or the
#  names  of  its  contributors  may  not  be used to endorse or promote
#  products  derived   from   this  software  without  specific  written
#  permission.
#
#  -- Disclaimer:
#
#  THIS  SOFTWARE  IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
#  ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES,  INCLUDING,  BUT NOT
#  LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
#  A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE UNIVERSITY
#  OR  CONTRIBUTORS  BE  LIABLE FOR ANY  DIRECT,  INDIRECT,  INCIDENTAL,
#  SPECIAL,  EXEMPLARY,  OR  CONSEQUENTIAL DAMAGES  (INCLUDING,  BUT NOT
#  LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
#  DATA OR PROFITS; OR BUSINESS INTERRUPTION)  HOWEVER CAUSED AND ON ANY
#  THEORY OF LIABILITY, WHETHER IN CONTRACT,  STRICT LIABILITY,  OR TORT
#  (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
#  OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# ######################################################################
#
# ----------------------------------------------------------------------
# - shell --------------------------------------------------------------
# ----------------------------------------------------------------------
#
SHELL        = /bin/sh
#
CD           = cd
CP           = cp
LN_S         = ln -fs
MKDIR        = mkdir -p
RM           = /bin/rm -f
TOUCH        = touch
#
# ----------------------------------------------------------------------
# - Platform identifier ------------------------------------------------
# ----------------------------------------------------------------------
#
ARCH         = rpi4-mpich
#
# ----------------------------------------------------------------------
# - HPL Directory Structure / HPL library ------------------------------
# ----------------------------------------------------------------------
#
TOPdir       = $(HOME)/hpl/hpl-2.3
INCdir       = $(TOPdir)/include
BINdir       = $(TOPdir)/bin/$(ARCH)
LIBdir       = $(TOPdir)/lib/$(ARCH)
#
HPLlib       = $(LIBdir)/libhpl.a
#
# ----------------------------------------------------------------------
# - Message Passing library (MPI) --------------------------------------
# ----------------------------------------------------------------------
# MPinc tells the  C  compiler where to find the Message Passing library
# header files,  MPlib  is defined  to be the name of  the library to be
# used. The variable MPdir is only used for defining MPinc and MPlib. 
#
MPdir        = /opt/mpich/3.3.2
MPinc        = -I$(MPdir)/include
MPlib        = $(MPdir)/lib/libmpi.a
#
# ----------------------------------------------------------------------
# - Linear Algebra library (BLAS or VSIPL) -----------------------------
# ----------------------------------------------------------------------
# LAinc tells the  C  compiler where to find the Linear Algebra  library
# header files,  LAlib  is defined  to be the name of  the library to be
# used. The variable LAdir is only used for defining LAinc and LAlib.
#
LAdir        = /opt/OpenBLAS
LAinc        = $(LAdir)/include
LAlib        = $(LAdir)/lib/libopenblas.a -lpthread
#
# ----------------------------------------------------------------------
# - F77 / C interface --------------------------------------------------
# ----------------------------------------------------------------------
# You can skip this section  if and only if  you are not planning to use
# a  BLAS  library featuring a Fortran 77 interface.  Otherwise,  it  is
# necessary  to  fill out the  F2CDEFS  variable  with  the  appropriate
# options.  **One and only one**  option should be chosen in **each** of
# the 3 following categories:
#
# 1) name space (How C calls a Fortran 77 routine)
#
# -DAdd_              : all lower case and a suffixed underscore  (Suns,
#                       Intel, ...),                           [default]
# -DNoChange          : all lower case (IBM RS6000),
# -DUpCase            : all upper case (Cray),
# -DAdd__             : the FORTRAN compiler in use is f2c.
#
# 2) C and Fortran 77 integer mapping
#
# -DF77_INTEGER=int   : Fortran 77 INTEGER is a C int,         [default]
# -DF77_INTEGER=long  : Fortran 77 INTEGER is a C long,
# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short.
#
# 3) Fortran 77 string handling
#
# -DStringSunStyle    : The string address is passed at the string loca-
#                       tion on the stack, and the string length is then
#                       passed as  an  F77_INTEGER  after  all  explicit
#                       stack arguments,                       [default]
# -DStringStructPtr   : The address  of  a  structure  is  passed  by  a
#                       Fortran 77  string,  and the structure is of the
#                       form: struct {char *cp; F77_INTEGER len;},
# -DStringStructVal   : A structure is passed by value for each  Fortran
#                       77 string,  and  the  structure is  of the form:  
#                       struct {char *cp; F77_INTEGER len;},
# -DStringCrayStyle   : Special option for  Cray  machines,  which  uses
#                       Cray  fcd  (fortran  character  descriptor)  for
#                       interoperation.
#
F2CDEFS      = 
#
# ----------------------------------------------------------------------
# - HPL includes / libraries / specifics -------------------------------
# ----------------------------------------------------------------------
#
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) -I$(LAinc) $(MPinc)
HPL_LIBS     = $(HPLlib) $(LAlib) $(MPlib) -lrt -lbacktrace
#
# - Compile time options -----------------------------------------------
#
# -DHPL_COPY_L           force the copy of the panel L before bcast;
# -DHPL_CALL_CBLAS       call the cblas interface;
# -DHPL_CALL_VSIPL       call the vsip  library;
# -DHPL_DETAILED_TIMING  enable detailed timers;
#
# By default HPL will:
#    *) not copy L before broadcast,
#    *) call the BLAS Fortran 77 interface,
#    *) not display detailed timing information.
#
HPL_OPTS     = -DHPL_DETAILED_TIMING -DHPL_PROGRESS_REPORT -DHPL_CALL_CBLAS
#
# ----------------------------------------------------------------------
#
HPL_DEFS     = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
#
# ----------------------------------------------------------------------
# - Compilers / linkers - Optimization flags ---------------------------
# ----------------------------------------------------------------------
#
CC       = gcc
CCNOOPT  = $(HPL_DEFS)
CCFLAGS  = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall
#
# On some platforms,  it is necessary  to use the Fortran linker to find
# the Fortran internals used in the BLAS library.
#
LINKER       = $(CC)
LINKFLAGS    = $(CCFLAGS)
#
ARCHIVER     = ar
ARFLAGS      = r
RANLIB       = echo
#
# ----------------------------------------------------------------------

Compiling and running

We can compile using the following command

make arch=rpi4-mpich

My HPL.dat was in ~/hpl, so the fact my PWD was ~/hpl, I ran the benchmark in the following way

OMP_NUM_THREADS=4 ./hpl-2.3/bin/rpi4-mpich/xhpl

Finally, my result is here from the above environment for the 8GB board, it was using 2.82GB

================================================================================
HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   19008
NB     :     192
PMAP   : Row-major process mapping
P      :       1
Q      :       1
PFACT  :   Right
NBMIN  :       4
NDIV   :       2
RFACT  :   Crout
BCAST  :  1ringM
DEPTH  :       1
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

Column=000000192 Fraction= 1.0% Gflops=1.352e+01
Column=000000384 Fraction= 2.0% Gflops=1.353e+01
Column=000000576 Fraction= 3.0% Gflops=1.355e+01
Column=000000768 Fraction= 4.0% Gflops=1.358e+01
Column=000000960 Fraction= 5.1% Gflops=1.360e+01
Column=000001152 Fraction= 6.1% Gflops=1.359e+01
Column=000001344 Fraction= 7.1% Gflops=1.359e+01
Column=000001536 Fraction= 8.1% Gflops=1.357e+01
Column=000001728 Fraction= 9.1% Gflops=1.357e+01
Column=000001920 Fraction=10.1% Gflops=1.356e+01
Column=000002112 Fraction=11.1% Gflops=1.356e+01
Column=000002304 Fraction=12.1% Gflops=1.356e+01
Column=000002496 Fraction=13.1% Gflops=1.355e+01
Column=000002688 Fraction=14.1% Gflops=1.355e+01
Column=000002880 Fraction=15.2% Gflops=1.353e+01
Column=000003072 Fraction=16.2% Gflops=1.353e+01
Column=000003264 Fraction=17.2% Gflops=1.353e+01
Column=000003456 Fraction=18.2% Gflops=1.352e+01
Column=000003648 Fraction=19.2% Gflops=1.352e+01
Column=000003840 Fraction=20.2% Gflops=1.352e+01
Column=000004032 Fraction=21.2% Gflops=1.351e+01
Column=000004224 Fraction=22.2% Gflops=1.351e+01
Column=000004416 Fraction=23.2% Gflops=1.351e+01
Column=000004608 Fraction=24.2% Gflops=1.350e+01
Column=000004800 Fraction=25.3% Gflops=1.350e+01
Column=000004992 Fraction=26.3% Gflops=1.349e+01
Column=000005184 Fraction=27.3% Gflops=1.349e+01
Column=000005376 Fraction=28.3% Gflops=1.349e+01
Column=000005568 Fraction=29.3% Gflops=1.348e+01
Column=000005760 Fraction=30.3% Gflops=1.348e+01
Column=000005952 Fraction=31.3% Gflops=1.347e+01
Column=000006144 Fraction=32.3% Gflops=1.346e+01
Column=000006336 Fraction=33.3% Gflops=1.346e+01
Column=000006528 Fraction=34.3% Gflops=1.345e+01
Column=000006720 Fraction=35.4% Gflops=1.345e+01
Column=000006912 Fraction=36.4% Gflops=1.344e+01
Column=000007104 Fraction=37.4% Gflops=1.344e+01
Column=000007296 Fraction=38.4% Gflops=1.343e+01
Column=000007488 Fraction=39.4% Gflops=1.343e+01
Column=000007680 Fraction=40.4% Gflops=1.343e+01
Column=000007872 Fraction=41.4% Gflops=1.343e+01
Column=000008064 Fraction=42.4% Gflops=1.342e+01
Column=000008256 Fraction=43.4% Gflops=1.342e+01
Column=000008448 Fraction=44.4% Gflops=1.341e+01
Column=000008640 Fraction=45.5% Gflops=1.341e+01
Column=000008832 Fraction=46.5% Gflops=1.341e+01
Column=000009024 Fraction=47.5% Gflops=1.340e+01
Column=000009216 Fraction=48.5% Gflops=1.340e+01
Column=000009408 Fraction=49.5% Gflops=1.340e+01
Column=000009600 Fraction=50.5% Gflops=1.339e+01
Column=000009792 Fraction=51.5% Gflops=1.339e+01
Column=000009984 Fraction=52.5% Gflops=1.339e+01
Column=000010176 Fraction=53.5% Gflops=1.339e+01
Column=000010368 Fraction=54.5% Gflops=1.338e+01
Column=000010560 Fraction=55.6% Gflops=1.338e+01
Column=000010752 Fraction=56.6% Gflops=1.338e+01
Column=000010944 Fraction=57.6% Gflops=1.337e+01
Column=000011136 Fraction=58.6% Gflops=1.337e+01
Column=000011328 Fraction=59.6% Gflops=1.337e+01
Column=000011520 Fraction=60.6% Gflops=1.336e+01
Column=000011712 Fraction=61.6% Gflops=1.336e+01
Column=000011904 Fraction=62.6% Gflops=1.336e+01
Column=000012096 Fraction=63.6% Gflops=1.335e+01
Column=000012288 Fraction=64.6% Gflops=1.335e+01
Column=000012480 Fraction=65.7% Gflops=1.335e+01
Column=000012672 Fraction=66.7% Gflops=1.334e+01
Column=000012864 Fraction=67.7% Gflops=1.334e+01
Column=000013056 Fraction=68.7% Gflops=1.334e+01
Column=000013248 Fraction=69.7% Gflops=1.333e+01
Column=000013440 Fraction=70.7% Gflops=1.333e+01
Column=000013632 Fraction=71.7% Gflops=1.333e+01
Column=000013824 Fraction=72.7% Gflops=1.333e+01
Column=000014016 Fraction=73.7% Gflops=1.332e+01
Column=000014208 Fraction=74.7% Gflops=1.332e+01
Column=000014400 Fraction=75.8% Gflops=1.332e+01  
Column=000014592 Fraction=76.8% Gflops=1.331e+01
Column=000014784 Fraction=77.8% Gflops=1.331e+01
Column=000014976 Fraction=78.8% Gflops=1.331e+01
Column=000015168 Fraction=79.8% Gflops=1.331e+01
Column=000015360 Fraction=80.8% Gflops=1.330e+01
Column=000015552 Fraction=81.8% Gflops=1.330e+01
Column=000015744 Fraction=82.8% Gflops=1.330e+01
Column=000015936 Fraction=83.8% Gflops=1.330e+01
Column=000016128 Fraction=84.8% Gflops=1.330e+01
Column=000016320 Fraction=85.9% Gflops=1.330e+01
Column=000016512 Fraction=86.9% Gflops=1.329e+01
Column=000016704 Fraction=87.9% Gflops=1.329e+01
Column=000016896 Fraction=88.9% Gflops=1.329e+01
Column=000017088 Fraction=89.9% Gflops=1.329e+01
Column=000017280 Fraction=90.9% Gflops=1.329e+01
Column=000017472 Fraction=91.9% Gflops=1.329e+01
Column=000017664 Fraction=92.9% Gflops=1.329e+01
Column=000017856 Fraction=93.9% Gflops=1.328e+01
Column=000018048 Fraction=94.9% Gflops=1.328e+01
Column=000018240 Fraction=96.0% Gflops=1.328e+01
Column=000018432 Fraction=97.0% Gflops=1.328e+01
Column=000018624 Fraction=98.0% Gflops=1.328e+01
Column=000018816 Fraction=99.0% Gflops=1.328e+01
================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR11C2R4       19008   192     1     1             345.24             1.3263e+01
HPL_pdgesv() start time Fri Jun  5 16:39:38 2020

HPL_pdgesv() end time   Fri Jun  5 16:45:23 2020

--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV--VVV-
Max aggregated wall time rfact . . . :               9.47
+ Max aggregated wall time pfact . . :               3.14
+ Max aggregated wall time mxswp . . :               0.64
Max aggregated wall time update  . . :             335.26
+ Max aggregated wall time laswp . . :              10.44
Max aggregated wall time up tr sv  . :               0.49
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   9.59421981e-04 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================

The above result showed that we were able to acheive 13.26 GFlops

Note: The system was not tuned, i.e. the following things were not changed

  • Reduction of any services running on the pi
  • CPU governor
  • CPU overclocking