Quantum ESPRESSO* for Intel® Xeon Phi™ Coprocessor

Image may be NSFW.
Clik here to view.

Purpose

This code recipe describes how to get, build, and use the Quantum ESPRESSO* code for the Intel® Xeon Phi™ coprocessor using the Intel® Math Kernel Library with Automatic Offload.

Introduction¹

Quantum ESPRESSO is an integrated suite of open source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudo potentials.

The Quantum ESPRESSO distribution consists of an “historical” core set of components, and a set of plug-ins that perform more advanced tasks, plus a number of third-party packages designed to be interoperable with the core components. To see what Quantum ESPRESSO can do, visit http://www.quantum-espresso.org/project/what-can-qe-do/.

The Quantum ESPRESSO code is maintained by developers around the world, coordinated through the Quantum ESPRESSO Foundation. The code is available as an open source distribution at http://www.quantum-espresso.org/download/.

Code Support for Intel® Xeon Phi™ coprocessor

Running Quantum ESPRESSO code version 5.0.2+ on the Intel® Xeon Phi™ coprocessor requires minimal modifications in the Quantum ESPRESSO build process and no Quantum ESPRESSO source modifications. Simply download the original Quantum ESPRESSO code from http://www.quantum-espresso.org/download/.

Intel Xeon Phi coprocessor support is made through Intel® Math Kernel Library (Intel® MKL) Automatic Offload features. Be sure you are using Intel MKL version 11.1.3 or later, available for download (described below).

Code Access

To get access to the Quantum ESPRESSO code:

Download the original Quantum ESPRESSO code at http://www.quantum-espresso.org/download/.
Obtain the Latest Version of Intel MKL or Intel® Composer XE, which includes the Intel® C/C++ Compiler and Intel MKL available from https://registrationcenter.intel.com/regcenter/register.aspx, or register at https://software.intel.com/en-us/ to get a free 30-day evaluation copy.

Build Directions

1. Unpack the Quantum Espresso source code

tar –xzvf espresso-5.0.2.tar.gz

2. Source the Intel Compiler and MPI* software (adapt the version numbers to the version you have)

source /opt/intel/composer_xe_2013.2.146/bin/compilervars.sh intel64
source /opt/intel/impi/4.1.1/bin64/mpivars.sh

3. Create an initial configuration with defined Intel compilers:

export FC=mpiifort
export F90=$FC
export F77=$FC
export MPIF90=$FC
export FCFLAGS="-O3 -xAVX -fno-alias -ansi-alias -g -mkl -$MKLROOT/include/fftw"
export FFLAGS=$FCFLAGS
export CC=mpiicc
export CPP="icc -E"
export CFLAGS=$FCFLAGS
export AR=xiar
export BLAS_LIBS=""
export LAPACK_LIBS="-lmkl_blacs_intelmpi_lp64"
export FFT_LIBS="-L$MKLROOT/intel64"
./configure --enable-openmp --enable-parallel

4. Edit the make.sys file in the top package directory and modify the DFLAGS variable in order to enable hybrid MPI+openmp mode, fftw3, and scalapack:

DFLAGS = -D__INTEL -D__FFTW3 -D__MPI -D__PARA -D__SCALAPACK -D__OPENMP
BLAS_LIBS = -lmkl_intel_thread -lmkl_core
LAPACK_LIBS = -lmkl_blacs_intelmpi_lp64

5. Build with the threaded Intel MKL, which is required to enable Intel MKL Automatic Offload.

make pw

Running Workloads on the Intel Xeon Phi coprocessor

1. Set up the environment:

source /opt/intel/composer_xe_2013.2.146/bin/compilervars.sh intel64
source /opt/intel/impi/4.1.1/bin64/mpivars.sh
export MKLROOT=
export LD_LIBRARY_PATH=$MKLROOT/lib/intel64:$LD_LIBRARY_PATH

2. Use these additional settings to define the Intel MKL Automatic Offload thresholds:

export MKL_MIC_THRESHOLDS_ZGEMM=500,500,500

3. Set OMP_NUM_THREADS explicitly and PPN for pin script.

4. Enable the Intel MKL Automatic Offload feature with the following (also see ao_pin.sh example below):

MKL_MIC_ENABLE=1

5. Load the MPSS environment with the following shell script:

. /opt/intel/mpss/2.1.6720-16/etc/mpss_vars.sh
export MIC_SYSROOT=/opt/intel/mpss/2.1.6720-16

6. Start the binary with “mpirun” through the thread pinning proxy:

mpirun <mpi_options> ./ao_pin.sh ${path_to_pw}/pw.x <pw_options>

7. The pinning proxy requires correct thread placing on the Intel Xeon Phi coprocessor side. See the run.sh script below for an example.

Shell Script Examples

ao_pin.sh

#!/bin/sh
export MKL_MIC_ENABLE=1
export MKL_DYNAMIC=false
export MKL_MIC_DISABLE_HOST_FALLBACK=1

export MIC_LD_LIBRARY_PATH=$MKLROOT/lib/mic:$MIC_LD_LIBRARY_PATH

# Number of host cores
export HOST_CORE=$(( $PPN * $OMP_NUM_THREADS ))

# MPI process per node
MPI_PER_NODE=$PPN

# Number of MIC cores (minus 1)
MIC_CORE=60

export MKL_MIC_MAX_MEMORY=1G
export MIC_USE_2MB_BUFFERS=64K

MYNODE=$((PMI_RANK / MPI_PER_NODE))
MYRANK=$((PMI_RANK % MPI_PER_NODE))

HALFCORE=$((HOST_CORE / 2))

#export OMP_NUM_THREADS=$((HOST_CORE / MPI_PER_NODE))
export MIC_OMP_NUM_THREADS=$((MIC_CORE * 8 / MPI_PER_NODE))

# 2x oversubscribing on MIC
#export MIC_OMP_NUM_THREADS=$((MIC_CORE * 16 / MPI_PER_NODE))

HOST_FROM=$((MYRANK * OMP_NUM_THREADS))
HOST_TO=$(((MYRANK + 1) * OMP_NUM_THREADS - 1))

MHOST_FROM=$((MYRANK * MIC_OMP_NUM_THREADS + 1))
MHOST_TO=$(((MYRANK + 1) * MIC_OMP_NUM_THREADS))

if [ "$HOST_FROM" -lt "$HALFCORE" ]; then
export OFFLOAD_DEVICES=0
else
export OFFLOAD_DEVICES=1
MHOST_FROM=$((MHOST_FROM - MIC_CORE * 4))
MHOST_TO=$((MHOST_TO - MIC_CORE * 4))
fi

export KMP_AFFINITY=explicit,granularity=fine,proclist=[${HOST_FROM}-${HOST_TO}:1]
export MIC_KMP_AFFINITY=explicit,granularity=fine,proclist=[${MHOST_FROM}-${MHOST_TO}:1]

echo "[${PMI_RANK}] ${OFFLOAD_DEVICES}: ${OMP_NUM_THREADS} => ${KMP_AFFINITY}, ${MIC_OMP_NUM_THREADS} => ${MIC_KMP_AFFINITY}"

numactl --cpunodebind=${OFFLOAD_DEVICES} $@

run.sh

#!/bin/sh
export QE_ROOT=$PWD

cd $QE_ROOT
export WID=AUSURF112
export WDIR=$QE_ROOT/../workloads/$WID
export CORES=$(( $NODES * $PPN * $OMPN ))
export RUNDIR=./rundir/${WID}_${CORES}c_${NODES}x${PPN}x${OMPN}_o_$LSB_JOBID
export BIN=$QE_ROOT/PW/src/pw.x

mkdir -p $RUNDIR
cd $RUNDIR
cp $WDIR/* .
echo "Stay in $PWD"

export OMP_NUM_THREADS=$OMPN
export MKL_NUM_THREADS=$OMPN

export I_MPI_FALLBACK_DEVICE=disable
export I_MPI_FABRICS=shm:dapl
export I_MPI_PIN=disable
export I_MPI_DEBUG=5

export MKL_MIC_ZGEMM_AA_M_MIN=500
export MKL_MIC_ZGEMM_AA_N_MIN=500
export MKL_MIC_ZGEMM_AA_K_MIN=500
export MKL_MIC_THRESHOLDS_ZGEMM=500,500,500


ldd $BIN > pw.ldd.out
mpirun -perhost $PPN -np $CPUS  $LS_SUBCWD/ao_pin.sh $BIN -in ./ausurf.in  2>&1 |tee ./run_${WID}_${CPUS}_${PPN}x${OMPN}omp.mklao.log

Performance Testing^2,3

Quantum Espresso is cluster–enabled, even with Intel Xeon Phi coprocessor support. The following chart shows performance and scalability of up to 2.59x compared to the CPU-only baseline, by increasing the number of nodes from 1 to 4, while using two processors and two coprocessors per node.

MPI + OpenMP distributions

Configurations without coprocessor:

1 node: MPI6, OMP4

4 nodes: MPI6, OMP4

Configurations with coprocessor

1 node: MPI6, OMP4

4 nodes: MPI6, OMP4

Image may be NSFW.
Clik here to view.

Your mileage will vary depending on your workload. Ausurf112 spends a fair amount of time in initialization, and larger workloads with more time spent in Intel MKL sections with Automatic Offload may see even better results.

Testing Platform Configurations⁴

The following hardware and software were used for the above recipe and performance testing.

Server Configuration:

2-socket/24 cores:
Processor: Intel® Xeon® processor E5-2697 V2 @ 2.70GHz (12 cores) with Intel® Hyper-Threading Technology⁵
Network: InfiniBand* Architecture Fourteen Data Rate (FDR)
Operating System: Red Hat Enterprise Linux* 2.6.32-358.el6.x86_64 #1 SMP Tue Jan 29 11:47:41 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
Memory: 64GB
Coprocessor: 1X Intel Xeon Phi coprocessor 7120P: 61 cores @ 1.238 GHz, 4-way Intel Hyper-Threading Technology, Memory: 15872 MB
Intel® Many-core Platform Software Stack Version 2.1.6720-15
Intel® C++ Compiler Version 14.0
Intel® MPI Library Version 4.1
Intel® Math Kernel Library special engineering build

Quantum ESPRESSO

Linux-x64_64-icc
CPU (MPI + OMP) – 4T or 8T – best chosen
Many Integrated Core – NUM Threads – 120T or 240T – best chosen

DISCLAIMERS:

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm

¹ From the Quantum ESPRESSO website: www.quantum-espresso.org

² Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

³ Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel.

Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

⁴ For more information go to http://www.intel.com/performance

⁵ Available on select Intel® processors. Requires an Intel® HT Technology-enabled system. Consult your PC manufacturer. Performance will vary depending on the specific hardware and software used. For more information including details on which processors support HT Technology, visit http://www.intel.com/info/hyperthreading.

Intel, the Intel logo, Xeon and Xeon Phi are trademarks of Intel Corporation in the US and/or other countries.

*Other names and brands may be claimed as the property of others.

Quantum ESPRESSO*

Intel® Xeon Phi™ Coprocessor

Intel® MKL

Intel® Xeon® Processor

Linux*

Server

Intel® Math Kernel Library

Architettura Intel® Many Integrated Core

Server

URL