How to configure OpenMP in LIGGGHTS for parallel computing

Submitted by yyz987 on Sun, 06/14/2015 - 09:14

The reference materials I can get for OpenMP configuration is that:
1.go to liggghts/src directory
2.run make yes-user-omp command
The problem I encountered is how to run the examples with OpenMP, what command I should input in the terminal.

richti83's picture

richti83 | Mon, 06/15/2015 - 09:10

as stated here: http://www.cfdem.com/comment/15078#comment-15078 the granular style is still not OpenMP enabled. The OMP Package in liggghts comes from LAMMPS and the only usefull fix I can see is nve/sphere/omp which is useless without pair- and wall styles. Not sure about neighbour-lists.
You need to use the MPI-parralisation by using a makefile which uses mpicxx (e.g make fedora, make ubuntu or make ubuntuVTK) and start liggghts with mpirun -np XX lmp_fedora -in SCRIPT.in

I'm not an associate of DCS GmbH and not a core developer of LIGGGHTS®
ResearchGate | Contact

yyz987 | Mon, 06/15/2015 - 16:54

Thanks for your answer first. I am a new LIGGGHTS learner, and I don't have any experience in using LIGGGHTS. But I have seen some papers about their efforts to implement an MPI/OpenMP hybrid parallelization of the LIGGGHTS. They have already done this in their papar. If you are interested in it, you can see Hybrid parallelization of the LIGGGHTS open-source DEM code for detail. I just don't know how they configure LIGGGHTS for implementing MPI/OpenMP hybrid parallelization.

richti83's picture

richti83 | Mon, 06/15/2015 - 20:47

Hi yyz,

yes I know the paper "Towards Parallelization of LIGGGHTS Granular Force Kernels with OpenMP" by R. Berger, A. Kohlmeyer and C. Kloss at DEM6 Chicago, 2014.
But they used a special, very small code base called miniMD (-granular) which is a castrated version of the original lammps code and added some very basic granular functionalities with OpenMP support there. As you can see in the answer of RBERGER I linked above this code is still not released, nor did any Line of code went from miniMD-granular to liggghts-public. So there is nothing you can enable because it's still not there.
You should imagine that a lot of recode is necessary to vectorize liggghts (trust me, I tried two times with GPU (openACC) and OpenMP, but with no usefull result). Take in mind that liggghts has more than 260.000 LOC and you would need to insert the openMP compiler directives in front of every loop and do all the memory mapping. So when you know 10 high performance computing openMP specialists, ask them to do this work for us (for free), I'm waiting for a solution since 2 years.....

I'm not an associate of DCS GmbH and not a core developer of LIGGGHTS®
ResearchGate | Contact

yyz987 | Sat, 06/20/2015 - 11:38

Hi richti83,
Thanks a lot for your specific explanation. I learn a lot from what you replied. Mr. Berger also replies and he says he will put the OpenMP Hybrid code to public in the near future.
Best wished.

rberger's picture

rberger | Tue, 06/16/2015 - 21:16

Hi yyz987,
First of all, sorry for the confusion. I'm the author of those papers and of the OpenMP implementation. Unfortunately the OpenMP Hybrid code hasn't made it into the public domain yet. If it were up to me, I would push the code to public right away. But there are some politics going on in the background about who can see what version of the code. Long story short, we're (weeks?) away from launching a LIGGGHTS repository targeted for developers which will also contain a branch with a version of the hybrid MPI/OpenMP parallelization and many other improvements (e.g. faster insertion, better MPI scalability). For more details, feel free to contact me via Email: richard.berger [at] jku.at
Cheers,
Richard

yyz987 | Sat, 06/20/2015 - 11:24

Hi Richard,
Thanks a lot for answering my question first. This question bothered me for a long time before you reply. I am looking forward for your publication of the OpenMP Hybrid code.
Best wishes

rberger's picture

rberger | Sat, 07/11/2015 - 13:20

Hi everyone,

I just released the OpenMP code as public fork of LIGGGHTS-PUBLIC. Unfortunately DCS decided against the LIGGGHTS-DEV repo in the last minute. So I'm releasing the OpenMP code like this instead. Note this version of LIGGGHTS is not approved or endorsed by DCS Computing GmbH, the producer of the LIGGGHTS® and CFDEM®coupling software and owner of the LIGGGHTS and CFDEM® trade marks. So don't expect any support from them on this.

https://github.com/ParticulateFlow/LIGGGHTS-PUBLIC/tree/openmp_paralleli...

If you have questions or find bugs in this OpenMP version feel free to contact me via mail (richard.berger [at] jku.at).
I hope many of you find this useful. It's one small step towards a better HPC version for the masses.

Best wishes
Richard

richti83's picture

richti83 | Tue, 07/14/2015 - 12:44

Hi Richard, many thanks for releasing this code base.
I spent this morning installing user-omp and user-zoltan. Here is a short record for other users who want to try it out of what I've done to install LIGGGHTS-PUBLIC with openmp_parallelization

git clone -b openmp_parallelization https://github.com/ParticulateFlow/LIGGGHTS-PUBLIC.git LIGGGHTS-OMP
#
cd LIGGGHTS-OMP
#
wget http://www.cs.sandia.gov/~kddevin/Zoltan_Distributions/zoltan_distrib_v3.82.tar.gz
#
tar -xf zoltan_distrib_v3.82.tar.gz
#
cd Zoltan_v3.82/
#----in Zoltan dir----
mkdir build
cd build
../configure --with-cxxflags=-fPIC --with-gnumake --with-ccflags=-fPIC --with-cflags=-fPIC
#
make everything
sudo make install
sudo ldconfig
cp src/include/Zoltan_config.h ../src/include/
#-------------------
cd ../../src
#-------- in LIGGGHTS-OMP/src/ dir -----------
make yes-USER-OMP
make yes-USER-ZOLTAN
mkdir build
cd build
ccmake ../
#set CMAKE_BUILD_TYPE to Release
#enable USE_OPENMP
[c] configure -> cmake should find libzoltan.a in /usr/local/lib otherwise check installation of Zoltan !
[c] (again)
[g] generate (ccmake will close)
make
#-----------------------------------------
#
#
##########################
#make it known systemwide#
##########################
cd /usr/bin/
sudo ln -s $HOME/LIGGGHTS/LIGGGHTS-OMP/src/build/liggghts lmp3omp
#
########################
# test example: #
########################
#
cd ~/LIGGGHTS/LIGGGHTS-OMP/examples/LIGGGHTS/OMP/rotaryKiln/openmp
lmp3omp -in in.rotaryKiln_init -var NTHREADS 16
lmp3omp -in in.rotaryKiln_run -var NTHREADS 16
#
#My Timings (at 2x8C E5-2687W 0 @ 3.10GHz, no Hyperthreading)
#OpenMP, no MPI:
#lmp3omp -in in.rotaryKiln_run -var NTHREADS 16
#Loop time of 717.956 on 16 procs (1 MPI x 16 OpenMP) for 400000 steps with 20000 atoms
#only MPI:
#mpirun -np 16 lmp3 -in in.rotaryKiln_run -var XPROCS 1 -var YPROCS 4 -var ZPROCS 4 (note: I used standard liggghts for this test)
#Loop time of 859.261 on 16 procs for 400000 steps with 20000 atoms
#MPI & OpenMP
#did not work
#
#
#that's all

the liggghts executable does not work with mpirun, any hints what could cause this problem, or is a hybrid parallelization not "meant to be" ? I can see that N processes are spawn, but I don't get a version information or any error message at command prompt.

Thanks again,
Christian.

I'm not an associate of DCS GmbH and not a core developer of LIGGGHTS®
ResearchGate | Contact

richti83's picture

richti83 | Tue, 07/14/2015 - 15:26


the liggghts executable does not work with mpirun

This problem has been related with my openmpi 1.7.3 installation from fedora package installer. With a hand build openmpi 1.6.3 version from OF/ThirdParty it works fine. But -as expected- on a single node with dualcore 8c processor one process with 16 threads is faster than N processed with 16/N threads. Next step is to test on our cluster, but I need to wait until some more important jobs are finished ...
Best,
Christian.

I'm not an associate of DCS GmbH and not a core developer of LIGGGHTS®
ResearchGate | Contact

rberger's picture

rberger | Tue, 07/14/2015 - 16:39

You're welcome, I'm happy to finally let other people test drive it. I've been using OpenMPI 1.8 and Mvapich2 as MPI implementations. For the hybrid usage here's a tip: if there is more than one socket on your machine, use MPI between the sockets. E.g. 2x sockets with 8 core cpus -> 2x MPI + 8xOpenMP. That way you'll minimize costly communication between sockets. Choose your MPI domain cuts wisely, since they're not load-balanced in this code base. For this to work you have to ensure your MPI implementation binds the MPI processes to the right cores. Thread-binding can further improve performance.

The reason for this is simple, think: number of memory controllers = number of allocated memory regions = number of MPI processes. Although OpenMP threads work independently the global data of atoms will be allocated "near" the core which runs the MPI process. A MPI process spawns - lets say - 8 threads. At some point these OpenMP threads, even though they work independently, will saturate that one memory channel leading to global memory. The only way to increase usage of memory channels right now is to increase the number of MPI processes. I've got some ideas how to avoid this, but it's another big task which will involve rather large code changes. For now the quick fix is to use some MPI processes to increase memory bandwidth utilization.
Cheers,
Richard
PS: I think we should close this thread, we're still in the CFDEMcoupling sub-forum ;-)

j-kerbl's picture

j-kerbl | Mon, 06/15/2015 - 10:22

Hi yyz987,

please don't just post in every forum.

Thank you

yyz987 | Mon, 06/15/2015 - 16:43

I apologize for what I did. I am a new liggghts learner, and I don't know this rule in the forum. I just don't know which forum can answer my question.