How to configure OpenMP in LIGGGHTS for parallel computing

The reference materials I can get for OpenMP configuration is that：
1.go to liggghts/src directory
2.run make yes-user-omp command
The problem I encountered is how to run the examples with OpenMP, what command I should input in the terminal.

Forums:

CFDEM®coupling- Developer Forum

richti83 | Mon, 06/15/2015 - 09:10

liggghts is not OMP enabled

as stated here: http://www.cfdem.com/comment/15078#comment-15078 the granular style is still not OpenMP enabled. The OMP Package in liggghts comes from LAMMPS and the only usefull fix I can see is nve/sphere/omp which is useless without pair- and wall styles. Not sure about neighbour-lists.
You need to use the MPI-parralisation by using a makefile which uses mpicxx (e.g make fedora, make ubuntu or make ubuntuVTK) and start liggghts with mpirun -np XX lmp_fedora -in SCRIPT.in

yyz987 | Mon, 06/15/2015 - 16:54

Hi richti83

Thanks for your answer first. I am a new LIGGGHTS learner, and I don't have any experience in using LIGGGHTS. But I have seen some papers about their efforts to implement an MPI/OpenMP hybrid parallelization of the LIGGGHTS. They have already done this in their papar. If you are interested in it, you can see Hybrid parallelization of the LIGGGHTS open-source DEM code for detail. I just don't know how they configure LIGGGHTS for implementing MPI/OpenMP hybrid parallelization.

richti83 | Mon, 06/15/2015 - 20:47

paper

Hi yyz,

yes I know the paper "Towards Parallelization of LIGGGHTS Granular Force Kernels with OpenMP" by R. Berger, A. Kohlmeyer and C. Kloss at DEM6 Chicago, 2014.
But they used a special, very small code base called miniMD (-granular) which is a castrated version of the original lammps code and added some very basic granular functionalities with OpenMP support there. As you can see in the answer of RBERGER I linked above this code is still not released, nor did any Line of code went from miniMD-granular to liggghts-public. So there is nothing you can enable because it's still not there.
You should imagine that a lot of recode is necessary to vectorize liggghts (trust me, I tried two times with GPU (openACC) and OpenMP, but with no usefull result). Take in mind that liggghts has more than 260.000 LOC and you would need to insert the openMP compiler directives in front of every loop and do all the memory mapping. So when you know 10 high performance computing openMP specialists, ask them to do this work for us (for free), I'm waiting for a solution since 2 years.....

yyz987 | Sat, 06/20/2015 - 11:38

Thanks

Hi richti83,
Thanks a lot for your specific explanation. I learn a lot from what you replied. Mr. Berger also replies and he says he will put the OpenMP Hybrid code to public in the near future.
Best wished.

rberger | Tue, 06/16/2015 - 21:16

OpenMP implementation

Hi yyz987,
First of all, sorry for the confusion. I'm the author of those papers and of the OpenMP implementation. Unfortunately the OpenMP Hybrid code hasn't made it into the public domain yet. If it were up to me, I would push the code to public right away. But there are some politics going on in the background about who can see what version of the code. Long story short, we're (weeks?) away from launching a LIGGGHTS repository targeted for developers which will also contain a branch with a version of the hybrid MPI/OpenMP parallelization and many other improvements (e.g. faster insertion, better MPI scalability). For more details, feel free to contact me via Email: richard.berger [at] jku.at
Cheers,
Richard

yyz987 | Sat, 06/20/2015 - 11:24

Thanks

Hi Richard,
Thanks a lot for answering my question first. This question bothered me for a long time before you reply. I am looking forward for your publication of the OpenMP Hybrid code.
Best wishes

rberger | Sat, 07/11/2015 - 13:20

OpenMP code release!

Hi everyone,

I just released the OpenMP code as public fork of LIGGGHTS-PUBLIC. Unfortunately DCS decided against the LIGGGHTS-DEV repo in the last minute. So I'm releasing the OpenMP code like this instead. Note this version of LIGGGHTS is not approved or endorsed by DCS Computing GmbH, the producer of the LIGGGHTS® and CFDEM®coupling software and owner of the LIGGGHTS and CFDEM® trade marks. So don't expect any support from them on this.

https://github.com/ParticulateFlow/LIGGGHTS-PUBLIC/tree/openmp_paralleli...

If you have questions or find bugs in this OpenMP version feel free to contact me via mail (richard.berger [at] jku.at).
I hope many of you find this useful. It's one small step towards a better HPC version for the masses.

Best wishes
Richard

richti83 | Tue, 07/14/2015 - 12:44

THANKS !

Hi Richard, many thanks for releasing this code base.
I spent this morning installing user-omp and user-zoltan. Here is a short record for other users who want to try it out of what I've done to install LIGGGHTS-PUBLIC with openmp_parallelization
git clone -b openmp_parallelization https://github.com/ParticulateFlow/LIGGGHTS-PUBLIC.git LIGGGHTS-OMP # cd LIGGGHTS-OMP # wget http://www.cs.sandia.gov/~kddevin/Zoltan_Distributions/zoltan_distrib_v3.82.tar.gz # tar -xf zoltan_distrib_v3.82.tar.gz # cd Zoltan_v3.82/ #----in Zoltan dir---- mkdir build cd build ../configure --with-cxxflags=-fPIC --with-gnumake --with-ccflags=-fPIC --with-cflags=-fPIC # make everything sudo make install sudo ldconfig cp src/include/Zoltan_config.h ../src/include/ #------------------- cd ../../src #-------- in LIGGGHTS-OMP/src/ dir ----------- make yes-USER-OMP make yes-USER-ZOLTAN mkdir build cd build ccmake ../ #set CMAKE_BUILD_TYPE to Release #enable USE_OPENMP [c] configure -> cmake should find libzoltan.a in /usr/local/lib otherwise check installation of Zoltan ! [c] (again) [g] generate (ccmake will close) make #----------------------------------------- # # ########################## #make it known systemwide# ########################## cd /usr/bin/ sudo ln -s $HOME/LIGGGHTS/LIGGGHTS-OMP/src/build/liggghts lmp3omp # ######################## # test example: # ######################## # cd ~/LIGGGHTS/LIGGGHTS-OMP/examples/LIGGGHTS/OMP/rotaryKiln/openmp lmp3omp -in in.rotaryKiln_init -var NTHREADS 16 lmp3omp -in in.rotaryKiln_run -var NTHREADS 16 # #My Timings (at 2x8C E5-2687W 0 @ 3.10GHz, no Hyperthreading) #OpenMP, no MPI: #lmp3omp -in in.rotaryKiln_run -var NTHREADS 16 #Loop time of 717.956 on 16 procs (1 MPI x 16 OpenMP) for 400000 steps with 20000 atoms #only MPI: #mpirun -np 16 lmp3 -in in.rotaryKiln_run -var XPROCS 1 -var YPROCS 4 -var ZPROCS 4 (note: I used standard liggghts for this test) #Loop time of 859.261 on 16 procs for 400000 steps with 20000 atoms #MPI & OpenMP #did not work # # #that's all
the liggghts executable does not work with mpirun, any hints what could cause this problem, or is a hybrid parallelization not "meant to be" ? I can see that N processes are spawn, but I don't get a version information or any error message at command prompt.

Thanks again,
Christian.

richti83 | Tue, 07/14/2015 - 15:26

OK I got it

the liggghts executable does not work with mpirun

This problem has been related with my openmpi 1.7.3 installation from fedora package installer. With a hand build openmpi 1.6.3 version from OF/ThirdParty it works fine. But -as expected- on a single node with dualcore 8c processor one process with 16 threads is faster than N processed with 16/N threads. Next step is to test on our cluster, but I need to wait until some more important jobs are finished ...
Best,
Christian.

rberger | Tue, 07/14/2015 - 16:39

Hi Christian

You're welcome, I'm happy to finally let other people test drive it. I've been using OpenMPI 1.8 and Mvapich2 as MPI implementations. For the hybrid usage here's a tip: if there is more than one socket on your machine, use MPI between the sockets. E.g. 2x sockets with 8 core cpus -> 2x MPI + 8xOpenMP. That way you'll minimize costly communication between sockets. Choose your MPI domain cuts wisely, since they're not load-balanced in this code base. For this to work you have to ensure your MPI implementation binds the MPI processes to the right cores. Thread-binding can further improve performance.

The reason for this is simple, think: number of memory controllers = number of allocated memory regions = number of MPI processes. Although OpenMP threads work independently the global data of atoms will be allocated "near" the core which runs the MPI process. A MPI process spawns - lets say - 8 threads. At some point these OpenMP threads, even though they work independently, will saturate that one memory channel leading to global memory. The only way to increase usage of memory channels right now is to increase the number of MPI processes. I've got some ideas how to avoid this, but it's another big task which will involve rather large code changes. For now the quick fix is to use some MPI processes to increase memory bandwidth utilization.
Cheers,
Richard
PS: I think we should close this thread, we're still in the CFDEMcoupling sub-forum ;-)

j-kerbl | Mon, 06/15/2015 - 10:22

Hi yyz987,

please don't just post in every forum.

Thank you

yyz987 | Mon, 06/15/2015 - 16:43

Hi j-kerbl

I apologize for what I did. I am a new liggghts learner, and I don't know this rule in the forum. I just don't know which forum can answer my question.

How to configure OpenMP in LIGGGHTS for parallel computing

Forums:

For full access including downloads and forums, please register