Parallel computation time (mpirun)

Submitted by ZYan on Sun, 02/16/2014 - 20:14

Dear colleagues,

Recently I am repeating a same simulation with different cores (1-128 cores ) with our HPC.

I find two things that are strange to me:

1. When the number of the cores changes, the number of the particles that are created from the input script is different, the difference is not big (a few to ten particles).

Is this a bug of the liggghts or i set wrong parameters?

2. The computation time is not minimized as the number of the cores increases. The computation time more or less the same even I used 128 cores. However, for the usage of another software -openFOAM, more cores make simualtions run faster as expected.

Is this because liggghts is not correctly configured or something else.

Thank you for your expertise,

Regards,

Zilin

willchennewcastle's picture

willchennewcastle | Mon, 02/17/2014 - 05:30

Hi Zilin,

what command do you input in to the terminal to run the simulation?

mpirun -np 128 liggghts < in .script??

maybe try /usr/bin/mpirun.mpich2 -np 128 liggghts < in.script

greetings,

wei

ZYan | Mon, 02/17/2014 - 11:36

Hi Wei,
Thank you for the reply. Yes I use command: mpirun -np 128 liggghts < in .script
I cannot find /usr/bin/mpirun.mpich2

But I made some tests using the example packing (tutortials), it seems using more cores can speed the computation!

Somting is wrong with my scipt?

#Here is my script i used for my simulation:
atom_style granular
atom_modify map array
boundary f f f
newton off
communicate single vel yes
units si
region reg block 0. 0.3 0. 0.5 0. 0.3 units box
create_box 1 reg
neighbor 0.002 bin
neigh_modify delay 0

#Material properties required for new pair styles

fix m1 all property/global youngsModulus peratomtype 5.e6
fix m2 all property/global poissonsRatio peratomtype 0.45
fix m3 all property/global coefficientRestitution peratomtypepair 1 0.3
fix m4 all property/global coefficientFriction peratomtypepair 1 0.1
fix m5 all property/global coefficientRollingFriction peratomtypepair 1 0.05
fix m6 all property/global k_finnie peratomtypepair 1 1.0

#New pair style
# pair_style gran/hertz/history #Hertzian without cohesion
pair_style gran/hertz/history rolling_friction cdt #rolling_friction
pair_coeff * *

timestep 0.00001

fix gravi all gravity 9.81 vector 0.0 -1.0 0.0

# import trangular mesh (the sand container)
fix cad1 all mesh/surface file box.stl type 1 scale 0.001
fix cad2 all mesh/surface file lid.stl type 1 scale 0.001
fix cad3 all mesh/surface file plane.stl type 1 scale 0.001

#use the imported mesh as granular wall
fix granwalls all wall/gran/hertz/history mesh n_meshes 3 meshes cad1 cad2 cad3 rolling_friction cdt
#fix walls all gran/hertz/history rolling_friction cdt

#distributions for insertion
fix pts1 all particletemplate/sphere 1 atom_type 1 density constant 2500 radius constant 0.001
fix pdd1 all particledistribution/discrete 2000. 1 pts1 1.0

#region and insertion
region istbox block 0.11 0.145 0.10 0.28 0.11 0.145 units box
group nve_group region istbox

#particle insertion
fix ins nve_group insert/pack seed 10000 distributiontemplate pdd1 &
maxattempt 200 insert_every once overlapcheck yes all_in yes vel constant 0. 0. 0. &
region istbox volumefraction_region 0.6

#apply nve integration to all particles that are inserted as single particles
fix integr nve_group nve/sphere

#output settings, include total thermal energy
compute 1 all erotate/sphere
thermo_style custom step atoms ke c_1 vol
thermo 1000
thermo_modify lost ignore norm no
compute_modify thermo_temp dynamic yes

#insert the first particles so that dump is not empty
run 1
dump dmp all custom 1000 post/dump*.sandpile id type type x y z vx vy vz fx fy fz radius

#insert particles
run 25000 upto
unfix ins

rberger's picture

rberger | Mon, 02/17/2014 - 10:41

1. It's not a bug, it's a small price we pay for faster parallelization. Insertion is slightly different when using MPI decomposition. It's stable and reproducable, but not if you change the number of processors.

2. Automatic MPI decomposition works, but is not optimal in many cases. A bad decomposition can cause load-imbalance, leading to degraded performance. So usually you want to manually adjust how LIGGGHTS decomposes your domain using the "processors X Y Z" command. See http://lammps.sandia.gov/doc/processors.html for more information.

Cheers,
Richard

ZYan | Mon, 02/17/2014 - 11:42

Hi Richard,

Thanks for the explaination concering the changing insertion numbers. I will play with the Processor x y z. I will give feedback soon. Thanks again,
Cheers,

Zilin

ZYan | Mon, 02/17/2014 - 14:44

Hi Richard,
Thank you for your expertise. Choosing a proper grid for the processors enables a faster parallelization!

Cheers,

zilin

PaulWinkler's picture

PaulWinkler | Mon, 02/24/2014 - 11:11

Hi,

can you please check if you get grid effects while using fix insert and MPI? Since the geometry of the simulation is parted, on the borders of each domain appear wall effects and distortion in particle distribution. You can check this easily, there are gaps on the domain borders watching your inserted particles in x, y or z axis. This is just one of the reasons I am not using MPI at the moment.

Regards.
Paul