Parallel processing in the "LIGGGHTS-PUBLIC"

Dear friends,

I have installed "LIGGGHTS-PUBLIC" on my desktop computer and then run my code in the LIGGGHTS without any error with this command: liggghts < in.liggghts_1

But when I tried to run it in the parallel processing mode ( mpirun -np 4 liggghts < in.liggghts_1), it was not possiable.

I received the following message:

mpirun was unable to launch the specified application as it could not find an executable:
Executable: liggghts
Node: ubuntu
while attempting to start process rank 0.
--------------------------------------------------------------------------
4 total processes failed to start

Has anyone any idea about this error ?
Thanks in advance,

Regards,
Ebrahim

Forums:

LIGGGHTS® - User Forum

Philippe | Tue, 07/17/2012 - 08:20

Dear Ebrahim, it seems like

Dear Ebrahim,

it seems like there is a problem with your alias/symlink to the liggghts executable. Try checking your $PATH settings and your .bashrc

best
Philippe

e.derakhshani | Wed, 07/18/2012 - 15:27

Thanks Philippe

Dear Philippe,

Thanks for your repose.
I solved it bee on your recommendation.

Thanks,
Ebrahim

sharonyue | Sat, 06/20/2015 - 14:04

hello,

how do u fix this?
this is my bashrc:

alias liggghts="$HOME/LIGGGHTS-3-beta-PUBLIC-master/src/lmp_fedora"

does not work with parallel. But i can run it in serial.

citizen erased | Sun, 04/10/2016 - 01:08

way to make it work

I had exactly the same problem but it seems that I managed to make it work by using the following command:
mpirun -np 4 /home/USER/LIGGGHTS-PUBLIC/src/lmp_fedora -in in.FILE ( you have to replace USER and FILE)
instead of:
mpirun -np 4 liggghts -in in.FILE
even if I have the same line in the bashrc.

I hope this will work for you.

Regards,

Abdellah

mschramm | Mon, 04/11/2016 - 16:21

alternative to above

I also ran into this problem. A way to get around this is to export the variable instead of making it an alias. So in my bashrc I have the following.
liggghts_mpi=~/path/to/liggghts/lmp_fedora
export liggghts_mpi

this allows me to use the following command to run a simulation
mpirun -n 8 $liggghts_mpi

govind | Thu, 07/20/2017 - 09:12

PARALLEL RUN

I do not fine lmp_fedora in src folder.

Govind

alice | Thu, 07/20/2017 - 13:42

Hi Govind,

Hi Govind,
the exact name of the liggghts object depends on the name of the used makefile. Which command did you use for compiling LIGGGHTS? (In case you used "make auto", please look for lmp_auto etc...)?
Cheers,
Alice

govind | Thu, 07/20/2017 - 15:32

Thank you Alice,

Now its fine. I used make fedora. But I have doubt while installing LIGGGHTS I didn't use this and running serial cases without any errors.

I am facing another issue , it runs in parallel but gives an error for dump command : "Invalid dump command". I using this :

dump dmp all custom/vtk 10000 post/dump*.vtk id type type x y z ix iy iz vx vy vz fx fy fz omegax omegay omegaz radius

It didn't not create problem with serial run.

Govind

aaigner | Thu, 07/20/2017 - 16:03

No changes?

Did you run the exact same case in serial and parallel, but only in parallel LIGGGHTS fails with the error message?

govind | Thu, 07/20/2017 - 16:27

NO CHANGES

Thank you ,

Yes its the exact same case for both. Only fails in parallel.

Govind

aaigner | Sat, 07/22/2017 - 21:40

well...

Hello!
Now its getting difficult.
If your executable worked in serial, there is no reason why it should not recognize the dump command in parallel.

Can you post your input script and the two commands (serial, parallel) to run your script. Maybe you can open a new thread for your problem.

Regards
Andreas

govind | Fri, 07/21/2017 - 09:08

PARALLEL_TIME_IS_MORE

Hi,

I am running in parallel but its taking too much time more than serial way. But memory per processor is decreasing.

For 4 processors i used the command:

processors : 1 1 4

Am I going in right way or not?

Govind

medvedeg | Fri, 07/21/2017 - 10:20

Dear govind,

"processors" command divides the simulation domain (region in "create_box") between the cpu cores into subdomains. Particles in each subdomain are processed by one cpu coure. Each time step there is communication between cpu cores because the particles can travel from subdomain to another. By the processor command you can define in how many subdomains the simulation domain will be divided in wich direction. In your case, the domain will be divided in 4 parts in z-direction. It can be possible, that some domains have less particles inside that the others. It means such division "1 1 4" is not optimal. Optimal parameters can be different for each simulation. Try another combinations like "2 2 1", "1 4 1", "2 1 2", etc. Also check neighbor bin size, it must be 0.5-1 of the minimum particle radius. The smaller is the bin size, the more memory it will use.

govind | Fri, 07/21/2017 - 11:59

SAME

Thank you ,

I made changes as per your instructions. But I do not see any significance speedup in simulation. I updated the bin size and tried with all combinations. Even particle insertion stage is taking time. I am solving couette cylinder problem from LIGGGHTS tutorials.

Govind

aaigner | Sat, 07/22/2017 - 21:47

Min. number of particles

Hello!
By the way, there is lower boundary, where further parallelization is not useful and the communication overhead use up the additional computational power. As a rule of thump less than 10000-20000 particles per processor are inefficient.

Regards
Andreas

govind | Sun, 07/23/2017 - 06:48

THANKS Andreas

Here is the input script :

### Couette cylinder simulation

### This simulation firs inserts a sett of partticles into the Couette cylinder and allows them to settle. Then the interior cylinder is set to rotate at
### a constant rate and the bottom of the hopper is opened to allow material to flow in the verticle direction. A periodic boundary is used in the
### z-direction to reinsert the flow of material and maintain a constant mass of material in the apparatus.

### Initialization

# Preliminaries
units si
atom_style sphere
boundary f f p
newton off
communicate single vel yes
processors 2 2 1

# Declare domain
region reg block -0.078 0.078 -0.078 0.078 -0.1524 0.306 units box
create_box 2 reg

#Neighbor listing
neighbor 0.0006 bin
neigh_modify delay 0

### Setup

# Material properties and interactions
fix m1 all property/global youngsModulus peratomtype 2.5e7 2.5e7
fix m2 all property/global poissonsRatio peratomtype 0.25 0.25
fix m3 all property/global coefficientRestitution peratomtypepair 2 0.5 0.5 0.5 0.5
fix m4 all property/global coefficientFriction peratomtypepair 2 0.5 0.5 0.5 0.5
fix m5 all property/global coefficientRollingFriction peratomtypepair 2 0.1 0.1 0.1 0.1

# Particle insertion

fix pts1 all particletemplate/sphere 15485863 atom_type 1 density constant 1000 radius constant 0.00125
fix pts2 all particletemplate/sphere 49979687 atom_type 2 density constant 1000 radius constant 0.00100
fix pdd all particledistribution/discrete 67867967 2 pts1 0.5 pts2 0.5
fix ins_mesh all mesh/surface file mesh/factory.stl type 1 scale 0.001
fix ins all insert/stream seed 32452867 distributiontemplate pdd nparticles 450000 particlerate 900000 overlapcheck yes vel constant 0. 0. -3.0 &
insertion_face ins_mesh extrude_length 0.02

# Import mesh from cad
fix cad1 all mesh/surface file mesh/outer_cylinder.stl type 1 scale 0.001 curvature 1e-5
fix cad2 all mesh/surface file mesh/inner_cylinder.stl type 1 scale 0.001 curvature 1e-5
fix cad3 all mesh/surface file mesh/funnel.stl type 1 scale 0.001 curvature 1e-5
fix cad4 all mesh/surface file mesh/plate.stl type 1 scale 0.001 curvature 1e-5

# Use the imported mesh as granular wall
fix geometry all wall/gran model hertz tangential history rolling_friction cdt mesh n_meshes 4 meshes cad1 cad2 cad3 cad4

# Define the physics
pair_style gran model hertz tangential history rolling_friction cdt
pair_coeff * *

### Detailed settings

# Inegrator
fix integrate all nve/sphere

# Gravity
fix grav all gravity 9.81 vector 0.0 0.0 -1.0

# Timestep
timestep 0.00000625

# Thermodynamic output settings
thermo_style custom step atoms ke cpu
thermo 8000
thermo_modify lost ignore norm no

# Check timestep
fix timecheck all check/timestep/gran 1 0.01 0.01
run 1
unfix timecheck

# Dump output

dump dmp all custom 16000 post/dump*.CouetteCylinder id type type x y z ix iy iz vx vy vz fx fy fz omegax omegay omegaz radius
dump dumpstl all mesh/stl 16000 post/dump*.stl
dump dmp2 all custom 16000 dump.txt type x y z radius

### Execution and further settings

# Run 1.0 sec to insert and settle particles
run 160000 upto

# Remove the stopper and start the rotation
unfix geometry
fix geometry all wall/gran model hertz tangential history rolling_friction cdt mesh n_meshes 3 meshes cad1 cad2 cad3
fix movecad all move/mesh mesh cad2 rotate origin 0. 0. 0. axis 0. 0. 1. period 1.

# Run 30 sec
run 4800000

I am using 4 processors with 450000 particles. Particle insertion process is also as compared with serial mode. I applied processors command with all possible combinations such as 2 2 1 , 1 2 2 , 1 4 1, 1 1 4 , 4 1 1. And changed the bin size with 0.5 - 1.0 times of min radius of particle.

Govind

Fabeeha | Fri, 08/10/2018 - 22:34

Any Improvement in your processing performance

Hi Govind,
I'm also struggling with this same problem and just found your thread in forum, Kindly update with your progress in this regard. It'll be a great favor me.

ahad61 | Tue, 08/01/2017 - 14:33

combinations for 12 processors

Hi
would you please inform me about different combinations for 12 processors?

Thank you
Ahad

Parallel processing in the "LIGGGHTS-PUBLIC"

Forums:

For full access including downloads and forums, please register