Any optimization suggested to speed up the simulation?

Submitted by Matteo on Fri, 05/06/2016 - 16:33

Hello,

I know I recently opened a topic but I think this question requires another discussion.

I am trying to run simulations of fine powder, with particle dimension described by a Gaussian distribution with mean value of 50 microns.

Currently I have a domain with about 400000 particles and I am running the simulation on a cluster, using 256 CPUs.
Fortunately, for cases with less particles I noticed that the running time scaled well with the number of CPUs so, I am happy to use 256 CPUs for this more complicated case. Maybe I will run soon the same simulation with a different number of CPUs and plot a chart to see the trend computation time vs CPUs.

Anyway, for this simulation (400000 particles), even with 256 CPUs the computation seems to be very slow.

I was wondering if I should change the value set for for the variable "neighbor". Currently it is set to 0.001 as suggested when using SI units.
But should this value be related also to the dimension of the actual particles? 1mm for particles with a size below 1mm seems a bit too much.

You can have a look below at the first part of the log file and see if maybe you notice anything that can be improved.

Created orthogonal box = (-0.0255 -0.0105 -0.0055) to (0.0255 0.0105 0.007)
16 by 4 by 4 MPI processor grid

Reading STL file 'meshes/Box.stl'

Reading STL file 'meshes/Recoater.stl'

Reading STL file 'meshes/Building_Piston.stl'

Reading STL file 'meshes/Powder_Piston.stl'

Reading STL file 'meshes/Inlet.stl'
Fix particledistribution/discrete (id pdd1): distribution based on mass%:
pts1: d=1.900000e-04 (max. bounding sphere) mass%=50.000000%
pts2: d=2.000000e-04 (max. bounding sphere) mass%=50.000000%
Fix particledistribution/discrete (id pdd1): distribution based on number%:
pts1: d=1.900000e-04 (max. bounding sphere) number%=55.308642%
pts2: d=2.000000e-04 (max. bounding sphere) number%=44.691358%
0 atoms in group nve_group
Setting up run ...
Import and parallelization of mesh cv containing 38 triangle(s) successful
Import and parallelization of mesh rc containing 2 triangle(s) successful
Import and parallelization of mesh bx containing 2 triangle(s) successful
Import and parallelization of mesh ax containing 2 triangle(s) successful
Import and parallelization of mesh inface containing 2 triangle(s) successful
INFO: Particle insertion ins: 423952591.109035 particles every 520000 steps - particle rate 8152934444.404511 (mass rate 50.000000)
391340 particles (mass 0.002400) within 0 steps
Memory usage per processor = 13.7887 Mbytes
Step Atoms KinEng rke ts[1] ts[2] Volume
0 0 -0 0 0 0 1.33875e-05
INFO: Particle insertion ins: inserted 159166 particle templates (mass 0.001087) at step 1
- a total of 159166 particle templates (mass 0.001087) inserted so far.
WARNING: Particle insertion: Less insertions than requested (../fix_insert.cpp:734)
1 159166 5.4325788e-06 0 0 0 1.33875e-05
Loop time of 1.03687 on 256 procs for 1 steps with 159166 atoms

Pair time (%) = 0.00814638 (0.78567)
Neigh time (%) = 0.0258669 (2.49471)
Comm time (%) = 0.00268626 (0.259074)
Outpt time (%) = 0.571987 (55.1648)
Modfy time (%) = 0.428079 (41.2857)
Other time (%) = 0.000104312 (0.0100603)
Fix m1 property/global time (%) = 2.72878e-07 (2.63174e-05)
Fix m2 property/global time (%) = 1.18278e-07 (1.14072e-05)
Fix m3 property/global time (%) = 6.23986e-08 (6.01798e-06)
Fix m4 property/global time (%) = 9.22009e-08 (8.89224e-06)
Fix m5 property/global time (%) = 9.03383e-08 (8.71259e-06)
Fix gravi gravity time (%) = 6.09457e-06 (0.000587786)
Fix cv mesh/surface time (%) = 1.27284e-05 (0.00122758)
Fix rc mesh/surface time (%) = 6.23707e-06 (0.000601528)
Fix bx mesh/surface time (%) = 6.08619e-06 (0.000586977)
Fix ax mesh/surface time (%) = 5.87758e-06 (0.000566858)
Fix inface mesh/surface time (%) = 1.01142e-06 (9.75451e-05)
Fix wall wall/gran time (%) = 7.05896e-05 (0.00680795)
Fix wall_neighlist_cv neighlist/mesh time (%) = 5.4799e-06 (0.000528504)
Fix n_neighs_mesh_cv property/atom time (%) = 3.77186e-07 (3.63773e-05)
Fix tracker_cv contacthistory/mesh time (%) = 9.30484e-06 (0.000897397)
Fix wall_neighlist_rc neighlist/mesh time (%) = 1.11386e-06 (0.000107425)
Fix n_neighs_mesh_rc property/atom time (%) = 1.47149e-07 (1.41916e-05)
Fix tracker_rc contacthistory/mesh time (%) = 8.32975e-06 (0.000803355)
Fix wall_neighlist_bx neighlist/mesh time (%) = 1.03004e-06 (9.93415e-05)
Fix n_neighs_mesh_bx property/atom time (%) = 2.2538e-07 (2.17366e-05)
Fix tracker_bx contacthistory/mesh time (%) = 8.21799e-06 (0.000792577)
Fix wall_neighlist_ax neighlist/mesh time (%) = 1.04494e-06 (0.000100779)
Fix n_neighs_mesh_ax property/atom time (%) = 1.74157e-07 (1.67964e-05)
Fix tracker_ax contacthistory/mesh time (%) = 8.2748e-06 (0.000798056)
Fix pts1 particletemplate/sphere time (%) = 1.04308e-07 (1.00599e-05)
Fix pts2 particletemplate/sphere time (%) = 7.63685e-08 (7.36529e-06)
Fix pdd1 particledistribution/discrete time (%) = 1.59256e-07 (1.53593e-05)
Fix ins insert/stream time (%) = 0.427909 (41.2693)
Fix release_fix_insert_stream property/atom time (%) = 3.20375e-07 (3.08983e-05)
Fix integr nve/sphere time (%) = 1.39708e-05 (0.0013474)
Fix ts check/timestep/gran time (%) = 7.45058e-08 (7.18564e-06)
Fix contacthistory contacthistory time (%) = 2.15136e-06 (0.000207485)

Nlocal: 621.742 ave 7820 max 0 min
Histogram: 216 8 8 8 0 0 8 0 0 8
Nghost: 2032.86 ave 20102 max 0 min
Histogram: 192 12 14 10 12 6 4 2 0 4
Neighs: 290006 ave 4.37085e+06 max 0 min
Histogram: 220 12 7 1 0 8 0 0 2 6

Total # of neighbors = 74241520
Ave neighs/atom = 466.441
Neighbor list builds = 1
Dangerous builds = 0
Setting up run ...
Memory usage per processor = 14.3997 Mbytes

Daniel Queteschiner | Thu, 05/12/2016 - 17:45

400000 particles on 256 cores gives about 1500 particles per core if they are evenly distributed. Typically (from my experience and depending on the exact architecture, cpu etc), you want to go for about 10000 particles per core, otherwise the communication overhead you produce will reduce the efficiency of the simulation.

The number you specify with the neighbor command is the skin distance and determines the range which is used to search for potential collision partners (neighbor list build). The larger this range the more particles need to be checked for collision. The skin distance of course heavily depends on the size of the particles and their expected velocity and is crucial for the speed of your simulation. (You can see in the log that the number of neighbors is pretty high)

Particle insertion: you need to change it!
Here is why: the log file says that LIGGGHTS tries to insert 391340 particles but can actually insert just 159166 particles. That is very bad. First, you probably waste some time trying to insert particles into a volume that has no more space left for new particle. Second, once LIGGGHTS cannot insert a particle (it starts with the largest ones) it will just stop insertion at that point (even though smaller particles of your distribution might still fit in) and you will end up with a totally different size distribution.

Matteo | Mon, 05/16/2016 - 10:49

Dear Daniel,

Thanks very much for your comments.

I understand your point regarding the number of CPUs and that was my same concern. Following your suggestion, I tried again running a simulation with 35000 particles using different amounts of CPUs, starting with 16 and increasing up to 256. I really can appreciate just by refreshing manually the log file the difference in speed. With 16 CPUs the simulation is much slower.
I am sure you are more experienced than me, I guess thus that something wrong is going on in my simulation. It could be maybe related to the number of neighbors? I reduced the skin size to 0.0002.
My particle distribution is described by a Gaussian function with mean value at 65 microns, the maximum radius is below 100 microns.

What is the number of neighbors you would suggest I should try to stay within in my simulations?
With the new settings I get the following
Neighs: 798.227 ave 4718 max 0 min
I think it is a much more reasonable number than before.
I understand that the skin size should be set also in relation to the speed of the particles. In my case I would say that they do not move quickly, as the speed should be around 10-15 cm/s.
Is there any particular procedure you would suggest to set up properly the skin size?

I understand the problem of the particle insertion and I will keep it in mind.

Thanks!
Matteo

Matteo | Fri, 06/10/2016 - 12:56

Hello,

Just to return quickly on the computational time.
I decided to run a model using different number of CPUs and check if I was wrong or not.

So this is what I got. Simulation with 37700 particles (not many).
32 CPUs -> 46331 seconds
64 CPUs -> 25139 seconds
128 CPUs -> 17194 seconds
256 CPUs -> 19000 seconds

Hope this can help. Obviously with more particles I would expect a better scalability with high numbers of CPUs (>64).

Matteo

sbateman | Tue, 06/14/2016 - 01:31

The log files for those simulations should have timing information ("Pair time (%)", "Neigh time (%)", "Comm time (%)", etc) and neighbor information ("Nlocal:", "Nghost:", "Neighs:", and "Ave neighs/atom:").

If you're running a granular model, you probably shouldn't have more than 6 or so "Ave neighs/atom". Of course, that depends on your particle size distribution, how tight your packing is, and the coordination number. Having too many non-interacting neighbors in the neighbor list will definitely slow down your simulation and kill the scalability because of communication overhead.

Other than that, it's a balancing act to find the optimal number of particles per processor and neighbor skin distance. A smaller skin distance will decrease "Pair" and "Comm time", but may increase "Neigh time". More CPUs will decrease the "Pair time" but may increase "Comm time". There are also some options to the "processors" command (i.e., "grid").

You'll have to do some testing to figure out what works best for your system. Also, take a look at the "spcpu" thermo_style variable as a rough estimate of simulation speed.

Matteo | Wed, 06/15/2016 - 16:21

Thanks for the suggestion sbateman.

I currently have an average of 2000 neighs using 128 CPUs (17000 with 16 CPUs). I think I will try to reduce a bit more the skin size and hopefully I will notice some improvements in the computational time.

Best regards
Matteo

JoG | Wed, 01/11/2017 - 15:38

Hi Matteo,

I am also wondering if my simulation could run faster. In my simulation most of the run time is "other time". Therefore I want to analyze my run a little bit more. Can you tell me which command prints the detailed information about the runtime of each fix? I couldn't find anything in the thermo_style argument list...

This would really help.

Thanks,
Johannes