Optimizing parallelization to avoid high memory consumption and eventaully killing of simulation

Submitted by atul2018 on Wed, 07/27/2022 - 10:40

Dear CFDEM Community

I am using Unresolved CFDEM to simulate particle infiltration in bed. I am facing problem now to simulate the case for larger duration due to high memory requirements as number of particles increase as simulation progresses and eventually gets killed even on large Linux-clusters. I know that Unresolved CFDEM has limit in terms of number of particles (oder of ~10^7) but my simulation gets killed despite total number of particles are less than the limit specified above.

I am running the case on 8 nodes each containing 28 number of cores,so in total 224 number of processors. In the CFD side, i am using scotch method of decomposition, which automatically divides the CFD domain optimally. In the DEM side, I tried with different arrangements of processors in x, y and z direction (28 4 2, 28 2 4, 14 8 2, etc.) but no success in optimizing the parallalization. I wanted to know if there is a way to optimize my paralization both in CFD and DEM side, so the killing of job due to high memory requirement can be avoided. more specifically, i have some questions/thoughts as follows:
1. Is there any way to tell LIGGGHTS (DEM side) that do the decomposition as done in OpenFOAM (CFD side) so that both CFD and DEM domain calculations are done on same node and may be this will reduce communication load among processors?
2. I saw the processor command (https://www.cfdem.com/media/DEM/docu/processors.html) and found an option to divide the domain with two levels by specifying number of cores used and decomposition in each core. However, I am not sure if it can solve the problem. Any comments on that and may be some example how to use this function efficiently.
3. I also saw that instead of specifying the decomposition in x, y and z direction, I can use '*' and LIGGGHTS will devide the domain with some optimization algorithm (minimizing the surface-to-volume ratio) but I am not sure if the surface-to-volume optimization can solve my problem as I think the memory problem is more related to to too many particles (this too many calculations) on some of my processors. It would be great if LIGGGHTS could devide the domain by optimizing number of particles. Is comment on this?

Looking forward to hear it from your experiences and let me know if any other point that I missed to point out.

Best Regards
Atul

atul2018 | Sat, 10/08/2022 - 16:36

I have further reduced the domain size and now simulation is not crashing but it is super slow and with the speed It will take months to perform even even few seconds of simulations.

Let me tell you what exactly I do (maybe you can find some trick to speed up my simulation's speed): I am investigating sand infiltration (0.1-1mm;type2 ) into immobile gravel bed (1mm-10mm;type1) under the influence of lowing water. As my bed remains immobile I group and freeze the gravel bed particles (type1), so that I can avoid unnecessary calculations updating the position of gravel particles. As simulation progresses, sand particles get infiltrated into immobile gravel bed, so the sand particles (0.1-1mm;type2) are the only moving particles that interact among themselves and also with the gravel bed particles.

here is other infos that might help experts to help me out:
-I am using the lower value of Young's modulus (=5e6) which is quite low for sediments (I guess). I have tried to reduce the DEM time steps that satisfy the Rayleigh and Hertz time criteria (~10%) but I could only increase the DEM time step to 0.0000005 sec (5*0.0000001 sec), which is still very small. I am worried if I should even use a smaller value of Young's modulus in order to increase DEM time steps. What value of Young's modulus do you suggest?
-The Cundall number is much greater than 1e5. I am running a CFDEM simulation with a coupling interval of 100. I tested the cundell number at several time steps and it's in the range of 1e8. So parallelization and accordingly speed up should be possible.
- The size ratio of 100 seems to be the biggest problem. i am using skin=0.0005 (bin) and delay=0 as neighbor and neigh_modify settings respectively. The skin size is equal to ~average particles (small sand particles ranging from 1 mm-0.1mm). Following these settings, It seems like the neighbor list is only prepared at the beginning of CFDEM simulation and never builts again (the neightime=0 for almost all the next iterations) even using delay=0. This seems to be the biggest problem as particles migrate and change position over the time, the neighbor list should be built more often (because delay=0 and skin=0.0005). Can you please say what could be the reason/solution for it?
-I also observe that my pair time=1-2% and comm time=5-10%, so the communication among the processors and contact detection is not taking much time, my modify time and other time is taking 80-90% of total loop time. I listed the detailed output of time consumed by different commands (first few iterations shown for example). And it seems STL file loading and wall interaction is taking most of the time. Do you have any clue how I can reduce this extra time to speed up my simulations?
-I have also read about the fix neigh_modify --, where one can choose to exclude some type/group of particles by not including them into the neighbor list. As mentioned, gravel bed particles (1mm-10mm;type1) remain freezed during CFDEM simulation and maybe I can exclude interaction (neighbor list preparation) among gravel particles itself (i.e. type1*type1) but I am worried if this exclusion would also turn off the interaction among sand (type2) and gravel (type1) particles and cause unphysical particle behaviour, which i don't want, as sand particle should be trapped in the void spaces in between gravel particles. What do you think if this could help me somehow or any other options in fix neigh_mofdify-- that might be useful for me?

I am stuck here for months and have to finish my PhD in a limited time. I would be grateful if someone can help me out to speed up my simulations.

here is my in.liggghts_run script for reference:

echo both
log ../DEM/log.liggghts
thermo_log ../DEM/post/thermo.txt

atom_style granular
atom_modify map array
communicate single vel yes

boundary p f f
newton off

units si
processors * * *

# read the restart file
read_restart ../DEM/post/restart/liggghts.restart

neighbor 0.0005 bin
neigh_modify delay 0

# Material properties required for granular pair styles

fix m1 all property/global youngsModulus peratomtype 5.e6 5.e6
fix m2 all property/global poissonsRatio peratomtype 0.45 0.45
fix m3 all property/global coefficientRestitution peratomtypepair 2 0.3 0.3 0.3 0.3
fix m4 all property/global coefficientFriction peratomtypepair 2 0.5 0.5 0.5 0

# pair style
pair_style gran model hertz tangential history # Hertzian without cohesion
pair_coeff * *

# timestep, gravity
timestep 0.0000001
fix ts_check all check/timestep/gran 1000 0.1 0.1
fix gravi all gravity 9.81 vector 0.0 -1.0 0.0

# Defining the walls that can be in contact with the particles; those come from the STL folder
fix inlet all mesh/surface/surface file ../DEM/STL/inlet.stl type 1 curvature_tolerant yes
fix outlet all mesh/surface/surface file ../DEM/STL/outlet.stl type 1 curvature_tolerant yes
fix top all mesh/surface/surface file ../DEM/STL/top.stl type 1 curvature_tolerant yes
fix channelBottom_1 all mesh/surface/surface file ../DEM/STL/channelBottom_1.stl type 1 curvature_tolerant yes
fix channelBottom_2 all mesh/surface/surface file ../DEM/STL/channelBottom_2.stl type 1 curvature_tolerant yes
fix bedBottom all mesh/surface/surface file ../DEM/STL/bedBottom.stl type 1 curvature_tolerant yes
fix front all mesh/surface/surface file ../DEM/STL/front.stl type 1 curvature_tolerant yes
fix back all mesh/surface/surface file ../DEM/STL/back.stl type 1 curvature_tolerant yes
fix insFace all mesh/surface/surface file ../DEM/STL/insertion_face.stl type 1 curvature_tolerant yes

# The wall contacts are only allowed at the box walls
fix wall all wall/gran model hertz tangential history mesh n_meshes 6 meshes top channelBottom_1 channelBottom_2 bedBottom front back

#creating group for bed paricle (type 1)
group bed type 1

# particle distributions and insertion, IFS1 (bridging case)
fix r_pts1 all particletemplate/sphere 100631591 atom_type 2 density constant 2650 radius constant 0.00078
fix r_pts2 all particletemplate/sphere 100631807 atom_type 2 density constant 2650 radius constant 0.00049
fix r_pts3 all particletemplate/sphere 100661111 atom_type 2 density constant 2650 radius constant 0.00042
fix r_pts4 all particletemplate/sphere 100661399 atom_type 2 density constant 2650 radius constant 0.00036
fix r_pts5 all particletemplate/sphere 100793137 atom_type 2 density constant 2650 radius constant 0.00030
fix r_pts6 all particletemplate/sphere 100794431 atom_type 2 density constant 2650 radius constant 0.00025
fix r_pts7 all particletemplate/sphere 100796713 atom_type 2 density constant 2650 radius constant 0.00021
fix r_pts8 all particletemplate/sphere 100798699 atom_type 2 density constant 2650 radius constant 0.00018
fix r_pts9 all particletemplate/sphere 100799183 atom_type 2 density constant 2650 radius constant 0.00015
fix r_pts10 all particletemplate/sphere 100802419 atom_type 2 density constant 2650 radius constant 0.00011
fix r_pts11 all particletemplate/sphere 100805377 atom_type 2 density constant 2650 radius constant 0.00006 volume_limit 1e-14
fix r_pts12 all particletemplate/sphere 100822807 atom_type 2 density constant 2650 radius constant 0.00004 volume_limit 1e-14
fix pdd2 all particledistribution/discrete 100831279 12 r_pts1 8.8 r_pts2 4.6 r_pts3 4.7 r_pts4 8.2 r_pts5 9.2 r_pts6 11.9 r_pts7 16 r_pts8 15 r_pts9 15.2 r_pts10 5 r_pts11 1.2 r_pts12 0.2

#particle insertion (particles at the rate of 0.01 kg/s for 0.5 sec=0.005 kg particle should be inserted)-for 0.075 m wide flume
fix r_ins all insert/stream seed 101235583 distributiontemplate pdd2 mass 0.005 massrate 0.01 vel constant 0.675 0.0 0.0 overlapcheck yes insertion_face insFace extrude_length 0.15

# cfd coupling
fix cfd all couple/cfd couple_every 100 mpi
fix cfd2 all couple/cfd/force/implicit
#fix cfd2 all couple/cfd/force/accumulator RongDrag 10 1.5e-3
#fix cfd2 all couple/cfd/force/implicit/accumulated #CrankNicolson 0.5

#freezing the bed particle and forcing velocity of bed particle to be zero
velocity bed set 0.0 0.0 0.0 units box
fix 2 bed freeze

# apply nve integration to all particles that are inserted as single particles
fix integr all nve/sphere

# center of mass
compute centerOfMass all com

# compute explicit dragforce
compute explDrag all reduce update_on_run_end yes sum f_dragforce[1] f_dragforce[2] f_dragforce[3]

# sum of explicit and implicit drag force given from CFD to DEM
variable totalDragX equal f_cfd2[1]
variable totalDragY equal f_cfd2[2]
variable totalDragZ equal f_cfd2[3]

# explicit drag force given from CFD to DEM
variable explicitDragX equal c_explDrag[1]
variable explicitDragY equal c_explDrag[2]
variable explicitDragZ equal c_explDrag[3]

variable time equal step*dt
fix extra all print 10 "${time} ${explicitDragX} ${explicitDragY} ${explicitDragZ} ${totalDragX} ${totalDragY} ${totalDragZ}" file ../DEM/post/forces.txt title "# time expDrag(X Y Z) totDrag(X Y Z)"

# screen output
compute rke all erotate/sphere
thermo_style custom step atoms ke c_rke vol c_centerOfMass[3] c_explDrag[1] c_explDrag[2] c_explDrag[3] f_cfd2[1] f_cfd2[2] f_cfd2[3]
thermo 10
thermo_modify lost ignore norm no
compute_modify thermo_temp dynamic yes
modify_timing on

#dump dmp all custom 5000 ../DEM/post/dump*.liggghts_run id type x y z vx vy vz fx fy fz f_dragforce[1] f_dragforce[2] f_dragforce[3] radius

dump dmp all custom/vtk 1000000 ../DEM/post/dump_liggghts_run.*.vtk id type x y z vx vy vz fx fy fz f_dragforce[1] f_dragforce[2] f_dragforce[3] radius

reset_timestep 0
run 0

#writing restart file using restart command at every CFD writing interval (0.1sec) ie every 1000000 DEM timesteps
restart 1000000 ../DEM/post/restart/restartCFDEM.restart

Best Regards
Atul Jaiswal

mschramm | Fri, 10/21/2022 - 16:04

Hello,
Coarse graining may help you reduce the number of particles in your system so please look into that.

The other thing is to look into LIGGGHTS-PFM.
This version of liggghts is closer related to LAMMPS then the current version of LIGGGHTS-PUBLIC and includes the fix_balance command.
This fix uses the zoltan library to dynamically load-balance the simulation (but it is harder to compile and lacks documentation...).

atul2018 | Mon, 11/21/2022 - 17:18

Dear mschramm

Thank you for you valuable suggestions. As far as I know and understand, coarsegraining wont be suitable for my case as it is related to balancing the load and polydispersity of the system. But you second suggestion sounds promising. Following that, with the help of support from LRZ (supercomputing center in Germany), I successfully compiled the the whole package (OpenFOAM-5.x, CFDEM-coupling and LIGGGHTS) with PFM in linux cluster. To compile the package, we needed to do some changes suggested during the compilation. Everything seem to be okay till now.

first I tried to see if it works or not with pure DEM case (but with coupled way of doing it with cfdemLiggghts ). It ran successfully without any error. I tested this on chute_wear example. However when I try to run CFDEM simulation (with cfdemSolverPiso) on my own case, it starts successfully, does the decomposition but reports error at first time stating unknown contact model. I wonder why pure DEM case doesnt report any error but CFDEM simulation does. The same case (my own case) run well with package compiled with LIGGGHTS-PUBLIC without any error. More specifically the error says:

unknown contact model (pair_gran_proxy.cpp:115)

Once again: i wanted to try LIGGGHTS-PFM to do the load babancing, as LIGGGHTS-PUBLIC doesnt have this option. Do you see anything that might lead to such error of not detecting contach model?

Best Regards
Atul Jaiswal