CFDEM: Floating point exception when Starting up LIGGGHTS after a lot of successful coupling time steps

Min Zhang's picture
Submitted by Min Zhang on Thu, 06/14/2018 - 00:30

Hello All,

I am simulating a particle-fluid flow case using cfdemPisoSolver.

My physical problem is like this.

The geometry is a cylinder (D1=6" and L=30cm) and there is a hole (D2=3/8") on the side of the cylinder.

Boundary conditions could be like this:
Inlet: fixed fluid velocity (=871cm/s) pointing into the cylinder and the particle is injected from the inlet and has the same velocity as the fluid;
Outlet: fixed pressure (P=0);
Hole: fixed fluid velocity(=44628cm/s) pointing out of the cylinder, which means the particle&fluid would flow outside through the hole.

Basic parameters are as follows:
d_particle=600microns, CFD time step = 1e-7s, DEM time step = 1e-8s, couplingInterval = 100, komegaSST turbulence model.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
The error message after a lot of time step runs is as follows.

DICPCG: Solving for p, Initial residual = 5.95516e-07, Final residual = 5.95516e-07, No Iterations 0
time step continuity errors : sum local = 8.44545e-07, global = 8.90076e-14, cumulative = -0.000706481
smoothSolver: Solving for omega, Initial residual = 0.000649045, Final residual = 8.78572e-06, No Iterations 1
DILUPBiCG: Solving for k, Initial residual = 0.000102867, Final residual = 4.15073e-08, No Iterations 1
ExecutionTime = 1772.16 s ClockTime = 1779 s

Time = 0.001435

Courant Number mean: 0.000134113 max: 0.0358228

Coupling...
Starting up LIGGGHTS
Executing command: 'run 100 '
srun: error: nid00009: tasks 1,13: Floating point exception
srun: Terminating job step 1625520.0
slurmstepd: error: *** STEP 1625520.0 ON nid00009 CANCELLED AT 2018-06-13T15:32:21 ***
srun: error: nid00010: tasks 25,37: Floating point exception
srun: error: nid00010: tasks 24,26-36,38-47: Terminated
srun: error: nid00009: tasks 0,2-12,14-23: Terminated
srun: Force Terminated job step 1625520.0
TACC: MPI job exited with code: 136

TACC: Shutdown complete. Exiting.
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
I tried to modify the initial values of turbulence parameters k and omega. But the error message is almost the same and the ERROR time is still ####Time = 0.001435####.

I would very appreciate your kind help!

Best regards,
Min

AttachmentSize
Image icon failedcase_step1000.jpeg457.43 KB

paul | Thu, 06/14/2018 - 11:20

This is LIGGGHTS crashing, not the CFD side. When the CFD side messes up, the iteration number and/or Courant number usually skyrockets.

This cannot be answered without some error output that your job scheduler suppressed. Please try to get a nice backtrace so we can find out what went wrong.

Min Zhang's picture

Min Zhang | Fri, 06/15/2018 - 17:55

Thank you so much, Paul! I would give more information.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Some basic information of this confusing case:
2 nodes 48 processors
deltaT CFD = 1e-7s; deltaT DEM = 1e-8s.
couplingInterval 100;
turbulenceModelType "RASProperties";
Software versions: OpenFOAM-2.4.x, LIGGGHTS 3.6.0, CFDEMcoupling-PUBLIC-2.4.x.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
It failed. mpi_job.o1623993. The following is the terminal output when it failed.
ExecutionTime = 1909.13 s ClockTime = 1914 s
Time = 0.001435
Courant Number mean: 0.000134498 max: 0.0358266
Coupling...
Starting up LIGGGHTS
Executing command: 'run 100 '
srun: error: nid00132: tasks 25,37: Floating point exception
srun: Terminating job step 1623993.0
srun: error: nid00127: tasks 1,13: Floating point exception
slurmstepd: error: *** STEP 1623993.0 ON nid00127 CANCELLED AT 2018-06-12T12:36:21 ***
srun: error: nid00127: tasks 0,2-12,14-23: Terminated
srun: error: nid00132: tasks 24,26-36,38-47: Terminated
srun: Force Terminated job step 1623993.0
TACC: MPI job exited with code: 136
TACC: Shutdown complete. Exiting.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Then, I modified the initial values of "k" and "omega" for the turbulence model komegaSST.
It failed again. mpi_job.o1624601
ExecutionTime = 1867.53 s ClockTime = 1872 s
Time = 0.001435
Courant Number mean: 0.0001341 max: 0.0358275
Coupling...
Starting up LIGGGHTS
Executing command: 'run 100 '
srun: error: nid01219: tasks 1,13: Floating point exception
srun: Terminating job step 1624601.0
slurmstepd: error: *** STEP 1624601.0 ON nid01219 CANCELLED AT 2018-06-12T17:39:15 ***
srun: error: nid01220: tasks 25,37: Floating point exception
srun: error: nid01219: tasks 0,2-12,14-23: Terminated
srun: error: nid01220: tasks 24,26-36,38-47: Terminated
srun: Force Terminated job step 1624601.0
TACC: MPI job exited with code: 136
TACC: Shutdown complete. Exiting.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
I modified the DEM geometry. But it failed again.
ExecutionTime = 1772.16 s ClockTime = 1779 s
Time = 0.001435
Courant Number mean: 0.000134113 max: 0.0358228
Coupling...
Starting up LIGGGHTS
Executing command: 'run 100 '
srun: error: nid00009: tasks 1,13: Floating point exception
srun: Terminating job step 1625520.0
slurmstepd: error: *** STEP 1625520.0 ON nid00009 CANCELLED AT 2018-06-13T15:32:21 ***
srun: error: nid00010: tasks 25,37: Floating point exception
srun: error: nid00010: tasks 24,26-36,38-47: Terminated
srun: error: nid00009: tasks 0,2-12,14-23: Terminated
srun: Force Terminated job step 1625520.0
TACC: MPI job exited with code: 136
TACC: Shutdown complete. Exiting.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Therefore, we can see that, no matter what changes I made, the failure time is the same.
Time = 0.001435
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
There is one critical setting in the input file of DEM, which I think is the potential reason behind this problem.

The following is the terminal output for the initial liggghts run including my own comments/understanding.

variable insertstep equal round(5/${vfluid}/${simstep}/4) # insert particle every # step, use one fourth of the critical step for more frequent insertion, 5 = (6-1) is the insert zone distance in flow direction, to be changed accordingly
# Comment from Min: Here, "5" and "4" are some values I referred to other example's file but I don't know why...
# Comment from Min: insertstep = 5/(871.636)/(1e-8)/4 = 143408
# Comment from Min: Therefore, insert time step (insert deltaT) = 143408 * 1e-8 = 1.43408e-3s = 0.00143408, which is close to the above failure time.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This is also the terminal output for the initial liggghts run
fix ins nve_group insert/pack seed 617323 distributiontemplate pdd1 maxattempt 50000 all_in yes vel constant 0 0 871.636 insert_every 143408 overlapcheck yes volumefraction_region 0.043 region insert ntry_mc 100000
INFO: Particle insertion ins: inserted 170564 particle templates (mass 51.119385) at step 1
- a total of 170564 particle templates (mass 51.119385) inserted so far.
WARNING: Particle insertion: Less insertions than requested (../fix_insert.cpp:734)
1 170564 19418956 0.0010889 0 0 6986.028
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Terminal output before ERROR (which is the last successful LIGGGHTS run):
ExecutionTime = 1771.01 s ClockTime = 1777 s
Time = 0.001434
Courant Number mean: 0.000134107 max: 0.0358229
Coupling...
Starting up LIGGGHTS
Executing command: 'run 100 '
run 100
Setting up run ...
Memory usage per processor = 34.8744 Mbytes
Step Atoms KinEng RotE ts[1] ts[2] Volume
143301 170564 21859715 149.18419 0.0010050491 0.00047614742 6986.028
CFD Coupling established at step 143400
143401 170564 21861222 149.39135 0.0010050491 0.00047614742 6986.028
Loop time of 0.307553 on 48 procs for 100 steps with 170564 atoms
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
We can see that, the next particle insertion will occur in step 143408.

Therefore, we can see that, the reason for the above simulation failure should because that the next/second particle insertion will occur.

Right now, I think the problem should be about the particle insertion. But I don't know why?

The attached picture is the paraview visualization in step = 1000.

For the only/initial particle insertion at step =1, I set the insertion region is z = 1-6.
However, from the attached figure, we can see that the particle insertion is not continuous. Why? I mean it is not uniform in the insertion region. In the middle of the insertion region, there is even no particle at all. Why?

From the above warning message:
WARNING: Particle insertion: Less insertions than requested (../fix_insert.cpp:734)
1 170564 19418956 0.0010889 0 0 6986.028

I searched online about this warning message.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This warning message: WARNING: Particle insertion: Less insertions than requested (../fix_insert.cpp:646) tells that LIGGGHTS is not inserting the expected number of particles. This can happen if you are trying to generate a huge amount of particles in a small region so LIGGGHTS try to create the particles but it doesn't find enough physical space in that region - by default LIGGGHTS search for possible overlaps and make a number of attempts to generate the particles. If not possible than the particle is not generated (have a look in fix_insert_stream).
My suggestion is either you increase the insertion region size or decrease the mass flow rate. You could also increase the initial velocity that particles are created.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
However, in my case, the insertion region is large enough, so that there are no particles at all in the middle region.

Min Zhang's picture

Min Zhang | Sat, 06/16/2018 - 05:16

I talked with my friend and now I know why the insertstep is calculated like this.
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
variable insertstep equal round(5/${vfluid}/${simstep}/4)

Here, 5 is insertion region (z = 1-6 cm) distance, so 5/${vfluid} is time, 5/${vfluid}/${simstep} is so many DEM steps the particles who are at z=1 will leave the insertion region at z = 6. The factor of 4 is to just inject more frequently so that if particle is sucked out of the injection region faster than ${vfluid}, then we'll still have continuous particle supply.
It doesn't have to be 4, can be a smaller number, too, as long as particle concentration in the insert region is the target particle concentration.

Min Zhang's picture

Min Zhang | Sat, 06/16/2018 - 05:43

region simwell cylinder z 0 0 7.61 0 30 units box
region simbox block -7.62 7.62 -7.62 7.62 1 6 units box
region insert intersect 2 simwell simbox

fix ins nve_group insert/pack seed ${seed4} distributiontemplate pdd1 maxattempt 50000 all_in yes vel constant 0 0 ${vfluid}\
insert_every ${insertstep} overlapcheck yes volumefraction_region ${conc} region insert ntry_mc 100000

paul | Sat, 06/16/2018 - 16:18

On a second look - why do you use cgs units for velocities? Do you also use cgs units in OpenFOAM? Are you consistent with viscosities etc?
Also, following things caught my attention:
- The coupling is very loose, I'd suggest using a couplingInterval of 10 (= once per CFD time step)
- I am not confident in LIGGGHTS supporting complicated regions for fix insert/pack. Why not just use a short region cylinder sharing the radius and base coord of the simulation domain.
- Does the case work w/o turbulence modelling?
- Since you don't get debug output, it could be the case, that the RNG failed: Can you choose another seed and look whether it works?

Min Zhang's picture

Min Zhang | Mon, 06/18/2018 - 05:48

Thanks, Paul!

I think it is convenient to use cgs units in our case. Yes, all of the parameters are in cgs units.

- Ok, I will do that.

- Ok, let me try a simpler cylinder region. So you mean the insertion region could share the "same" radius with the simulation domain, yes? I thought that the insertion region should have a slightly smaller radius than the simulation domain since the particle has a diameter. In addition, I am wondering whether the DEM simulation domain could share the "same" radius with the CFD geometry?

- I am not sure, I haven't done that.

- Ok, I will try.

Thanks again for your reply!

Best regards,
Min

paul | Mon, 06/18/2018 - 09:14

as long as you choose all_in yes, the the same radius is fine - as per documentation:
"The all_in flag determines if the particle is completely contained in the insertion region (all_in = yes) or only the particle center (all_in = no)."
I'd choose both simulation domains identical.