[solved] Floating point exception error

Submitted by jtvanlew on Fri, 02/13/2015 - 05:51

Hello developers,

I'm running into this error periodicly in my coupled simulations. It happens when I have many small particles in my system (curiously, just a few small particles isn't an issue). I've got about 5000 particles of radius Rp and about another 5000 of radius 0.35*Rp. The CFD runs first and its fine. Then the LIGGGHTS half of the coupling fires up and it crashes at the moment of first coupling. I just finished a similar simulation where there were maybe 6000 large particles and 2000 smaller particles -- there was no issue whatsoever.

Here's the error from my log:

Courant Number mean: 1.50067e-05 max: 1.52413e-05
- evolve()
Starting up LIGGGHTS
Executing command: 'run 1000 '
run 1000
Setting up run ...
Memory usage per processor = 8.97187 Mbytes
Step Atoms KinEng Volume heattran
2799990 9911 4.4545406e-07 5.625e-06 7.0935493
CFD Coupling established at step 2800000
2800000 9911 0.20950397 5.625e-06 7.0935781
[2] #0 Foam::error::printStack(Foam::Ostream&) at ??:?
[2] #1 Foam::sigFpe::sigHandler(int) at ??:?
[2] #2 in "/lib/x86_64-linux-gnu/libc.so.6"
[2] #3 void LAMMPS_NS::FixHeatGranCond::post_force_eval<1, 0>(int, int) at ??:?
[2] #4 LAMMPS_NS::FixHeatGranCond::post_force(int) at ??:?
[2] #5 LAMMPS_NS::Modify::post_force(int) at ??:?
[2] #6 LAMMPS_NS::Verlet::run(int) at ??:?
[2] #7 LAMMPS_NS::Run::command(int, char**) at ??:?
[2] #8 void LAMMPS_NS::Input::command_creator(LAMMPS_NS::LAMMPS*, int, char**) at ??:?
[2] #9 LAMMPS_NS::Input::execute_command() at ??:?
[2] #10 LAMMPS_NS::Input::one(char const*) at ??:?
[2] #11 Foam::twoWayMPI::couple() const at ??:?
[2] #12 Foam::cfdemCloud::evolve(Foam::GeometricField&, Foam::GeometricField, Foam::fvPatchField, Foam::volMesh>&, Foam::GeometricField, Foam::fvPatchField, Foam::volMesh>&) at ??:?
[2] #13
[2] at ??:?
[2] #14 __libc_start_main in "/lib/x86_64-linux-gnu/libc.so.6"
[2] #15
[2] at ??:?
[jon-OEM:05297] *** Process received signal ***
[jon-OEM:05297] Signal: Floating point exception (8)
[jon-OEM:05297] Signal code: (-6)
[jon-OEM:05297] Failing at address: 0x3e8000014b1
[jon-OEM:05297] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36d40) [0x7fb253979d40]
[jon-OEM:05297] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x39) [0x7fb253979cc9]
[jon-OEM:05297] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x36d40) [0x7fb253979d40]
[jon-OEM:05297] [ 3] /home/jon/OpenFOAM/jon-2.3.1/platforms/linux64GccDPOpt/lib/liblagrangianCFDEM-PUBLIC-2.3.1.so(_ZN9LAMMPS_NS15FixHeatGranCond15post_force_evalILi1ELi0EEEvii+0x596) [0x7fb25553f236]
[jon-OEM:05297] [ 4] /home/jon/OpenFOAM/jon-2.3.1/platforms/linux64GccDPOpt/lib/liblagrangianCFDEM-PUBLIC-2.3.1.so(_ZN9LAMMPS_NS15FixHeatGranCond10post_forceEi+0xac) [0x7fb25553e32c]
[jon-OEM:05297] [ 5] /home/jon/OpenFOAM/jon-2.3.1/platforms/linux64GccDPOpt/lib/liblagrangianCFDEM-PUBLIC-2.3.1.so(_ZN9LAMMPS_NS6Modify10post_forceEi+0x4b) [0x7fb2557ab29b]
[jon-OEM:05297] [ 6] /home/jon/OpenFOAM/jon-2.3.1/platforms/linux64GccDPOpt/lib/liblagrangianCFDEM-PUBLIC-2.3.1.so(_ZN9LAMMPS_NS6Verlet3runEi+0x35f) [0x7fb2558e72ef]
[jon-OEM:05297] [ 7] /home/jon/OpenFOAM/jon-2.3.1/platforms/linux64GccDPOpt/lib/liblagrangianCFDEM-PUBLIC-2.3.1.so(_ZN9LAMMPS_NS3Run7commandEiPPc+0x7fc) [0x7fb25589936c]
[jon-OEM:05297] [ 8] /home/jon/OpenFOAM/jon-2.3.1/platforms/linux64GccDPOpt/lib/liblagrangianCFDEM-PUBLIC-2.3.1.so(_ZN9LAMMPS_NS5Input15command_creatorINS_3RunEEEvPNS_6LAMMPSEiPPc+0x28) [0x7fb2557702e8]
[jon-OEM:05297] [ 9] /home/jon/OpenFOAM/jon-2.3.1/platforms/linux64GccDPOpt/lib/liblagrangianCFDEM-PUBLIC-2.3.1.so(_ZN9LAMMPS_NS5Input15execute_commandEv+0x880) [0x7fb25576e010]
[jon-OEM:05297] [10] /home/jon/OpenFOAM/jon-2.3.1/platforms/linux64GccDPOpt/lib/liblagrangianCFDEM-PUBLIC-2.3.1.so(_ZN9LAMMPS_NS5Input3oneEPKc+0x87) [0x7fb25576ed27]
[jon-OEM:05297] [11] /home/jon/OpenFOAM/jon-2.3.1/platforms/linux64GccDPOpt/lib/liblagrangianCFDEM-PUBLIC-2.3.1.so(_ZNK4Foam9twoWayMPI6coupleEv+0x7b7) [0x7fb25538fde7]
[jon-OEM:05297] [12] /home/jon/OpenFOAM/jon-2.3.1/platforms/linux64GccDPOpt/lib/liblagrangianCFDEM-PUBLIC-2.3.1.so(_ZN4Foam10cfdemCloud6evolveERNS_14GeometricFieldIdNS_12fvPatchFieldENS_7volMeshEEERNS1_INS_6VectorIdEES2_S3_EES9_+0x7a) [0x7fb2552d348a]
[jon-OEM:05297] [13] cfdemSolverPisoScalar() [0x41e688]
[jon-OEM:05297] [14] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fb253964ec5]
[jon-OEM:05297] [15] cfdemSolverPisoScalar() [0x421632]
[jon-OEM:05297] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 5297 on node jon-OEM exited on signal 8 (Floating point exception).
--------------------------------------------------------------------------

Is this something a layman such as myself may be able to remedy?

Thanks for any help.

Jon

jtvanlew | Tue, 02/17/2015 - 20:39

Ok so the problem had nothing to do with the size and/or quantity of small particles. After loading my restart file, there was a stability issue that was sending one of my particles out of the system. As soon as it left the integration region, its temperature became a "nan". That nan temp was giving errors going between cfd and dem and hence the floating point exception. I dropped the timestep and it's running fine now.

jon

Min Zhang's picture

Min Zhang | Wed, 06/13/2018 - 23:27

Hello Jon,

I am having this Floating point exception ERROR and I'd appreciate your valuable suggestions.

This is the terminal output.

DILUPBiCG: Solving for k, Initial residual = 0.000102867, Final residual = 4.15073e-08, No Iterations 1
ExecutionTime = 1772.16 s ClockTime = 1779 s
Time = 0.001435
Courant Number mean: 0.000134113 max: 0.0358228
Coupling...
Starting up LIGGGHTS
Executing command: 'run 100 '
srun: error: nid00009: tasks 1,13: Floating point exception
srun: Terminating job step 1625520.0
slurmstepd: error: *** STEP 1625520.0 ON nid00009 CANCELLED AT 2018-06-13T15:32:21 ***
srun: error: nid00010: tasks 25,37: Floating point exception
srun: error: nid00010: tasks 24,26-36,38-47: Terminated
srun: error: nid00009: tasks 0,2-12,14-23: Terminated
srun: Force Terminated job step 1625520.0
TACC: MPI job exited with code: 136
TACC: Shutdown complete. Exiting.

How did you find the error reason in your case?
What do you mean by ####there was a stability issue that was sending one of my particles out of the system. As soon as it left the integration region, its temperature became a "nan".####?