LIGGGHTS stalls without error message

Submitted by JoG on Sat, 01/27/2018 - 13:14

Hello everybody,

I have the strange behavior that my simulation suddenly stalls and does not proceed. There is no error message. With one case, this behavior appears always around the same timestep. When I slightly modify the case, for example move the mesh a little bit or set particle heat sources a little different, the case either finishes or stalls at a different timestep. The problem is that I don't know how to tackle the problem.

My current guess is that it has to do something with the mesh, as I use the "move/mesh ... viblin" command and this feature already caused a bug, which should have been fixed in release 3.3.1:
https://www.cfdem.com/forums/liggghtsr-331-released-14012016-happy-new-y... ("Raphael Schubert (IWM Fraunhofer) found a glitch in the fix move/mesh/linear/variable which could cause the code to stall at the start in some cases")
However, in my case the code does not stall at the start, but after some time...

Can somebody give me a hint? Without an error message I don't know where to start...

richti83's picture

richti83 | Mon, 01/29/2018 - 08:05

Do your Host run out of memory ? Check if the memory consumption of liggghts instance is growing. I've discovered some crash when there is no ram and no swap left.

I'm not an associate of DCS GmbH and not a core developer of LIGGGHTS®
ResearchGate | Contact

JoG | Tue, 01/30/2018 - 11:09

Hi,

thank you for the quick response. I also thought about a memory issue, but when I use the "top" command to check it, each of my 20 LIGGGHTS processes only consumes about 0.1% of the 64 GB of RAM available.

I use a restart script to start my simulation. When I just try to run the simulation again, it stops at the same timestep. When I modify the heat source I load from an atomfile, the simulation also stops at the same timestep. I also completely removed the meshes, it still stops, but at a different timestep. It is really strange. I try to narrow it down now and try to provide a minimal example with a failing simulation.

JoG | Thu, 02/01/2018 - 15:46

Thank you for offering help, Arno. I just found the mistake, it was in my own part of the code, in my heat transfer model. I narrowed the problem down by applying less and less models. In the end I found the problem by running the case in serial (with one processor only). For some reason, in this case the code threw the error message I had implemented. When I was running in parallel, the code just stalled.
With the help of my error message, I found the mistake in my source code.

So, for anybody having this problem, first try to run the code in serial and see if the problem persists. Thank all of you for the help!