"Popping" behavior (Parallelization Bug?)

Submitted by sgeerster on Wed, 04/13/2016 - 23:13

Greetings, Everyone!

I think there's still a parallelization bug in LIGGGHTS-PUBLIC. The posts at http://www.cfdem.com/forums/bug-mesh-granular-interaction and http://www.cfdem.com/forums/popping-behaviour suggest the issue has been fixed. I'm having problems that seem very similar to those posts, yet I have the latest and greatest version of LIGGGHTS-PUBLIC.

I'm creating a polydisperse cloud of spherical particles in LIGGGHTS and letting them fall into a virtual Petri dish... Imagine pouring sand into a bottle cap until the bottle cap overflows, leaving a conical pile of sand in the cap.

I've been running several identical input files into LIGGGHTS, which create and run this sand/cap scenario, and I've noticed some differences in my results depending on the number of cores I allocate for the simulations. I've run the same input file with 8, 12, 16, 20, 24, 28, 32 cores. The results for the jobs are varied, where I expected them to be nearly the same. Two of the seven jobs (8 cores and 28 cores) show signs of instability while the other five do not.

By "instability," I mean one or two particles out of the 3,000 settled/settling in the Petri dish seem to randomly jump or "pop". Among the five stable jobs (no "popping"), no two appear to be exactly alike... they're very similar, but the particles are not in the exact same positions at the same time in the simulations. For example, Particle A at time t=100s in one simulation might be in a slightly different position from Particle A at time t=100s in another simulation, with all parameters and initial particle positions being identical save for the former may be using 12 cores while the latter may be using 24 cores.

I did check the timestep to make sure I was well below the critical timestep for my particle sizes and material stiffness.

I've run the same input script on two different cluster systems with different numbers of cores allocated for the simulation.

I'm attaching a zip with the input scripts, log info, and some VTK files I obtained for an unstable case.

Any help and/or suggestions would be greatly appreciated.

Thanks,
Steve

AttachmentSize
Binary Data Runs on 8 cores in about 1 minute173.95 KB
ckloss's picture

ckloss | Mon, 04/25/2016 - 21:16

Hi Steve,
thanks for your post!
The bugs at the links you are referencing are fixed - no doubt about that! A parallelization bug like you describe it is of course possible, yet very unlikely. Sorry - developers' time is scarce! Nevertheless I'll be happy to take a look if you can you break the case down to a smaller setup which can reproduce the error with a setup on < 8 cores in < 1min run-time!

Best wishes!
Christoph

sgeerster | Thu, 05/05/2016 - 21:42

Hi Christoph,

Thanks for the reply. I attached files for a simulation that runs on 8 cores in about a minute. I didn't include the output dump and vtk files, because the attachment would be well over the 2Mb limit, but I can provide them if you'd like to see what I'm seeing. About halfway through the simulation, I see a few particles in/around the dish that suddenly "pop" or zip off in upward, outward directions.

What do you think?

Thanks,
Steve

Termo | Fri, 05/20/2016 - 13:20

I see the same behaviour running on PPA install of LIGGGHTS on Ubuntu 16.04 (3.3.1) , with 4 nodes. My simulation is sort of a conveyor simulation with very small polymer particles (60μm), so I'm running with the "units micro" and with a timestep of well below 20% of the Rayleigh time.

I'll see if it also happens when running the same simulation without mpirun...

/Rasmus

Termo | Fri, 05/20/2016 - 17:21

Just compiled latest from git (3.4.1) and so far no issue with that :)

Only issue right now with that binary is that make command does not complain about anything with my libvtk6.2 setup, but still does the compiled binary not work with dump custom/vtk ?!?

Any ideas/pointer on how to compile correctly against libvtk6.2-dev ?

Part of my /MAKE/Makefile.ubuntuVTK62
VTK_INC = -I/usr/include/vtk-6.2
VTK_PATH =
VTK_LIB = -lvtkCommonCore-6.2 -lvtkIOCore-6.2 -lvtkIOXML-6.2 -lvtkIOLegacy-6.2 -lvtkCommonDataModel-6.2

Termo | Mon, 05/23/2016 - 10:58

using cmake and all libs are found and used for compilation

cmake src
make

sgeerster | Mon, 05/23/2016 - 17:59

My problem has NOT been solved. The files requested of me were uploaded weeks ago.

Termo | Tue, 05/24/2016 - 10:17

In your uploaded log it sais:

LIGGGHTS (Version LIGGGHTS-PUBLIC 3.3.1, compiled 2016-03-15-14:22:10 by geer, git commit 99507f5217ad4fd4be5107d6262bd7274c9a5c03 based on LAMMPS 23 Nov 2013)
#c8soft razor

I also saw the issue in version 3.3.1, but have not been able to reproduce it in latest version (3.4.1). So maybe you can try if you can reproduce in latest version as well?

regards

sgeerster | Sun, 05/29/2016 - 17:30

That did it! I downloaded and compiled 3.4.1. I completed side-by-side runs with the old 3.3.1 executable and the new 3.4.1 executable on identical input files that was previously shown to be very "jumpy," i.e. much more active than the smaller example I posted on this site. The 3.4.1 version was completely calm.... not a single pop or jump where there were dozens before.

Thanks for suggesting this, Termo!!! I have spent an embarrassing amount of time over the past few weeks trying to solve this issue. Reading the version history portion of this website, I didn't think 3.4.1 was really any different than 3.3.1, in terms of what I am using LIGGGHTS for.

Anyways, Thanks!!!!

ckloss's picture

ckloss | Tue, 06/14/2016 - 23:05

Hi,

might have been that one (just a guess):

3.4.0: Fixed a rare particle-triangle contact detection issue. Thanks to Christian Richter (OVGU Magdeburg) for sending over a test case.
(http://www.cfdem.com/node/42)

Best wishes
Christoph