Trouble with Multiprocessors and insert/pack

Submitted by cTop89 on Fri, 10/25/2013 - 00:21

We're getting inconsistent results running a large fix insert/pack in a very tall cylinder on 32 processors. We have it set up to insert 700,000 particles, but the number we actually get when we run varies widely. On the first run, it generated ~200,000 particles and on a second run it generated ~400,000. (We ran those first two with LIGGGHTS 2.3.6, then we updated to 2.3.8 and ran one more that acted the same and generated ~350,000 particles.)

It seems like some processors aren't inserting their share of the particles, as the results show large sections of the cylinder unfilled. We get the following warning message: "WARNING: Particle insertion: Less insertions than requested (fix_insert.cpp:650)" (it was fix_insert.cpp:648 in 2.3.8)

I took a look at the source code in that spot, but it seems you're just tallying the actual count there to see if there was a problem or not. I couldn't figure out much for myself beyond that.

Here are my STL geometry files and the liggghts input file.

Binary Data InsertPackMultiprocessorIssue.tar_.gz27.42 KB
Image icon image.png389.22 KB

cstoltz | Sat, 10/26/2013 - 14:28

I tried running your code out of curiosity and while I didn't see the run-to-run variability that you mention when the number of processors is fixed, I can offer a couple thoughts on the packing. I was able to get more particles to pack initially when I forced the processor breakdown to be something like 2x2x4 instead of 1x1x16. When inserting particles, LIGGGHTS will effectively break up the domain along processors and ensure that an inserted particle is placed entirely within the bounds of the subdomain for a given processor. This means that you get some distinct separations in the initial packing that can be observed when viewed from +/- x, +/- y, +/- z.

Beyond this, I'd recommend either making smaller particles and growing them if you want to use insert_pack, or use a different insertion method (rate/region or stream) to pour the particles in until reaching the desired number.


cTop89 | Mon, 10/28/2013 - 21:58

I've seen the behavior you're mentioning with other insertion methods where you get a 'void' of particles in a very narrow strip that the processor splits on, but that's not what I get in this situation. What I'm seeing is that there are processors that insert no particles. It looks like the processor breakdown is something like 2x1x8 going by the missing spaces.

I attached a screenshot to the original post.

"Research is what I'm doing when I don't know what I'm doing."
Wernher Von Braun

cstoltz | Tue, 10/29/2013 - 12:56

Ah, okay, I can reproduce that problem on my machine as well. Interesting - I've never seen that occur before. There appears to be a bug here related to the way in which many of your numbers are entered. I don't see anything explicitly wrong with your input, but when I cleaned up the input deck to change values such as 20000000E-008 to 0.2, it started running properly and inserted all 700,000 particles desired.


cTop89 | Tue, 10/29/2013 - 14:33

Interesting. Any idea why that would cause the issue? We've been using values like that for awhile and haven't had issues until now.

In the past we've had lots of issues with international users culture settings that use a "," instead of a "." causing all sorts of fun/strange problems, which is why we switched to using the type of input you see in saw in the file.

If you know specifically what causes the issue, we could work around it easier on our side with the input script, or maybe even look into LIGGGHTS source code ourselves. If you have any other suggestions, I'd be happy to hear them.

Thanks for your help!

"Research is what I'm doing when I don't know what I'm doing."
Wernher Von Braun

cstoltz | Tue, 10/29/2013 - 15:33

No clue. Maybe some sort of float/double mismatch? I'm not terribly good at trying to muck around with the source code. Best I could suggest for debugging would be to try to change commands one at a time and try to isolate where the problem is occurring. Would start with the insertion commands and work out from there.


ckloss's picture

ckloss | Thu, 10/31/2013 - 12:31


thanks to Chris for tracking this down!

>> but when I cleaned up the input deck to change values such as 20000000E-008 to 0.2
Seems 20000000 is too large for the atof() routine used to read the input