Significantly different forces when running on different computers

Submitted by pfalkingham on Tue, 08/03/2021 - 10:05

(apologies, I originally posted this in the User forum).

When running the same simulation on two computers, I'm getting very different force behaviour on a moving mesh, even though liggghts version and input files are identical.

I'm using this input data: https://1drv.ms/u/s!Am5GkbZS_98Xto1HN3LSdxzKI-x0fw?e=RBzhV9

It's a cylinder indenter, slowly moving vertically down into a box of particles. It is displacement controlled, and displaces some distance, pauses, then displaces again. It looks like this: https://1drv.ms/v/s!Am5GkbZS_98Xto1G158QEzxDMVd_iw?e=48ATUW

I'm interested in the forces acting on the indenter. I expect the forces to be higher when the indenter stops deeper.

HOWEVER, on one computer (personal desktop, 8-core i7, 16Gb Ram), I get this force trace (this is vertical force acting on the indenter):

https://pfalkingham.files.wordpress.com/2021/06/computer-1.jpg

The vertical force is the same whether the indenter pauses shallow or deep.

But on a second computer (Workstation, two 12-core Xeon processors, 64Gb RAM), I get this:

https://pfalkingham.files.wordpress.com/2021/06/computer-2.jpg

In which force is significantly higher when the indenter pauses deeper.

What the heck is going on?

Both computers are using OpenMPI 4.0.3, and both computers are running Liggghts 3.8.0, both compiled within the last few months. I'm not sure how to go about debugging this and figuring out a) which computer is giving me false results and b) why. I expect the force trace from the workstation to be correct (pauses at different depths result in different forces), but then why would my personal computer be so different?

Daniel Queteschiner | Tue, 08/03/2021 - 11:40

I see the processor command is commented in your scripts. Did you use the same number of processors and partition layout on both computers?
Both simulations are jumping between extreme values, so small differences in the spatial particle configuration (that could stem from usage of different number of processors) may lead to quite different final values ...

(On an unrelated note, the triangulation of your cylinder mesh is far from ideal, especially the top/bottom triangles all connecting to a central vertex ...)

pfalkingham | Thu, 08/19/2021 - 17:26

Firstly, thanks so much - processor number, and how the domain is divided up directly affects the force on the cylinder. I also re-meshed the cylinder (dividing it up into ~1000 even triangles), and that made results a little more consistent. Forces vary wildy depending on how many processors are used, and in what configuration, and the force results are completely wrong if I comment out the processors command and let Liggghts divide it up automatically.

However, this raises a scary point - if varying processor number changes the reported forces on the mesh, what's the solution for getting a 'true' or at least consistent result? I tried with specifying processor numbers in each dimension for varying numbers of processors, and the final force kept changing, quite noticeably. Is there something I should be doing to make sure results are consistent? Are, e.g. odd numbers of processors a bad idea even if specified? (e.g. "processors 1 3 1" to divide the domain along the y-axis into 3, for 3 processors, results in an output force higher (~68) than "processors 2 2 1" on 4 processors (~58))

pfalkingham | Mon, 08/09/2021 - 11:23

Thanks. I was letting it auto-divide the domain, because each computer has differing numbers of cores, but that's a great place to start debugging. I'll remesh the cylinder too, just in case that's a factor.

However, I will say that this has been giving me trouble on more than these two machines - running on an 80 core cluster (again, automatically dividing the domain), I'm getting the same output as the personal computer - i.e. force at rest is the same no matter the depth, which is not what is expected.