MPI issue using "atom_style hybrid" and "fix adapt" commands

Hello everyone,

I am new on this wonderful forum that helped me many times, and I really thank all the people that take the time to answer ^^
This time, I ask for help concerning a tricky issue.

I currently use LIGGGHTS to simulate the breathing behaviour of a bed of particles with changing diameters. Consequently, I use the method proposed in packing example of Tutorials_public, based on the use of the fix adapt command (https://www.cfdem.com/media/DEM/docu/fix_adapt.html).
So far, I met no particular issue using this command.

However, I recently tried to extend my simulation using the atom_style hybrid, but I met the following error message :

[aar093:20997] *** An error occurred in MPI_Wait [aar093:20997] *** reported by process [908394497,4594212051357794305] [aar093:20997] *** on communicator MPI_COMM_WORLD [aar093:20997] *** MPI_ERR_TRUNCATE: message truncated [aar093:20997] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [aar093:20997] *** and potentially your MPI job)

This error is quite common on forums, but I did not manage to link the solutions to my particular case.
Nevertheless, I reduced the occurence to the smallest situation I could find. Indeed, in my case, this issue only appears when :
→ fix adapt is used in a atom_style hybrid simulation ;
→ the simulation is run on multiple processors, depending on the number of particles (for example, 2 processors work for a low number of particles, but the error can appear with a higher number).

It seems the issue comes from a "partial" incompatibility of the fix adapt command with atom_style hybrid when it comes to multi-processoring, which means that at a certain amount of number of particles and/or processors, the MPI compiler crashes.
On my side, I use LIGGGHTS-PUBLIC 3.8.0 with openmpi 2.0.1 defined through the $PATH. An important constraint to notice is that I do not have the administrator rights on my Unix environment.

Up to now, I tried several things :
→ Checking if the problem came from the compute property/atom associated command ;
→ Using the LIGGGHTS_Flexible_Fibers version (https://github.com/schrummy14/LIGGGHTS_Flexible_Fibers) ;
→ Using other fix commands that read/change atom behaviour (fix ave/atom and fix addforce) in a hybrid atom style simulation ;
→ Changing the diameter of only one atom through the fix adapt command ;
→ Using a more recent version of openmpi (version 3.1.6).

I have made a little simple test attached, if anyone wants to check if it displays the same error on his/her side.

Consequently, if anyone had an idea about this strange behaviour, it would help a lot to avoid making all these simulations on only one processor.

Thank you very much in advance !
Best regards,
Theo

Attachment	Size
Simple test of "fix adapt" command used with "atom_style hybrid"	2.5 KB

Forums:

Bug Reports for CFDEM®coupling, LIGGGHTS®, and ParScale

Daniel Queteschiner | Fri, 11/25/2022 - 12:20

Mismatch in comm buffer size

The issue is that the size of the communication buffer doesn't get defined correctly in that case.
In more detail, the creation of an atom style follows the steps:
1. create a new atom style (hybrid, granular etc) -> AtomVec::AtomVec()
2. parse the settings of the atom style -> AtomVec::settings(...)
3. do some initialization of the atom style -> AtomVec::init()

When combining fix adapt and atom style granular, this triggers a change in buffer size in AtomVecSphere::init() (additional communication of type, radius, mass and density).
However, when using atom style hybrid (+granular) the overall buffer size is defined in AtomVecHybrid::settings(...) and the later change of the granular sub-style buffer size in AtomVecSphere::init() causes a mismatch as it is not recognized by the AtomVecHybrid class.

A quick fix could be to add the following line to the end of the AtomVecHybrid::init() method
if (atom->radvary_flag == 1) size_forward += 4;
Note that I have not tested this change thoroughly.

LAMMPS seems to solve this issue by adding a setting to the granular/sphere atom style (i.e. adjusting the buffer size already in AtomVecSphere::settings(...)) such that it can be properly recognized by the AtomVecHybrid class.

thboivin | Wed, 11/30/2022 - 13:21

Quick and effective... Clean solution !

Thank you very much for your clear and fast answer, Daniel !

I better understand the issue, and it gave me the opportunity to learn more about the LIGGGHTS code structure.
And first of all, your solution works perfectly. Indeed, the AtomVecSphere::init() method corrects the size_forward parameter from 3 to 7 if particle diameters are time-varying due to some fix, and this is not the case in AtomVecHybrid::init() method. So your solution is perfect to correct the buffer size when the radvary parameter is changed to 1.

Thank you once again for your help and all the best to you !
Theo

MPI issue using "atom_style hybrid" and "fix adapt" commands

Forums:

Mismatch in comm buffer size

Quick and effective... Clean solution !

For full access including downloads and forums, please register