Hello everyone,
I am new on this wonderful forum that helped me many times, and I really thank all the people that take the time to answer ^^
This time, I ask for help concerning a tricky issue.
I currently use LIGGGHTS to simulate the breathing behaviour of a bed of particles with changing diameters. Consequently, I use the method proposed in packing example of Tutorials_public, based on the use of the fix adapt
command (https://www.cfdem.com/media/DEM/docu/fix_adapt.html).
So far, I met no particular issue using this command.
However, I recently tried to extend my simulation using the atom_style hybrid
, but I met the following error message :
[aar093:20997] *** An error occurred in MPI_Wait
[aar093:20997] *** reported by process [908394497,4594212051357794305]
[aar093:20997] *** on communicator MPI_COMM_WORLD
[aar093:20997] *** MPI_ERR_TRUNCATE: message truncated
[aar093:20997] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[aar093:20997] *** and potentially your MPI job)
This error is quite common on forums, but I did not manage to link the solutions to my particular case.
Nevertheless, I reduced the occurence to the smallest situation I could find. Indeed, in my case, this issue only appears when :
→ fix adapt
is used in a atom_style hybrid
simulation ;
→ the simulation is run on multiple processors, depending on the number of particles (for example, 2 processors work for a low number of particles, but the error can appear with a higher number).
It seems the issue comes from a "partial" incompatibility of the fix adapt
command with atom_style hybrid
when it comes to multi-processoring, which means that at a certain amount of number of particles and/or processors, the MPI compiler crashes.
On my side, I use LIGGGHTS-PUBLIC 3.8.0 with openmpi 2.0.1 defined through the $PATH. An important constraint to notice is that I do not have the administrator rights on my Unix environment.
Up to now, I tried several things :
→ Checking if the problem came from the compute property/atom
associated command ;
→ Using the LIGGGHTS_Flexible_Fibers version (https://github.com/schrummy14/LIGGGHTS_Flexible_Fibers) ;
→ Using other fix
commands that read/change atom behaviour (fix ave/atom
and fix addforce
) in a hybrid atom style simulation ;
→ Changing the diameter of only one atom through the fix adapt
command ;
→ Using a more recent version of openmpi (version 3.1.6).
I have made a little simple test attached, if anyone wants to check if it displays the same error on his/her side.
Consequently, if anyone had an idea about this strange behaviour, it would help a lot to avoid making all these simulations on only one processor.
Thank you very much in advance !
Best regards,
Theo
Attachment | Size |
---|---|
Simple test of "fix adapt" command used with "atom_style hybrid" | 2.5 KB |
Daniel Queteschiner | Fri, 11/25/2022 - 12:20
Mismatch in comm buffer size
The issue is that the size of the communication buffer doesn't get defined correctly in that case.
In more detail, the creation of an atom style follows the steps:
1. create a new atom style (hybrid, granular etc) ->
AtomVec::AtomVec()
2. parse the settings of the atom style ->
AtomVec::settings(...)
3. do some initialization of the atom style ->
AtomVec::init()
When combining fix adapt and atom style granular, this triggers a change in buffer size in
AtomVecSphere::init()
(additional communication of type, radius, mass and density).However, when using atom style hybrid (+granular) the overall buffer size is defined in
AtomVecHybrid::settings(...)
and the later change of the granular sub-style buffer size inAtomVecSphere::init()
causes a mismatch as it is not recognized by theAtomVecHybrid
class.A quick fix could be to add the following line to the end of the
AtomVecHybrid::init()
methodif (atom->radvary_flag == 1) size_forward += 4;
Note that I have not tested this change thoroughly.
LAMMPS seems to solve this issue by adding a setting to the granular/sphere atom style (i.e. adjusting the buffer size already in
AtomVecSphere::settings(...)
) such that it can be properly recognized by theAtomVecHybrid
class.thboivin | Wed, 11/30/2022 - 13:21
Quick and effective... Clean solution !
Thank you very much for your clear and fast answer, Daniel !
I better understand the issue, and it gave me the opportunity to learn more about the LIGGGHTS code structure.
And first of all, your solution works perfectly. Indeed, the
AtomVecSphere::init()
method corrects thesize_forward
parameter from 3 to 7 if particle diameters are time-varying due to some fix, and this is not the case inAtomVecHybrid::init()
method. So your solution is perfect to correct the buffer size when theradvary
parameter is changed to 1.Thank you once again for your help and all the best to you !
Theo