LIGGGHTS parallel computing

Submitted by MartinD on Thu, 03/13/2014 - 12:58

Hi everybody,

i am trying to parallelize the LIGGGHTS calculation, but it seems that it doesn't work with "mpirun -np 8 liggghts in.script"
In the "in.script" i placed the command "Processors 2 4 1" to divide domain into subdomains.

The error i got is:

A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.

Host: Dorn
Framework: crs
Component: none
--------------------------------------------------------------------------
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.

Host: Dorn
Framework: crs
Component: none
--------------------------------------------------------------------------
[Dorn:10712] *** Process received signal ***
[Dorn:10712] Signal: Segmentation fault (11)
[Dorn:10712] Signal code: Address not mapped (1)
[Dorn:10712] Failing at address: 0x28
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.

Host: Dorn
Framework: crs
Component: none
--------------------------------------------------------------------------
[Dorn:10713] *** Process received signal ***
[Dorn:10713] Signal: Segmentation fault (11)
[Dorn:10713] Signal code: Address not mapped (1)
[Dorn:10713] Failing at address: 0x28
[Dorn:10715] *** Process received signal ***
[Dorn:10715] Signal: Segmentation fault (11)
[Dorn:10715] Signal code: Address not mapped (1)
[Dorn:10715] Failing at address: 0x28
[Dorn:10713] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7fac987f9cb0]
[Dorn:10713] [ 1] /usr/lib/libopen-pal.so.0(mca_base_select+0x108) [0x7fac91657518]
[Dorn:10713] [ 2] /usr/lib/libopen-pal.so.0(opal_crs_base_select+0x7e) [0x7fac9166990e]
[Dorn:10713] [ 3] /usr/lib/libopen-pal.so.0(opal_cr_init+0x31e) [0x7fac916480ee]
[Dorn:10713] [ 4] /usr/lib/libopen-pal.so.0(opal_init+0x159) [0x7fac91647a59]
[Dorn:10713] [ 5] /usr/lib/libopen-rte.so.0(orte_init+0x4d) [0x7fac94d65a0d]
[Dorn:10713] [ 6] /usr/lib/libmpi.so.0(+0x362e1) [0x7fac98f4f2e1]
[Dorn:10713] [ 7] /usr/lib/libmpi.so.0(MPI_Init+0x16b) [0x7fac98f703fb]
[Dorn:10715] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7fb6f1bedcb0]
[Dorn:10715] [ 1] /usr/lib/libopen-pal.so.0(mca_base_select+0x108) [0x7fb6eaa4b518]
[Dorn:10715] [ 2] /usr/lib/libopen-pal.so.0(opal_crs_base_select+0x7e) [0x7fb6eaa5d90e]
[Dorn:10715] [ 3] /usr/lib/libopen-pal.so.0(opal_cr_init+0x31e) [0x7fb6eaa3c0ee]
[Dorn:10715] [ 4] /usr/lib/libopen-pal.so.0(opal_init+0x159) [0x7fb6eaa3ba59]
[Dorn:10713] [ 8] liggghts(main+0x1d) [0x48cefd]
[Dorn:10715] [ 5] /usr/lib/libopen-rte.so.0(orte_init+0x4d) [0x7fb6ee159a0d]
[Dorn:10715] [ 6] /usr/lib/libmpi.so.0(+0x362e1) [0x7fb6f23432e1]
[Dorn:10715] [ 7] /usr/lib/libmpi.so.0(MPI_Init+0x16b) [0x7fb6f23643fb]
[Dorn:10715] [ 8] liggghts(main+0x1d) [0x48cefd]
[Dorn:10715] [ 9] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fb6f183f76d]
[Dorn:10715] [10] liggghts() [0x48e271]
[Dorn:10715] *** End of error message ***
[Dorn:10713] [ 9] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fac9844b76d]
[Dorn:10713] [10] liggghts() [0x48e271]
[Dorn:10713] *** End of error message ***
[Dorn:10712] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7fdab6a3bcb0]
[Dorn:10712] [ 1] /usr/lib/libopen-pal.so.0(mca_base_select+0x108) [0x7fdaaf899518]
[Dorn:10712] [ 2] /usr/lib/libopen-pal.so.0(opal_crs_base_select+0x7e) [0x7fdaaf8ab90e]
[Dorn:10712] [ 3] /usr/lib/libopen-pal.so.0(opal_cr_init+0x31e) [0x7fdaaf88a0ee]
[Dorn:10712] [ 4] /usr/lib/libopen-pal.so.0(opal_init+0x159) [0x7fdaaf889a59]
[Dorn:10712] [ 5] /usr/lib/libopen-rte.so.0(orte_init+0x4d) [0x7fdab2fa7a0d]
[Dorn:10712] [ 6] /usr/lib/libmpi.so.0(+0x362e1) [0x7fdab71912e1]
[Dorn:10712] [ 7] /usr/lib/libmpi.so.0(MPI_Init+0x16b) [0x7fdab71b23fb]
[Dorn:10712] [ 8] liggghts(main+0x1d) [0x48cefd]
[Dorn:10712] [ 9] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fdab668d76d]
[Dorn:10712] [10] liggghts() [0x48e271]
[Dorn:10712] *** End of error message ***
[Dorn:10714] *** Process received signal ***
[Dorn:10714] Signal: Segmentation fault (11)
[Dorn:10714] Signal code: Address not mapped (1)
[Dorn:10714] Failing at address: 0x28
[Dorn:10714] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7fdd0d52acb0]
[Dorn:10714] [ 1] /usr/lib/libopen-pal.so.0(mca_base_select+0x108) [0x7fdd06388518]
[Dorn:10714] [ 2] /usr/lib/libopen-pal.so.0(opal_crs_base_select+0x7e) [0x7fdd0639a90e]
[Dorn:10714] [ 3] /usr/lib/libopen-pal.so.0(opal_cr_init+0x31e) [0x7fdd063790ee]
[Dorn:10714] [ 4] /usr/lib/libopen-pal.so.0(opal_init+0x159) [0x7fdd06378a59]
[Dorn:10714] [ 5] /usr/lib/libopen-rte.so.0(orte_init+0x4d) [0x7fdd09a96a0d]
[Dorn:10714] [ 6] /usr/lib/libmpi.so.0(+0x362e1) [0x7fdd0dc802e1]
[Dorn:10714] [ 7] /usr/lib/libmpi.so.0(MPI_Init+0x16b) [0x7fdd0dca13fb]
[Dorn:10714] [ 8] liggghts(main+0x1d) [0x48cefd]
[Dorn:10714] [ 9] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fdd0d17c76d]
[Dorn:10714] [10] liggghts() [0x48e271]
[Dorn:10714] *** End of error message ***
--------------------------------------------------------------------------
A requested component was not found, or was unable to be opened. This
means that this component is either not installed or is unable to be
used on your system (e.g., sometimes this means that shared libraries
that the component requires are unable to be found/loaded). Note that
Open MPI stopped checking at the first component that it did not find.

Host: Dorn
Framework: crs
Component: none
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 3 with PID 10715 on node Dorn exited on signal 11 (Segmentation fault).

Does anyone know what is wrong here?

Thanks a lot for help!

Bests,
martin

ckloss's picture

ckloss | Fri, 04/04/2014 - 10:06

Hi Martin,

seems that this has nothing to do with LIGGGHTS itself, seems like a configuration problem at your system

Christoph

enrique | Thu, 01/19/2017 - 16:24

I'd really appreciate if someone could post the solution to this problem.
I used the command "mpirun -np 4 liggghts in.conveyor" on the example conveyor that appears after installing and it works
properly but when I try to make it with a in.test of mine, that error comes out.
So, I've been able to compute it only in serial.

I'd appreciate anyone's help.

aaigner's picture

aaigner | Fri, 01/20/2017 - 10:07

Hello Enrique,

I hope there is either a pipe < or the -in option in your liggghts execution call. It should look like mpirun -np 4 liggghts -in in.conveyor

Besides that it is quite hard to help you without further information. But it is clear, that a simulation should never fail with a segmentation fault. If possible, open a new thread under bug reports and share your input script in.test and more important the output of your simulation (the exact error message).

In case the error message also starts with something like
[Dorn:10715] Signal: Segmentation fault (11)
[Dorn:10715] Signal code: Address not mapped (1)
[Dorn:10715] Failing at address: 0x28
[Dorn:10713] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7fac987f9cb0]
[Dorn:10713] [ 1] /usr/lib/libopen-pal.so.0(mca_base_select+0x108) [0x7fac91657518]
...

The problem is most probably not related to LIGGGHTS but to your system.

Kind regards
Andreas

enrique | Mon, 01/23/2017 - 18:31

Hello,
Thank you for answering.
The error shown starts just like that. I have only made it in serial.
Error starts like this:
Created orthogonal box = (-2.1 -2.1 -0.1) to (2.1 2.1 3.1)
2 by 2 by 1 MPI processor grid

Reading STL file 'meshes/insertion_face.stl'

Reading STL file 'meshes/cilinthick.stl'
[investigacionlnx:05889] *** Process received signal ***
[investigacionlnx:05889] Signal: Segmentation fault (11)
[investigacionlnx:05889] Signal code: Address not mapped (1)
[investigacionlnx:05889] Failing at address: 0x46
[investigacionlnx:05889] [ 0] [0xb7731bd0]
[investigacionlnx:05889] [ 1] /lib/i386-linux-gnu/libc.so.6(fclose+0x17) [0xb71e38f7]
[investigacionlnx:05889] [ 2] liggghts() [0x8079467]
[investigacionlnx:05889] [ 3] liggghts() [0x807a635]
[investigacionlnx:05889] [ 4] liggghts() [0x8cb414e]
[investigacionlnx:05889] [ 5] liggghts() [0x8cb5c18]
[investigacionlnx:05889] [ 6] liggghts() [0x8cc6afb]
[investigacionlnx:05889] [ 7] liggghts() [0x8c6d2f6]
[investigacionlnx:05889] [ 8] liggghts() [0x8c67982]
[investigacionlnx:05889] [ 9] liggghts() [0x80aa63a]
[investigacionlnx:05889] [10] liggghts() [0x80aad24]
[investigacionlnx:05889] [11] liggghts() [0x80539e9]
[investigacionlnx:05889] [12] /lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xde) [0xb719b7ae]
[investigacionlnx:05889] [13] liggghts() [0x8054769]
[investigacionlnx:05889] *** End of error message ***
[investigacionlnx:05888] *** Process received signal ***
[investigacionlnx:05890] *** Process received signal ***
[investigacionlnx:05888] Signal: Segmentation fault (11)
[investigacionlnx:05888] Signal code: Address not mapped (1)
[investigacionlnx:05888] Failing at address: 0x46
[investigacionlnx:05890] Signal: Segmentation fault (11)
[investigacionlnx:05890] Signal code: Address not mapped (1)
[investigacionlnx:05890] Failing at address: 0x46
[investigacionlnx:05888] [ 0] [0xb7731bd0]
[investigacionlnx:05888] [ 1] /lib/i386-linux-gnu/libc.so.6(fclose+0x17) [0xb71e38f7]

Do you have any suggestion to solve this?

P.D. I've read in the LIGGGHTS Tutorial that when an imported stl from CAD is complex or not very simple, then parallelization can't be done properly.

Enrique

cs222 | Wed, 04/19/2017 - 15:49

I guess the meshing for your geometry maybe an issue. Have you tried increasing the mesh size? This maybe be due to a large number of skewed elements.

Chai