Segmentation fault LIGGGHTS-PUBLIC 3.8.0.

Submitted by Phil93 on Mon, 04/11/2022 - 09:52

Hello, I have the problem that I regularly get a "Segmentation fault" in my simulations. When I start the simulations several times, some of them run without problems. But most of them abort with a "Segmentation fault".
I use a cluster on which PBS is. The CPU is a Xeon E5-2630 v3 of which I use 16 cores. Liggghts version LIGGGHTS-PUBLIC 3.8.0.

The error looks like this:
[lena-n023:29539] *** Process received signal ***
[lena-n023:29539] Signal: Segmentation fault (11)
[lena-n023:29539] Signal code: Invalid permissions (2)
[lena-n023:29539] Failing at address: 0x2b20eaf99d58
[lena-n023:29539] [ 0] /lib64/libc.so.6(+0x36280)[0x2b20ea4f9280]
[lena-n023:29539] [ 1] /sw-eb/apps/software/haswell/Compiler/GCC/7.3.0-2.30/OpenMPI/3.1.1/lib/openmpi/mca_btl_vader.so(+0x4788)[0x2b20ebf86788]
[lena-n023:29539] [ 2] /sw-eb/apps/software/haswell/Compiler/GCC/7.3.0-2.30/OpenMPI/3.1.1/lib/libopen-pal.so.40(opal_progress+0x2c)[0x2b20eaec6e6c]
[lena-n023:29539] [ 3] /sw-eb/apps/software/haswell/Compiler/GCC/7.3.0-2.30/OpenMPI/3.1.1/lib/libmpi.so.40(ompi_request_default_wait+0x35)[0x2b20ea0f0b95]
[lena-n023:29539] [ 4] /sw-eb/apps/software/haswell/Compiler/GCC/7.3.0-2.30/OpenMPI/3.1.1/lib/libmpi.so.40(ompi_coll_base_sendrecv_actual+0xa3)[0x2b20ea143073]
[lena-n023:29539] [ 5] /sw-eb/apps/software/haswell/Compiler/GCC/7.3.0-2.30/OpenMPI/3.1.1/lib/libmpi.so.40(ompi_coll_base_allreduce_intra_recursivedoubling+0x2d9)[0x2b20ea143469]
[lena-n023:29539] [ 6] /sw-eb/apps/software/haswell/Compiler/GCC/7.3.0-2.30/OpenMPI/3.1.1/lib/libmpi.so.40(PMPI_Allreduce+0x14f)[0x2b20ea10267f]
[lena-n023:29539] [ 7] /home/-/sw/haswell/liggghts/3.5.1/build/LIGGGHTS-PUBLIC/src/lmp_auto[0x90f0e1]
[lena-n023:29539] [ 8] /home/-/sw/haswell/liggghts/3.5.1/build/LIGGGHTS-PUBLIC/src/lmp_auto[0x90f4c1]
[lena-n023:29539] [ 9] /home/-/sw/haswell/liggghts/3.5.1/build/LIGGGHTS-PUBLIC/src/lmp_auto[0x95e5e0]
[lena-n023:29539] [10] /home/-/sw/haswell/liggghts/3.5.1/build/LIGGGHTS-PUBLIC/src/lmp_auto[0x970f1a]
[lena-n023:29539] [11] /home/-/sw/haswell/liggghts/3.5.1/build/LIGGGHTS-PUBLIC/src/lmp_auto[0x9c6e4b]
[lena-n023:29539] [12] /home/-/sw/haswell/liggghts/3.5.1/build/LIGGGHTS-PUBLIC/src/lmp_auto[0x9f5ac3]
[lena-n023:29539] [13] /home/-/sw/haswell/liggghts/3.5.1/build/LIGGGHTS-PUBLIC/src/lmp_auto[0x88fd00]
[lena-n023:29539] [14] /home/-/sw/haswell/liggghts/3.5.1/build/LIGGGHTS-PUBLIC/src/lmp_auto[0x924f82]
[lena-n023:29539] [15] /home/-/sw/haswell/liggghts/3.5.1/build/LIGGGHTS-PUBLIC/src/lmp_auto[0x922eca]
[lena-n023:29539] [16] /home/-/sw/haswell/liggghts/3.5.1/build/LIGGGHTS-PUBLIC/src/lmp_auto[0x923a07]
[lena-n023:29539] [17] /home/-/sw/haswell/liggghts/3.5.1/build/LIGGGHTS-PUBLIC/src/lmp_auto[0x92465f]
[lena-n023:29539] [18] /home/-/sw/haswell/liggghts/3.5.1/build/LIGGGHTS-PUBLIC/src/lmp_auto[0x922f28]
[lena-n023:29539] [19] /home/-/sw/haswell/liggghts/3.5.1/build/LIGGGHTS-PUBLIC/src/lmp_auto[0x92371b]
[lena-n023:29539] [20] /home/-/sw/haswell/liggghts/3.5.1/build/LIGGGHTS-PUBLIC/src/lmp_auto[0x40e77b]
[lena-n023:29539] [21] /lib64/libc.so.6(__libc_start_main+0xf5)[0x2b20ea4e53d5]
[lena-n023:29539] [22] /home/-/sw/haswell/liggghts/3.5.1/build/LIGGGHTS-PUBLIC/src/lmp_auto[0x40ea63]
[lena-n023:29539] *** End of error message ***
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--exited on signal 11 (Segmentation fault).

A whole log file is attached and a example simulation can be donwload under https://drive.google.com/drive/folders/1tjD3yMZDL0n55-OeUL9F99SmRRG3ObYn...

AttachmentSize
Plain text icon log.txt1.26 MB

jumlouh | Thu, 02/15/2024 - 03:42

[lena-n023:29539] [ 1] /sw-eb/apps/software/haswell/Compiler/GCC/7.3.0-2.30/OpenMPI/3.1.1/lib/openmpi/mca_btl_vader.so(+0x4788)[0x2b20ebf86788]
[lena-n023:29539] [ 2] /sw-eb/apps/software/haswell/Compiler/GCC/7.3.0-2.30/OpenMPI/3.1.1/lib/libopen-pal.so.40(opal_progress+0x2c)[0x2b20eaec6e6c]
[lena-n023:29539] [ 3] /sw-eb/apps/software/haswell/Compiler/GCC/7.3.0-2.30/OpenMPI/3.1.1/lib/libmpi.so.40(ompi_request_default_wait+0x35)[0x2b20ea0f0b95]
[lena-n023:29539] [ 4] /sw-eb/apps/software/haswell/Compiler/GCC/7.3.0-2.30/OpenMPI/3.1.1/lib/libmpi.so.40(ompi_coll_base_sendrecv_actual+0xa3)[0x2b20ea143073]
[lena-n023:29539] [ 5] /sw-eb/apps/software/haswell/Compiler/GCC/7.3.0-2.30/OpenMPI/3.1.1/lib/libmpi.so.40(ompi_coll_base_allreduce_intra_recursivedoubling+0x2d9)[0x2b20ea143469]
[lena-n023:29539] [ 6] /sw-eb/apps/software/haswell/Compiler/GCC/7.3.0-2.30/OpenMPI/3.1.1/lib/libmpi.so.40(PMPI_Allreduce+0x14f)[0x2b20ea10267f]
[lena-n023:29539] [ 7] /home/-/sw/haswell/liggghts/3.5.1/build/LIGGGHTS-PUBLIC/src/lmp_auto[0x90f0e1]
[lena-n023:29539] [ 8] /home/-/sw/haswell/liggghts/3.5.1/build/LIGGGHTS-PUBLIC/src/lmp_auto[0x90f4c1]
pm: drift boss