segmentation fault

Submitted by tjleps on Thu, 02/20/2020 - 23:10

We seem to have developed a segmentation fault on our CENTOS 8 system. Everything was working fine, then all of a sudden we started getting seg faults on our scripts. I've tried downloading a fresh copy of LIGGGHTS and recompiling with $make auto, and I've reinstalled openmpi and openmpi-devel, but we're still getting the same faults. I'm unable to even run the examples. Has anyone had a similar problem?

output from running cohesion :

Caught signal 11 (Segmentation fault: address not mapped to object at address 0x7fec02ff6768)
==== backtrace ====
0 /lib64/libucs.so.0(+0x18bb0) [0x7fec02989bb0]
1 /lib64/libucs.so.0(+0x18d8a) [0x7fec02989d8a]
2 /lib64/libuct.so.0(+0x1655b) [0x7fec082e655b]
3 /lib64/ld-linux-x86-64.so.2(+0xfd2a) [0x7fec1c9c2d2a]
4 /lib64/ld-linux-x86-64.so.2(+0xfe2a) [0x7fec1c9c2e2a]
5 /lib64/ld-linux-x86-64.so.2(+0x13e3f) [0x7fec1c9c6e3f]
6 /lib64/libc.so.6(_dl_catch_exception+0x77) [0x7fec16e9fff7]
7 /lib64/ld-linux-x86-64.so.2(+0x136ae) [0x7fec1c9c66ae]
8 /lib64/libdl.so.2(+0x11ba) [0x7fec168ac1ba]
9 /lib64/libc.so.6(_dl_catch_exception+0x77) [0x7fec16e9fff7]
10 /lib64/libc.so.6(_dl_catch_error+0x33) [0x7fec16ea0093]
11 /lib64/libdl.so.2(+0x1939) [0x7fec168ac939]
12 /lib64/libdl.so.2(dlopen+0x4a) [0x7fec168ac25a]
13 /usr/lib64/openmpi/lib/libopen-pal.so.40(+0x6df05) [0x7fec14f25f05]
14 /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_component_repository_open+0x206) [0x7fec14f03b16]
15 /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_component_find+0x35a) [0x7fec14f02a5a]
16 /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_components_register+0x2e) [0x7fec14f0e3ce]
17 /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_register+0x252) [0x7fec14f0e8b2]
18 /usr/lib64/openmpi/lib/libopen-pal.so.40(mca_base_framework_open+0x15) [0x7fec14f0e915]
19 /usr/lib64/openmpi/lib/libmpi.so.40(ompi_mpi_init+0x674) [0x7fec17935494]
20 /usr/lib64/openmpi/lib/libmpi.so.40(MPI_Init+0x72) [0x7fec179656b2]
21 lmp_auto() [0x41ef1d]
22 /lib64/libc.so.6(__libc_start_main+0xf3) [0x7fec16d8a873]
23 lmp_auto() [0x42034e]
===================

mschramm | Fri, 02/21/2020 - 04:30

Hello,
I have had some problems with LIGGGHTS and other programs ever since updating my rhel 7 machine to rhel 8 (still don't have a fully functional paraview...).
I did not have good luck with openmpi-devel. Under the eye of our HPC engineer techs, we started to install everything using spack (github.com/spack/spack).
You can install everything simply using
spack spec liggghts ^mesa~llvm ^python@3.6.9
spack install liggghts
REPLACE 3.6.9 with your version of python
This will automatically download all needed programs to get liggghts to run. The only thing you HAVE to watch out for is Mesa using Clang to compile itself.

If you are going to be activily developing liggghts, you can download openmpi, boost, and vtk from speck and load them when you compile liggghts
spack load vtk
spack load openmpi
spack load boost
make auto

Sorry if this wasn't what you were after.

tjleps | Sat, 02/22/2020 - 03:35

I think the problem started after I last ran "$yum update" a few weeks ago. Thinking I, or one of the other users, may have broken something on the machine, I did a clean install tonight. Unfortunately, I'm still getting the same seg fault.

I'm a little hesitant to install new flavors of package managers, since I will (hopefully) be graduating soon and I don't want to steepen the learning curve for the next administrator of our labs workstation. On the other hand, it does seem to be a fairly well supported system for HPC applications, so maybe it's not so bad. I've just had bad luck with third party package managers polluting /usr in unpredictable ways, and making the system a ratsnest.

tjleps | Sat, 02/22/2020 - 04:02

LAMMPS is broken in the latest centos8 as well. This is looking quite bad.