Segmentation fault

Submitted by subhodhk on Thu, 06/12/2014 - 00:41

Hi

I get a segmentation fault when I try to run a LIGGGHTS script on more than 1 cores. This happens only if the timestep is greater than 100,000 and I think it is also linked with the number of timesteps a dump file is generated. Here are the results when I run this script using 3 cores for different total timesteps and dump timesteps :
1. Total timesteps= 100,000 & Dump for every timesteps= 100 --> Segmentation error & last dumpfile generated at Timestep = 64,000
2. Total timesteps= 100,000 & Dump for every timesteps= 1000 --> Segmentation error & last dumpfile generated at Timestep = 64,000
3. Total timesteps= 100,000 & Dump for every timesteps= 10,000 --> Runs completely without error!!
4. Total timesteps= 1000,000 & Dump for every timesteps= 10,000 --> Segmentation error & last dumpfile generated at Timestep = 290,000

I can't figure out what is causing this error or why the third run completed without any errors. Any ideas on how i can tackle this problem?
I have attached my input file and the complete error message is as below:

LIGGGHTS:01032] *** Process received signal ***
[LIGGGHTS:01032] Signal: Segmentation fault (11)
[LIGGGHTS:01032] Signal code: Address not mapped (1)
[LIGGGHTS:01032] Failing at address: 0x61a91
[LIGGGHTS:01032] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xfbb0) [0x7f287b285bb0]
[LIGGGHTS:01032] [ 1] /home/subhodh/Documents/liggghts/src/lmp_fedora(_ZN9LAMMPS_NS13FixInsertPack9is_nearbyEi+0x2a) [0x7015aa]
[LIGGGHTS:01032] [ 2] /home/subhodh/Documents/liggghts/src/lmp_fedora(_ZN9LAMMPS_NS9FixInsert11count_nnearEv+0x178) [0x6fc098]
[LIGGGHTS:01032] [ 3] /home/subhodh/Documents/liggghts/src/lmp_fedora(_ZN9LAMMPS_NS9FixInsert10load_xnearEi+0x1c) [0x6fc66c]
[LIGGGHTS:01032] [ 4] /home/subhodh/Documents/liggghts/src/lmp_fedora(_ZN9LAMMPS_NS9FixInsert12pre_exchangeEv+0x4c0) [0x6ff1a0]
[LIGGGHTS:01032] [ 5] /home/subhodh/Documents/liggghts/src/lmp_fedora(_ZN9LAMMPS_NS6Modify12pre_exchangeEv+0x4f) [0xa15fdf]
[LIGGGHTS:01032] [ 6] /home/subhodh/Documents/liggghts/src/lmp_fedora(_ZN9LAMMPS_NS6Verlet3runEi+0x547) [0xb49c37]
[LIGGGHTS:01032] [ 7] /home/subhodh/Documents/liggghts/src/lmp_fedora(_ZN9LAMMPS_NS3Run7commandEiPPc+0x7e6) [0xafe306]
[LIGGGHTS:01032] [ 8] /home/subhodh/Documents/liggghts/src/lmp_fedora(_ZN9LAMMPS_NS5Input15command_creatorINS_3RunEEEvPNS_6LAMMPSEiPPc+0x26) [0x9db916]
[LIGGGHTS:01032] [ 9] /home/subhodh/Documents/liggghts/src/lmp_fedora(_ZN9LAMMPS_NS5Input15execute_commandEv+0x7c7) [0x9d9677]
[LIGGGHTS:01032] [10] /home/subhodh/Documents/liggghts/src/lmp_fedora(_ZN9LAMMPS_NS5Input4fileEv+0x506) [0x9da186]
[LIGGGHTS:01032] [11] /home/subhodh/Documents/liggghts/src/lmp_fedora(main+0x46) [0x582786]
[LIGGGHTS:01032] [12] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f287aecfde5]
[LIGGGHTS:01032] [13] /home/subhodh/Documents/liggghts/src/lmp_fedora() [0x583bbf]
[LIGGGHTS:01032] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1032 on node LIGGGHTS exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Thank you.

AttachmentSize
Plain text icon Liggghts input file2.65 KB

msandli | Mon, 06/16/2014 - 18:22

I'm having a (perhaps) similar issue. Running 3.0.2 on a cygwin64 installation and mpich2. mpirun -np 1 lmp_fedora < *** will run to completion, but using -np > 1 leads to segmentation faults. The more processors I use, the faster the fault occurs and the few timesteps my simulation runs.

I'm aware that running this on a windows machine might have some limitations, and since i can run serial (or -np 1) just fine, it's not the end of the world, but it would be nice on those larger simulations.

subhodhk | Sat, 06/21/2014 - 23:22

It is true that the fault occurs faster if you increase the number of processors. I run Liggghts on Ubuntu so I don't think this error is due to the operating system. But the LIGGGHTS version I am using is 3.0.1. I will install 3.0.2 and see if that fixes the problem.

My simulation takes 12 days on a single processor and hence I do need it to run parallely. I've not been able to find a solution to this as of yet so please let me know if you do.

ckloss's picture

ckloss | Thu, 06/26/2014 - 12:14

HI,

I moved the discussion to the bug reports section and put in on my list of things to look at for the next release (not scheduled yet)

Christoph

ckloss's picture

ckloss | Tue, 08/26/2014 - 10:31

Hi subhodhk ,

your bug report is incomplete - you did not attach the full case so I can not run/reproduce

Christoph

RobertG | Wed, 11/19/2014 - 17:47

Hello Christoph,
I have the same problem with now LIGGGHTS 3.0.3
When you run the Continuous Blending Mixer Tutorial with two different particle. You should get the same errors.
There is an dependency on the number of processors and a dependency on the "Material and interaction properties" the lower the values are, the faster you will get the error. Liggghts 2.x shows you an error massage (but there is the same error) if you are got to low with the values but not 3.0.x ...
Tomorrow, I will programm the file for you and test it.
If there is an solution for the problem. Please say it.

Cu
RobertG

ckloss's picture

ckloss | Fri, 12/12/2014 - 15:58

Hi RobertG,

please try with the latest version 3.0.6
If the problem persists, I'll have a look!

Best wishes
Christoph

ckloss's picture

ckloss | Sat, 01/10/2015 - 13:21

Hi Robert,

can you retry with 3.0.7 and post the case that causes the problem with the insertion?

Christoph

joshiga | Tue, 11/24/2015 - 14:17

I got same error,

Signal: Segmentation fault (11)
Signal code: Address not mapped (1)

with mixer with two types of particles.
Can somebody confirm if it is fixed in latest version?
Also, is there any workaround if I want to use liggghts 3.0.2?

ckloss's picture

ckloss | Thu, 12/10/2015 - 21:56

Hi guys,

the full case set-up has never been posted here, so you'll have to try it out on your in the lastest version!

Good luck and best wishes
Christoph

rodolfo | Thu, 04/19/2018 - 15:10

Hi guys,
I am using the 3.8.0 version and I have two cases to evaluate.
They are pretty similar differentiating only the particle type (sphere/multisphere).
When I run the sphere case everything goes smoothly but in the multisphere one I got the segmentation fault issue.
Please, can anyone help me with that?

arnom's picture

arnom | Tue, 05/15/2018 - 11:51

Hi rodolfo, can you send us the input scripts of the case that segfaults? Please make the case as small as possible so that the error occurs within a few seconds of execution.
Kind regards,
Arno

DCS team member & LIGGGHTS(R) core developer

rodolfo | Mon, 05/21/2018 - 19:16

Thanks in advanced for your attention. I figure out the problem. It was an issue in the geometry mesh.
After I used the gmsh to rebuild the mesh, the script run smoothly. But I still do not know why with spheres the case could run with the older mesh and the multisphere could not.

Kind regards,
Rodolfo

blee039 | Wed, 12/12/2018 - 05:05

Hi guys,
Before compiling CFDEM i could run liggghts without errors. But after coupling CFDEM, only openfoam works. When i run pure liggghts sims the segmentation fault error pops up immediately. Same error shows up for np 1 as well.
I'm using ubuntu 16.04 with liggghts 3.8.0 and followed the latest coupling installation instructions.

<blee@blee-VirtualBox:~/Documents/Simulations/shortnozzle gravity z$ cfdemLiggghtsPar in.simd 4
[blee-VirtualBox:19249] *** Process received signal ***
[blee-VirtualBox:19249] Signal: Segmentation fault (11)
[blee-VirtualBox:19249] Signal code: Address not mapped (1)
[blee-VirtualBox:19249] Failing at address: 0x28
[blee-VirtualBox:19249] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f3c9f7f7390]
[blee-VirtualBox:19249] [ 1] /usr/lib/libmpi.so.12(mca_allocator_component_lookup+0x31)[0x7f3c9e2cc601]
[blee-VirtualBox:19249] [ 2] /usr/local/lib/openmpi/mca_mpool_hugepage.so(mca_mpool_hugepage_module_init+0xb1)[0x7f3c9815f9b1]
[blee-VirtualBox:19249] [ 3] /usr/local/lib/openmpi/mca_mpool_hugepage.so(+0x23d2)[0x7f3c981603d2]
[blee-VirtualBox:19249] [ 4] /usr/local/lib/libopen-pal.so.20(mca_base_framework_components_open+0xc3)[0x7f3c9db04253]
[blee-VirtualBox:19249] [ 5] /usr/local/lib/libopen-pal.so.20(+0xb5972)[0x7f3c9db6b972]
[blee-VirtualBox:19249] [ 6] /usr/local/lib/libopen-pal.so.20(mca_base_framework_open+0x85)[0x7f3c9db0d4d5]
[blee-VirtualBox:19249] [ 7] /usr/local/lib/libmpi.so.20(ompi_mpi_init+0x403)[0x7f3ca02e6d83]
[blee-VirtualBox:19249] [ 8] /usr/local/lib/libmpi.so.20(MPI_Init+0x63)[0x7f3ca0306873]
[blee-VirtualBox:19249] [ 9] /home/blee/LIGGGHTS/LIGGGHTS-PUBLIC/src/lmp_auto[0x40e69e]
[blee-VirtualBox:19249] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f3c9f43c830]
[blee-VirtualBox:19249] [11] /home/blee/LIGGGHTS/LIGGGHTS-PUBLIC/src/lmp_auto[0x40e939]
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node blee-VirtualBox exited on signal 11 (Segmentation fault).

abhishek9191 | Mon, 10/21/2019 - 11:15

Hi All,
I installed CFDEM on my department server. It shows segmentation fault as shown below when I run the ErgunTestMPI tutorial.
Also I think my compilation is not successful as I have the following logs present in the
directory $CFDEM_SRC_DIR/lagrangian/cfdemParticle/etc/log
LOGS in $CFDEM_SRC_DIR/lagrangian/cfdemParticle/etc/log:

dummy
log_compileASPHERElib
log_compileCFDEMcoupling_cfdemParticle
log_compileCFDEMcoupling_cfdemPostproc
log_compileCFDEMcoupling_cfdemSolverIB
log_compileCFDEMcoupling_cfdemSolverPiso
log_compileCFDEMcoupling_cfdemSolverPisoScalar
log_compileCFDEMcoupling_cfdemSolverPisoSTM
log_compileCFDEMcoupling_fvOptionsCFDEM
log_compileCFDEMcoupling_scalarTransportModelsCFDEM
log_compileDIPOLElib
log_compileLIGGGHTS
log_compilePASCALlib
log_compilePOEMSlib
log_compile_results_sol_success
log_compile_results_src_fail
log_compile_results_src_success

ERROR MESSAGE on running ErgunTestMPI tutorial:
mesh was built before - using old mesh
starting DEM run in parallel...

// run_liggghts_init_DEM //

/home/aa1/abhishek/tutorial/ErgunTestMPI/DEM

[chem3:81385] *** Process received signal ***
[chem3:81385] Signal: Segmentation fault (11)
[chem3:81385] Signal code: Address not mapped (1)
[chem3:81385] Failing at address: 0x10
[chem3:81385] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f411e0f4390]
[chem3:81385] [ 1] /usr/lib/libmpi.so.12(mca_btl_base_select+0x60)[0x7f411cbca850]
[chem3:81385] [ 2] /usr/local/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x7f4110180422]
[chem3:81385] [ 3] /usr/local/lib/libmpi.so.40(mca_bml_base_init+0x8c)[0x7f411ec3a88c]
[chem3:81385] [ 4] /usr/local/lib/libmpi.so.40(ompi_mpi_init+0x637)[0x7f411ebeece7]
[chem3:81385] [ 5] /usr/local/lib/libmpi.so.40(MPI_Init+0x6e)[0x7f411ec1d23e]
[chem3:81385] [ 6] /home/aa1/LIGGGHTS/LIGGGHTS-PUBLIC/src/lmp_auto[0x40e57e]
[chem3:81385] [ 7] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f411dd39830]
[chem3:81385] [ 8] /home/aa1/LIGGGHTS/LIGGGHTS-PUBLIC/src/lmp_auto[0x40e939]
[chem3:81385] *** End of error message ***
[chem3:81383] *** Process received signal ***
[chem3:81383] Signal: Segmentation fault (11)
[chem3:81383] Signal code: Address not mapped (1)
[chem3:81383] Failing at address: 0x10
[chem3:81383] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f1c5805b390]
[chem3:81383] [ 1] /usr/lib/libmpi.so.12(mca_btl_base_select+0x60)[0x7f1c56b31850]
[chem3:81383] [ 2] /usr/local/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x7f1c4a1b7422]
[chem3:81383] [ 3] /usr/local/lib/libmpi.so.40(mca_bml_base_init+0x8c)[0x7f1c58ba188c]
[chem3:81383] [ 4] /usr/local/lib/libmpi.so.40(ompi_mpi_init+0x637)[0x7f1c58b55ce7]
[chem3:81383] [ 5] /usr/local/lib/libmpi.so.40(MPI_Init+0x6e)[0x7f1c58b8423e]
[chem3:81383] [ 6] /home/aa1/LIGGGHTS/LIGGGHTS-PUBLIC/src/lmp_auto[0x40e57e]
[chem3:81383] [ 7] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f1c57ca0830]
[chem3:81383] [ 8] /home/aa1/LIGGGHTS/LIGGGHTS-PUBLIC/src/lmp_auto[0x40e939]
[chem3:81383] *** End of error message ***
[chem3:81382] *** Process received signal ***
[chem3:81382] Signal: Segmentation fault (11)
[chem3:81382] Signal code: Address not mapped (1)
[chem3:81382] Failing at address: 0x10
[chem3:81384] *** Process received signal ***
[chem3:81384] Signal: Segmentation fault (11)
[chem3:81384] Signal code: Address not mapped (1)
[chem3:81384] Failing at address: 0x10
[chem3:81382] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fd1e662a390]
[chem3:81382] [ 1] [chem3:81384] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f820b915390]
[chem3:81384] [ 1] /usr/lib/libmpi.so.12(mca_btl_base_select+0x60)[0x7fd1e5100850]
[chem3:81382] [ 2] /usr/local/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x7fd1d879b422]
[chem3:81382] [ 3] /usr/lib/libmpi.so.12(mca_btl_base_select+0x60)[0x7f820a3eb850]
[chem3:81384] [ 2] /usr/local/lib/openmpi/mca_bml_r2.so(mca_bml_r2_component_init+0x12)[0x7f81fd9b7422]
[chem3:81384] [ 3] /usr/local/lib/libmpi.so.40(mca_bml_base_init+0x8c)[0x7fd1e717088c]
[chem3:81382] [ 4] /usr/local/lib/libmpi.so.40(mca_bml_base_init+0x8c)[0x7f820c45b88c]
[chem3:81384] [ 4] /usr/local/lib/libmpi.so.40(ompi_mpi_init+0x637)[0x7fd1e7124ce7]
[chem3:81382] [ 5] /usr/local/lib/libmpi.so.40(ompi_mpi_init+0x637)[0x7f820c40fce7]
[chem3:81384] [ 5] /usr/local/lib/libmpi.so.40(MPI_Init+0x6e)[0x7fd1e715323e]
[chem3:81382] [ 6] /home/aa1/LIGGGHTS/LIGGGHTS-PUBLIC/src/lmp_auto[0x40e57e]
[chem3:81382] [ 7] /usr/local/lib/libmpi.so.40(MPI_Init+0x6e)[0x7f820c43e23e]
[chem3:81384] [ 6] /home/aa1/LIGGGHTS/LIGGGHTS-PUBLIC/src/lmp_auto[0x40e57e]
[chem3:81384] [ 7] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fd1e626f830]
[chem3:81382] [ 8] /home/aa1/LIGGGHTS/LIGGGHTS-PUBLIC/src/lmp_auto[0x40e939]
[chem3:81382] *** End of error message ***
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f820b55a830]
[chem3:81384] [ 8] /home/aa1/LIGGGHTS/LIGGGHTS-PUBLIC/src/lmp_auto[0x40e939]
[chem3:81384] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 0 on node chem3 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
I am using OpenFOAM 5.x and LIGGGHTS version 3.8.0 for installation. Please help.

Thanks & Regards,
Abhishek