Fatal IO Error in cfdemSolverIB tutorial

Submitted by Churchyard on Tue, 01/12/2021 - 11:24

Dear all,

I have compiled the latest version of LIGGGHTS 3.8.1 and CFDEMcoupling 3.8.0 together with OpenFOAM 5.x on our cluster as well on one of the local workstations. The programs were used together with following software versions of g++, gcc and MPI:

- g++ (GCC) 8.4.0
- gcc (GCC) 8.4.0
- mpirun: (Open MPI) 2.1.6

The compilation process was successful and completed without any errors. However, when I tried to run the Immersed Boundary tutorial „TwoSpheresGlowinskiMPI“ on several nodes (1 node containing 20 processors in our case), I am getting an error which is attached below (file: logfile_tutorial_case.txt). While splitting the domain in varying combinations (Px Py Pz) and using different number of nodes (2-5) it was observed that the occurrence of the error is depending on the selected amount of nodes and/or processor combinations. It has also been observed that this error is mainly occurring during the mesh refinement step in OpenFOAM. For instance, the tutorial case runs successfully and till end time with certain processor division (Px Py Pz) combinations without any issues but gives error at the refinement step after a couple of time steps (the number of successfully calculated steps varies with the processor combinations (Px Py Pz) used for domain division).
I have also tested a different IB case where I encountered a similar problem (attached in log aswell; file: logfile_second_case.txt). I could not figure out reason for this problem till now.

The cfdemSolverPiso tutorial „ErgunTestMPI“ is running without any issues so my initial guess is that probably dynamicMesh is causing problems on multiple nodes.

Has someone encountered a similar problem? Any advices or hints will be highly appreciated.

Thank you in advance!

Best regards,
Churchyard

AttachmentSize
Plain text icon logfile_tutorial_case.txt56.85 KB
Plain text icon logfile_second_case.txt908 bytes

mostanad | Thu, 01/12/2023 - 02:44

Hi Everyone,
Can someone from the CFDEM developers/group answer this question? I have faced the same issue when I run my cfdemSolverIB with more than 256 CPUs!

gcc --version
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

g++ --version
g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

mpirun --version
HYDRA build details:
Version: 3.4.3
Release Date: Thu Dec 16 11:20:57 CST 2021
CC: GCC

Please help me. I am in an urgent situation.

Regards

mostanad | Thu, 01/12/2023 - 09:27

The error appears just after 256 CPUs with 2 nodes (128 CPUs per node). If I increase the number of nodes to 3, i.e. 384 CPUs, the following error appears after cyclicAMI boundary condition:

AMI: Creating addressing and weights between 9514 source faces and 10120 target faces

[256]
[256]
[256] --> FOAM FATAL IO ERROR:
[256] error in IOstream "IOstream" for operation operator>>(Istream&, List&) : reading first token
[256]
[256] file: IOstream at line 0.
[256]
[256] From function void Foam::IOstream::fatalCheck(const char*) const
[256] in file db/IOstreams/IOstreams/IOstream.C at line 109.
[256]
FOAM parallel run exiting
[256]
MPICH ERROR [Rank 256] [job id 793324.3] [Wed Jan 11 18:12:49 2023] [nid001226] - Abort(1) (rank 256 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 256