Failed to allocate memory

Min Zhang's picture
Submitted by Min Zhang on Thu, 06/21/2018 - 06:05

Hello All,

This is Min.

I am running two cases and the ONLY difference is they use different particle insertion approaches.

The simulation settings including both geometry and simulation parameters are almost the same as the one in this post:
https://www.cfdem.com/forums/cfdem-floating-point-exception-when-startin...

Also, the only difference is about the particle insertion.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
For Case 1: The error message is:
ExecutionTime = 106141 s ClockTime = 106375 s
Time = 0.016851
Courant Number mean: 0.000148775 max: 0.0298628
Coupling...
Starting up LIGGGHTS
Executing command: 'run 100 '
run 100
Setting up run ...
Memory usage per processor = 51.7584 Mbytes
Step Atoms KinEng RotE ts[1] ts[2] Volume
1685001 1823708 3.1467598e+08 382210.01 0.0010050491 0.00093678174 7114.8
insertion: proc 0 at 0 %
insertion: proc 0 at 10 %
insertion: proc 0 at 20 %
insertion: proc 0 at 30 %
insertion: proc 0 at 40 %
insertion: proc 0 at 50 %
insertion: proc 0 at 60 %
insertion: proc 0 at 70 %
insertion: proc 0 at 80 %
insertion: proc 0 at 90 %
insertion: proc 0 at 100 %
INFO: Particle insertion ins: inserted 38883 particle templates (mass 11.653544) at step 1685045
- a total of 1952650 particle templates (mass 585.224707) inserted so far.
CFD Coupling established at step 1685100
1685101 1862584 3.1939677e+08 386886.67 0.0010050491 0.00093678174 7114.8
Loop time of 4.83152 on 48 procs for 100 steps with 1862584 atoms
Pair time (%) = 1.58029 (32.7078)
Neigh time (%) = 0.047384 (0.980727)
Comm time (%) = 0.11721 (2.42594)
Outpt time (%) = 0.0292991 (0.606417)
Other time (%) = 3.05734 (63.2791)
Nlocal: 38803.8 ave 69635 max 0 min
Histogram: 16 0 4 0 0 0 1 3 2 22
Nghost: 6911.1 ave 12638 max 0 min
Histogram: 14 0 2 4 1 3 0 0 6 18
Neighs: 202009 ave 380933 max 0 min
Histogram: 16 4 0 0 0 0 3 1 9 15
Total # of neighbors = 9696442
Ave neighs/atom = 5.20591
Neighbor list builds = 1
Dangerous builds = 0
LIGGGHTS finished
ERROR on proc 9: Failed to reallocate 453014800 bytes for array CfdDatacouplingMPI:data (../memory.cpp:102)

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
For Case 2: The error message is:
ExecutionTime = 120162 s ClockTime = 120432 s
Time = 0.031
Courant Number mean: 0.000143968 max: 0.031864
Coupling...
Starting up LIGGGHTS
Executing command: 'run 100 '
ERROR on proc 45: Failed to allocate 17634800 bytes for array write_restart:buf (../memory.cpp:77)
Rank 45 [Sat Jun 16 08:47:05 2018] [c0-1c2s6n0] application called MPI_Abort(comm=0x84000002, 1) - process 45
aborting job:
application called MPI_Abort(comm=0x84000002, 1) - process 45
srun: error: nid00920: task 45: Exited with exit code 255
srun: Terminating job step 1628457.0
slurmstepd: error: *** STEP 1628457.0 ON nid00919 CANCELLED AT 2018-06-16T08:47:06 ***
srun: error: nid00920: tasks 24-39,41-43,46-47: Terminated
srun: error: nid00919: tasks 3-4,8-12,14-15,17-20,22-23: Terminated
srun: error: nid00920: tasks 40,44: Terminated
srun: error: nid00919: tasks 0-2,5-7,13,16,21: Terminated
srun: Force Terminated job step 1628457.0
TACC: MPI job exited with code: 255
TACC: Shutdown complete. Exiting.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
I am wondering how to fix this problem.

Thanks and best regards,
Min

paul | Thu, 06/21/2018 - 15:32

Straight from LIGGGHTS memory.h:
Your LAMMPS simulation has run out of memory. You need to run a
smaller simulation or on more processors.

Which is somewhat reasonable when considering you have >1.8M particles.

How many processors do you use on how many nodes with how much RAM?

Min Zhang's picture

Min Zhang | Thu, 06/21/2018 - 16:28

Hello Paul,

Thanks again!

I am running on 48 processors on 2 nodes. Each node has two Intel E5-2690 v3 12-core (Haswell) processors and 64 GB of DDR4 memory.

paul | Sun, 06/24/2018 - 08:14

Since this is a CFDEMcoupling case, and the coupling module does not scale well, all procs get to know everything about every particle (MPI_Allreduce is used for comm).

So you already have a huge simulation (1.8 M particles) and try to add a lot more. srealloc from memory.cpp tries to re-allocate an array for communication (at a whopping (453014800 bytes = 453 MB!) - which fails as there is no 24 (one for each proc) contiguous memory regions of this size to be found for allocation.

It could work for compute nodes w/ more memory though - although with very bad performance.

I suggest performing a smaller simulation, 1.8 M particles quite a lot and usually not needed to reproduce the physics of a situation - coarse graining exists for a reason.

paul | Sun, 06/24/2018 - 17:52

Here's a patch:
diff --git a/src/input.cpp b/src/input.cpp
index e0496f4..d665752 100644
--- a/src/input.cpp
+++ b/src/input.cpp
@@ -583,6 +583,7 @@ int Input::execute_command()
else if (!strcmp(command,"bond_style")) bond_style();
else if (!strcmp(command,"boundary")) boundary();
else if (!strcmp(command,"box")) box();
+ else if (!strcmp(command,"coarsegraining")) coarsegraining();
else if (!strcmp(command,"communicate")) communicate();
else if (!strcmp(command,"compute")) compute();
else if (!strcmp(command,"compute_modify")) compute_modify();
@@ -1270,6 +1271,17 @@ void Input::box()
}

/* ---------------------------------------------------------------------- */
+
+void Input::coarsegraining()
+{
+ if (narg < 1) error->all(FLERR, "Illegal coarsegraining command");
+ for(int i = 0; i < narg; i++)
+ force->setCG(atof(arg[i]));
+ force->reportCG();
+}
+
+/* ---------------------------------------------------------------------- */
+
void Input::communicate()
{
comm->set(narg,arg);
diff --git a/src/input.h b/src/input.h
index 08d5e29..a9e58db 100644
--- a/src/input.h
+++ b/src/input.h
@@ -132,6 +132,7 @@ class Input : protected Pointers {
void bond_style();
void boundary();
void box();
+ void coarsegraining();
void communicate();
void compute();
void compute_modify();

You can apply the patch by going into the $CFDEM_LIGGGHTS_SRC_DIR and doing
patch -p1 < path/to/liggghts-cg.patch

This patch teaches liggghts to read parameters of the type:
coarsegraining cg
where cg is the scaling factor. This command should be put very early in the run script. Read the works of Bierwisch to get the theoretical background. CFDEMcoupling automatically gets this info and scales drag appropriately.

Min Zhang's picture

Min Zhang | Thu, 06/28/2018 - 02:03

Hello Paul,

Thank you so much for your reply.

I am wondering whether I could ask more questions.

1. Has CG CFD-DEM been implemented as a feature in the latest CFDEMcoupling version by default, or it is still under the hood?

2. The patch file name is liggghts-cg.patch. What I need to do is go into $CFDEM_LIGGGHTS_SRC_DIR, copy this patch file into this path, and then do patch -p1 < path/to/liggghts-cg.patch. Then, coarsegraining cg command is ready to use in the liggghts input file. Yes?

3. The software versions I am using are: OpenFOAM-2.4.x, LIGGGHTS 3.6.0, CFDEMcoupling-PUBLIC-2.4.x. I am wondering whether I need to upgrade to the latest version or newer versions.

Thanks and best regards,
Min

paul | Thu, 06/28/2018 - 10:21

1. It has been a "hidden" feature for a long time.

2. Yes, but if you copy it into the $CFDEM_LIGGGHTS_SRC_DIR, the command is patch -p1 < liggghts-cg.patch

3. If the patch applies cleanly, you should be fine.

Min Zhang's picture

Min Zhang | Thu, 06/28/2018 - 19:02

Hello Paul,

After I apply this patch, do I need to compile liggghts again?

Thanks and best regards,
Min