Paraview visualization for a HPC simulation of a modified Ergun test case

Dear All,

I am trying to run a modified Ergun test case, therefore a CFDEM fluidized bed simulation, on a HPC with 576 processors ("processor 8 8 9") and I do not know how to visualize the simulation data all together in Paraview. Indeed for each processor employed to run a simulation, a new "processor" folder is created in the directory /home/ErgunTest/CFD, which means 576 new "processor" folders, which have to be opened in Paraview one-by one, to visualize the particles and gas. As "modified" Ergun test case, I mean (i) more and smaller particles in a (ii) bigger cylinder (actually a geometry consisting of a cylinder on top of a truncated cone).

Now the question. How can I "put together" automatically all the "processor" folders (from "processor 0" to "processor 576" in /home/ErgunTest/CFD) in order to have just one folder to open in Paraview and see all my simulation data together ?

Hope it is clear. If useful for a reply, I would like to add that the 576 "processor" folders are created in /home/ErgunTest/CFD with the usual instructions used also for the Ergun test case, which runs with 4 processors ("processor 2 2 1"), i.e.

cd /data/ErgunTestMPI2d/CFD
blockMesh
surfaceFeatureExtract
decomposePar
mpirun -np 4 snappyHexMesh -overwrite -parallel
reconstructParMesh -constant -fullMatch
decomposePar
mpirun -np 4 renumberMesh -overwrite -parallel

In case of the default Ergun test case, only 4 "processor" folders are created (i.e. "processor 0", "processor 1", "processor 2" and "processor 3"), but in my case 576 "processor" folders are created in /home/ErgunTest/CFD.

Best, Limone

Forums:

CFDEM®coupling - User Forum

paul | Wed, 03/14/2018 - 12:57

Use reconstructPar or do

Use reconstructPar or do
touch case.foam
paraview case.foam
and use the Case Type Decomposed Case.

A side note: Which dataExchangeModel did you use? From my experience, twoWayMPI only scales efficiently to about 50 procs.

Greetings,
Paul

limone | Wed, 03/14/2018 - 14:31

Many thanks Paul,

I think I did not use/change the "dataExchangeModel"... Where can i find it ?

In addition, based on your experience, which is the best configuration to run a CFDEM simulation with 500 - 2000 cores ?

If I am not wrong, CFDEMcoupling is/should be massively parallelized, right ?

Best,
Limone

limone | Wed, 03/14/2018 - 14:51

Paul, I have just checked fom

Paul, I have just checked fom the terminal with the command grep -R "dataExchangeModel" ........

CFD/constant/couplingProperties:dataExchangeModel twoWayMPI;

Now I am a bit scared because I need urgently massive simulations...... So.......Does it mean that the simulations are/will not be scaled adequately ?? Which is the best configuration then ??

I cannot find guides about the best configurations for HPC simulations for CFDEM.........

Thanks,
Lemon

mbaldini | Thu, 03/15/2018 - 15:54

Hi Lemon, I was wondering the

Hi Lemon, I was wondering the same thing that Paul asked. On mi experience, I've got speed ups using up to 80 cores. The hpc system that I'm using 40 has cores per node, If I use more that two nodes (80 cores) the simulation become slower. If I'm not wrong because inter node communication costs. I would recommend you to perform a scaling test, and then choose a reasonable number of cores for your runs.

Cheers,
Mauro

limone | Thu, 03/15/2018 - 16:09

Hi Mauro,

Thanks for sharing...... The question is: Does the performance decrease with an increase of the nodes due to a "bad" communication among the nodes, or due to a poor parallelization of the CFDEM code ? Or due to both reasons ?

I am doing a scalability test with my HPC (in my HPC 1 node has 24 cores).....So far I got...

With 18 cores (processor 3 3 2) a timestep takes 109 real minutes
With 48 cores (processor 4 4 3) a timestep takes 44 real minutes
With 216 cores (processor 6 6 6) a timestep...... still running ! I will let you know ASAP

It would be interesting to know what the DCS guys think and suggest to get a good performance of CFDEMcoupling.

Cheers,
Lemon

paul | Thu, 03/15/2018 - 17:55

For finding the reason behind

For finding the reason behind the bad scaling, one has to look no further than:
https://github.com/CFDEMproject/LIGGGHTS-PUBLIC/blob/28301df8853491784b1...

Here we see the magic behind the scenes: MPI_Allreduce
A huge array containing particle data is 1. summed from all and subsequently 2. distributed to all.
This causes coupling to become the bottleneck. Every core knows everything about everyone and has to talk to everyone.

They have some better communication scheme called M2M which is part of premium:
http://lammps.sandia.gov/workshops/Aug13/Kloss/LAMMPS_presentation_Kloss...

I ran into the same problems as you and have just finished writing a scheme that scales much better. I'll talk to my boss tomorrow and ask whether I can publicize it.

Greetings,
Paul

limone | Fri, 03/16/2018 - 15:04

Hi Paul,

Any news about your comment "I'll talk to my boss tomorrow and ask whether I can publicize it" on the better scheme you wrote... ?
Anything sharable somehow ? :-)

Cheers,
Limone