Dear All,
I am trying to run a modified Ergun test case, therefore a CFDEM fluidized bed simulation, on a HPC with 576 processors ("processor 8 8 9") and I do not know how to visualize the simulation data all together in Paraview. Indeed for each processor employed to run a simulation, a new "processor" folder is created in the directory /home/ErgunTest/CFD, which means 576 new "processor" folders, which have to be opened in Paraview one-by one, to visualize the particles and gas. As "modified" Ergun test case, I mean (i) more and smaller particles in a (ii) bigger cylinder (actually a geometry consisting of a cylinder on top of a truncated cone).
Now the question. How can I "put together" automatically all the "processor" folders (from "processor 0" to "processor 576" in /home/ErgunTest/CFD) in order to have just one folder to open in Paraview and see all my simulation data together ?
Hope it is clear. If useful for a reply, I would like to add that the 576 "processor" folders are created in /home/ErgunTest/CFD with the usual instructions used also for the Ergun test case, which runs with 4 processors ("processor 2 2 1"), i.e.
cd /data/ErgunTestMPI2d/CFD
blockMesh
surfaceFeatureExtract
decomposePar
mpirun -np 4 snappyHexMesh -overwrite -parallel
reconstructParMesh -constant -fullMatch
decomposePar
mpirun -np 4 renumberMesh -overwrite -parallel
In case of the default Ergun test case, only 4 "processor" folders are created (i.e. "processor 0", "processor 1", "processor 2" and "processor 3"), but in my case 576 "processor" folders are created in /home/ErgunTest/CFD.
Best, Limone
paul | Wed, 03/14/2018 - 12:57
Use reconstructPar or do
Use reconstructPar or do
touch case.foam
paraview case.foam
and use the Case Type Decomposed Case.
A side note: Which dataExchangeModel did you use? From my experience, twoWayMPI only scales efficiently to about 50 procs.
Greetings,
Paul
limone | Wed, 03/14/2018 - 14:31
Many thanks Paul,
Many thanks Paul,
I think I did not use/change the "dataExchangeModel"... Where can i find it ?
In addition, based on your experience, which is the best configuration to run a CFDEM simulation with 500 - 2000 cores ?
If I am not wrong, CFDEMcoupling is/should be massively parallelized, right ?
Best,
Limone
limone | Wed, 03/14/2018 - 14:51
Paul, I have just checked fom
Paul, I have just checked fom the terminal with the command grep -R "dataExchangeModel" ........
CFD/constant/couplingProperties:dataExchangeModel twoWayMPI;
Now I am a bit scared because I need urgently massive simulations...... So.......Does it mean that the simulations are/will not be scaled adequately ?? Which is the best configuration then ??
I cannot find guides about the best configurations for HPC simulations for CFDEM.........
Thanks,
Lemon
mbaldini | Thu, 03/15/2018 - 15:54
Hi Lemon, I was wondering the
Hi Lemon, I was wondering the same thing that Paul asked. On mi experience, I've got speed ups using up to 80 cores. The hpc system that I'm using 40 has cores per node, If I use more that two nodes (80 cores) the simulation become slower. If I'm not wrong because inter node communication costs. I would recommend you to perform a scaling test, and then choose a reasonable number of cores for your runs.
Cheers,
Mauro
limone | Thu, 03/15/2018 - 16:09
Hi Mauro,
Hi Mauro,
Thanks for sharing...... The question is: Does the performance decrease with an increase of the nodes due to a "bad" communication among the nodes, or due to a poor parallelization of the CFDEM code ? Or due to both reasons ?
I am doing a scalability test with my HPC (in my HPC 1 node has 24 cores).....So far I got...
With 18 cores (processor 3 3 2) a timestep takes 109 real minutes
With 48 cores (processor 4 4 3) a timestep takes 44 real minutes
With 216 cores (processor 6 6 6) a timestep...... still running ! I will let you know ASAP
It would be interesting to know what the DCS guys think and suggest to get a good performance of CFDEMcoupling.
Cheers,
Lemon
paul | Thu, 03/15/2018 - 17:55
For finding the reason behind
For finding the reason behind the bad scaling, one has to look no further than:
https://github.com/CFDEMproject/LIGGGHTS-PUBLIC/blob/28301df8853491784b1...
Here we see the magic behind the scenes: MPI_Allreduce
A huge array containing particle data is 1. summed from all and subsequently 2. distributed to all.
This causes coupling to become the bottleneck. Every core knows everything about everyone and has to talk to everyone.
They have some better communication scheme called M2M which is part of premium:
http://lammps.sandia.gov/workshops/Aug13/Kloss/LAMMPS_presentation_Kloss...
I ran into the same problems as you and have just finished writing a scheme that scales much better. I'll talk to my boss tomorrow and ask whether I can publicize it.
Greetings,
Paul
limone | Fri, 03/16/2018 - 15:04
Hi Paul,
Hi Paul,
Any news about your comment "I'll talk to my boss tomorrow and ask whether I can publicize it" on the better scheme you wrote... ?
Anything sharable somehow ? :-)
Cheers,
Limone
paul | Fri, 03/16/2018 - 19:02
Yeah this is might take a
Yeah this is might take a while - things are a lot harder if you do not own your work, there are many people who need to be convinced :/
What I can share with you right now is the CG patch for LIGGGHTS if you give me your mail address.
limone | Mon, 03/19/2018 - 10:21
Thanks Paul!
Many thanks Paul!
I would be really grateful if you could share that with me.
My email is limonecfdem@gmail.com
In case of any problem I can give you another one.
Cheers,
Limone
limone | Thu, 03/15/2018 - 18:05
Yes Paul, If your work can be
Yes Paul, If your work can be publicized would be a great benefit for all this community!
It starts to be frustrating......
Many thanks,
Lemon
hoehnp | Mon, 03/23/2020 - 14:49
REMINDER
Hi Paul,
sorry for reviving this old topic. But I wonder if you finally go the permission to distribute the code.
Many thanks,
Patrick