Best configuration for HPC simulations to run CFDEM simulations (Ergun test case)

Submitted by limone on Wed, 03/14/2018 - 17:18

Dear All,

I am trying to run a modified Ergun test case, therefore a CFDEM fluidized bed simulation, on a HPC cluster. As "modified" Ergun test case, I mean (i) more and smaller particles in a (ii) bigger cylinder (actually a geometry consisting of a cylinder on top of a truncated cone).

Do you know which is the best configuration/set up to run efficiently such a CFDEM simulation on a HPC with up to 4000 cores ?
Any trick ? Any hint to have a good HPC scalability ?

Any information is really useful, since I cannot find any guide for HPC usage/configuration for CFDEM simulations.

Best,
Limone

paul | Thu, 03/15/2018 - 20:12

Do not run a simulation with 4k cores. Trust me, you don't want to, it creates nothing but pain:
- You have to handle terabytes of result data if you don't sample your target quantities online. If you really want to, consider using binary VTK dumps in LIG and binary dumps in OF. Also, you have to be smart when to write b/c it may turn out to be the bottleneck if everything else works well.
- Use 10k-20k particles per core, this is the sweet spot. Throwing more cores at a problem is rarely economic or faster.
- You have to deal with queueing / do some social engineering to free the resources. If your sim fails, you have to fix it, submit it again and wait. Your submit - fail - fix -resubmit cycle may take a long time.
- If a single node fails, the whole simulation will fail.

Consider using coarse-graining, it is usually applicable to FB with little loss of accuracy. It is probably not available by default in PUBLIC, but enabling it is described somewhere in this forum. Takes a hand full of lines of code at most. Look at the work of Lu 2017:
https://pubs.acs.org/doi/abs/10.1021/acs.iecr.7b01862
for example.

limone | Fri, 03/16/2018 - 11:08

Thanks a lot Paul for your precious advices!

By your description it looks like quite messy when you increase to 4k cores.
I will have a look / consider the binary dumps for both LIGGGTHS and OpenFOAM.

About the number of cores per processor I was thinking around 20k / processor.
In this moment I am trying with 1000000 particles, which means 1,000,000/20,000 = 50 processors.
But I would need to use 5 - 10 millions of particles....

I had a look at the Lu(2017) paper - I know a bit about coarse graining... probably, as you suggested, I can use some implementation of the coarse graining "from the forum"....... I will check later who wrote about it in the forum... Thanks!

In general, Many thanks for all these info, which are extremely useful!

Cheers,
Limo