p-costa / snac Goto Github PK

View Code? Open in Web Editor NEW

39.0 4.0 9.0 587 KB

A multi-block solver for massively parallel direct numerical simulations (DNS) of fluid flows

License: MIT License

Fortran 84.46% Makefile 1.77% Python 12.31% MATLAB 1.26% Shell 0.20%

fluid-dynamics fluid-simulation computational-fluid-dynamics turbulence high-performance-computing cfd fortran

snac's People

Contributors

Stargazers

Watchers

Forkers

nscapin nazmas m-abbaszadeh lkampoli ericlinbl zimoliao ydsu wang-yi-xiang

snac's Issues

optimize and profile OpenMP implementation

Currently, the loops are not collapsed (which I believe should be fine for shared-memory runs), static scheduling is not explicitly imposed, and loops in solver.f90 lacks some OpenMP directives (although the most demanding part, the iterative solvers in hypre, have an OpenMP implementation).

This has been fine for CPU only runs using only MPI, but in light of future porting efforts, it would be good to make sure the OpenMP implementation is performing fine.

MPI_REAL_SP and MPI_REAL_DP need to be swapped in the `types` module

Possible improvements for the current visualization strategy

I would propose two improvements to the current visualization strategy and specifically in the phase of generation of the .xmf file:

Add the possibility for the user to choose which field should be loaded;
Add the possibility for the user to choose the frequency at which each field is loaded as some of them can be skipped and therefore should not be loaded.

allow for a non-uniform BC in the helmholtz solver.

This can be quite important when implicit diffusion is used in conjunction with a non-uniform inflow (e.g. a Poiseuille or a square duct inflow).

Edit: the non-uniform B.C. has to be reflected in:

the coefficient matrix
the imposition of the B.C. in the right-hand-side of the Helmholtz Eq.

Non-conforming grid

Hi,

Does SNaC support hanging grids at the interface between blocks, i.e., non-conforming grid?

Thanks

Implement infrastructure for in-situ visualization using ParaView Catalyst

As already done in CaNS-GPU.

fix minor issues in `main.f90`

if(myid == 0) write(stderr,*) 'ERROR: implicit diffusion not yet supported with "_FFT_USE_SLICED_PENCILS".' the first instance of this error message needs fixing.
the following too:

#elif  _FFT_Z
                    dyf(lo(2)) == dzf(lo(3))

implement optional reordering of ranks in the multi-block code

As of now, the ranks in the multi-block implementation are ordered in by increasing block ID. For instance, for 4 blocks with 2x2 mpi domain decomposition, we get:

===============
|10 11| |14 15|
|08 09| |12 13|
---------------
|02 03| |06 07|
|00 01| |04 05|
===============

It may be advantageous to have the option to re-order the ranks with increasing ijk indexes as done by default in MPI_CART_CREATE:

===============
|12 13| |14 15|
|08 09| |10 11|
---------------
|04 05| |06 07|
|00 01| |02 03|
===============

This can be achieved by

reading and computing lower and upper bounds for each individual rank
computing the maximum value of large = product(hi(:)) among all ranks, where hi(:) is the upper bounds array
computing and broadcasting an array containing arr(0:nrank-1) = lo(1) + large*lo(2) + large**2*lo(3)
sorting the ranks by increasing the value of arr (insertion method is simple and should work well here since the data are partially sorted)

number of grid points instead of grid extents in the input file

It is much simpler to edit case files when the total number of grid points per block n(:) are prescribed instead of the lo(:) and hi(:) extents: changes in the latter may need to be propagated to the extents of other blocks. Since we already have the geometry well defined, lo(:) and hi(:) carry redundant information, and can be actually replaced by n(:).

For instance, the following part of a geo file:

 1  33 1                  ! lo(1:3)
 32 64 64                 ! hi(1:3)
 0. .5 0.                 ! lmin(1:3)
 .5 1. 1.                 ! lmax(1:3)

can be replaced by:

 32 32 64                 ! n(1:3)
 0. .5 0.                 ! lmin(1:3)
 .5 1. 1.                 ! lmax(1:3)

without loss of information. cc @f-aportela

Improve testing in CI

More testing would be good, to facilitate future developments.

Particularly important is the correctness of the results for different domain decompositions, and the different combinations of directions of FFT synthesis, if FFTs are used.

initiate porting to DO CONCURRENT loops?

except for those with reductions (to be supported by the next standard anyway), most loops could be easily ported.

Backflow issue

Hi Costa,

I am trying to use SNaC to simulate a high Re cylinder flow.
But when the dimensionless time gets to some value, there is a huge backflow happening from the outlet.
I notice that you have already developed a subroutine "outflow" in bound.f90.

Could you kindly provide some help on how to use it?
I read the paper "Bozonnet et al., JCP 2021". However, I am not so sure how to implement the subroutines "outflow" and "outflow_p", especially "outflow_p" in which how to set the parameter "alpha"?

Thank you.

DOI is broken

The reference given in the README has a broken DOI link.

sanity check -- ensure that there are enough points to partition for a certain block

if the number of computational divisions is larger than the number of grid points, the simulation must be killed and an error returned.

script generating the xdmf file should ask for the file prefixes per block

To do:

modify files under utils/visualize_fields/
update INFO_VISU.md

add documentation generated from source using FORD

add `num_checkpoint_max` parameter

In order to maintain a small number of checkpoint files, introduce an num_checkpoint_max parameter, which overwrites the saved checkpoints every n time steps. Say n=5, the savings will proceed in time as follows:

1, 2, 3, 4, 5 ; 6->1, 7->2, 8->3, 9->4, 10->5; with -> meaning that files are overwritten.

One can still use the current symbolic link approach to have fld.bin pointing to the last saved files.

Thanks @arashalizadbanaei for the discussion!

use fpm as build system

Once fpm reaches some maturity with regards to supporting MPI, consider using it as the main build system.

implement GPU-acceleration

Compilation failure with gnu compiler (in initmpi.f90)

Hello @p-costa, just for your info. I tried to compile the master branch of SNaC and I couldn't since MPI_LONG in initmpi.f90 was not recognized from my GNU compiler. I just replaced with MPI_INTEGER and it works fine. Anyway, I thought it was worth mentioning.

temporal integration with implicit diffusion

@nazmas brought to my attention that the explicit part of the temporal integration of the diffusion term, when implicit diffusion is used, is inconsistent with what is reported in the CaNS-GPU manuscript.

Although it may still yield physically sound results, the velocity will not be second-order accurate in time.

leverage upcoming large counts support to avoid integer overflow in `MPI_Alltoallw`

This is a known issue of the MPI Standard, which has been fixed in the recent 4.0 Standard. When supported by the implementations, the send and recv count arrays should be of kindMPI_COUNT_KIND, and the corresponding and recv displacements of kind MPI_ADDRESS_KIND.

pencil-to-slab collective for uneven data

Modify transposing subroutines using ALLTOALLW to accommodate uneven data distribution.

Steps:

create an array of MPI types subarray with the size of the non-uniform local subdomains.
save the extents which account for the non-uniformity of data distribution (passing lo_p(:), n_p(:), lo_s(:), n_s(:)).

implement a convective outflow

avoid integer overflow

the following lines under initmpi.f90 may result in an integer overflow for very large systems, especially ntot_sum. The overflow has no major consequences for the calculation, since it is just used to log information about the load distribution).

    ntot = product(hi(:)-lo(:)+1)
    call MPI_ALLREDUCE(ntot,ntot_min,1,MPI_INTEGER,MPI_MIN,MPI_COMM_WORLD)
    call MPI_ALLREDUCE(ntot,ntot_max,1,MPI_INTEGER,MPI_MAX,MPI_COMM_WORLD)
    call MPI_ALLREDUCE(ntot,ntot_sum,1,MPI_INTEGER,MPI_SUM,MPI_COMM_WORLD)

   function get_id(coords,dims,periods) result(id)
     use mpi, only: MPI_PROC_NULL
     implicit none
     integer :: id
     integer, intent(in), dimension(3) :: coords,dims
     logical, intent(in), dimension(3), optional :: periods
     integer, dimension(3) :: coords_aux,shift
     coords_aux(:) = coords(:)
     if(present(periods)) then
       shift(:) = 0
       where(periods(:))
         where(coords_aux(:)>dims(:)-1) shift(:) =   (0        -coords_aux(:))/dims(:)
         where(coords_aux(:)<0        ) shift(:) =   (dims(:)-1-coords_aux(:))/dims(:)
         coords_aux(:) = coords_aux(:) + shift(:)*dims(:)
       end where
     endif
     if(all(coords_aux(:)<=dims(:)-1).and.all(coords_aux(:)>=0)) then
       id = coords_aux(1)+coords_aux(2)*dims(1)+coords_aux(3)*dims(2)
     else
       id = MPI_PROC_NULL
     endif
   end function get_id

   function get_coords(id,dims) result(coords)
     integer :: coords(3)
     integer, intent(in) :: id, dims(3)
     coords(:) = [mod(id,dims(1)),mod(id/dims(1),dims(2)),mod(id/(dims(1)*dims(2)),dims(3))]
   end function get_coords