p-costa / snac Goto Github PK
View Code? Open in Web Editor NEWA multi-block solver for massively parallel direct numerical simulations (DNS) of fluid flows
License: MIT License
A multi-block solver for massively parallel direct numerical simulations (DNS) of fluid flows
License: MIT License
Currently, the loops are not collapsed (which I believe should be fine for shared-memory runs), static scheduling is not explicitly imposed, and loops in solver.f90
lacks some OpenMP directives (although the most demanding part, the iterative solvers in hypre
, have an OpenMP implementation).
This has been fine for CPU only runs using only MPI, but in light of future porting efforts, it would be good to make sure the OpenMP implementation is performing fine.
I would propose two improvements to the current visualization strategy and specifically in the phase of generation of the .xmf file:
This can be quite important when implicit diffusion is used in conjunction with a non-uniform inflow (e.g. a Poiseuille or a square duct inflow).
Edit: the non-uniform B.C. has to be reflected in:
Hi,
Does SNaC support hanging grids at the interface between blocks, i.e., non-conforming grid?
Thanks
As already done in CaNS-GPU.
if(myid == 0) write(stderr,*) 'ERROR: implicit diffusion not yet supported with "_FFT_USE_SLICED_PENCILS".'
the first instance of this error message needs fixing.
the following too:
#elif _FFT_Z
dyf(lo(2)) == dzf(lo(3))
As of now, the ranks in the multi-block implementation are ordered in by increasing block ID. For instance, for 4 blocks with 2x2 mpi domain decomposition, we get:
===============
|10 11| |14 15|
|08 09| |12 13|
---------------
|02 03| |06 07|
|00 01| |04 05|
===============
It may be advantageous to have the option to re-order the ranks with increasing ijk indexes as done by default in MPI_CART_CREATE
:
===============
|12 13| |14 15|
|08 09| |10 11|
---------------
|04 05| |06 07|
|00 01| |02 03|
===============
This can be achieved by
large = product(hi(:))
among all ranks, where hi(:)
is the upper bounds arrayarr(0:nrank-1) = lo(1) + large*lo(2) + large**2*lo(3)
It is much simpler to edit case files when the total number of grid points per block n(:)
are prescribed instead of the lo(:)
and hi(:)
extents: changes in the latter may need to be propagated to the extents of other blocks. Since we already have the geometry well defined, lo(:)
and hi(:)
carry redundant information, and can be actually replaced by n(:)
.
For instance, the following part of a geo
file:
1 33 1 ! lo(1:3)
32 64 64 ! hi(1:3)
0. .5 0. ! lmin(1:3)
.5 1. 1. ! lmax(1:3)
can be replaced by:
32 32 64 ! n(1:3)
0. .5 0. ! lmin(1:3)
.5 1. 1. ! lmax(1:3)
without loss of information. cc @f-aportela
More testing would be good, to facilitate future developments.
Particularly important is the correctness of the results for different domain decompositions, and the different combinations of directions of FFT synthesis, if FFTs are used.
except for those with reductions (to be supported by the next standard anyway), most loops could be easily ported.
Hi Costa,
I am trying to use SNaC to simulate a high Re cylinder flow.
But when the dimensionless time gets to some value, there is a huge backflow happening from the outlet.
I notice that you have already developed a subroutine "outflow" in bound.f90.
Could you kindly provide some help on how to use it?
I read the paper "Bozonnet et al., JCP 2021". However, I am not so sure how to implement the subroutines "outflow" and "outflow_p", especially "outflow_p" in which how to set the parameter "alpha"?
Thank you.
The reference given in the README has a broken DOI link.
if the number of computational divisions is larger than the number of grid points, the simulation must be killed and an error returned.
To do:
utils/visualize_fields/
INFO_VISU.md
In order to maintain a small number of checkpoint files, introduce an num_checkpoint_max
parameter, which overwrites the saved checkpoints every n
time steps. Say n=5
, the savings will proceed in time as follows:
1, 2, 3, 4, 5 ; 6->1, 7->2, 8->3, 9->4, 10->5; with -> meaning that files are overwritten.
One can still use the current symbolic link approach to have fld.bin
pointing to the last saved files.
Thanks @arashalizadbanaei for the discussion!
Once fpm reaches some maturity with regards to supporting MPI, consider using it as the main build system.
Hello @p-costa, just for your info. I tried to compile the master branch of SNaC and I couldn't since MPI_LONG in initmpi.f90 was not recognized from my GNU compiler. I just replaced with MPI_INTEGER and it works fine. Anyway, I thought it was worth mentioning.
@nazmas brought to my attention that the explicit part of the temporal integration of the diffusion term, when implicit diffusion is used, is inconsistent with what is reported in the CaNS-GPU manuscript.
Although it may still yield physically sound results, the velocity will not be second-order accurate in time.
This is a known issue of the MPI Standard, which has been fixed in the recent 4.0 Standard. When supported by the implementations, the send and recv count arrays should be of kindMPI_COUNT_KIND
, and the corresponding and recv displacements of kind MPI_ADDRESS_KIND
.
Modify transposing subroutines using ALLTOALLW
to accommodate uneven data distribution.
Steps:
lo_p(:)
, n_p(:)
, lo_s(:)
, n_s(:)
).the following lines under initmpi.f90
may result in an integer overflow for very large systems, especially ntot_sum
. The overflow has no major consequences for the calculation, since it is just used to log information about the load distribution).
ntot = product(hi(:)-lo(:)+1)
call MPI_ALLREDUCE(ntot,ntot_min,1,MPI_INTEGER,MPI_MIN,MPI_COMM_WORLD)
call MPI_ALLREDUCE(ntot,ntot_max,1,MPI_INTEGER,MPI_MAX,MPI_COMM_WORLD)
call MPI_ALLREDUCE(ntot,ntot_sum,1,MPI_INTEGER,MPI_SUM,MPI_COMM_WORLD)
Right now periodicity won't be detected for certain domains where it may be prescribed, such as the flow around a periodic array of obstacles. Periodicity will not be detected due to a lack of connectivities of inner boundaries.
This can be relaxed so that periodicity is detected not only for a cyclic succession of connectivity BCs, but also for cyclic BCs which may be interrupted by a 'hole'.
The approach that is used to solve N 2D systems of equations when FFTs are employed (by modifying the diagonal using hypre's AddToBoxValues
in a loop) does not seem to scale well with many cores.
Probably better, instead, initializing N 2D matrices at the beginning of the calculation.
Right now I/O is handled using pure MPI I/O onto raw binary files, which are visualized using XDMF metadata files. XDMF also supports HDF5 files, so the differences in visualization post-processing workflow would be minor.
For later reference: https://adios2.readthedocs.io
So that one does not have to recompile, one can read a file hypre.in
with the tolerance and solver type.
The determination of neighboring tasks can be simplified by using the following functions I drafted:
function get_id(coords,dims,periods) result(id)
use mpi, only: MPI_PROC_NULL
implicit none
integer :: id
integer, intent(in), dimension(3) :: coords,dims
logical, intent(in), dimension(3), optional :: periods
integer, dimension(3) :: coords_aux,shift
coords_aux(:) = coords(:)
if(present(periods)) then
shift(:) = 0
where(periods(:))
where(coords_aux(:)>dims(:)-1) shift(:) = (0 -coords_aux(:))/dims(:)
where(coords_aux(:)<0 ) shift(:) = (dims(:)-1-coords_aux(:))/dims(:)
coords_aux(:) = coords_aux(:) + shift(:)*dims(:)
end where
endif
if(all(coords_aux(:)<=dims(:)-1).and.all(coords_aux(:)>=0)) then
id = coords_aux(1)+coords_aux(2)*dims(1)+coords_aux(3)*dims(2)
else
id = MPI_PROC_NULL
endif
end function get_id
function get_coords(id,dims) result(coords)
integer :: coords(3)
integer, intent(in) :: id, dims(3)
coords(:) = [mod(id,dims(1)),mod(id/dims(1),dims(2)),mod(id/(dims(1)*dims(2)),dims(3))]
end function get_coords
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.