Comments (5)
I'd suggest the following:
- work on the push scheme only
- define a separate communcaotr which communicates only the velocity field
- use that instead of the full communicaotr a thte end of integrate_push() and check improvement in performance and scaleability.
- If the improvement gained form that is relevant, the get_interpolated_density() needs to be fixed or removed (it relies on the now missing ghost communication of the full pdf). The easiest fix is to trigger a full ghost communicaiot of the pdfs when someune queries an interpolated density. Its slow, but I'm not aware of a cuse case of that getter, anyway.
from espresso.
For a 48x48x48 LB grid, removing the pdfs and forces from the communicator improves runtime significantly:
- 25% faster on CPU with 8 cores
- 40% faster on GPU with 1 core or 8 cores
Similar observation with particle coupling enabled, using the defaults: 30x30x30 box with 125 particles per core.
I introduced a std::bitset
where individual bits are addressed using an enum value to keep track of which fields have outdated halos. Python setter functions automatically do a full LB ghost communication after changing fields, so that getter functions always access correct data. But in C++ unit tests, one has to manually call the LB ghost communication after any call to a setter function or the integration sweep.
Assertions were introduced to make sure getter functions never access outdated halo cells if the consider_ghosts
flag is set to true
. While the full testsuite passes with these assertions, more testing is needed, in particular to figure out if the LB observables need to manually call the LB ghost update before collecting data. I'll also run more thorough benchmarks.
from espresso.
Concerning observabels:
- if they only use info from the lattice sites (total momenutm, etc), probably not, as each MPI rank can read from its local LB cells and no info from ghost laers is needed
- velocity profile at arbitrary posiitons; probably not, should use same interpolated velociyt getter as particle coupling, which relies on already ghost-communicated pre-calculated velocity field
- density and momentum density profile at arbitrary positions: probably yes, due to the density which we calculate direclyt from PDFs, which we no longer ghost communicate.
from espresso.
Great that it worked out. Thsi probably means, we should put more effort into reducing communication volume. The next steps would be:
In the m_pdf_streaming_communicator replace the Packinfo which packs the entire field by a generated pack info which only packs the data actually used on the other side by the streaming kernel (lbmpy_walberla.packinfo.generate_lb_pack_info does this, I think)
Notes:
- according to doc string, passing the collisiom kernel to the function as well will add the other fields the collision kernel needs to the pack info. This is the last_applied_force in our case. It should then be possible to remove the separate PackInfo for the last_applied_force from the m_pdf_streaming_communicator as well.
- I think we need streaming pattern "pull", in spite of theis affecitng integrate_push_scheme. Not sure.
- If using teh generated pack info improves peroformance, the next step would be to have an AVX-vectorized version as well
from espresso.
Here are the benchmark results:
We are now outperforming the ESPResSo 4.2 LB implementation up to and including 16 cores on the CPU. There is no AVX implementation of the PackInfo, since pystencils generates the exact same code with and without AVX. It's also unclear to me, whether AVX would be beneficial here, since the data in the bufffer doesn't have the same alignment as in the pdfs. Maybe for vector fields it would make sense, but that buffer is one order of magnitude smaller that the pdfs buffer, and we would need to increase the buffer size by 33% to get the proper alignment, not mentioning the changes to the field itself.
The linked PR raised new questions about and MPI buffers data alignment and UBB interactions with the streaming step communication. I invested two days on solving these issues, but could only come up with temporary workarounds. It is unclear if it's worth pursuing this investigation further, as we already agreed on switching to the pull scheme in the near future.
from espresso.
Related Issues (20)
- Decide api for system-wide propagation setup
- py: Allow passing ParticleHanlde and Particle Slice to observables
- CI build failed for merged PR HOT 1
- Kokkos based P3M HOT 1
- Add ZnDraw-based visualization to tutorials
- Template the floating point data type in P3M
- P3M: further FFt refactoring HOT 4
- Test failures with specific myconfig.hpp HOT 2
- Support Sympy for tabulated interactions HOT 1
- Add more ZnDraw-features HOT 8
- Add prefix to preprocessor macros
- espresso assumes numpy for pint HOT 7
- Visualisation does not wot work because np.mat is removed HOT 2
- Update Readme
- Restructure installation documentation
- CI build failed for merged PR HOT 1
- Simplify work with walberla kernels
- Walberla performance tracking ticket HOT 1
- system.part.by_id() accepts bool arguments
- bump `zndraw` version HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from espresso.