Git Product home page Git Product logo

Comments (5)

RudolfWeeber avatar RudolfWeeber commented on September 23, 2024

@jngrad

I'd suggest the following:

  • work on the push scheme only
  • define a separate communcaotr which communicates only the velocity field
  • use that instead of the full communicaotr a thte end of integrate_push() and check improvement in performance and scaleability.
  • If the improvement gained form that is relevant, the get_interpolated_density() needs to be fixed or removed (it relies on the now missing ghost communication of the full pdf). The easiest fix is to trigger a full ghost communicaiot of the pdfs when someune queries an interpolated density. Its slow, but I'm not aware of a cuse case of that getter, anyway.

from espresso.

jngrad avatar jngrad commented on September 23, 2024

For a 48x48x48 LB grid, removing the pdfs and forces from the communicator improves runtime significantly:

  • 25% faster on CPU with 8 cores
  • 40% faster on GPU with 1 core or 8 cores

Similar observation with particle coupling enabled, using the defaults: 30x30x30 box with 125 particles per core.

I introduced a std::bitset where individual bits are addressed using an enum value to keep track of which fields have outdated halos. Python setter functions automatically do a full LB ghost communication after changing fields, so that getter functions always access correct data. But in C++ unit tests, one has to manually call the LB ghost communication after any call to a setter function or the integration sweep.

Assertions were introduced to make sure getter functions never access outdated halo cells if the consider_ghosts flag is set to true. While the full testsuite passes with these assertions, more testing is needed, in particular to figure out if the LB observables need to manually call the LB ghost update before collecting data. I'll also run more thorough benchmarks.

from espresso.

RudolfWeeber avatar RudolfWeeber commented on September 23, 2024

Concerning observabels:

  • if they only use info from the lattice sites (total momenutm, etc), probably not, as each MPI rank can read from its local LB cells and no info from ghost laers is needed
  • velocity profile at arbitrary posiitons; probably not, should use same interpolated velociyt getter as particle coupling, which relies on already ghost-communicated pre-calculated velocity field
  • density and momentum density profile at arbitrary positions: probably yes, due to the density which we calculate direclyt from PDFs, which we no longer ghost communicate.

from espresso.

RudolfWeeber avatar RudolfWeeber commented on September 23, 2024

Great that it worked out. Thsi probably means, we should put more effort into reducing communication volume. The next steps would be:

In the m_pdf_streaming_communicator replace the Packinfo which packs the entire field by a generated pack info which only packs the data actually used on the other side by the streaming kernel (lbmpy_walberla.packinfo.generate_lb_pack_info does this, I think)

Notes:

  • according to doc string, passing the collisiom kernel to the function as well will add the other fields the collision kernel needs to the pack info. This is the last_applied_force in our case. It should then be possible to remove the separate PackInfo for the last_applied_force from the m_pdf_streaming_communicator as well.
  • I think we need streaming pattern "pull", in spite of theis affecitng integrate_push_scheme. Not sure.
  • If using teh generated pack info improves peroformance, the next step would be to have an AVX-vectorized version as well

from espresso.

jngrad avatar jngrad commented on September 23, 2024

Here are the benchmark results:

benchmarks

We are now outperforming the ESPResSo 4.2 LB implementation up to and including 16 cores on the CPU. There is no AVX implementation of the PackInfo, since pystencils generates the exact same code with and without AVX. It's also unclear to me, whether AVX would be beneficial here, since the data in the bufffer doesn't have the same alignment as in the pdfs. Maybe for vector fields it would make sense, but that buffer is one order of magnitude smaller that the pdfs buffer, and we would need to increase the buffer size by 33% to get the proper alignment, not mentioning the changes to the field itself.

The linked PR raised new questions about and MPI buffers data alignment and UBB interactions with the streaming step communication. I invested two days on solving these issues, but could only come up with temporary workarounds. It is unclear if it's worth pursuing this investigation further, as we already agreed on switching to the pull scheme in the near future.

from espresso.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.