Git Product home page Git Product logo

mf_data_locality's Introduction

Data locality of conjugate gradient solvers with matrix-free finite element operators

This project provides various flavors of conjugate gradient solvers to efficiently implement the ceed benchmark case BP4 http://ceed.exascaleproject.org/bps with the matrix-free evaluation routines provided by the deal.II finite element library, https://github.com/dealii/dealii

The project inherits many files from the repository https://github.com/kronbichler/ceed_benchmarks_dealii but specializes on different conjugate gradient solvers with or without preconditioners.

mf_data_locality's People

Contributors

kronbichler avatar peterrum avatar shkodm avatar

Watchers

 avatar  avatar  avatar

Forkers

peterrum shkodm

mf_data_locality's Issues

Collection of TODOs

  • add GitHub action for checking formatting
  • enable Likwid
  • add ctest (manufactured solution) and integrate into a GitHub action
  • enable to turn off preconditioner (PreconditionIdentity)
  • add s-step CG
  • add pipelined CG

Strange performance of s-step method

While recording benchmarks for the s-step method, I recorded some strange behavior. To be more precise, I see relatively low throughput for small to intermediate sizes. On the AMD Epyc system, I get up to 4.6 GDoFs/s when running the merged operations from cache, but only around 1.5 GDoFs/s for s-step with 4 steps (I have already multiplied the numbers by 4). The problematic part is e.g. seen for a size of a million DoFs:

 5  7       1536      610203 0.0004661  1.309e+09  324 0.0001763 0.0001968  4.29e-05 4.829e-05 5.086e-06 5.058e-05 2.524e-06
 5  7       3072     1205523 0.0008053  1.497e+09  312 0.0003057 0.0003585 8.127e-05 7.769e-05 1.039e-05 8.959e-05 3.427e-06

I can't really explain the gap between the mat-vec (0.3585ms) and the sum for the other others, giving a total of 0.62 ms/iteration while the time for a CG iteration is 0.8ms. @peterrum have you seen something like that before? As you see from the iteration counts I have already increased the maximum number of iterations to see if it is some very expensive initialization. At least, I could bring down the timings a bit, for 100 iterations I get 1.16ms/it, so even more than the 0.8 and almost a factor of 2 of loss in performance compared to the part that we have inside timers:

 5  7       3072     1205523  0.001158  1.041e+09  100 0.0003099 0.0003517 7.086e-05  7.33e-05 1.036e-05 8.898e-05 3.338e-06

But nothing that would explain it, so there seems to be something very weird going on.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.