mf_data_locality's Introduction

Data locality of conjugate gradient solvers with matrix-free finite element operators

This project provides various flavors of conjugate gradient solvers to efficiently implement the ceed benchmark case BP4 http://ceed.exascaleproject.org/bps with the matrix-free evaluation routines provided by the deal.II finite element library, https://github.com/dealii/dealii

The project inherits many files from the repository https://github.com/kronbichler/ceed_benchmarks_dealii but specializes on different conjugate gradient solvers with or without preconditioners.

mf_data_locality's People

Contributors

Watchers

mf_data_locality's Issues

Collection of TODOs

add GitHub action for checking formatting
enable Likwid
add ctest (manufactured solution) and integrate into a GitHub action
enable to turn off preconditioner (PreconditionIdentity)
add s-step CG
add pipelined CG

Strange performance of s-step method

While recording benchmarks for the s-step method, I recorded some strange behavior. To be more precise, I see relatively low throughput for small to intermediate sizes. On the AMD Epyc system, I get up to 4.6 GDoFs/s when running the merged operations from cache, but only around 1.5 GDoFs/s for s-step with 4 steps (I have already multiplied the numbers by 4). The problematic part is e.g. seen for a size of a million DoFs:

 5  7       1536      610203 0.0004661  1.309e+09  324 0.0001763 0.0001968  4.29e-05 4.829e-05 5.086e-06 5.058e-05 2.524e-06
 5  7       3072     1205523 0.0008053  1.497e+09  312 0.0003057 0.0003585 8.127e-05 7.769e-05 1.039e-05 8.959e-05 3.427e-06

I can't really explain the gap between the mat-vec (0.3585ms) and the sum for the other others, giving a total of 0.62 ms/iteration while the time for a CG iteration is 0.8ms. @peterrum have you seen something like that before? As you see from the iteration counts I have already increased the maximum number of iterations to see if it is some very expensive initialization. At least, I could bring down the timings a bit, for 100 iterations I get 1.16ms/it, so even more than the 0.8 and almost a factor of 2 of loss in performance compared to the part that we have inside timers:

 5  7       3072     1205523  0.001158  1.041e+09  100 0.0003099 0.0003517 7.086e-05  7.33e-05 1.036e-05 8.898e-05 3.338e-06

But nothing that would explain it, so there seems to be something very weird going on.

Recommend Projects

kronbichler / mf_data_locality Goto Github PK

mf_data_locality's Introduction

Data locality of conjugate gradient solvers with matrix-free finite element operators

mf_data_locality's People

Contributors

Watchers

Forkers

mf_data_locality's Issues

Collection of TODOs

Strange performance of s-step method

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent