Git Product home page Git Product logo

Comments (2)

moritzaugustin avatar moritzaugustin commented on June 11, 2024

I think @Kwartke you had implemented this already, i.e., it might be worth to scan the old branches

from brian2cuda.

denisalevi avatar denisalevi commented on June 11, 2024

It turns out, that using atomicAdd on shared memory instead of global memory in the thresholder kernel is slower (on GTX Titan Black). A performance plot comparing the two implementations for N=10000 neurons, with every n-th neuron spiking in each time step can be found here.
The time measurements were done using the nvprof command line profiler and are average values of 10 kernel calls.

Time measurement of only the atomicAdd instructions within the kernel for the two implementations using clock() in the kernel code (for the case of all neurons spiking) show the same results:

  • using shared atomics takes ~35.5 us per kernel call
  • using global atomics takes ~9.6 us per kernel call

The code for the time measurements and how to reproduce them can be found in the dev/issues/issue9_spikespace folder (commit dee9bf7).
clock() measures the number of clocks per-multiprocessor counter, so this is the time the device takes executing the thread (including waiting/replays of conflicting atomics) (as explained here).

In this nvidia blogpost a good explanation can be found, where shared and global atomics are compared for a similar use case.

Kepler emulates shared memory atomics in software [...].

However the Maxwell architecture features hardware support for shared memory atomics and we can clearly see that in all cases the shared atomics version performs best.

This explains the results for our performance measurements (The GTX Titan Black is a Kepler-GPU). The performance gain with shared atomics on Maxwell architectures in aboves blog post is also only a factor of 2 (independent of the number of conflicting atomics).

Therefore, we keep the implementation using global atomics. Optional shared atomics for Maxwell architectures could be considered at a later development stage.
The implementation using shared atomics can be found in the issue9_spikespace branch (eb215d4).

Closing the issue.

from brian2cuda.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.