Git Product home page Git Product logo

Comments (3)

ZipCPU avatar ZipCPU commented on June 19, 2024

Ahm, okay, wow, 207MHz?? That's AWESOME! I'm glad you've been able to run the FFT that fast.

There are two "official" answer to the issue of running the FFT faster.

  1. Keep the internal bitwidth from expanding, by setting a maximum bitwidth at or less than 18 bits. (Might need to be 16 or 17-bits) so that only a single DSP is used per multiply.

  2. The two samples per clock option was specifically built for the purpose of being able to handle larger and larger FFTs but at lower clock rates.

Actually adjusting the logic so that the multiplies are pipelined one into the next has a couple problems. I doubt any are unsolvable, but they'd need some work:

  1. Not all solutions would want to split the multiplies into two, so an option would need to be presented to decide when to do this and when not to.

  2. The code itself isn't necessarily that complicated, although my own example breaks a single multiply into four instead of two.

  3. There's also the pipeline hassle that would need to be dealt with. A multiply that takes more clocks needs to be properly scheduled so that the result is able to be matched up with the rest of the butterfly.

So, it's doable, but it would take some work to get it done right.

Now, can you tell me what happened without the output re-order, and why you think it's broken?

Dan

from dblclockfft.

gabriel-tenma-white avatar gabriel-tenma-white commented on June 19, 2024

207MHz on a Kintex-7 isn't considered fast ;) I'm pretty sure you can get your design to meet timing at double that if you do a little bit of timing whack-a-mole ;) A 2-sample-per-cycle core would double the throughput but also increase resource usage, it would be much nicer to get that 2x Fmax boost too.

The code currently describes a single multiplier. It is vivado that is splitting that into two multiplies. However vivado is saying that if you simply describe one more register on the output of the RTL multiply, it will insert the right pipeline registers at the right places. I would say maybe add a configurable option to enable an extra register. The way I'm planning to deal with this is to have many multiplier implementations that the user can choose from. Currently I have two implementations, a simple and naive straightforward description (that infers well for multiplies below 25x18), and a hand optimized one designed specifically for 25x35 complex multiplies using 8 DSP48E1s in Xilinx devices with minimal LUT usage.

Yes the rest of the logic needs to be delayed to accommodate the additional multiplier delay, which is the part I'm unsure about. In my implementation I have a "phase" signal that goes through the core and is used to coordinate everything. When there are pipeline delays I would simply subtract the delay from phase and pass on the new phase signal downstream (with registers of course). Since phase is monotonic I can do tricks like phase_out <= phase+1 when rising_edge(clk) which adds a register but keeps the phase value unchanged.

The code just has a compilation error when output reorder is disabled. The br_o_result signal is undeclared and nothing wires br_sample to the output.

from dblclockfft.

ZipCPU avatar ZipCPU commented on June 19, 2024

I fixed the bug when no bit-reversal stage was present. Feel free to inspect the latest code.

from dblclockfft.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.