Comments (3)
Ahm, okay, wow, 207MHz?? That's AWESOME! I'm glad you've been able to run the FFT that fast.
There are two "official" answer to the issue of running the FFT faster.
-
Keep the internal bitwidth from expanding, by setting a maximum bitwidth at or less than 18 bits. (Might need to be 16 or 17-bits) so that only a single DSP is used per multiply.
-
The two samples per clock option was specifically built for the purpose of being able to handle larger and larger FFTs but at lower clock rates.
Actually adjusting the logic so that the multiplies are pipelined one into the next has a couple problems. I doubt any are unsolvable, but they'd need some work:
-
Not all solutions would want to split the multiplies into two, so an option would need to be presented to decide when to do this and when not to.
-
The code itself isn't necessarily that complicated, although my own example breaks a single multiply into four instead of two.
-
There's also the pipeline hassle that would need to be dealt with. A multiply that takes more clocks needs to be properly scheduled so that the result is able to be matched up with the rest of the butterfly.
So, it's doable, but it would take some work to get it done right.
Now, can you tell me what happened without the output re-order, and why you think it's broken?
Dan
from dblclockfft.
207MHz on a Kintex-7 isn't considered fast ;) I'm pretty sure you can get your design to meet timing at double that if you do a little bit of timing whack-a-mole ;) A 2-sample-per-cycle core would double the throughput but also increase resource usage, it would be much nicer to get that 2x Fmax boost too.
The code currently describes a single multiplier. It is vivado that is splitting that into two multiplies. However vivado is saying that if you simply describe one more register on the output of the RTL multiply, it will insert the right pipeline registers at the right places. I would say maybe add a configurable option to enable an extra register. The way I'm planning to deal with this is to have many multiplier implementations that the user can choose from. Currently I have two implementations, a simple and naive straightforward description (that infers well for multiplies below 25x18), and a hand optimized one designed specifically for 25x35 complex multiplies using 8 DSP48E1s in Xilinx devices with minimal LUT usage.
Yes the rest of the logic needs to be delayed to accommodate the additional multiplier delay, which is the part I'm unsure about. In my implementation I have a "phase" signal that goes through the core and is used to coordinate everything. When there are pipeline delays I would simply subtract the delay from phase and pass on the new phase signal downstream (with registers of course). Since phase is monotonic I can do tricks like phase_out <= phase+1 when rising_edge(clk) which adds a register but keeps the phase value unchanged.
The code just has a compilation error when output reorder is disabled. The br_o_result signal is undeclared and nothing wires br_sample to the output.
from dblclockfft.
I fixed the bug when no bit-reversal stage was present. Feel free to inspect the latest code.
from dblclockfft.
Related Issues (13)
- Don't care output HOT 7
- Compiling under MINGW-w64/MSYS2 HOT 8
- Request to add a License to the project HOT 2
- Return value of `printf`.
- How would I use this FFT for my 20khz sampled audio signal? HOT 3
- Output scaling factor HOT 2
- longmpy declared as a wire instead of a reg in hwbfly.v HOT 1
- unexpected output HOT 3
- Could I just use the .v file in rtl to build my core? HOT 2
- Support FP32 format HOT 1
- X propagation : 4096 pt FFT HOT 1
- Undefined behaviour of o_result HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dblclockfft.