zipcpu / dblclockfft Goto Github PK

A configurable C++ generator of pipelined Verilog FFT cores

Makefile 4.04% C++ 63.10% MATLAB 0.20% C 2.71% Verilog 29.47% Shell 0.47%

fft fpga verilog verilator verilog-components gplv3

dblclockfft's Introduction

A Generic Piplined FFT Core Generator

This generic pipelined FFT project contains all of the software necessary to create the IP to generate an arbitrary sized FFT. The FFT has been modified for operation in one of the following modes:

Two samples in per clock and, after some delay, two samples out per clock. This uses 6 multiplies per FFT stage in the butterflies. This was the purpose of the original dblclkfft. (Why double clock? I don't know. Double-sample FFT might've been a better name.)
One sample in per clock, with the i_ce line being high for every incoming sample--up to one sample per clock. There's also options to run with at least one clock between samples, or even two clocks between samples (or more). This mode uses 3, 2, or 1 multiplies per FFT stage respectively.
Eventually, I want to support a real FFT mode which will accept real samples input, and alternately produce real and imaginary samples output--or the converse for the inverse FFT.

The FFT generated by this project is very configurable. By simple adjustment of a command line parameter, the FFT created will either be a forward FFT or an inverse FFT. The number of bits processed, kept, and maintained by this FFT are also configurable. Even the number of bits used for the twiddle factors, or whether or not to bit reverse the outputs, are all configurable parts to this FFT core.

These features make this open source pipelined FFT module very different and unique among the other open HDL cores you may find.

For those who wish to get started right away, please download the package, change into the sw directory and run make. There is no need to run a configure script, fftgen is completely portable C++. Then, once built, go ahead and run fftgen without any arguments. This will cause fftgen to print a usage statement to the screen. Review the usage statement, and run fftgen a second time with the arguments you need.

Windows Users

My test platform is an Ubuntu system. I don't normally test the core generator on Windows platforms. Others have used it on Windows platforms quite successfully. There are two problems they have encountered when doing so:

I use a mkdir(dirname, chmod) function in the core generator to make a directory in which to place the generated core. Windows doesn't seem to support this function, but rather a similar mkdir(dirname) function. Switching between the two is simple enough.
The generator also uses the lstat(path, statbuf) function to make certain that it doesn't overwrite any pre-existing files having the same name as the directory it wishes to create. This function can be replaced with the constant 1 or boolean true in order to bypass the check and build the design on a Windows system.

There is a #define option set at the top of the generator that applies these changes for some versions of Microsoft Visual C++.

Current State

This particular version of the FFT core now passes all my tests. It has yet to meet hardware to be finally verified.

The FFT test bench doesn't yet have a threshold that adjusts with input parameters to determine success or failure (yet).
I haven't started on the real-only version of this FFT.

While my previously stated goal ws to continue working with this core until it has a real-FFT capability before releasing it back into the master branch, I'm actually so excited that I got it to this point that I'm going to move it from dev to master earlier, and come back to get the real only version.

Common Confusions

This core does not contain any overflow protection. Should you send it data large enough to overflow its internal registers, you will likely get something unexpected in return.

This is perhaps for the best. While overflow detection is a possible addition, overflow protection will only minimize an already bad situation. It would be better to avoid that bad situation in the first place by providing the core with enough bits of precision to do its mathematics with.

Commercial Applications

Should you find the LGPLv3 license insufficient for your needs, other licenses can be purchased from Gisselquist Technology, LLC.

Likewise, please contact us should you wish to fund the further development of this core.

dblclockfft's People

Contributors

Stargazers

Watchers

dblclockfft's Issues

Timing optimization

I've been playing with synthesizing your FFT core for various FPGAs (I'm surveying all FPGA FFT implementations I can find). On Xilinx 7-series FPGAs I'm seeing some timing issues for the following parameters:

Device: XC7K160T
N: 1024, 24 bit input/output, 24 bit twiddle
1 sample per clock
Maximal hardware multiplier usage

The Fmax achieved was ~207MHz, but it seems to be caused by a single timing bottleneck which is insufficient pipelining in the multipliers. Two DSP48E1 are inferred per multiply (each one can only do 18x25 multiply), and the cascading path is failing timing: https://imgur.com/a/GpcUdNE
DRC is showing:

DPOP 1 Warning DSP fft/stage_1024/HWBFLY.bfly/CKPCE_ONE.rp_one0 output fft/stage_1024/HWBFLY.bfly/CKPCE_ONE.rp_one0/P[47:0] is not pipelined (PREG=0). Pipelining the DSP48 output will improve performance and often saves power so it is suggested whenever possible to fully pipeline this function. If this DSP48 function was inferred, it is suggested to describe an additional register stage after this function. If the DSP48 was instantiated in the design, it is suggested to set the PREG attribute to 1.

There just needs to be one more register after the multiply, but I'm really not familiar with Verilog and not sure what else needs to be changed after adding the delay. Do you want to look into fixing this and see what the actual Fmax your design can achieve is?

Also the generated code is broken when output reorderer is disabled, I had to edit the generated main file and connect the last stage to the output.

Related: check out my FPGA FFT implementation (also programmatically generated) and tell me what you think.

How would I use this FFT for my 20khz sampled audio signal?

I am using windows, but i use windows subsystem for linux (WSL) to make the fft program and also to generate my FFT's

I am using a BASYS3 board with 100MHz clock, I want my FFT to be able to bin the audio signal of 20Khz sampling to around a 20Hz frequency resolution Is it possible to get some guidance to use this?

./fftgen -f 1024 -n 12 -m 6 -p 20 -k 5000

these are the current commands i use to generate my FT as i believe that to have 20Hz binning resolution it is 20Khz / 1024 which gives me 20Hz bins.

my pmod mic 3 outputs 12 bit data
and i have an oled screen that has 64 pixels in height hence i put the output in 6 bits to be able to map my output to the oled screen

-p 20 as i saw that BASYS3 has DSP elements and hence this is more efficient

-k 5000 as i am currently using my 100Mhz clock while pushing in the 20khz samples

is this the correct implementation? or am i doing something wrongly

Compiling under MINGW-w64/MSYS2

...works after adding the following at the beginning of fftgen.cpp

#ifdef __MINGW32__
#include <direct.h> // mkdir
#define mkdir(A,B)  _mkdir(A)
#define lstat(p,s)   stat(p,s)
#endif

Could I just use the .v file in rtl to build my core?

Hello, thank for your open source providing. But I am not sure where and how to use the code to build the core, could I use code:block to run the code and build the core in Win10? Actually I try to use the .v file of rtl folder directly, but it does not work because I think some .hex file is lacked. Could you give me some suggestions to use the source? Thank you.

X propagation : 4096 pt FFT

4096 point FFT was created with input data width 4bits and output data width 4bits. Output(o_result) is showing x value through out.checked the waveform in detail , after the stage_4096 onwards 'x' is getting propagated through the FFT and at the end only value is coming as output.
0.txt

Testbench
`module main_tb();
reg clk;
reg ce;
reg rst;
//reg IN;
wire [7:0] result;
wire sync;
reg [7:0] sample;

//integer i=0;

fftmain dut(.i_clk(clk),
.i_reset(rst),
.i_ce(ce),
.i_sample(sample),
.o_sync(sync),
.o_result(result)
);

always #2 clk = ~ clk;

initial
begin
$dumpfile("main.vcd");
$dumpvars(0, main_tb);
end

initial
begin

  clk = 0;
  rst = 1;
  ce = 0;
   
#6
   rst = 0;
   ce =1;
   $readmemb("0.txt",IN,0,1); 
   //IN <= 4'b1110;

end

always @(posedge clk )
begin
if (ce !=0)
sample <= IN;

sample <= IN[i];
if (i != 1)
i<= i+1;
end
initial
begin
#100 $finish;
end

endmodule`

Request to add a License to the project

@ZipCPU could you please add a valid license to the project I can't really use it for an official purpose, I saw gpvl3 has been tagged into the topics for this repo but I am not sure that's the same as actually putting a license on github, so most likely I can't use it ig?

Support FP32 format

Is it possible to support single-precision floating-point format supported by Xilinx Versal?

Undefined behaviour of o_result

Good Day,

I am trying to use this fft_core for a music spectrum analyzer, so the fft_operation doesnt need to be extremely precise.
I configured the core to take 128 samples, 8 bits wide, one sample per clock cycle. I prepared test samples of a 50Hz sine wave, for a quick sanity check of the core. I used Modelsim to simulate the fft_core and to check if the timing samples is correct and i inserted the following sample data into the core.

Excuse me, that the data is in hex, Modelsim needs it that way. I split the i_sample input in half. These are the real data samples and the complex data samples are permanently set to 0.

1. 00
2. 3E
3. 9F
4. 5B
5. D3
6. EC
7. 4C
8. 9C
9. 51
10. E5
11. D9
12. 57
13. 9D
14. 44
15. F9
16. C8
17. 5F
18. A2
19. 34
20. C0
21. B9
22. 63
23. AB
24. 22
25. 20
26. AC
27. 64
28. B7
29. F0
30. 32
31. A3
32. 60
33. C6
34. FB
35. 42
36. 9E
37. 59
38. D7
39. E8
40. 4F
41. 9C
42. 4E
43. EA
44. D5
45. 5A
46. 9E
47. 04
48. FE
49. C4
50. 61
51. A4
52. 2F
53. 11
54. B5
55. 64
56. AE
57. 1D
58. 24
59. AA
60. 63
61. BB
62. 0A
63. 36
64. A2
65. 5E
66. CA
67. F6
68. 45
69. 9D
70. 56
71. DC
72. E3
73. 52
74. 9C
75. 4B
76. EF
77. D1
78. 5C
79. 9F
80. 3C
81. 02
82. 0C
83. 62
84. A6
85. 2B
86. 16
87. B2
88. 64
89. B1
90. 18
91. 29
92. A7
93. 62
94. BE
95. 05
96. 3A
97. A0
98. 5D
99. CE
100. F1
101. 49
102. 9C
103. 54
104. E0
105. DE
106. 55
107. 9D
108. 47
109. F4
110. CC
111. 5E
112. A1
113. 38
114. 07
115. BC
116. 63
117. A9
118. 27
119. 1B
120. AF
121. 64
122. B4
123. 14
124. 2D
125. A5
126. 61
127. C2
128. 00

The strange thing is that the o_result output is mostly undefined in the simulation ('x'), regardless how long the fft is running.

1. real: 00, komplex: 00
2. real: xx, komplex: xx
3. real: xx, komplex: xx
4. real: xx, komplex: xx
5. real: xx, komplex: xx
6. real: xx, komplex: xx
7. real: xx, komplex: xx
8. real: xx, komplex: xx
9. real: xx, komplex: xx
10. real: xx, komplex: xx
11. real: xx, komplex: xx
12. real: xx, komplex: xx
13. real: xx, komplex: xx
14. real: xx, komplex: xx
15. real: xx, komplex: xx
16. real: xx, komplex: xx
17. real: xx, komplex: xx
18. real: xx, komplex: xx
19. real: xx, komplex: xx
20. real: xx, komplex: xx
21. real: xx, komplex: xx
22. real: xx, komplex: xx
23. real: xx, komplex: xx
24. real: xx, komplex: xx
25. real: xx, komplex: xx
26. real: xx, komplex: xx
27. real: xx, komplex: xx
28. real: xx, komplex: xx
29. real: xx, komplex: xx
30. real: xx, komplex: xx
31. real: xx, komplex: xx
32. real: xx, komplex: xx
33. real: 00, komplex: 00
34. real: xx, komplex: xx
35. real: xx, komplex: xx
36. real: xx, komplex: xx
37. real: xx, komplex: xx
38. real: xx, komplex: xx
39. real: xx, komplex: xx
40. real: xx, komplex: xx
41. real: xx, komplex: xx
42. real: xx, komplex: xx
43. real: xx, komplex: xx
44. real: xx, komplex: xx
45. real: xx, komplex: xx
46. real: xx, komplex: xx
47. real: xx, komplex: xx
48. real: xx, komplex: xx
49. real: xx, komplex: xx
50. real: xx, komplex: xx
51. real: xx, komplex: xx
52. real: xx, komplex: xx
53. real: xx, komplex: xx
54. real: xx, komplex: xx
55. real: xx, komplex: xx
56. real: xx, komplex: xx
57. real: xx, komplex: xx
58. real: xx, komplex: xx
59. real: xx, komplex: xx
60. real: xx, komplex: xx
61. real: xx, komplex: xx
62. real: xx, komplex: xx
63. real: xx, komplex: xx
64. real: xx, komplex: xx
65. real: fe, komplex: 00
66. real: xx, komplex: xx
67. real: xx, komplex: xx
68. real: xx, komplex: xx
69. real: xx, komplex: xx
70. real: xx, komplex: xx
71. real: xx, komplex: xx
72. real: xx, komplex: xx
73. real: xx, komplex: xx
74. real: xx, komplex: xx
75. real: xx, komplex: xx
76. real: xx, komplex: xx
77. real: xx, komplex: xx
78. real: xx, komplex: xx
79. real: xx, komplex: xx
80. real: xx, komplex: xx
81. real: xx, komplex: xx
82. real: xx, komplex: xx
83. real: xx, komplex: xx
84. real: xx, komplex: xx
85. real: xx, komplex: xx
86. real: xx, komplex: xx
87. real: xx, komplex: xx
88. real: xx, komplex: xx
89. real: xx, komplex: xx
90. real: xx, komplex: xx
91. real: xx, komplex: xx
92. real: xx, komplex: xx
93. real: xx, komplex: xx
94. real: xx, komplex: xx
95. real: xx, komplex: xx
96. real: xx, komplex: xx
97. real: 00, komplex: 00
98. real: xx, komplex: xx
99. real: xx, komplex: xx
100. real: xx, komplex: xx
101. real: xx, komplex: xx
102. real: xx, komplex: xx
103. real: xx, komplex: xx
104. real: xx, komplex: xx
105. real: xx, komplex: xx
106. real: xx, komplex: xx
107. real: xx, komplex: xx
108. real: xx, komplex: xx
109. real: xx, komplex: xx
110. real: xx, komplex: xx
111. real: xx, komplex: xx
112. real: xx, komplex: xx
113. real: xx, komplex: xx
114. real: xx, komplex: xx
115. real: xx, komplex: xx
116. real: xx, komplex: xx
117. real: xx, komplex: xx
118. real: xx, komplex: xx
119. real: xx, komplex: xx
120. real: xx, komplex: xx
121. real: xx, komplex: xx
122. real: xx, komplex: xx
123. real: xx, komplex: xx
124. real: xx, komplex: xx
125. real: xx, komplex: xx
126. real: xx, komplex: xx
127. real: xx, komplex: xx
128. real: xx, komplex: xx

This is my first project that uses a fft core and i cant really tell what the problem could be. I only know that the problem starts in the butterfly module of the fftstage_128.

Don't care output

Hello!

There's a problem with the fft output. When the output is different from zero, it gives don't care values. I attach a picture

This FFT core is generated by command:

./fftgen -f 16 -n 5 -m 5

Output scaling factor

Hello!

In the spec, it is mentioned that the core calculates the regular DFT "to within a scale factor".

After trying out the FFT core with several different configurations, it is not quite clear to me what this scaling factor is or where it comes from.

As a reference, I compared the FFT core's results to a Python FFT (NumPy, as well as a manual DFT loop) and came to the following conclusion:

Input bitwidth (N)	Output bitwidth	FFT length (F)	Scale factor
8	10	8	2
8	11	16	1
8	11	32	4
8	12	64	2
8	12	128	8
8	13	256	4
8	13	512	16

(Command for core generation: fft-gen -f F -n N -p 1024 -x 4)

The output bitwith was not manually capped, so the generator decided how many bits are required. The same test was performed with an input bitwidth of 16, leading to the same scale factors. So I assume that these are deterministic and due to them being all factors of 2, I also assume that they are the result of certain implementation details (shifting, truncating, etc.).

Unfortunately, I didn't find more information on that topic in other issues, the blog post, spec, or README.

Would it be possible to elaborate on that?

I'm also a little bit surprised that no one already mentioned this since it seems pretty important to me to have the correct scaling factor in order to get "correct" results (i.e.: the same as in a co-simulation, for example).

Thanks a lot!

Return value of `printf`.

I noticed that the return value of printf and fclose is not checked, but those functions can fail (e.g. if the disk became full or in case quota were reached). For maximal robustness, I recommend checking them.

p.s. This software has been packaged in Guix recently: https://issues.guix.gnu.org/57291.

unexpected output

Hello!

I run a testbench with random input data and found out that about half of the times the output doesn't match the expected result (dismissing some rounding error). In some runs the output data matches the expected result for the DFT while in some others the output data alternates samples in the form of "right-wrong-right-wrong-...".

If I calculate the difference between the output vs expected in the wrong samples, the abs() is always about 16384 (max value for 14 bits, or 15 if signed), which makes me think of an overflow somewhere.. but not sure really.

Here goes an horrible dump of the first values in a test example, where the complex numbers in the left are the output samples, and the ones in the right are the expected values:

d, r = ((-21120+44291j), (-21120+44291j)) (err = 0.0)
d, r = ((-77852-20660j), (-77855-37045j)) (err = 16385.00027464144)
d, r = ((-30079+14j), (-30078+14j)) (err = 1.0)
d, r = ((-33996-5932j), (-43099+7690j)) (err = 16383.634914145274)
d, r = ((29684-18098j), (29684-18097j)) (err = 1.0)
d, r = ((-21408-9626j), (-6272-15895j)) (err = 16382.883049085103)
d, r = ((31267+13250j), (31268+13250j)) (err = 1.0)
d, r = ((283+31702j), (-15786+28506j)) (err = 16383.747343022596)
d, r = ((10194-12038j), (10194-12039j)) (err = 1.0)
d, r = ((-6952-10386j), (4633+1200j)) (err = 16384.371242131936)
d, r = ((-27051+23808j), (-27052+23807j)) (err = 1.4142135623730951)
d, r = ((-5240-9459j), (-8436-25528j)) (err = 16383.747343022596)
d, r = ((-6422+174j), (-6422+174j)) (err = 0.0)
d, r = ((-656+28250j), (-6926+43386j)) (err = 16383.265730616713)
d, r = ((18130+40819j), (18130+40819j)) (err = 0.0)
d, r = ((-4588-12407j), (9036-21510j)) (err = 16385.297830677355)

Something that could also help is that if I write real inputs (with the lowest 16 bits corresponding to the imaginary component = 0) the algorithm works flawlessly.

I'm leaving the testcase here and also an example with the input/output data and waveforms: testcase.tar.gz

Let me know if I can help with more info or by checking something else!
Thanks!

longmpy declared as a wire instead of a reg in hwbfly.v

Hello,

When generating the verilog code (with "./fftgen -f 16" for instance), in the
"sw/fft-core/hwbfly.v" file, longmpy should be declared as a reg and not a wire (line 238), since it is used in a procedural block later on (line 277).

Thanks!

zipcpu / dblclockfft Goto Github PK

dblclockfft's Introduction

A Generic Piplined FFT Core Generator

Windows Users

Current State

Common Confusions

Commercial Applications

dblclockfft's People

Contributors

Stargazers

Watchers

Forkers

dblclockfft's Issues

Recommend Projects

Recommend Topics

Recommend Org