A floating-point matrix multiplication implemented in hardware.
This repo describes the implementation of a floating-point matrix multiplication on a PYNQ-Z1 development board.
The hardware module implements the matrix product C = AB, where A, B, and C are 128 x 128 matrices.
This hardware accelerator provides a 2.8x speedup compared to NumPy. It should be noted that NumPy uses both vectorization and, presumably, a more efficient algorithm than the naive one implemented in this example.
A 3.5x speedup can be achieved by using the 64-bit AXI-Stream interface. This approach requires additional logic to pack and unpack the matrices.
- [hls] contains the accelerator c++ source code for high level synthesis.
- [boards/Pynq-Z1/matmult] contains the Vivado project.
- [notebooks] contains the Jupyter Notebook to evaluate the design. This notebook uses the Xilinx/PYNQ Python library.
- [overlay] contains the generated hardware files. These files were generated using
vivado
andvivado_hls
version 2019.2.
- Copy overlay/matmult to the PYNQ-Z1 device.
- Copy notebooks/matmult.ipynb to the Jupyter notebooks area in the PYNQ-Z1 device.
Requires Xilinx vivado
and vivado_hls
version 2019.2. If necessary, a different version can be configured in the tcl scripts: script_solution1.tcl and matmult.tcl.
- Build the
matmult
module:cd hls make clean && make solution1
- Build the Vivado project:
cd boards/Pynq-Z1/matmult make clean && make all
- This implementation borrows ideas and code from this application note, and the PYNQ hello world example.
- Schematic of matrix multiplication taken from Wikipedia