Microprocessor-Application Term project (Park Jonghyuk, Ko Ryeowook)
- Pure-SW implementation (No HW block allowed to accelerate the function)
- Reproduibility
- Accuracy (NSR)
- Performance (How fast it got compared to the reference)
- Novelty
- Reference C code of QR Decomposition, LDPC Decoding algorithm
- Benchmarking C code
- Accuracy and time
- Understanding of reference C code
- Debugged into assembly level to find the part that takes long time.
- C programming language
- Vivado 2017.4, Vivado SDK
- ZYNQ Z7-20
- Loop optimization (Loop order change, loop merge) - best performance improvement
- Declared new variables in every loop
- Minimize access to external global H arrays
- Used temporary variables
- Changed every multiply & division operation into shift operation
- Reciprocal the variable and converted it into multiplication.
- Used temporary variables (best performance improvement)
- Utilized faster "sqrt()" algorithm (Float square root approximation)
- Loop unrolling, loop order change
- Tried to apply loop tiling, but showed low performance improvement.
- More details : 16team_final.pdf
- 5.44 times faster than the reference with NSR of -inf [dB]
- 2.16 times faster than the reference with NSR of -57.682 [dB]
- More details : 16team_final.pdf
- SW implementation on ZYNQ Z7-20
- Usage of Vivado SDK
- Improved my C programming skills
- Learning of ways to optimize C code
- Cache optimization
- Loop optimization
- Assembly level coding
- Temporary variables
- Understanding of assembly language (+ Neon)
- Read IEEE articles related to software optimization