If you use this repository, we would appreciate a citation for the following article:
@ARTICLE{9861394,
author={Li, Zhen and Zheng, Su and Zhang, Jide and Lu, Yao and Gao, Jingbo and Tao, Jun and Wang, Lingli},
journal={IEEE Transactions on Very Large Scale Integration (VLSI) Systems},
title={Adaptable Approximate Multiplier Design Based on Input Distribution and Polarity},
year={2022},
volume={30},
number={12},
pages={1813-1826},
doi={10.1109/TVLSI.2022.3197229}}
This repository contains:
- software: the MATLAB code of the optimization method.
- multipliers: Verilog models of reproduced multipliers and generated multipliers.
- ApproxFlow: a toolbox to evaluate the DNN accuracy with the approximate multiplier.
- accelerators: three DNN accelerators with unsigned 8-bit multipliers.
- fir: an adaptive least mean square (LMS)-based finite impulse response (FIR) filter implemented by python.
- scripts: the scripts to synthesize multipliers and accelerators with Arizona State Predictive PDK (ASAP) 7nm process library in Synopsys Design Compiler (DC).
The goal of the method is to generate approximate multipliers based on the data distributions extracted from the target application with consideration of the input polarity.
The 'software' folder contains the MATLAB code of the method. Please follow the steps to generate multipliers:
-
Step-1: select the unsigned multiplier or the signed multiplier: sign = 0 or 1 in 'LogicCompress.m'.
-
Step-2: decide the number of rows of the partial products
$l$ to be compressed.
-
Step-3: extract the data distributions from the target application.
-
Step-4: combine data distributions of Step-3 in 'LogicCompress.m' (default: uniform).
-
Step-5: run 'LogicCompress.m' to generate '.mat'.
-
Step-6: find a a control parameter
$\lambda$ for a given desired percent reduction of area$R$ by 'findLamb.m'. -
Step-7: run 'GA.m' to solve the optimization objective and directly generate Verilog and C models of multipliers.
By modifying the number of rows of the partial products to be compressed, reversing the input polarity, or adding different
V. Mrazek, R. Hrbacek, Z. Vasicek and L. Sekanina, "EvoApprox8b: Library of Approximate Adders and Multipliers for Circuit Design and Benchmarking of Approximation Methods," Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, 2017, pp. 258-261, doi: 10.23919/DATE.2017.7926993.
V. Mrazek, Z. Vasicek, L. Sekanina, H. Jiang and J. Han, "Scalable Construction of Approximate Multipliers With Formally Guaranteed Worst Case Error," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 26, no. 11, pp. 2572-2576, Nov. 2018, doi: 10.1109/TVLSI.2018.2856362.
- DRUM
S. Hashemi, R. I. Bahar and S. Reda, "DRUM: A Dynamic Range Unbiased Multiplier for approximate applications," 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2015, pp. 418-425, doi: 10.1109/ICCAD.2015.7372600.
Available at: https://github.com/scale-lab/DRUM and https://github.com/phyzhenli/DRUM.
Multipliers for three different-scale quantized DNNs including LeNet, AlexNet, and VGG16.
Multipliers with different settings of l and
Multipliers for the half-normal distribution applications.
Multipliers for an adaptive least mean square (LMS)-based finite impulse response (FIR) filter.
- AC
S. Venkatachalam and S. -B. Ko, "Design of Power and Area Efficient Approximate Multipliers," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 5, pp. 1782-1786, May 2017, doi: 10.1109/TVLSI.2016.2643639.
- CR
C. Liu, J. Han and F. Lombardi, "A low-power, high-performance approximate multiplier with configurable partial error recovery," 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2014, pp. 1-4, doi: 10.7873/DATE.2014.108.
- KMap
P. Kulkarni, P. Gupta and M. Ercegovac, "Trading Accuracy for Power with an Underdesigned Multiplier Architecture," 2011 24th International Conference on VLSI Design, 2011, pp. 346-351, doi: 10.1109/VLSID.2011.51.
- OU
C. Chen, S. Yang, W. Qian, M. Imani, X. Yin and C. Zhuo, "Optimally Approximated and Unbiased Floating-Point Multiplier with Runtime Configurability," 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD), 2020, pp. 1-9.
- RoBA
R. Zendegani, M. Kamal, M. Bahadori, A. Afzali-Kusha and M. Pedram, "RoBA Multiplier: A Rounding-Based Approximate Multiplier for High-Speed yet Energy-Efficient Digital Signal Processing," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 25, no. 2, pp. 393-401, Feb. 2017, doi: 10.1109/TVLSI.2016.2587696.
- SDLC
I. Qiqieh, R. Shafik, G. Tarawneh, D. Sokolov and A. Yakovlev, "Energy-efficient approximate multiplier design using bit significance-driven logic compression," Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017, 2017, pp. 7-12, doi: 10.23919/DATE.2017.7926950.
I. Haddadi, I. Qiqieh, R. Shafik, F. Xia, M. Al-hayanni and A. Yakovlev, "Run-time Configurable Approximate Multiplier using Significance-Driven Logic Compression," 2021 IEEE 39th International Conference on Computer Design (ICCD), 2021, pp. 117-124, doi: 10.1109/ICCD53106.2021.00029.
- TOSAM
S. Vahdat, M. Kamal, A. Afzali-Kusha and M. Pedram, "TOSAM: An Energy-Efficient Truncation- and Rounding-Based Scalable Approximate Multiplier," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 27, no. 5, pp. 1161-1173, May 2019, doi: 10.1109/TVLSI.2018.2890712.
- PPAM
G. Zervakis, K. Tsoumanis, S. Xydis, D. Soudris and K. Pekmestzi, "Design-Efficient Approximate Multiplication Circuits Through Partial Product Perforation," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 24, no. 10, pp. 3105-3117, Oct. 2016, doi: 10.1109/TVLSI.2016.2535398.
- Wallace
An exact multiplier implemented by Wallace Tree technique.
- DesignW
An exact multiplier implemented using Verilog star operator, which is usually built from DesignWare library in Synopsys Design Compiler tool.
ApproxFlow is a toolbox to evaluate the DNN accuracy with the approximate multiplier. In ApproxFlow, each approximate multiplier is described by a look-up table. A DNN is represented by a directional acyclic graph (DAG), where each vertex denotes a DNN layer and the edges indicate the data flow. When a vertex in the DAG is executed, its dependencies will be executed automatically.
Available at: https://github.com/FDU-ME-ARC/ApproxFlow.
- Systolic Array
Systolic Array (SA) is a popular accelerator adopted by Google Tensor Processing Unit (TPU). We implement a 16×16 SA. The top module is systolic_array and the multiplier can be changed in 'multiplier.v'. The names of the clock and the reset signals are 'clk' and 'rst_n' respectively.
- Systolic Cube
Systolic Cube (SC) is an efficient accelerator of convolution operations in DNNs. The top module is systolic_cube_without_fifo and the multiplier can be changed in 'mad_unit_test.v'. The names of the clock and the reset signals are 'iClk' and 'iRst' respectively.
- TASU
TASU is a DNN accelerator for DoReFa-Net. The top module is conv0 and the multiplier can be changed in "mad_unit_test.v". The names of the clock and the reset signals are 'clk' and 'rst_n' respectively.
The files 'constraints_comb.tcl' and 'constraints_seq.tcl' work for synthesis of multipliers and accelerators respectively.