# Multiplierless Design of Low-Complexity and High-Speed DSP Systems



Paulo Flores and José Monteiro, INESC-ID/IST TU Lisbon, Portugal Levent Aksoy, INESC-ID, Portugal Eduardo Costa, UCPel, Brazil



#### **Multiplierless Design of Constant Multiplications**

Multiplication of data samples with constant coefficients is a ubiquitous operation and performance bottleneck in Digital Signal Processing (DSP) systems such as, digital Finite Impulse Response (FIR) filters and linear DSP transforms.



### **Optimization of Gate-Level Area Under a Delay Constraint**

The delay in constant multiplications is generally defined as the maximal number of operations in series, known as the number of adder-steps.

The **problem of optimizing gate-level area under a delay constraint** is to find a set of **A-operations** that does not violate the delay constraint and that yields a design with optimal gate-level area [3].

The high-level algorithms [3,4] can explore the tradeoff between area and delay in constant multiplications by changing the delay constraint.

(a) Multiple Constant Multiplications (MCM) frequently occur in FIR filters; (b) Constant Matrix Vector Multiplication (CMVM) exists in linear DSP transforms.

Since the constant coefficients are determined beforehand by DSP algorithms, constant multiplications can be realized using only **adders/subtracters and shifts**.

A-operation: w = A(u,v) =  $|u << l_1 + (-1)^s v << l_2| >> r = |2^{l_1}u + (-1)^s 2^{l_2}v|2^{-r}$ w: output u, v: inputs  $l_1$ ,  $l_2$ : left shifts, r: right shift, s: sign (0 or 1)

The **fundamental optimization problem** is to find the minimum number of **A-operations** that realize the constant multiplications [1,2].

The high-level algorithms [1,2] aim to **maximize the sharing of partial products** among the constant multiplications.



# **Optimization of Gate-Level Area In High-Throughput Constant Multiplications**

The throughput in multiplierless design of constant multiplications is increased using pipeline registers.

The **optimization problem** is to find a set of **A-operations** that yields a pipelined design with optimal area at gate-level [5].



Pipelined designs of 59x and 89x: (a) Exact GB algorithm [2]; (b) The algorithm of [5].

## **Digit-Serial Design of Constant Multiplications**

In digit-serial design, the data is divided into **d** bits and is processed one at a time.





Shift-adds designs of 59x and 89x: (a) Digit-based recoding; (b) Exact Common Subexpresion Elimination (CSE) algorithm [1]; (c) Exact Graph-Based (GB) algorithm [2];

Optimization of the number of operations does not guarantee a design with optimal area at gate-level.

## **Optimization of Gate-Level Area in Constant Multiplications**

Each addition and subtraction operation realizing a constant multiplication has a different gate-level implementation cost [3].



(a) An addition  $u + 2^{l_2}v$ ; (b) A subtraction  $2^{l_1}u - v$ ; (c) A subtraction  $(u - v)2^{-r}$  all under unsigned input.

The gate-level area optimization problem is to find a set of **A-operations**, that violds a design with optimal gate level area, realizing constant multiplications [2]

The digit-serial operations when the digit size *d* is 3: (a) an addition operation; (b) a subtration operation; (c) a left shift by 4 times.

The **problem of optimizing gate-level area in digit-serial design** is to find a set of **A-operations** that yields a digit-serial design with optimal gate-level area [6,7].

#### Design of a digit-serial FIR filter under different digit sizes.

| d  | Shift-Adds Design [6] |            |           |            |             | Design with Constant Multipliers [6] |            |           |            |             |
|----|-----------------------|------------|-----------|------------|-------------|--------------------------------------|------------|-----------|------------|-------------|
|    | area (mm²)            | delay (ns) | lat. (ns) | power (µW) | energy (fJ) | area (mm²)                           | delay (ns) | lat. (ns) | power (µW) | energy (fJ) |
| 1  | 201,7                 | 5,5        | 190,8     | 0,503      | 95,947      | 252,0                                | 4,0        | 139,0     | 0,619      | 86,010      |
| 2  | 214,8                 | 6,2        | 110,9     | 0,593      | 65,752      | 264,8                                | 5,8        | 104,2     | 0,706      | 73,579      |
| 4  | 228,9                 | 6,9        | 62,5      | 0,694      | 43,347      | 269,7                                | 6,9        | 62,3      | 0,779      | 48,516      |
| 8  | 281,1                 | 7,7        | 38,5      | 0,923      | 35,536      | 377,9                                | 12,0       | 60,0      | 1,023      | 61,380      |
| 16 | 322,9                 | 9,9        | 9,9       | 1,060      | 10,494      | 439,0                                | 9,0        | 36,0      | 1,220      | 43,920      |

# Acknowledgment

This work was partially supported by the Portuguese Foundation for Science and Technology (FCT) research project "Multicon - Architectural Optimization of DSP Systems with Multiple Constants Multiplications" PTDC/EIA-EIA/103532/2008. http://algos.inesc-id.pt/multicon/

#### yields a design with optimal gate-level area, realizing constant multiplications [3].



Summary of gate-level area results: (a) on MCM instances {3]; (b) on CMVM instances [4].

#### References

[1] L. Aksoy, E. Costa, P. Flores, J. Monteiro 'Exact and Approximate Algorithms for the Optimization of Area and Delay in Multiple Constant Multiplications', IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 27, no. 6, pp. 1013-1026, 2008.

[2] L. Aksoy, E. O. Gunes, P. Flores 'Search Algorithms for the Multiple Constant Multiplications Problem: Exact and Approximate', Elsevier Journal on Microprocessors and Microsystems: Embedded Hardware Design, vol. 34, no. 5, pp. 151-162, 2010.

[3] L. Aksoy, E. Costa, P. Flores, J. Monteiro 'Finding the Optimal Tradeoff Between Area and Delay in Multiple Constant Multiplications', Elsevier Journal on Microprocessors and Microsystems: Embedded Hardware Design, vol. 35, no.8, pp. 729-741, 2011.

[4] L. Aksoy, E. Costa, P. Flores, J. Monteiro 'Optimization Algorithms for the Multiplierless Realization of Linear Transforms', ACM Transactions on Design Automation of Electronic Systems, vol. 17, no. 1, Article 3, 2012.
[5] L. Aksoy, E. Costa, P. Flores, J.Monteiro 'Optimization of Gate-Level Area in High Throughput Multiple Constant Multiplications', European Conference on Circuit Theory and Design, pp. 588-591, 2011.

[6] L. Aksoy, C. Lazzari, E. Costa, P. Flores, J. Monteiro 'High-Level Algorithms for the Optimization of Gate-Level Area in Digit-Serial Multiple Constant Multiplications', Elsevier Integration, the VLSI Journal, to appear, 2012.
[7] L. Aksoy, C. Lazzari, E. Costa, P. Flores, J. Monteiro 'Design of Digit-Serial FIR Filters: Algorithms, Architectures, and a CAD Tool', IEEE Transactions on Very Large Scale Integration Systems, to appear, 2012.