COPYRIGHT NOTICE (as suggested by IEEE)
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Thesis, Books and Books Chapters

6 references, last updated Tue Jan 20 17:02:18 2015

Aksoy-SpringerBookChap12.pdf Levent Aksoy, Eduardo Costa, Paulo Flores, and José Monteiro.
"VLSI-SoC: The Advanced Research for Systems on Chip", volume 379 of IFIP Advances in Information and Communication Technology, chapter Multiplierless Design of Linear DSP Transform, pages 73-93. Springer, S. Mir, C.-Y. Tsui, R. Reis and O.C.S. Choy edition, July 2012. (doi:10.1007/978-3-642-32770-4_5)
Abstract
The last two decades have seen tremendous effort on the development of high-level algorithms for the multiplierless design of constant multiplications with data samples, i.e., using only addition, subtraction, and shift operations. Among different types of constant multiplications, the multiplication of a constant matrix with an input vector, i.e., the Constant Matrix-Vector Multiplication (CMVM) operation, is the most general case and occurs in many Digital Signal Processing (DSP) systems. However, it has received less attention than it deserves. Hence, this chapter addresses the problem of minimizing the number of addition and subtraction operations in the multiplierless realization of a CMVM operation and introduces a hybrid algorithm. This chapter also describes how the hybrid algorithm can be modified to handle a delay constraint. The experimental results on a comprehensive set of instances show the efficiency of the hybrid algorithms, at both high-level and gatelevel, in comparison to previously proposed algorithms.

Keywords: DSP Transforms; SCM, MCM, CAVM, CMVM; Hybrid Algorithms

Lazzari-SpringerBookChap11.pdf Cristiano Lazzari, Jorge Fernandes, Paulo Flores, and José Monteiro.
"Integrated Circuit and System Design. Power and Timing Modeling, Optimization, and Simulation", volume 6448 of Lecture Notes in Computer Science, chapter An Efficient Low Power Multiple-Value Look-up Table Targeting Quaternary FPGAs, pages 84-93. Springer, René van Leuken and Gilles Sicard edition, September 2011. (doi:10.1007/978-3-642-17752-1_9)
Abstract
FPGA structures are widely used as they enable early time-to-market and reduced non-recurring engineering costs in comparison to ASIC designs. Interconnections play a crucial role in modern FPGAs, because they dominate delay, power and area. Multiple-valued logic allows the reduction of the number of interconnections in the circuit, hence can serve as a mean to effectively curtail the impact of interconnections. In this work we propose a new look-up table structure based on a low-power high-speed quaternary voltage-mode device. The most important characteristics of the proposed architecture are that it is a voltage-mode structure, which allows reduced power consumption, and it is implemented with a standard CMOS technology. Our quaternary implementation overcomes previous proposed techniques with simple and efficient CMOS structures. Moreover, results show significant reductions on power consumption and timing in comparison to binary implementations with similar functionality.

Keywords: Multiple-value Logic, Quaternary Logic, Look-up Tables, FPGAs, Standard CMOS Technology

Aksoy-Emicro09.pdf Levent Aksoy, Eduardo Costa, Paulo Flores, and José Monteiro.
"Advanced Topics on VLSI Design", chapter Optimization Algorithms for Multiple Constant Multiplications, pages 71-99. Number 2 in Inovation Series. UFRGS, Ricardo Reis edition, January 2009.
Abstract
We describe efficient parallel architecture implemententations of digital processing systems, namely FIR filters, that require the multiplication of each input sample by a set of constant coefficients. These architectures allow for significant reductions in hardware, and consequently power, by sharing the partial products of the input among the set of multiplications. We present an exact algorithm for common subexpression elimination that finds the optimum sharing of partial terms in Multiple Constant Multiplications (MCM). We model this problem as a Boolean network that covers all possible partial terms which may be used to generate the set of coefficients in the MCM instance. This problem is cast into a 0-1 Integer Linear Programming (ILP) problem by requiring that the single output of this network is asserted while minimizing the number of gates representing operations in the MCM implementation that evaluate to one. A SAT-based 0-1 ILP solver is used to obtain the exact solution. We argue that for many real problems the size of the problem is within the capabilities of current SAT solvers. Because performance is often a primary design parameter, we describe how this algorithm can be modified to target the minimum area solution under a user-specified delay constraint. Additionally, we describe approximate algorithms based on the exact approach with extremely competitive results. We have applied these algorithms on the design of digital filters and present a comprehensive set of results, that evaluate ours and existing approximation schemes against exact solutions.

Keywords: FIR Filter Design, Number Representation, Delay Optimization, Multiple Constant Multiplications

Silva-SCEE07.pdf João M. S. Silva, Jorge Fernández Villena, Paulo Flores, and L. Miguel Silveira.
"Scientific Computing in Electrical Engineering", volume 11 of Mathematics in Industry, chapter Outstanding Issues in Model Order Reduction, pages 139-152. Springer Verlag, Berlin Heidelberg, Gabriela Ciuprina and Daniel Ioan editors edition, May 30, 2007. (doi:10.1007/978-3-540-71980-9_13)
Abstract
With roots dating back to many years ago and applications in a wide variety of areas, model order reduction has emerged in the last few decades as a crucial step in the simulation, control, and optimization of complex physical systems. Reducing the order or dimension of models of such systems, is paramount to enabling their simulation and verification. While much progress has been achieved in the last few years regarding the robustness, efficiency and applicability of these techniques, certain problems of relevance still pose difficulties or renewed challenges that are not satisfactorily solved with the existing approaches. Furthermore, new applications for which dimension reduction is crucial, are becoming increasingly relevant, raising new issues in the quest for increased performance.

Keywords: Model order reduction, massively coupled systems, orthogonal projection, parametric systems, circuit simulation.

Flores-PhD01.pdf Paulo Flores.
"Models and Algorithms for Optimization Problems in Digital Circuits Testing". PhD thesis, Instituto Superior Técnico - Universidade Técnica de Lisboa, December 11, 2001.
Abstract
The field of automatic test pattern generation (ATPG) presents a large number of challenging optimization problems of key significance that impact testing time, built-in self-test (BIST), power dissipation, among others. Unfortunately, the vast majority of these problems are most often solved using heuristic approaches that do not guarantee an optimum solution. Discrete algorithms, in particular satisfiability search algorithms (SAT), are a promising technique for solving optimization problems cast as integer linear programming (ILP) instances, however, for most ATPG problems no formal optimization models existed, so far. In this dissertation we derive several optimization models for ATPG problems concerning test set compaction, BIST and power dissipation. We present models for test set compaction which minimize the total number of test patterns that detect all detectable faults in a combinational circuit. We define a new optimization model for computing test patterns with don't cares, that identifies test patterns with the least number of specified input assignments that detect a target fault. We propose a model to identify a minimal BIST test generator circuit, which detects all detectable faults, assuming that some primary circuit inputs can be declared equivalent for testing purposes. Finally, we develop a model for optimum pattern sequence reordering and don't care assignment that impacts directly the power dissipation during test set application. We have implemented all the models and algorithms that have reasonable representation size and present results using the ISCAS and IWLS benchmark circuits which confirm the practical applicability of our models.

Keywords: Automatic test pattern generation (ATPG); Optimization models; Satisfiability (SAT); Integer Linear Programming (ILP); Test compaction/compression; Built-In Self-Test (BIST); Power reduction

Flores-MSc93.pdf Paulo Flores.
"Especificação funcional de sistemas electrónicos digitais em ambiente de síntese". Master's thesis, Instituto Superior Técnico - Universidade Técnica de Lisboa, June 1993.
Abstract
The project of integrated circuits in a computer aided design environment traditionally begins with the use of a schematic editor. In this way, a representation, at the logic level, of the circuit to be implemented is supplied to the production system. With the growing complexity and dimension of the circuits to be designed, this method tends to be time consuming and error prone. Only the specification of circuits in levels of abstraction higher than the logic level, allows the designer to deal with the complexity of the actual circuits. The use of hardware description languages is the most common way to specify circuits at an higher level of abstraction, making possible the description of the behavior of the circuit without concern about the details of the logic level implementation. This representation also permits a verification of its functionality (simulation) and the realization of ``an automatic transformation'' in order to obtain the circuit represented as a set of cells of a given library (synthesis). Nevertheless, the performance of the synthesized circuit depends both on the quality of the synthesis tool and on the style of description used. In this work a methodology for the design of digital circuits in a synthesis environment is presented, using the IEEE standard language (VHDL) for the specification of the circuit. Special attention is given to how a VHDL description should be written so that it can be synthesized, and so that the resulting circuit has the required performance. A subset of VHDL instructions which can be used in a synthesis environment is presented and a description style proposed which, although restrictive, it tries to ``guarantee'' the quality of the final circuit. The design methodology presented and the style of description proposed were validated through the development of a VHDL library and the design of a medium-sized digital circuit.

Keywords: Digital integrated circuits; Automatic synthesis; Hardware description languages; VHDL; Description styles for synthesis; Synthesized library of circuits

International Journals

20 references, last updated Thu Aug 18 17:12:03 2016

Aksoy-EURASIP16.pdf Levent Aksoy, Paulo Flores, and José Monteiro.
"A novel method for the approximation of multiplierless constant matrix vector multiplication". EURASIP Journal on Embedded Systems, 2016(1):1-11, 2016. (doi:10.1186/s13639-016-0033-y)
Abstract
Since human beings have limited perceptual abilities, in many digital signal processing (DSP) applications, e.g., image and video processing, the outputs do not need to be computed accurately. Instead, their computation can be approximated so that the area, delay, and/or power dissipation of the design can be reduced. This paper presents an approximation algorithm, called aura, for the multiplierless design of the constant matrix vector multiplication (CMVM) which is a ubiquitous operation in DSP systems. aura aims to tune the constants such that the resulting matrix leads to a CMVM design which requires the fewest adders/subtractors, satisfying the given error constraints. This paper also introduces its modified version, called aura-dc, which can reduce the delay of the CMVM operation with a small increase in the number of adders/subtractors. Experimental results show that the proposed algorithms yield significant reductions in the number of adders/subtractors with respect to the original realizations without violating the error constraints, and consequently lead to CMVM designs with less area, delay, and power dissipation. Moreover, they can generate alternative CMVM designs under different error constraints, enabling a designer to choose the one that fits best in an application.

Keywords: Approximate computing, constant matrix vector multiplication, multiplierless design, 0-1 integer linear programming, area and delay optimization, discrete cosine transform.

Neves-TVLSI15.pdf Nuno Neves, Nuno Sebastião, David Matos, Pedro Tomás, Paulo Flores, and Nuno Roma.
"Multicore SIMD ASIP for next-generation sequencing and alignment biochip platforms". IEEE Transactions on Very Large Scale Integration (VLSI) Systems (TVLSI), 23(7):1287-1300, July 2015. (doi:10.1109/TVLSI.2014.2333757)
Abstract
Targeting the development of new biochip platforms capable of autonomously sequencing and aligning biological sequences, a new multicore processing structure is proposed in this manuscript. This multicore structure makes use of a shared memory model and multiple instantiations of a novel application-specific instruction-set processor (ASIP) to simultaneously exploit both fine and coarse-grained parallelism and to achieve high performance levels at low-power consumption. The proposed ASIP is built by extending the instruction set architecture of a synthesizable processor, including both general and special-purpose single-instruction multiple-data instructions. This allows an efficient exploitation of fine-grained parallelism on the alignment of biological sequences, achieving over 30x speedup when compared with sequential algorithmic implementations. The complete system was prototyped on different field-programmable gate array platforms and synthesized with a 90-nm CMOS process technology. Experimental results demonstrate that the multicore structure scales almost linearly with the number of instantiated cores, achieving performances similar to a quad-core Intel Core i7 3820 processor, while using 25x less energy.

Keywords: Application Specific Instruction-Set Architecture; Single-Instruction Multiple-Data; Multi-Core Architecture, Biological Sequences Alignment; Biochip Platforms.

Brito-TVLSI15.pdf Diogo Brito, Taimur Rabuske, Jorge Fernandes, Paulo Flores, and José Monteiro.
"Quaternary logic look-up table in standard CMOS". IEEE Transactions on Very Large Scale Integration (VLSI) Systems (TVLSI), 23(2):306-316, February 2015. (doi:10.1109/TVLSI.2014.2308302)
Abstract
Interconnections are increasingly the dominant contributor to delay, area and energy consumption in CMOS digital circuits. Multiple-valued logic can decrease the average power required for level transitions and reduces the number of required interconnections, hence also reducing the impact of interconnections on overall energy consumption. In this paper, we propose a quaternary lookup table (LUT) structure, designed to replace or complement binary LUTs in field programmable gate arrays. The circuit is compatible with standard CMOS processes, with a single voltage supply and employing only simple voltage-mode structures. A clock boosting technique is used to optimize the switches resistance and power consumption. The proposed implementation overcomes several limitations found in previous quaternary implementations published so far, such as the need for special features in the CMOS process or power-hungry current-mode cells. We present a full adder prototype based on the designed LUT, fabricated in a standard 130-nm CMOS technology, able to work at 100 MHz while consuming 122 W. The experimental results demonstrate the correct quaternary operation and confirm the power efficiency of the proposed design.

Keywords: Field-Programmable Gate Array (FPGA); Lookup Table (LUT); Multiple-Valued Logic (MVL); Quaternary Logic; Standard CMOS Technology.

Aksoy-TSP15.pdf Levent Aksoy, Paulo Flores, and José Monteiro.
"Exact and approximate algorithms for the filter design optimization problem". IEEE Transactions on Signal Processing, 63(1):142-154, January 2015. (doi:10.1109/TSP.2014.2366713)
Abstract
The filter design optimization (FDO) problem is For Revi defined as finding a set of filter coefficients which yields a filter design with minimum complexity, satisfying the filter constraints. It has received a tremendous interest due to the widespread application of filters. Assuming that the coefficient multiplications in the filter design are realized under a shift-adds architecture, the complexity is generally defined in terms of the total number of adders and subtractors. In this article, we present an exact FDO algorithm that can guarantee the minimum design complexity under the minimum quantization value, but can only be applied to filters with a small number of coefficients. We also introduce an approximate algorithm that can handle filters with a large number of coefficients using less computational resources than the exact FDO algorithm and find better solutions than existing FDO heuristics. We describe how these algorithms can be modified to handle a delay constraint in the shift-adds designs of the multiplier blocks and to target different filter constraints and filter forms. Experimental results show the effectiveness of the proposed algorithms with respect to prominent FDO algorithms and explore the impact of design parameters, such as the filter length, quantization value, and filter form, on the complexity and performance of filter designs.

Keywords: Finite Impulse Response (FIR) filters; Direct and trans- posed forms; Filter design optimization problem; Multiplierless design; Depth-first and local search methods; Delay reduction; Filter design and structures; DSP Algorithm implementation in hardware and software.

Aksoy-TODAES14.pdf Levent Aksoy, Paulo Flores, and José Monteiro.
"Multiplierless design of folded DSP blocks". ACM Transactions on Design Automation of Electronic Systems (TODAES), 20(1):14:1-14:24, November 2014. (doi:10.1145/2663343)
Abstract
This article addresses the problem of optimizing the implementation cost of the time-multiplexed constant multiplication (TMCM) operation which realizes the multiplication of an input variable by a single constant selected from a set of multiple constants at a time. It presents an efficient algorithm, called ORPHEUS, which finds a multiplierless TMCM design by sharing logic operators, i.e., adders, subtractors, adders/subtractors, and multiplexors (MUXes). Moreover, this article introduces folded design architectures for the digital signal processing (DSP) blocks, such as finite impulse response (FIR) filters and linear DSP transforms, and describes how these folded DSP blocks can be efficiently realized using TMCM operations optimized by ORPHEUS. Experimental results indicate that ORPHEUS can find better solutions than existing TMCM algorithms, yielding TMCM designs requiring less area. They also show that the folded architectures lead to alternative designs with significantly less area, but incurring an increase in latency and energy consumption, compared to the parallel architecture.

Keywords: Time-multiplexed constant multiplication; Folded architecture, multiplierless design; Area optimization; Finite Impulse Response (FIR) filter, Discrete Cosine Transform (DCT); Integer Cosine Transform (ICT); Folded architectures.

Aksoy-JCSSP14.pdf Levent Aksoy, Paulo Flores, and José Monteiro.
"A tutorial on multiplierless design of FIR filters: Algorithms and architectures". Journal of Circuits, Systems, and Signal Processing, 33(6):1689-1719, June 2014. (doi:10.1007/s00034-013-9727-8)
Abstract
Finite impulse response (FIR) filtering is a ubiquitous operation in digital signal processing systems and is generally implemented in full custom circuits due to high-speed and low-power design requirements. The complexity of an FIR filter is dominated by the multiplication of a large number of filter coefficients by the filter input or its time-shifted versions. Over the years, many high-level synthesis algorithms and filter architectures have been introduced in order to design FIR filters efficiently. This article reviews how constant multiplications can be designed using shifts and adders/subtractors that are maximally shared through a high-level synthesis algorithm based on some optimization criteria. It also presents different forms of FIR filters, namely, direct, transposed, and hybrid and shows how constant multiplications in each filter form can be realized under a shift-adds architecture. More importantly, it explores the impact of the multiplierless realization of each filter form on area, delay, and power dissipation of both custom (ASIC) and reconfigurable (FPGA) circuits by carrying out experiments with different bitwidths of filter input, design libraries, reconfigurable target devices, and optimization criteria in high-level synthesis algorithms.

Keywords: FIR filter;Direct, transposed, and hybrid forms; Multiplierless design; High-level synthesis; Area and delay optimization; Custom and reconfigurable circuits.

Sebastiao-CCPE13.pdf Nuno Sebastião, Nuno Roma, and Paulo Flores.
"Configurable and scalable class of high performance hardware accelerators for simultaneous DNA sequence alignment". Concurrency and Computation: Practice and Experience, 25(10):1319-1339, July 2013. Special Issue: High performance computing and simulation: architectures, systems, algorithms, technologies, services, and applications. (doi:10.1002/cpe.2934)
Abstract
A new class of efficient and flexible hardware accelerators for DNA local sequence alignment based on the widely used Smith-Waterman algorithm is proposed in this paper. This new class of accelerating structures exploits an innovative technique that tracks the origin coordinates of the best alignment to allow a significant reduction of the size of the dynamic programming matrix that needs to be recomputed during the subsequent traceback phase, providing a considerable reduction of the resulting time and memory requirements. The significant performance of the enhanced class of accelerators is attained by also providing support for an additional level of parallelism: the capability to concurrently align several query sequences with one or more reference sequences, according to the specific application requisites. Moreover, the accelerator class also includes specially designed processing elements that improve the resource usage when implemented in a Field Programmable Gate Array (FPGA), and easily provide several different configurations in an Application Specific Integrated Circuit (ASIC) implementation. Obtained results demonstrated that speedups as high as 278 can be obtained in ASIC accelerating structures. A FPGA-based prototyping platform, operating at a 40 times lower clock frequency and incorporating a complete alignment embedded system, still provides significant speedups as high as 27, compared with a pure software implementation.

Keywords: Hardware accelerator; DNA; Local Sequence Alignment; Traceback; FPGA; ASIC.

Aksoy-TVLSI13.pdf Levent Aksoy, Cristiano Lazzari, Eduardo Costa, Paulo Flores, and José Monteiro.
"Design of digit-serial FIR filters: Algorithms, architectures, and a CAD tool". IEEE Transactions on Very Large Scale Integration (VLSI) Systems (TVLSI), 21(3):498-511, March 2013. (doi:10.1109/TVLSI.2012.2188917)
Abstract
In the last two decades, many efficient algorithms and architectures have been introduced for the design of low-complexity bit-parallel multiple constant multiplications (MCM) operation which dominates the complexity of many digital signal processing systems. On the other hand, little attention has been given to the digit-serial MCM design that offers alternative low-complexity MCM operations albeit at the cost of an increased delay. In this paper, we address the problem of optimizing the gate-level area in digit-serial MCM designs and introduce high-level synthesis algorithms, design architectures, and a computer-aided design tool. Experimental results show the efficiency of the proposed optimization algorithms and of the digit-serial MCM architectures in the design of digit-serial MCM operations and finite impulse response filters.

Keywords: 0-1 Integer Linear Programming (ILP); Digit-Serial Arithmetic; Finite Impulse Response (FIR) Filters; Gate-Level Area Optimization; Multiple Constant Multiplications.

Sebastiao-TVLSI12.pdf Nuno Sebastião, Nuno Roma, and Paulo Flores.
"Integrated hardware architecture for efficient computation of the n-best bio-sequence local alignments in embedded platforms". IEEE Transactions on Very Large Scale Integration (VLSI) Systems (TVLSI), 20(7):1262 - 1275, July 2012. (doi:10.1109/TVLSI.2011.2157541)
Abstract
A flexible hardware architecture that implements a set of new and efficient techniques to significantly reduce the computational requirements of the commonly used Smith-Waterman sequence alignment algorithm is presented. Such innovative techniques use information gathered by the hardware accelerator during the computation of the alignment scores to constrain the size of the subsequence that has to be post-processed in the traceback phase using a general purpose processor (GPP). Moreover, the proposed structure is also capable of computing the n-best local alignments according to the Waterman-Eggert algorithm, becoming the first hardware architecture that is able to simultaneously evaluate the n-best alignments of a given sequence pair, by incorporating a set of ordering units that work in parallel with the systolic array. A complete alignment system was developed and implemented in a Virtex-4 FPGA, by integrating the proposed accelerator architecture with a Leon3 GPP. The obtained experimental results demonstrate that the proposed systm is flexible and allows the alignment of large sequences in memory constrained systems. As an example, a speedup of 17 was obtained with the conceived system when compared to a regular implementation of the LALIGN35 program running on an Intel Core2 Duo processor running at a 40 times higher frequency.

Keywords: DNA sequence alignment; Smith-Waterman (S-W); Waterman-Eggert (W-E); Hardware accelerator.

Aksoy-INTEGRATION12.pdf Levent Aksoy, Cristiano Lazzari, Eduardo Costa, Paulo Flores, and José Monteiro.
"High-level algorithms for the optimization of gate-level area in digit-serial multiple constant multiplications". Integration, the VLSI Journal, 45(3):294-306, June 2012. Special Issue of GLSVLSI 2011: Current Trends on VLSI and Ultra Low-Power Design. (doi:10.1016/j.vlsi.2011.11.008)
Abstract
The last two decades have seen tremendous effort on the development of high-level synthesis algorithms for efficient realization of the multiplication of a variable by a set of constants using only addition, subtraction, and shift operations. These algorithms generally target the minimization of the number of adders and subtractors, assuming that shifts are realized using only wires due to the bit-parallel processing of the input data. On the other hand, digit-serial architectures offer alternative low-complexity designs since digit-serial operators occupy less area and are independent of the data wordlength. However, in this case, shifts are no longer free in terms of hardware and require D flip-flops. Moreover, each digit-serial addition, subtraction, and shift operation has different implementation cost at gate-level. Hence, this article introduces high-level algorithms that optimize the area of digit-serial constant multiplications under the shift-adds architecture by taking into account the implementation cost of each operation at gate-level. Experimental results indicate that our high-level algorithms obtain better solutions than prominent algorithms designed for the minimization of the number of operations in terms of gate-level area and their solutions lead to less complex digit-serial MCM designs. It is also shown that the use of shift-adds architecture yields significant area reductions when compared to the constant multiplications designed using generic digit-serial constant multipliers.

Keywords: Multiple constant multiplications; Digit-serial architectures; Gate-level area optimization; 0-1 Integer linear programming; Graph-based algorithms.

Sebastiao-MICPRO12.pdf Nuno Sebastião, Nuno Roma, and Paulo Flores.
"Hardware accelerator architecture for simultaneous short-read DNA sequences alignment with enhanced traceback phase". Microprocessors and Microsystems: Embedded Hardware Design (MICPRO), 36(2):96-109, March 2012. Special Issue - Exploitation Of Hardware Accelerators. (doi:10.1016/j.micpro.2011.05.003)
Abstract
Dynamic programming algorithms are widely used to find the optimal sequence alignment between any two DNA sequences. This manuscript presents a new, flexible and scalable hardware accelerator architecture to speedup the implementation of the frequently used Smith-Waterman algorithm. When integrated with a general purpose processor, the developed accelerator significantly reduces the computation time and memory space requirements of alignment tasks. Such efficiency mainly comes from two innovative techniques that are proposed. First, the usage of the maximum score cell coordinates, gathered during the computation of the alignment scores in the matrix-fill phase, in order to significantly reduce the time and memory requirements of the traceback phase. Second, the exploitation of an additional level of parallelism in order to simultaneously align several query sequences with the same reference sequence, targeting the processing of short-read DNA sequences. The results obtained from the implementation of a complete alignment system based on the new accelerator architecture in a Virtex-4 FPGA showed that the proposed techniques are feasible and the developed accelerator is able to provide speedups as high as 16 for the considered test sequences. Moreover, it was also shown that the proposed approach allows the processing of larger DNA sequences in memory restricted environments.

Keywords: DNA; Hardware accelerator; Local sequence alignment; Traceback.

Aksoy-TODAES12.pdf Levent Aksoy, Eduardo Costa, Paulo Flores, and José Monteiro.
"Optimization algorithms for the multiplierless realization of linear transforms". ACM Transactions on Design Automation of Electronic Systems (TODAES), 17(1):3:1 - 3:27, January 2012. (doi:10.1145/2071356.2071359)
Abstract
This article addresses the problem of finding the fewest number of addition and subtraction operations in the multiplication of a constant matrix with an input vector, a fundamental operation in many linear digital signal processing transforms. We first introduce an exact Common Subexpression Elimination (CSE) algorithm that formalizes the minimization of the number of operations as a 0-1 integer linear programming problem. Since there are still instances that the proposed exact algorithm cannot handle due to the NP-completeness of the problem, we also introduce a CSE heuristic algorithm that iteratively finds the most common 2-term subexpressions with the minimum conflicts among the expressions. Furthermore, since the main drawback of CSE algorithms is their dependency on a particular number representation, we propose a hybrid algorithm that initially finds promising realizations of linear transforms using a numerical difference method, and then applies the proposed CSE algorithm to utilize the common subexpressions iteratively. The experimental results on a comprehensive set of instances indicate that the proposed approximate algorithms find competitive results with those of the exact CSE algorithm and obtain better solutions than prominent previously proposed heuristics. It is also observed that our solutions yield significant area reductions in the design of linear transforms after circuit synthesis compared to direct realizations of linear transforms.

Keywords: 0-1 Integer Linear Programming; Constant Matrix-Vector Multiplication; Common Subexpression Elimination; Numerical Difference Method.

Aksoy-MICPRO11.pdf Levent Aksoy, Eduardo Costa, Paulo Flores, and José Monteiro.
"Finding the optimal tradeoff between area and delay in multiple constant multiplications". Microprocessors and Microsystems: Embedded Hardware Design (MICPRO), 35(8):729-741, November 2011. (doi:10.1016/j.micpro.2011.08.009)
Abstract
Over the years many efficient algorithms for the multiplierless design of multiple constant multiplications (MCM) have been introduced. These algorithms primarily focus on finding the fewest number of addition/subtraction operations that generate the MCM. Although the complexity of an MCM design is decreased by reducing the number of operations, their solutions may not lead to an MCM design with optimal area at gate-level since they do not consider the implementation costs of the operations in hardware. This article introduces two approximate algorithms that aim to optimize the area of the MCM operation by taking into account the gate-level implementation of each addition and subtraction operation which realizes a constant multiplication. To find the optimal tradeoff between area and delay, the proposed algorithms are further extended to find an MCM design with optimal area under a delay constraint. Experimental results clearly indicate that the solutions of the proposed algorithms lead to significantly better MCM designs at gate-level when compared to those obtained by the solutions of algorithms designed for the optimization of the number of operations.

Keywords: Multiple constant multiplications; Addition and subtraction architectures; Gate-level area optimization; Delay aware area optimization; 0-1 integer linear programming; Graph-based algorithms.

Lazzari-JOLPE11.pdf Cristiano Lazzari, Jorge Fernandes, Paulo Flores, and José Monteiro.
"Low power multiple-value voltage-mode look-up table for quaternary field programmable gate arrays". Journal of Low Power Electronics (JOLPE), 7(2):294-301, April 2011. (doi:10.1166/jolpe.2011.1138)
Abstract
FPGA structures are widely used as they enable early time-to-market and reduced non-recurring engineering costs in comparison to ASIC designs. Interconnections play a crucial role in modern FPGAs, because they dominate delay, power and area. Multiple-valued logic allows the reduction of the number of interconnections in the circuit, hence can serve as a mean to effectively curtail the impact of interconnections. In this work we propose a new look-up table structure based on a low-power high-speed quaternary voltage-mode device. The most important characteristics of the proposed architecture are that it is a voltage-mode structure, which allows reduced power consumption, and it is implemented using a standard CMOS technology. Our quaternary implementation overcomes previous proposed techniques with simple and efficient CMOS structures. Moreover, results show significant reductions on power consumption and timing in comparison to binary implementations with similar functionality.

Keywords: Multiple-value Logic; Quaternary Logic; Look-up Tables; FPGAs; Standard CMOS Technology.

Aksoy-MICPRO10.pdf Levent Aksoy, Ece Olcay Gunes, and Paulo Flores.
"Search algorithms for the multiple constant multiplications problem: Exact and approximate". Microprocessors and Microsystems: Embedded Hardware Design (MICPRO), 34(5):151-162, August 2010. Special issue on selected papers from NORCHIP 2008. (doi:10.1016/j.micpro.2009.10.001)
Abstract
This article addresses the multiplication of one data sample with multiple constants using addition/subtraction and shift operations, i.e., the multiple constant multiplications (MCM) operation. In the last two decades, many efficient algorithms have been proposed to implement the MCM operation using the fewest number of addition and subtraction operations. However, due to the NP-hardness of the problem, almost all the existing algorithms have been heuristics. The main contribution of this article is the proposal of an exact depth-first search algorithm that, using lower and upper bound values of the search space for the MCM problem instance, finds the minimum solution consuming less computational resources than the previously proposed exact breadth-first search algorithm. We start by describing the exact breadth-first search algorithm that can be applied on real mid-size instances. We also present our recently proposed approximate algorithm that finds solutions close to the minimum and is able to compute better bounds for the MCM problem. The experimental results clearly indicate that the exact depth-first search algorithm can be efficiently applied to large size hard instances that the exact breadth-first search algorithm cannot handle and the heuristics can only find suboptimal solutions.

Keywords: Multiple constant multiplications problem; Depth-first search; Breadth-first search; Graph-based algorithms; Finite Impulse Response filters.

Morgado-TODAES09.pdf P. Marques Morgado, Paulo F. Flores, and L. Miguel Silveira.
"Generating realistic stimuli for accurate power grid analysis". ACM Transactions on Design Automation of Electronic Systems (TODAES), 14(3):40:1-40:26, May 2009. Special issue on Demonstrable Sfotware Systems and Hardware Platforms II. (doi:10.1145/1529255.1529262)
Abstract
Power Analysis tools are an integral component of any current power sign-off methodology. The performance of a design's power grid affects the timing and functionality of a circuit directly impacting the overall performance. Ensuring power grid robustness implies taking into account, among others, static and dynamic effects of voltage drop, ground bounce and electromigration. This type of verification is usually done by simulation, targeting a worst-case scenario where devices, switching almost simultaneously, could impose stern current demands on the power grid. While determination of the exact worst-case switching conditions from the grid perspective is usually not practical, the choice of simulation stimuli has a critical effect on the results of the analysis. Targetting safe but unrealistic settings could lead to pessimistic results and costly over-designs in terms of die area. In this paper we describe a software tool that generates a reasonable, realistic, set of stimuli for simulation. The approach proposed accounts for timing and spatial restrictions that arise from the circuit's netlist and placement and generates an approximation to the worst-case condition. The resulting stimuli indicates that only a fraction of the gates change in any given timing window, leading to a more robust verification methodology, specially in the dynamic case. Generating such stimuli is akin to performing a standard static timing analysis, so the tool fits well within conventional design frameworks. Furthermore, the tool can be used for hot-spot detection in early design stages.

Keywords: Power grid; Verification; Simulation; Voltage drop; Ground bounce; Stimuli Generationy.

Aksoy-TCADICS08.pdf Levent Aksoy, Eduardo da Costa, Paulo Flores, and José Monteiro.
"Exact and approximate algorithms for the optimization of area and delay in multiple constant multiplications". IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 27(6):1013-1026, June 2008. (doi:10.1109/TCAD.2008.923242)
Abstract
The main contribution of this paper is an exact common subexpression elimination algorithm for the optimum sharing of partial terms in multiple constant multiplications (MCMs). We model this problem as a Boolean network that covers all possible partial terms that may be used to generate the set of coefficients in the MCM instance. We cast this problem into a 0ó1 integer linear programming (ILP) problem by requiring that the single output of this network is asserted while minimizing the number of gates representing operations in the MCM implementation that evaluate to one. A satisfiability (SAT)-based 0ó1 ILP solver is used to obtain the exact solution. We argue that for many real problems, the size of the problem is within the capabilities of current SAT solvers. Because performance is often a primary design parameter, we describe how this algorithm can be modified to target the minimum area solution under a user-specified delay constraint. Additionally, we propose an approximate algorithm based on the exact approach with extremely competitive results. We have applied these algorithms on the design of digital filters and present a comprehensive set of results that evaluate ours and existing approximation schemes against exact solutions under different number representations and using different SAT solvers.

Keywords: Canonical signed digit (CSD); common subexpression elimination (CSE); delay constraints; minimal signed digit (MSD); multiple constant multiplications (MCMs); pseudo- Boolean optimization (PBO).

Gil-JSAT08.pdf Luís Gil, Paulo Flores, and Luís Miguel Silveira.
"PMSat: a parallel version of MiniSAT". Journal on Satisfiability, Boolean Modeling and Computation (JSAT), 6:71-98, 2008.
Abstract
Parallel computing has become an affordable reality forcing a shift in the programming paradigm from sequential to concurrent applications, specially those who demand much computational power or with large search spaces like SAT-solvers. In this context we present the research, planning and implementation of PMSat: a parallel version of MiniSAT with MPI (Message Passing Interface) technology, to be executed in clusters or grids of computers. The main features of the program are described: search modes, assumptions pruning and share of learnt clauses. An analysis of its performance and load balance is also presented.

Keywords: Parallel Computing; SAT-solver; Satisfiability; Message Passing Interface.

Flores-TODAES01.pdf Paulo F. Flores, Horácio C. Neto, and João P. Marques-Silva.
"An exact solution to the minimum size test pattern problem". ACM Transactions on Design Automation of Electronic Systems (TODAES), 6(4):629-644, October 2001. (doi:10.1145/502175.502186)
Abstract
This article addresses the problem of test pattern generation for single stuck-at faults in combinational circuits, under the additional constraint that the number of specified primary input assignments is minimized. This problem has different applications in testing, including the identification of ``don't care'' conditions to be used in the synthesis of Built-In Self-Test (BIST) logic. The proposed solution is based on an integer linear programming (ILP) formulation which builds on an existing Propositional Satisfiability (SAT) model for test pattern generation. The resulting ILP formulation is linear on the size of the original SAT model for test generation, which is linear on the size of the circuit. Nevertheless, the resulting ILP instances represent complex optimization problems, that require dedicated ILP algorithms. Preliminary results on benchmark circuits validate the practical applicability of the test pattern minimization model and associated ILP algorithm.

Keywords: Automatic Test Pattern Generation (ATPG); Built-In Self-Test (BIST); Integer Linear Programming (ILP); propositional satisfiability (SAT); verification and test.

Sousa-Ingenium92.pdf Ana Sousa, José Abreu, and Paulo Flores.
"VHDL novo suporte às metodologias de CAD". Ingenium. Revista da Ordem dos Engenheiros), pages 49-55, April 1992.

International Conferences

55 references, last updated Wed Jul 12 16:33:10 2017

Liacha-NEWCAS17.pdf Ahmed Liacha, Abdelkrim K. Oudjida, Farid Ferguene, José Monteiro, and Paulo Flores.
"A variable radix-2r algorithm for single constant multiplication". In 15th IEEE International New Circuits and Systems Conference (NEWCAS), pages ??-??, June 25-28, 2017.
Abstract
In previous work, a fully predictable sub-linear runtime heuristic for the multiplication by a constant based on Radix-2r arithmetic using a fixed radix was developed, called RADIX-2r. In this paper, we introduce a new constant multiplication algorithm based also on Radix-2r arithmetic but considering a variable radix. The new version is named RADIX-2r-VAR. Using a variable radix allows to optimize the average number of additions in the constant multiplication since a larger search space is explored. The new RADIX-2r-VAR recoding requires an average of 4.4% and 2.2% less additions than RADIX-2r for 24 and 32 bits, respectively. The RADIX-2r-VAR algorithm is combined with RADIX-2r for more improvements of the average number of Liacha-NEWCAS17 additions. An overall saving of 6.7% and 5.5% is obtained over RADIX-2r for 24 and 32 bits, respectively.

Keywords: High-Speed and Low-Power Design, Linear-Time-Invariant (LTI) Systems, Multiplierless Single/Mutiple Constant Multiplication (SCM/MCM). Radix-2r Arithmetic

Aksoy-EUC15.pdf Levent Aksoy, Paulo Flores, and José Monteiro.
"A novel method for the approximation of multiplierless constant matrix vector multiplication". In 13th IEEE/IFIP International Conference on Embedded and Ubiquitous Computing (EUC), pages 98-105. IEEE, October 21-23, 2015. Best Paper Award. (doi:10.1109/EUC.2015.27)
Abstract
Since human beings have limited perceptual abilities, in many digital signal processing (DSP) applications, e.g., image and video processing, the outputs do not need to be computed accurately. Instead, they can be approximated so that the area, delay, and/or power dissipation of the design can be reduced. This paper presents an approximation algorithm, called AURA, for the multiplierless design of the constant matrix vector multiplication (CMVM) which is a ubiquitous operation in DSP systems. AURA aims to tune the constants such that the resulting matrix leads to a CMVM design which requires the fewest adders/subtractors, satisfying the given error constraints. This paper also introduces its modified version, called AURA-DC, which can reduce the delay of the CMVM operation with a small increase in the number of adders/subtractors. Experimental results show that the proposed algorithms yield significant reductions in the number of adders/subtractors with respect to the original realizations without violating the error constraints, and consequently, lead to CMVM designs with less area, delay, and power dissipation. Moreover, they can generate alternative CMVM designs under different error constraints, enabling a designer to choose the one that fits best in an application.

Keywords: Approximate computing, constant matrix vector multiplication, multiplierless design, 0-1 integer linear programming, area and delay optimization, discrete cosine transform.

Aksoy-ISCAS15.pdf Levent Aksoy, Paulo Flores, and José Monteiro.
"Approximation of multiple constant multiplications using minimum look-up tables on FPGA". In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), pages 2884-2887, May 24-27, 2015. *. (doi:10.1109/ISCAS.2015.7169289)
Abstract
In many digital signal processing (DSP) systems, computations can be carried out within a tolerable error range rather than finding the exact output, enabling significant reductions in area, delay, or power dissipation of the design. This paper addresses the problem of approximating the multiple constant multiplications (MCM) operation which occurs frequently in DSP applications. We consider the realization of constant multiplications using look-up tables (LUTs) on field programmable gate arrays (FPGA) and introduce an exact algorithm, called THETIS, that can find a minimum number of distinct LUTs required to realize the partial products of constant multiplications, satisfying an error constraint. Experimental results show that THETIS can achieve significant reductions in number of LUTs on MCM instances and its solutions lead to less complex filter designs on FPGA than those realized using original filter coefficients.

Keywords: Aproximating computing, multiple constant multiplications (MCM), FPGA

Aksoy-ICCD14.pdf Levent Aksoy, Paulo Flores, and José Monteiro.
"Efficient design of FIR filters using hybrid multiple constant multiplications on FPGA". In Proceedings of the IEEE International Conference on Computer Design (ICCD), pages 42-47. IEEE, October 19-22, 2014. (doi:10.1109/ICCD.2014.6974660)
Abstract
The multiple constant multiplication (MCM) block, that realizes the multiplication of constants by a variable, is a ubiquitous operation in digital signal processing (DSP) systems. It can be implemented using generic multipliers or shifts and adders/subtractors. This paper addresses the problem of finding the minimum number of adders/subtractors to realize the MCM block while a number of multipliers are available to realize some constant multiplications. Such a situation appears in the design of DSP systems on field programmable gate arrays (FPGAs) which also include generic multipliers. We present a 0-1 integer linear programming (ILP) formulation of this problem, yielding an exact common subexpression elimination (CSE) method. Due to the NP-completeness of this problem, we also introduce an approximate graph-based (GB) algorithm. Experimental results show that the proposed methods can find better solutions than a state-of-art algorithm and the use of different numbers of multipliers in the MCM block leads to filter designs with different number of slices, delay, and power dissipation which enable a designer to choose the one that fits best in an application.

Keywords: Multiple Constant Multiplication (MCM); FPGAs; Exact Common Subexpression elimination (CSE) method; Approximate Graph-Based (GB) algorithm; Integer Linear Programming (ILP).

Sebastiao-HPCS14.pdf Nuno Sebastião, Paulo Flores, and Nuno Roma.
"Optimized ASIP architecture for compressed BWT-indexed search in bioinformatics applications". In International Conference on High Performance Computing & Simulation (HPCS), pages 527-534. IEEE, July 21-25, 2014. (doi:10.1109/HPCSim.2014.6903731)
Abstract
Compressed indexes are adopted by a vast set of bioinformatics applications that deal with extremely large datasets, mainly due to the inherently high memory requirements of uncompressed alternatives. However, the additional computational overhead that is imposed by the usage of such indexes makes them harder to implement in embedded computational platforms, such as biochips, with strict processing and power restrictions. Furthermore, compressed indexes are often characterized by a significant usage of bit-level operations, some of which are not commonly available on General Purpose Processors (GPPs). To circumvent this limitation, an Application-Specific Instruction-set Processor (ASIP) architecture is proposed to accelerate the processing of biological sequences (e.g., alignment, mapping, etc.) using compressed full-text indexes based on the Burrows-Wheeler Transform (BWT). The proposed processor was built over a RISC micro-architecture and extends the Xilinx MicroBlaze ISA with additional bit-level operations, especially tailored for compressed indexes. When used to perform search operations over the considered compressed index, the proposed architecture provides a reduction of the number of required instructions by about one half. Furthermore, when prototyped on a Xilinx Virtex-7 FPGA, the ASIP proved to offer an overall speedup between 3.1x and 4.5x for the execution of a single threaded operation. To ensure a further processing scalability, the proposed ASIP was designed in order to be easily used as the basic processing unit of multi-core systems, especially tuned for the parallel processing of massive datasets of biological reads.

Keywords: Bioinformatics applications; Embedded computational platforms; Processing of biological sequences: alignment, mapping; Application-Specific Instruction-set Processor (ASIP); Compressed full-text indexes; Burrows-Wheeler Transform (BWT); MicroBlaze ISA extension.

Aksoy-ISCAS14.pdf Levent Aksoy, Paulo Flores, and José Monteiro.
"ECHO: A novel method for the multiplierless design of constant array vector multiplication". In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), pages 1456-1459. IEEE, June 1-5, 2014. (doi:10.1109/ISCAS.2014.6865420)
Abstract
The constant array vector multiplication (CAVM) operation realizes the multiplication of a constant array by a vector of variables and occurs in the design of direct form finite impulse response (FIR) and infinite impulse response (IIR) filters. In this paper, for the first time, we directly target the optimization of the multiplierless design of a CAVM operation and introduce a novel algorithm, called ECHO, that can find the fewest number of adders and subtractors required for its implementation. We also describe some hardware optimization techniques that can reduce the gate-level area and delay of the CAVM design. It is shown that the solutions of ECHO include significantly less number of operations and yield less area in FIR filter designs than those of previously proposed state-of-art algorithms.

Keywords: FIR, IIR filters; Constant Array Vector Multiplication (CAVM); Multiplierless design; Cirucuit/hardware optimization techniques; DSP; Gate-level implementation.

Aksoy-IWLS14.pdf Levent Aksoy, Paulo Flores, and José Monteiro.
"On the multiplierless design of correctly rounded multiple constant divisions". In IEEE/ACM International Workshop on Logic Synthesis (IWLS), pages ??-??, May 30, 2014. *.
Abstract
The previously proposed algorithms designed for the constant divisions use the multiply-add architecture which is expensive in terms of area and power dissipation in hardware. This paper introduces the problem of finding the minimum number of adders/subtractors which realize the correctly rounded multiple constant divisions (MCD) under the shift-adds architecture. It proposes a depth-first search method which can explore the whole search space, ensuring the minimum solution. It also presents a local search method which can handle the instances that the exact algorithm cannot cope with due to the NP-completeness of the MCD problem. Both algorithms are equipped with techniques that maximize the sharing of common partial products among the constant multiplications. To the best of our knowledge, these are the first algorithms proposed for the multiplierless design of the correctly rounded MCD. Experimental results include the results of algorithms, indicating that the local search method can find solutions close to the minimum using little computational resources and can obtain better solutions than a state-of-art algorithm. They also include the results of MCD designs under different architectures, showing that the MCD designs under the shift-adds architecture occupy less area and consume less power than those under the multiply-add architecture.

Keywords: Constant division; multiply-add architecture; shift-adds architecture; rounded multiple constant divisions (MCD).

Aksoy-DATE14.pdf Levent Aksoy, Paulo Flores, and José Monteiro.
"Optimization of design complexity in time-multiplexed constant multiplications". In Proceedings of IEEE/ACM Design, Automation and Test in Europe Conference (DATE), pages 1-4. IEEE, March 24-28, 2014. (doi:10.7873/DATE.2014.313)
Abstract
The multiplication of constants by a data input is an essential operation in digital signal processing (DSP) systems. For applications requiring a large number of constant multiplications under stringent hardware constraints, it is generally realized under a folded architecture, where a single constant selected from a set of multiple constants is multiplied by the data input at each time, called time-multiplexed constant multiplication (TMCM). This paper addresses the problem of optimizing the complexity of a TMCM design and introduces an algorithm that finds the least complex TMCM design by sharing the logic operators, i.e., adders, subtractors, adders/subtractors, and multiplexors (MUXes). It includes efficient search methods, yielding better results than existing TMCM algorithms.

Keywords: Time-Multiplexed Constant Multiplication (TMCM); Folded architecture; shift-adds architecture optimization; Digital Signal Processing (DSP).

Aksoy-VLSI-SoC13.pdf Levent Aksoy, Paulo Flores, and José Monteiro.
"Towards the least complex time-multiplexed constant multiplication". In IEEE/IFIP International Conference on VLSI and System-on-Chip (VLSI-SoC), pages 328-331, October 7-9, 2013. (doi:10.1109/VLSI-SoC.2013.6673302)
Abstract
The multiplication of a variable by a single constant selected from a set of fixed constants at a time, called the time-multiplexed constant multiplication (TMCM), is frequently used in digital signal processing (DSP) systems. Existing algorithms implement the TMCM operation using multiplexers (MUXes), adders/subtractors, and shifts, and reduce its complexity by merging single/multiple constant multiplication graphs and by sharing the basic structures. This paper introduces ARION, that exploits the most common partial terms in the TMCM design on top of the previously proposed DAGfusion algorithm, which merges the single constant multiplication graphs. Experimental results show that ARION obtains significantly better solutions than prominent TMCM methods.

Aksoy-ECCTD13.pdf Levent Aksoy, Eduardo Costa, Paulo Flores, and José Monteiro.
"Exploration of tradeoffs in the design of integer cosine transforms for image compression". In Proceedings of European Conference on Circuit Theory and Design (ECCTD), pages 1-4, September 8-12, 2013. (doi:10.1109/ECCTD.2013.6662223)
Abstract
In image data compression, integer cosine transforms (ICTs) have been preferred to discrete cosine transforms (DCTs) due to their similar transform efficiency and lower implementation cost. However, there exist many alternative ICTs with different performance measures and implementation costs. In this work, we explore all possible ICTs, compute their performance measures, and find their implementation costs in terms of the number of adders/subtractors, where a state-of-art technique is used to realize ICTs under a shift-adds architecture. We also investigate the tradeoff between performance and implementation cost, present the pareto-optimal points of this tradeoff, and introduce promising ICTs that were not considered before.

Keywords: Integer Cosine Transforms (ICT); Discrete Cosine Transforms (DCT); Shift-Adds Architecture; Multiplier-Less Hardware Logic Gates.

Brito-NEWCAS13.pdf Diogo Brito, Jorge Fernandes, Paulo Flores, and José Monteiro.
"Standard CMOS voltage-mode QLUT using a clock boosting technique". In Proceedings of the IEEE International New Circuits and Systems Conference (NEWCAS), pages 1-4, June 16-19, 2013. (doi:10.1109/NEWCAS.2013.6573573)
Abstract
Interconnect has become preponderant in many aspects of digital circuit design, namely delay, power and area. This effect is particularly true for FPGAs, where interconnection is often the most limiting factor. Multiple-valued logic allows to reduce interconnections, within logic cells and between them, hence effectively mitigating the impact of interconnections. In this paper we propose a new look-up table structure based on a low-power high-speed quaternary voltage-mode device. Our quaternary implementation overcomes the drawbacks of previously proposed techniques by using a standard CMOS technology and a clock boosting technique to enhance speed without increasing consumption. Moreover, we present an ASIC prototype of a full adder based on the designed look-up table and experimental results are obtained and compared with simulation. The prototype is designed to work at 100 MHz and it consumes 128 uW.

Keywords: Quaternary Logic, Look-up Table, FPGA, Clock Boosting.

Neves-ASAP13.pdf Nuno Neves, Nuno Sebastião, André Patrício, David Matos, Pedro Tomás, Paulo Flores, and Nuno Roma.
"BioBlaze: Multi-core SIMD ASIP for DNA sequence alignment". In Proceedings of the IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP), pages 241-244. IEEE, June 5-7, 2013. (doi:10.1109/ASAP.2013.6567581)
Abstract
A new Application-Specific Instruction-set Processor (ASIP) architecture for biological sequences alignment is proposed in this manuscript. This architecture achieves high processing throughputs by exploiting both fine and coarse-grained parallelism. The former is achieved by extending the Instruction Set Architecture (ISA) of a synthesizable processor to include multiple specialized SIMD instructions that implement vector-vector and vector-scalar arithmetic, logic, load/store and control operations. Coarse-grained parallelism is achieved by using multiple cores to cooperatively align multiple sequences in a shared memory architecture, comprising proper hardware-specific synchronization mechanisms. To ease the programming, a compilation framework based on an adaptation of the GCC back-end was also implemented. The proposed system was prototyped and evaluated on a Xilinx Virtex-7 FPGA, achieving a 200MHz working frequency. A sequential and a state-of-the-art SIMD implementations of the Smith-Waterman algorithm were programmed in both the proposed ASIP and an Intel Core i7 processor. When comparing the achieved speedups, it was observed that the proposed ISA achieves a 40x speedup, which contrasts with the 11x speedup provided by SSE2 in the Intel Core i7 processor. The scalability of the multi-core system was also evaluated and proved to scale almost linearly with the number of cores.

Marques-FLAIRS13.pdf Ricardo Marques, Luis Guerra e Silva, Paulo Flores, and L. Miguel Silveira.
"Improving SAT solver efficiency using a multi-core approach". In Proceedings of International Florida Artificial Intelligence Research Society Conference (FLAIRS), pages 94-99, May 22-24, 2013.
Abstract
Many practical problems in multiple fields can be converted to a SAT problem, or a sequence of SAT problems, such that their solution immediately implies a solution to the original problem. Despite the enormous progress achieved over the last decade in the development of SAT solvers, there is strong demand for higher algorithm efficiency to solve harder and larger problems. The widespread availability of multi-core, shared memory parallel environments provides an opportunity for such improvements. In this paper we present our results on improving the effectiveness of standard SAT solvers on such architectures, through a portfolio approach. Multiple instances of the same basic solver using different heuristic strategies, for search-space exploration and problem analysis, share information and cooperate towards the solution of a given problem. Results from the application of our methodology to known problems from SAT competitions show relevant improvements over the state of the art and yield the promise of further advances.

Aksoy-GLSVLSI13.pdf Levent Aksoy, Paulo Flores, and José Monteiro.
"SIREN: A depth-first search algorithm for the filter design optimization problem". In Proceedings of Great Lakes Symposium on VLSI (GLS-VLSI), pages 179-184, May 2-3, 2013. (doi:10.1145/2483028.2483087)
Abstract
This paper addresses the filter design optimization (FDO) problem that is to find a set of filter coefficients which yields the least design complexity while meeting the required filter constraints. The design complexity of a filter is defined in terms of the total number of adders/subtracters, assuming that the multiplication of coefficients by the filter input is realized under a shift-adds architecture. Existing algorithms use efficient search methods, but none of them can guarantee the minimum design complexity. Hence, we propose an exact algorithm, called SIREN, that finds an optimum solution of the FDO problem under the minimum quantization value. It is based on a depth-first search method equipped with an exact technique, that finds the minimum number of adders/subtracters in the multiplier block of the filter, and search pruning techniques that enable it to be applicable to practical instances. Experimental results show that SIREN can still find better solutions than efficient FDO algorithms and its solutions lead to filters with significantly less area when compared to a straightforward filter design technique.

Keywords: Filter design optimization; Depth-first search; Linear programming; Multiplierless filter design.

Brito-ICECS12.pdf Diogo Brito, Jorge Fernandes, Paulo Flores, and José Monteiro.
"Design and characterization of a QLUT in a standard CMOS process". In Proceedings of the IEEE International Conference on Electronics, Circuits and Systems (ICECS), pages 288-291, December 9-12, 2012. (doi:10.1109/ICECS.2012.6463744)
Abstract
Interconnect has become preponderant in many aspects of circuit design, namely delay, power and area. This effect is particularly true for FPGAs, where interconnect is often the most limiting factor. Quaternary logic offers a means to reduce interconnect since each circuit wire can, in principle, carry the same information as two binary wires. We have proposed in [1] a design implementing a quaternary low-power highspeed look-up table. The main features of this circuit are being based on a voltage-mode structure and using only standard CMOS technology. In this paper we present the design of a prototype implementation and experimental results. These results are discussed and conclusions are drawn that provide further guidelines for improvement.

Keywords: Multi-value logic; Quaternary circuits; Circuit Layout, Fabrication and Test.

Aksoy-ICCAD12.pdf Levent Aksoy, Eduardo Costa, Paulo Flores, and José Monteiro.
"Multiple tunable constant multiplications: Algorithms and applications". In Proceedings of IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 473-479, November 5-8, 2012. (doi:10.1145/2429384.2429482)
Abstract
The multiple constant multiplications (MCM) problem, that is defined as finding the minimum number of addition and subtraction operations required for the multiplication of multiple constants by an input variable, has been the subject of great interest since the complexity of many digital signal processing (DSP) systems is dominated by an MCM operation. This paper introduces a variant of the MCM problem, called multiple tunable constant multiplications (MTCM) problem, where each constant is not fixed as in the MCM problem, but can be selected from a set of possible constants. We present an exact algorithm that formalizes the MTCM problem as a 0-1 integer linear programming (ILP) problem when constants are defined under a number representation. We also introduce a local search method for the MTCM problem that includes an efficient MCM algorithm. Furthermore, we show that these techniques can be used to solve various optimization problems in finite impulse response (FIR) filter design and we apply them to one of these problems. Experimental results clearly show the efficiency of the proposed methods when compared to prominent algorithms designed for the MCM problem

Keywords: MTCM - Multiple Tunable Constant Multiplications; 0-1 ILP; Exact algorithm; Local search algorithm.

Aksoy-DATE12.pdf Levent Aksoy, Eduardo Costa, Paulo Flores, and José Monteiro.
"Design of low-complexity digital finite impulse response filters on FPGAs". In Proceedings of IEEE/ACM Design, Automation and Test in Europe Conference (DATE), pages 1197-1202, March 12-16, 2012. (doi:10.1109/DATE.2012.6176675)
Abstract
The multiple constant multiplications (MCM) operation, which realizes the multiplication of a set of constants by a variable, has a significant impact on the complexity and performance of the digital finite impulse response (FIR) filters. Over the years, many high-level algorithms and design methods have been proposed for the efficient implementation of the MCM operation using only addition, subtraction, and shift operations. The main contribution of this paper is the introduction of a high-level synthesis algorithm that optimizes the area of the MCM operation and, consequently, of the FIR filter design, on field programmable gate arrays (FPGAs) by taking into account the implementation cost of each addition and subtraction operation in terms of the number of fundamental building blocks of FPGAs. It is observed from the experimental results that the solutions of the proposed algorithm yield less complex FIR filters on FPGAs with respect to those whose MCM part is implemented using prominent MCM algorithms and design methods.

Keywords: FIR filter design;FPGA;MCM algorithms;field programmable gate arrays;high-level algorithms;low-complexity digital finite impulse response filters design;multiple constant multiplications operation;operations level synthesis algorithm;shift operations;FIR filters;field programmable gate arrays;high level synthesis.

Aksoy-VLSI-SoC11.pdf Levent Aksoy, Eduardo Costa, Paulo Flores, and José Monteiro.
"A hybrid algorithm for the optimization of area and delay in linear DSP transforms". In IEEE/IFIP International Conference on VLSI and System-on-Chip (VLSI-SoC), pages 148-153, October 3-5, 2011. (doi:10.1109/VLSISoC.2011.6081637)
Abstract
This paper addresses the problem of multiplierless realization of linear transforms using the fewest number of addition and subtraction operations and introduces a hybrid algorithm that incorporates a graph-based technique, called the difference method, and a Common Subexpression Elimination (CSE) algorithm. In the proposed algorithm, while the difference method extracts the most promising realizations of linear transforms in each iteration, the CSE algorithm achieves the most common minimum conflicting subexpressions in each solution of the difference method. This paper also describes how the hybrid algorithm can be modified in order to find a solution with the fewest number of operations under a delay constraint. The experimental results on a comprehensive set of instances show the efficiency of the hybrid algorithms, at both high-level and gate-level, in comparison to previously proposed algorithms.

Keywords: Multiplierless Linear Transforms, Hybrid algorithms, Graph-based techniques, Common Subexpression Elimination (CSE), High-level and gate-level optimization.

Aksoy-ECCTD11.pdf Levent Aksoy, Eduardo Costa, Paulo Flores, and José Monteiro.
"Optimization of gate-level area in high throughput multiple constant multiplications". In Proceedings of European Conference on Circuit Theory and Design (ECCTD), pages 588-591. IEEE, August 29-31, 2011. (doi:10.1109/ECCTD.2011.6043602)
Abstract
This paper addresses the problem of optimizing gate-level area in a pipelined Multiple Constant Multiplications (MCM) operation and introduces a high-level synthesis algorithm, called HCUB-DC+ILP. In the HCUB-DC+ILP algorithm, initially, a solution with the fewest number of operations under a minimum delay constraint is found by the Hcub-DC algorithm. Then, the area around this local minimum point is explored exactly using a 0-1 Integer Linear Programming (ILP) technique that considers the gate-level implementation of the pipelined MCM operation. The experimental results at both high-level and gate-level clearly show the efficiency of HCUB-DC+ILP over previously proposed prominent MCM algorithms.

Keywords: Algorithm Design and Analysis, Optimization, Pipeline Processing, Signal Processing Algorithms, Pipelined Multiple Constant Multiplications (MCM), Integer Linear Programming (ILP), Gate-Level Optimization, HCUB-DC+ILP and Hcub-DC algorithm.

Aksoy-ISCAS11.pdf Levent Aksoy, Cristiano Lazzari, Eduardo Costa, Paulo Flores, and José Monteiro.
"Optimization of area in digit-serial multiple constant multiplications at gate-level". In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), pages 2737- 2740. IEEE, May 15-18, 2011. (doi:10.1109/ISCAS.2011.5938171)
Abstract
The last two decades have seen many efficient algorithms and architectures for the design of low-complexity bit-parallel Multiple Constant Multiplications (MCM) operation, that dominates the complexity of Digital Signal Processing (DSP) systems. On the other hand, digit-serial architectures offer alternative low-complexity designs, since digit-serial operators occupy less area and are independent of the data wordlength. This paper introduces the problem of designing a digit-serial MCM operation with minimal area at gate-level and presents the exact formalization of the area optimization problem as a 0-1 Integer Linear Programming (ILP) problem. Experimental results show the efficiency of the proposed algorithm and digit-serial MCM designs in terms of area at gate-level.

Keywords: Multiple Constant Multiplications (MCM), Digit-Serial Multiple Constant Multiplication operation, Low-complexity design, Area optimization, 0-1 Integer Linear Programming problem, Area optimization problem, Gate-level optimization.

Aksoy-GLSVLSI11-mcmSerial.pdf Levent Aksoy, Cristiano Lazzari, Eduardo Costa, Paulo Flores, and José Monteiro.
"Efficient shift-adds design of digit-serial multiple constant multiplications". In Proceedings of Great Lakes Symposium on VLSI (GLS-VLSI), pages 61-66. ACM, May 2-4, 2011. (doi:10.1145/1973009.1973023)
Abstract
The bit-parallel realization of the multiplication of a variable by a set of constants using only addition/subtraction and shift operations has been explored extensively over the years, as large number of constant multiplications dominate the complexity of many Digital Signal Processing (DSP) systems. On the other hand, digit-serial architectures offer alternative low-complexity designs, since digit-serial operators occupy less area and are independent of the data wordlength. This paper introduces an approximate algorithm that targets the optimization of the gate-level area of digit-serial constant multiplications under a shift-adds architecture. Experimental results on a comprehensive set of instances indicate that the approximate algorithm presented in this paper gives better solutions than the previously proposed algorithms in terms of area at gate-level and yields alternative low-complexity designs relatively to the bit-parallel design. Furthermore, it is observed on the digit-serial filter designs that the use of shift-adds architecture yields area reduction up to 43.6% with respect to the filter designs that use generic digit-serial constant multipliers.

Keywords: Multiple constant multiplications (MCM), digit-serial arithmetic, gate-level area optimization, finite impulse response (FIR) filters.

Aksoy-GLSVLSI11-mcmLowpower.pdf Levent Aksoy, Eduardo Costa, Paulo Flores, and José Monteiro.
"Design of low-power multiple constant multiplications using low-complexity minimum depth operations". In Proceedings of Great Lakes Symposium on VLSI (GLS-VLSI), pages 79-84. ACM, May 2-4, 2011. (doi:10.1145/1973009.1973026)
Abstract
Existing optimization algorithms for the multiplierless realization of multiple constant multiplications (MCM) typically target the minimization of the number of addition and subtraction operations. Since power dissipation is directly related to the amount of hardware, some power reduction is indirectly achieved by these algorithms. However, in many cases, glitching plays an equally important role in defining the power consumption. This is specially true for arithmetic circuits, and in particular to MCM due to high logic depth and large number of re-convergent paths. This paper introduces exact algorithms that search the optimal area of an MCM design at gate-level where each constant multiplication is implemented in its minimum depth. Experimental results show that the proposed algorithms lead to MCM designs consuming significantly less power with respect to those obtained by the MCM algorithms.

Keywords: Multiple constant multiplications (MCM), 0-1 integer linear programming, high-level synthesis, low-power MCM design.

Jaccottet-VLSI-SoC10.pdf Diego Jaccottet, Eduardo Costa, Levent Aksoy, Paulo Flores, and José Monteiro.
"Design of low-complexity and high-speed digital finite impulse response filters". In IEEE/IFIP International Conference on VLSI and System-on-Chip (VLSI-SoC), pages 292-297, September 27-29, 2010. (doi:10.1109/VLSISOC.2010.5642676)
Abstract
In this paper, we introduce a design methodology to implement low-complexity and high-speed digital Finite Impulse Response (FIR) filters. Since FIR filters suffer from a large number of constant multiplications, in the proposed method the constant multiplications are replaced by addition/subtraction and shift operations. Also, based on the design objective, i.e., low-complexity or high-speed, the addition/subtraction operations are implemented using Ripple Carry Adder (RCA) or Carry-Save Adder (CSA) architectures respectively. Furthermore, high-level algorithms designed for the optimization of the number of RCA and CSA blocks are used to reduce the complexity of the FIR filter. Thus, a Computer-Aided Design (CAD) tool that synthesizes low-complexity and high-speed FIR filters in a shift-adds architecture is developed. It is observed from the experimental results on FIR filter instances that the developed CAD tool can find better FIR filter designs in terms of area and delay than those obtained using efficient general multipliers.

Keywords: Multiple constant multiplications, addition and subtraction architectures, low complexity FIR filter, ripple carry adder, carry saver adder.

Aksoy-DSD10.pdf Levent Aksoy, Eduardo Costa, Paulo Flores, and José Monteiro.
"Optimization of area and delay at gate-level in multiple constant multiplications". In Proceedings of Euromicro Conference on Digital System Design: Architectures, Methods and Tools (DSD), IEEE Computer Society, pages 3-10, September 1-3, 2010. (doi:10.1109/DSD.2010.32)
Abstract
Although many efficient high-level algorithms have been proposed for the realization of Multiple Constant Multiplications (MCM) using the fewest number of addition and subtraction operations, they do not consider the low-level implementation issues that directly affect the area, delay, and power dissipation of the MCM design. In this paper, we initially present area efficient addition and subtraction architectures used in the design of the MCM operation. Then, we propose an algorithm that searches an MCM design with the smallest area taking into account the cost of each operation at gate-level. To address the area and delay tradeoff in MCM design, the proposed algorithm is improved to find the smallest area solution under a delay constraint. The experimental results show that the proposed algorithms yield low-complexity and high-speed MCM designs with respect to those obtained by the prominent algorithms designed for the optimization of the number of operations and the optimization of area at gate-level.

Keywords: Multiple constant multiplications, addition and subtraction architectures, delay aware area optimization, gate-level area optimization, graph-based algorithms.

Sebastiao-HPCS10.pdf Nuno Sebastião, Tiago Dias, Nuno Roma, and Paulo Flores.
"Integrated accelerator architecture for DNA sequences alignment with enhanced traceback phase". In International Conference on High Performance Computing & Simulation (HPCS), pages 16-23, June 28-July 2, 2010. (doi:10.1109/HPCS.2010.5547154)
Abstract
Dynamic programming algorithms are widely used to find the optimal sequence alignment between any two DNA sequences. This paper presents an innovative technique to significantly reduce the computation time and memory space requirements of the traceback phase of the Smith-Waterman algorithm, together with a flexible and scalable hardware architecture to accelerate the overall procedure. The results obtained from an implementation using a Virtex-4 FPGA showed that the proposed technique is feasible and is able to provide a significant speedup. For the considered test sequences, a speedup of about 6000 was obtained.

Keywords: DNA, Hardware Accelerator, Local Sequence Alignment, Traceback.

Lazzari-ISCAS10.pdf Cristiano Lazzari, Paulo Flores, José Monteiro, and Luigi Carro.
"Voltage-mode quaternary FPGAs: An evaluation of interconnections". In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), pages 869-972, May 30-June 2, 2010. (doi:10.1109/ISCAS.2010.5537423)
Abstract
This work presents a study about FPGA interconnections and evaluates their effects on voltage-mode binary and quaternary FPGA structures. FPGAs are widely used due to the fast time-to-market and reduced non-recurring engineering costs in comparison to ASIC designs. Interconnections play a crucial role in modern FPGAs, because they dominate delay, power and area. The use of multiple-valued logic allows the reduction of the number of signals in the circuit, hence providing a mean to effectively curtail the impact of interconnections. The most important characteristic of the results are the reduced fanout, fewer number of wires and the smaller wire length presented by the quaternary devices. We use a set of arithmetic circuits to compare binary and quaternary implementations. This work presents the first step on developing quaternary circuits by mapping any binary random logic onto quaternary devices.

Keywords: FPGAs, Mutiple-valued Logic, Low-power Quaternary Voltage-mode Device, Interconnectios, Arithmetic Circuits.

Lazzari-DATE10.pdf Cristiano Lazzari, Paulo Flores, José Monteiro, and Luigi Carro.
"A new quaternary FPGA based on a voltage-mode multi-valued circuit". In Proceedings of IEEE/ACM Design, Automation and Test in Europe Conference (DATE), pages 1797-1802, March 8-12, 2010. (doi:10.1109/DATE.2010.5457105)
Abstract
FPGA structures are widely used due to early time-to-market and reduced non-recurring engineering costs in comparison to ASIC designs. Interconnections play a crucial role in modern FPGAs, because they dominate delay, power and area. Multiple-valued logic allows the reduction of the number of signals in the circuit, hence can serve as a mean to effectively curtail the impact of interconnections. In this work we propose a new FPGA structure based on a low-power quaternary voltage-mode device. The most important characteristics of the proposed architecture are the reduced fanout, low number of wires and switches, and the small wire length. We use a set of FIR filters as a demonstrator of the benefits of the quaternary representation in FPGAs. Results show a significant reduction on power consumption.

Keywords: FPGAs, Mutiple-valued Logic, Low-power Quaternary Voltage-mode Device, FIR filters.

Lazzari-SCS09.pdf Cristiano Lazzari, Paulo Flores, and José Carlos Monteiro.
"Power and delay comparison of binary and quaternary arithmetic circuits". In International Conference on Signals, Circuits and Systems (SCS), pages 1-6, November 6-8, 2009. (doi:10.1109/ICSCS.2009.5412586)
Abstract
Interconnections play a crucial role in deep sub-micron designs because they dominate the delay, power and area. This is especially critical for modern million-gates FPGAs, where as much as 90% of chip area is devoted to interconnections. Multiple-valued logic allows for the reduction of the required number of signals in the circuit, hence can serve as a means to effectively curtail the impact of interconnections. We present in this paper a comparison of binary and quaternary implementations of arithmetic modules based on lookup table structures using a voltage-mode circuits. Our assessment demonstrates that significant power reduction is possible through the use of quaternary structures, with very low delay penalties.

Keywords: Quaternary Logic, Arithmetic Circuits, FPGA Synthesis, Delay and Power Consumption.

Aksoy-ICC09.pdf Levent Aksoy, Ece Olcay Gunes, and Paulo Flores.
"Optimization of area under a delay constraint in multiple constant multiplications". In Proceedings of the WSEAS International Conference on Circuits (ICC), pages 81-86. World Scientific and Engineering Academy and Society (WSEAS), July 22-24, 2009.
Abstract
The Multiple Constant Multiplications (MCM), i.e., the multiplication of a variable by a set of constants, has been a central operation and performance bottleneck in many digital signal processing applications such as, image and video processing, digital television, and wireless communications. Since the design of multiplications is expensive in terms of area, delay, and power consumption in hardware, the area-delay optimization of the MCM operation has often been accomplished by using the shift-adds architecture. However, most of the previously proposed algorithms have focused on the optimization of area ignoring the crucial tradeoff between area and delay of the computation. In this paper, we introduce an approximate algorithm that can find near optimal area solutions under the user specified delay constraint. It is shown by the experimental results that the proposed algorithm finds better area-delay solutions than the previously proposed efficient algorithms.

Keywords: Multiple constant multiplications, area and delay optimization, high-level synthesis, digital FIR filters.

Aksoy-NORCHIP08.pdf Levent Aksoy, Ece Olcay Gunes, and Paulo Flores.
"An exact breadth-first search algorithm for the multiple constant multiplications problem". In NORCHIP - The Nordic Microelectronics event (NORCHIP), pages 41-46, November 16-17, 2008. (doi:10.1109/NORCHP.2008.4738280)
Abstract
This paper addresses the multiplication of one data sample with multiple constants using addition/subtraction and shift operations, i.e., the multiple constant multiplications (MCM) problem. The MCM problem finds itself and its variants in many applications, such as digital finite impulse response (FIR) filters, linear signal transforms, and computer arithmetic. Although many efficient algorithms have been proposed to implement the MCM using the fewest number of operations, they have been heuristics, i.e., they cannot guarantee the minimum solution. In this work, we propose an exact algorithm based on the exhaustive search that finds the minimum number of operations solution of an MCM problem. We have tested the exact algorithm on a set of instances including FIR filter and randomly generated instances, and compared with the previously proposed efficient heuristics. It is observed from the experimental results that, even though the exact algorithm can be applied only on less complex instances, it finds better solutions than the prominent heuristics when applied on real size instances

Keywords: Multiple constant multiplications problem, exact algorithm, breadth-first exhaustive search.

Daitx-SCS08.pdf Fábio Daitx, Vagner Rosa, Eduardo Costa, Paulo Flores, and Sérgio Bampi.
"VHDL generation of optimized FIR filters". In International Conference on Signals, Circuits and Systems (SCS), pages 1-5, November 7-9, , 2008. (doi:10.1109/ICSCS.2008.4746944)
Abstract
This work proposes an VHDL generation software for optimized FIR filters. In this paper a near optimum algorithm for constant coefficient FIR filters was used. This algorithm uses general coefficient representation for the optimal sharing of partial products in Multiple Constants Multiplications (MCM). The developed tool was compared to Matlab FDA toolbox. Synthesis results show that our tool is able to produce significantly better hardware than FDA toolbox, doubling the speed and reducing the silicon area by 75%. The software produces a generic VHDL output, synthesizable to ASIC or FPGA.

Keywords: FIR filters; MCM optimization; FPGA implementation.

Morgado-PATMOS08.pdf P. Marques Morgado, Paulo F. Flores, José C. Monteiro, and L. Miguel Silveira.
"Generating worst-case stimuli for accurate power grid analysis". In Lars Svebsson and José Monteiro, editors, International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), pages 247-257. Springer, September 12-14, 2008.
Abstract
Power distribution systems provide the voltages and currents that devices in a circuit need to operate properly and silicon success requires its careful design and verification. However, problems like voltage drop, ground bounce and electromigration which may cause chip failures, are worsening, as more devices, operating at higher frequencies, are placed closer together. Verification of this type of systems is usually done by simulation, a costly endeavor given the size of current grids, making the determination of the worst-case input setting a crucial task. Current methodologies are based on supposedly safe settings targeting either unrealistic simultaneous switching on all signals or heuristic accounts of the joint switching probability of nearby signals. In this paper we propose a methodology for computation of the worst-case stimuli for power grid analysis. This is accomplished by determining the input vector that maximizes the number of gates in close proximity to each other that can switch in a given time window. The addition of these temporal and spatial restrictions makes the solution of the underlying optimization problem feasible. Comparisons with existing alternatives show that only a fraction of the gates change in any given timing window, leading to a more robust and efficient verification methodology.

Keywords: Power Grid Analysis; Wort-case Stimuli.

Sebastiao-DSD08.pdf Nuno Sebastião and Tiago Dias and Nuno Roma and Paulo Flores and Leonel Sousa.
"Application specific programmable IP core for motion estimation: Technology comparison targeting efficient embedded co-processing units". In Proceedings of Euromicro Conference on Digital System Design: Architectures, Methods and Tools (DSD), IEEE Computer Society, pages 181-188, September 3-5, 2008. (doi:10.1109/DSD.2008.66)
Abstract
The implementation of a recently proposed IP core of an efficient motion estimation co-processor is considered. Some significant functional improvements to the base architecture are proposed, as well as the presentation of a detailed description of the interfacing between the coprocessor and the main processing unit of the video encoding system. Then, a performance analysis of two distinct implementations of this IP core is presented, considering two different target technologies: a high performance FPGA device, from the Xilinx Virtex-II Pro family, and an ASIC based implementation, using a 0.18um CMOS StdCell library. Experimental results have shown that the two alternative implementations have quite similar performance levels and allow the estimation of motion vectors in real-time

Keywords: Motion Estimation IP core; Technology Comparison; Embedded Co-Processing.

Sebastiao-ACACES08.pdf Nuno Sebastião, Tiago Dias, Nuno Roma, Paulo Flores, and Leonel Sousa.
"Specialized motion estimation processor for heterogeneous multicore video coding systems". In International Summer School on Advanced Computer Architecture and Compilation for Embedded Systems (ACACES) Poster Abstracts, pages 63-66, July 13-19, 2008.
Abstract
In the last few years, several highly efficient processor architectures (e.g.: ARM, Power-PC, etc.) have been proposed to implement computational demanding multimedia applications. This paper proposes to use these capabilities to extend the main processor computational resources, in order to efficiently execute one of the most critical parts of the video coding algorithm: motion estimation. This extension is accomplished by embedding a specialized Application Specific Instruction Set Processor (ASIP) core for motion estimation that works together with a General Purpose Processor in a heterogeneous multicore environment.

Keywords: Motion Estimation; Multicore; Video Encoding.

Aksoy-SIPS07.pdf Levent Aksoy, Ece Olcay Gunes, Eduardo Costa, Paulo Flores, and José Monteiro.
"Effect of number representation on the achievable minimum number of operations in multiple constant multiplications". In IEEE Workshop on Signal Processing Systems (SIPS), pages 424-429, October 17-19, 2007. (doi:10.1109/SIPS.2007.4387585)
Abstract
In this work, we analyze the effect of representing constants under binary, CSD, and MSD representations on the minimum number of operations required in a multiple constant multiplications problem. To this end, we resort to a recently proposed algorithm that computes the exact minimum solution. To extend the applicability of this algorithm to much larger instances, we propose problem reduction and model simplification techniques that significantly reduce the search space. We have conducted experiments on a rich set of instances including randomly generated and FIR filter instances. The results show that, contrary to common belief, the binary representation clearly yields better solutions than CSD, and even provides slightly better solutions than MSD. Moreover, the superiority of the binary solutions increases as the number and bit-width of the constants increase.

Keywords: Canonical Signed Digit (CSD), Common Subexpression Elimination (CSE), Minimal Signed Digit (MSD), Multiple Constant Multiplication (MCM).

Aksoy-ECCTD07.pdf Levent Aksoy, Eduardo Costa, Paulo Flores, and José Monteiro.
"Minimum number of operations under a general number representation for digital filter synthesis". In Proceedings of European Conference on Circuit Theory and Design (ECCTD), pages 252-255, August 26-30, 2007. (doi:10.1109/ECCTD.2007.4529584)
Abstract
In this work, we introduce an algorithm for the optimization of the number of operations in the multiplier block of a digital filter based on a general number representation for the coefficients. In common subexpression elimination algorithms, constants are generally represented with the minimum number of non-zero digits based on their CSD, or MSD representations. We observe that these representations may yield a solution far from the minimum. The general number representation used in our algorithm considers a much larger set of alternative implementations of a constant, which includes the CSD and MSD representations. To cope with the increased search space, we propose model simplification and problem reduction techniques. In this paper, we show that the proposed exact algorithm using general number representation achieves a significant reduction in the number of operations, which can be up to 15 with respect to the solutions obtained under MSD representation.

Aksoy-DAC07.pdf Levent Aksoy, Eduardo Costa, Paulo Flores, and José Monteiro.
"Optimization of area in digital FIR filters using gate-level metrics". In Proceedings of Design Automation Conference (DAC), pages 420-423, New York, NY, USA, June 4-8, 2007. ACM Press. (doi:10.1145/1278480.1278588)
Abstract
In the paper, we propose a new metric for the minimization of area in the generic problem of multiple constant multiplications, and demonstrate its effectiveness for digital FIR filters. Previous methods use the number of required additions or subtractions as a cost function. We make the observation that not all of these operations have the same design cost. In the proposed algorithm, a minimum area solution is obtained by considering area estimates for each operation. To this end, we introduce accurate hardware models for addition and subtraction operations in terms of gate-level metrics, under both signed and unsigned representations. Our algorithm not only computes the best design solution among those that have the same number of operations, but is also able to find better area solutions using a non-minimum number of operations. The results obtained by the proposed exact algorithm are compared with the results of the exact algorithm designed for the minimum number of operations on FIR filters and it is shown that the area of the design can be reduced by up to 18%. General Terms Algorithms, design.

Keywords: FIR, Multiple Constant Multiplication, Area Optimization.

Morgado-ISVLSI07.pdf P. Marques Morgado, Paulo F. Flores, and L. Miguel Silveira.
"Generating realistic stimuli for accurate power grid analysis". In IEEE Computer Society Annual Symposium on VLSI (ISVLSI), pages 233-238, March 9-11, 2007. (doi:10.1109/ISVLSI.2007.47)
Abstract
Power distribution systems in integrated circuits are used to provide the voltages and currents the devices need to operate properly. As the semiconductor industry moves into deep nanometer nodes, problems like voltage drop, ground bounce and electromigration which may cause chip failures, are worsening, as more devices, operating at higher frequencies are placed closer together. Verification of a power distribution system is therefore paramount to silicon success. This type of verification is usually done by simulation, targeting a worst-case scenario, typically characterized by the almost simultaneous switching of several devices in the circuit. The definition of the worst-case situation is therefore crucial, since it influences the result of the simulation and ultimately the design target. Supposedly safe but unrealistic settings such as assuming that all signals switch simultaneously, could lead to costly over-designs in terms of die area. In this paper we describe a software tool that generates a reasonable, realistic, worst-case set of stimuli for simulation, based on timing and spatial restrictions that arise from the circuit's netlist and placement. Generating such stimuli is akin to performing a standard static timing analysis, as is done before signoff so the tool fits well within conventional design frameworks. The resulting stimuli indicates that only a fraction of the gates change in any given timing window, leading to a more robust verification methodology, specially in the dynamic case.

Keywords: Circuit CAD, Formal Verification, Integrated Circuit Design, Power Distribution, Power Grids, Software Tools, Timing.

Aksoy-ICECS06.pdf Levent Aksoy, Eduardo Costa, Paulo Flores, and José Monteiro.
"ASSUMEs: Heuristic algorithms for optimization of area and delay in digital filter synthesis". In Proceedings of the IEEE International Conference on Electronics, Circuits and Systems (ICECS), pages 748-751, December 10-13, 2006. (doi:10.1109/ICECS.2006.379897)
Abstract
In this work two heuristic algorithms are presented for the problems of optimization of area and optimization of area under a delay constraint in digital filter synthesis. The heuristics search for a solution on a combinational network that represents a covering problem using a greedy method for partial term selection. The methods start from the outputs towards the inputs for each coefficient. This top-down approach considers a much larger solution space than existing bottom-up heuristic algorithms. We present results on a wide range of instances and compare them with exact and prominent heuristic algorithms. The results demonstrate that the solutions obtained by the proposed heuristics are extremely close to the exact solutions and are significantly better than the existing heuristic algorithms.

Costa-SBCCI06.pdf Eduardo Costa, Paulo Flores, and José Monteiro.
"Exploiting general coefficient representation for the optimal sharing of partial products in MCMs". In IEEE/ACM Symposium on Integrated Circuit Design and System Design (SBCCI), pages 161-166, New York, NY, USA, August 28-31, 2006. ACM Press. (doi:10.1145/1150343.1150387)
Abstract
We propose a new algorithm that maximizes he sharing of partial terms in Multiple Cons an Multiplication (MCM) operations under a general number representation for the coefficients. MCM operations are required by many algorithms in digital signal processing and have been the subject of extensive research. By making no assumptions as to the number representation, the algorithm described in his paper is able to perform a better search for the optimal sharing of partial terms than previous methods based on MSD or CSD representations. We have applied our algorithm for the hardware minimization of FIR filers.The results show that we can ob ain solutions that require between 20% to 50 less hardware when compared agains he solutions using he MSD representation.

Keywords: Common Subexpression Elimination (CSE), Digital Filter Design, Minimal Signed Digit (MSD), Multiple Constant Multiplication (MCM).

Aksoy-DAC06.pdf Levent Aksoy, Eduardo Costa, Paulo Flores, and José Monteiro.
"Optimization of area under a delay constraint in digital filter synthesis using SAT-based integer linear programming". In Proceedings of Design Automation Conference (DAC), pages 669-674, New York, NY, USA, July 10-14, , 2006. ACM Press. (doi:10.1145/1146909.1147079)
Abstract
In this paper, we propose an exact algorithm for the problem of area optimization under a delay constraint in the synthesis of multiplierless FIR filters. To the best of our knowledge, the method presented in this paper is the only exact algorithm designed for this problem. We present the results of the algorithm on real-sized filter instances and compare with an improved version of a recently proposed exact algorithm designed for the minimization of area. We show that in many cases delay can be minimized without any area penalty. Additionally, we describe two approximate algorithms that can be applied to instances which cannot be solved, or take too long, with the exact algorithm. We show that these algorithms find similar solutions to the exact algorithm in less CPU time.

Flores-ICCAD05.pdf Paulo Flores, José Monteiro, and Eduardo Costa.
"An exact algorithm for the maximal sharing of partial terms in multiple constant multiplication". In Proceedings of IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pages 13-16, Washington, DC, USA, November 6-10, 2005. IEEE Computer Society. (doi:10.1109/ICCAD.2005.1560032)
Abstract
In this paper, we propose an exact algorithm that maximizes the sharing of partial terms in multiple constant multiplication (MCM) operations. We model this problem as a Boolean network that covers all possible partial terms which may be used to generate the set of coefficients in the MCM instance. The PIs to this network are shifted versions of the MCM input. An AND gate represents an adder or a subtracter, i.e., an AND gate generates a new partial term. All partial terms that have the same numerical value are ORed together. There is a single output which is an /spl and/ over all the coefficients in the MCM. We cast this problem into a 0-1 integer linear programming (ILP) problem by requiring that the output is asserted while minimizing the total number of AND gates that evaluate to one. A SAT-based solver is used to obtain the exact solution. We argue that for many real problems the size of the problem is within the capabilities of current SAT solvers. We present results using binary, CSD and MSD representations. Two main conclusions can be drawn from the results. One is that, in many cases, existing heuristics perform well, computing the best solution, or one close to it. The other is that the flexibility of the MSD representation does not have a significant impact in the solution obtained.

Costa-ECCTD05.pdf Eduardo da Costa, Paulo Flores, and José Monteiro.
"Maximal sharing of partial terms in MCM under minimal signed digit representation". In Proceedings of European Conference on Circuit Theory and Design (ECCTD), volume II, pages 221-224, August 28-September 2, 2005. (doi:10.1109/ECCTD.2005.1523033)
Abstract
We propose a new algorithm that maximizes the sharing of partial terms in Multiple Constant Multiplication (MCM) operations. MCM operations are required by many algorithms in digital signal processing and have been the subject of extensive research. Recently, the Minimal Signed Digit (MSD) number representation has been proposed as an extension to the Canonical Signed Digit (CSD) representation. By properly exploiting the redundancy of the MSD representation, the hardware implementation can be significantly optimized. The initial algorithm described in this paper is able to perform a better search for the optimal sharing of the redundant coefficient representations under MSD than previous methods. However, during its search the depth of adder-steps is not considered. We present a modified version of this algorithm that is able to reduce the maximum depth of partial terms at the expense of some extra hardware. The results show that for more complex problems our algorithm performs significantly better than previous approaches, in some cases obtaining solutions that require 25% less hardware.

Keywords: Adders, Digital Arithmetic, Digital Signals, Multiplying Circuits.

Flores-ISCAS99.pdf Paulo Flores, Horácio Neto, Krishnendu Chakrabarty, and João Marques-Silva.
"Test pattern generation for width compression in BIST". In Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS), volume 1, pages 114-118, May 30 - June 2, 1999. (doi:10.1109/ISCAS.1999.777818)
Abstract
The main objectives of built-in self test (BIST) are the design of test pattern generator circuits which achieve the highest fault coverage, require the shortest sequence of test vectors and utilize the minimum circuit area. This paper targets the problem of generating test patterns for stuck-at faults that induce compatibility relations between the primary inputs of the circuit under test. These compatibility relations can be used for designing counter-based test generator circuits with a reduced number of bits, thus requiring smaller testing time and smaller area. The proposed solution is based on an integer linear programming (ILP) formulation that builds on existing propositional satisfiability (SAT) models for test pattern generation. An ATPG tool for minimum test pattern generation for width compression (MTP-C) is described, which illustrates the practical applicability of our approach for a wide range of benchmark circuits

Keywords: VLSI, Automatic Test Pattern Generation, Built-In Self Test, Fault Diagnosis, Integer Programming, Linear Programming, Logic Testing; BIST, Counter-Based Test Generator Circuits, Fault Coverage, Integer Linear Programming, Minimum Circuit Area, Minimum Test Pattern Generation, Propositional Satisfiability Models, Width Compression.

Flores-GLS99.pdf Paulo Flores, Horácio Neto, and João Marques-Silva.
"On applying set covering models to test set compaction". In Proceedings of Great Lakes Symposium on VLSI (GLS-VLSI), pages 8-11. IEEE Computer Society, March 4-6, 1999. (doi:10.1109/GLSV.1999.757365)
Abstract
Test set compaction is fundamental problem in digital system testing. In recent years, many competitive solutions have been proposed, most of which based on heuristics approaches. This paper studies the application of set covering models to the compaction of test sets, which can be used with any heuristic test set compaction procedure. For this purpose, recent and highly effective set covering algorithms are used. Experimental evidence suggests that the size of computed test sets can often be reduced by using set covering models and algorithms. Moreover a noteworthy empirical conclusion is that it may be preferable not to use fault simulation when the final objective is test set compaction

Keywords: Automatic Test Pattern Generation, Integrated Circuit Testing, Logic Testing, Minimisation.

Flores-VLSID99.pdf Paulo Flores, José Costa, Horácio Neto, José Monteiro, and João Marques-Silva.
"Assignment and reordering of incompletely specified pattern sequences targetting minimum power dissipation". In VLSID '99: Proceedings of the 12th International Conference on VLSI Design - 'VLSI for the Information Appliance', pages 37-41, Washington, DC, USA, January 10-13, 1999. IEEE Computer Society.
Flores-ICCD98.pdf Paulo F. Flores, Horácio C. Neto, and João P. Marques-Silva.
"An exact solution to the minimum size test pattern problem". In Proceedings of the IEEE International Conference on Computer Design (ICCD), pages 510-515, October 5-7, 1998. (doi:10.1109/ICCD.1998.727097)
Abstract
This paper addresses the problem of test pattern generation for single stuck-at faults in combinational circuits, under the additional constraint that the number of specified primary input assignments is minimized. This problem has different applications in testing including the identification of don't care conditions to be used in the synthesis of Built-In Self- Test (BIST) logic. The proposed solution is based on an integer linear programming (ILP) formulation which builds on an existing propositional satisfiability (SAT) model for test pattern generation. The resulting ILP formulation is linear on the size of the original SAT model for test generation, which is linear on the size of the circuit. Nevertheless, the resulting ILP instances represent complex optimization problems, that require dedicated ILP algorithms. Preliminary results on benchmark circuits validate the practical applicability of the test pattern minimization model and associated ILP algorithm

Keywords: Automatic Test Pattern Generation, Built-In Self Test, Combinational Circuits, Computability, Integer Programming, Linear Programming, Logic Testing.

Flores-IWLS98-power.pdf José C. Costa, Paulo F. Flores, Horácio C. Neto, José C. Monteiro, and João P. Marques-Silva.
"Power reduction in BIST by exploiting don't cares in test patterns". In IEEE/ACM International Workshop on Logic Synthesis (IWLS), pages 375-386, June 1998.
Flores-IWLS98-mtp.pdf Paulo F. Flores, Horácio C. Neto, and João P. Marques-Silva.
"An exact solution to the minimum-size test pattern problem". In IEEE/ACM International Workshop on Logic Synthesis (IWLS), pages 452-470, June 1998.
Flores-ETW98-power.pdf José C. Costa, Paulo F. Flores, Horácio C. Neto, José C. Monteiro, and João P. Marques-Silva.
"Exploiting don't cares in test patterns to reduce power during BIST". In IEEE European Test Workshop (ETW), pages 34-36, May 1998.
Flores-ETW98-mtp.pdf Paulo F. Flores, Horácio C. Neto, Krishnendu Chakrabarty, and João P. Marques-Silva.
"A model and algorithm for computing minimum-size test patterns". In IEEE European Test Workshop (ETW), pages 147-148, May 1998.
Flores-CTC98-power.pdf José C. Costa, Paulo F. Flores, José C. Monteiro, and João P. Marques-Silva.
"Power reduction in BIST by exploiting don't cares in test patterns". In Proceedings of Cadence Technical Conference (CTC), May 1998.
Flores-CTC98-mtp.pdf Paulo F. Flores, Horácio C. Neto, Krishnendu Chakrabarty, and João P. Marques-Silva.
"A model and algorithm for computing minimum-size test patterns". In Proceedings of Cadence Technical Conference (CTC), May 1998.
Manquinho-ICTAI97.pdf Vasco Manquinho, Paulo Flores, João Marques-Silva, and Arlindo L. Oliveira.
"Prime implicant computation using satisfiability algorithms". In Proceedings of the IEEE International Conference on Tools with Artificial Intelligence (ICTAI), pages 232-239, November 3-8, 1997. (doi:10.1109/TAI.1997.632261)
Abstract
The computation of prime implicants has several and significant applications in different areas, including automated reasoning, non-monotonic reasoning, electronic design automation, among others. The authors describe a new model and algorithm for computing minimum-size prime implicants of propositional formulas. The proposed approach is based on creating an integer linear program (ILP) formulation for computing the minimum-size prime implicant, which simplifies existing formulations. In addition, they introduce two new algorithms for solving ILPs, both of which are built on top of an algorithm for propositional satisfiability (SAT). Given the organization of the proposed SAT algorithm, the resulting ILP procedures implement powerful search pruning techniques, including a non-chronological backtracking search strategy, clause recording procedures and identification of necessary assignments. Experimental results, obtained on several benchmark examples, indicate that the proposed model and algorithms are significantly more efficient than other existing solutions

Keywords: Backtracking, Computability, Integer Programming, Linear Programming, Nonmonotonic Reasoning, Search Problems.

Flores-DCIS96.pdf Paulo Flores and Horácio Neto.
"Register transfer level VHDL block generation". In Proceedings of Design of Integrated Circuits and Systems Conference (DCIS), November 1996.

National Conferences

5 references, last updated Tue Jan 20 17:02:18 2015

Cabrita-REC13.pdf José Cabrita, Gilberto Rodrigues, and Paulo Flores.
"Hardware accelerator for biological sequence alignment using coreworks® processing engine". In IX Jornadas sobre Sistemas Reconfiguráveis (REC), pages 83-88, February 7-8, 2013.
Abstract
Several algorithms exist for biological sequence alignment. The Smith-Waterman (S-W) algorithm is an exact algorithm that uses dynamic programming for local sequence alignment. Some implementations in software for General Purpose Processors (GPP) as well in hardware (using Field Programmable Gate Array (FPGA)) exist. In this paper it is proposed an implementation of the S-W algorithm for DNA, RNA and amino acids sequence alignment that uses the Coreworks® processing engine. The processor FireWorksTM will be used to control a hardware accelerator named SideWorksTM both developed by Coreworks®. In this paper is proposed an architecture based on Process Elements (PE) to be implemented in SideWorksTM accelerator template with the purpose of accelerating the S-W algorithm. The developed application is able to read sequences from a file, align them with a library of sequences and present the results for the best local alignments using the Coreworks® processing engine.

Keywords: DNA; Bioinformatics; Sequence Alignment; Smith-Waterman algorithm; Field Programmable Gate Array (FPGA); Cell Updates Per Second (CUPS); Platform Design; SideWorks; FireWorks

Sebastiao-REC10.pdf Nuno Sebastião, Nuno Roma, and Paulo Flores.
"Scalable accelerator architecture for local alignment of DNA sequences". In VI Jornadas sobre Sistemas Reconfiguráveis (REC), pages 59-66. Universidade de Aveiro / IEETA, February 4-5, 2010. BEST PAPER AWARD.
Abstract
The Smith-Waterman algorithm is widely used to determine the optimal sequence alignment between two DNA sequences. This paper presents an innovative method to significantly reduce the computation time and memory space requirements of the traceback phase of this alignment algorithm. It also presents a flexible and scalable hardware architecture for accelerating such method, which can be easily expandable by the interconnection of several FPGAs. The results obtained from an implementation using a Virtex-5 FPGA showed that the proposed method is highly feasible in order to provide significant gains in terms of the overall performance of the whole alignment procedure when long sequences are processed. The obtained results also showed that it is preferable to span the array of processing elements through several FPGAs, rather than reusing the hardware resources of the individual array.

Keywords: DNA; Local Sequence Alignment; Hardware Accelerator; FPGA

Dias-REC08.pdf Tiago Dias, Nuno Sebastião, Nuno Roma, Paulo Flores, and Leonel Sousa.
"Programmable IP core for motion estimation: Comparison of FPGA and ASIC based implementations". In IV Jornadas sobre Sistemas Reconfiguráveis (REC). Universidade do Minho, February 7-8, 2008.
Abstract
A performance analysis of two distinct implementations of a recently proposed quite efficient motion estimation coprocessor is presented. This comparison considers two distinct implementation technologies: a high performance FPGA device, from Xilinx Virtex-II Pro family, and an ASIC based implementation, using a 0.18µ m CMOS standard cells library. Experimental results have shown that the two considered implementations present quite similar performance levels and allow the estimation of motion vectors in real-time. Nevertheless, the reconfigurability properties of the FPGA implementation allow the motion estimator to dynamically adapt the video encoder to the characteristics of the target application and/or of the communication channel.

Flores-OE95.pdf Paulo Flores and Horácio Neto.
"Verificação formal de circuitos digitais". In Encontro Nacional do Colégio de Engenharia Electrotécnica, pages 295-307. Ordem dos Engenheiros, December 1995.
Sousa-OE91.pdf Ana Sousa, José Abreu, and Paulo Flores.
"VHDL - novo suporte às metodologias de CAD". In Jornadas Nacionais de Projecto, Planeamento e Produção, (Projecto Assistido por Computador), pages 79-83. Ordem dos Engenheiros, December 1991.

Technical Reports

26 references, last updated Tue Jan 20 17:02:18 2015

Gomes-RT2-2014.pdf Ana Gomes, Jorge R. Fernandes, and Paulo Flores.
"Quaternary logic look-up table: an alternative circuit". Technical Report RT 2/2014, INESC-ID, February 2014.
Abstract
The objective of this work is to make a brief introduction to the multi-value logic and the quaternary FPGA state-of-the-art. Furthermore we also propose the research and implementation of additional structures needed to achieve a fully functional quaternary FPGA.

Marques-INESCid12-28.pdf Ricardo Marques, Luis G. Silva, Paulo Flores, and L. Miguel Silveira.
"Improving SAT solver efficiency using a cooperative multicore approach". Technical Report TR No. 28/2012, INESC-ID, October 2012.
Abstract
Many problems in Computer-Aided Design of Electronic Systems are addressed by converting them to a sequence of SAT problems solved with a state of the art SAT solver. Typical applications include problems in testing, timing, verification, routing, etc. Despite the enourmous progress achieved over the last decade in the development of SAT solvers, there is strong demand for higher algorithm efficiency to solve harder and larger problems. The widespread availability of multi-core, shared memory parallel environments provides an opportunity for such improvements. In this paper we present our results on improving the effectiveness of standard SAT solvers on such architectures, through a portfolio approach. Multiple instances of the same basic solver using different heuristic strategies for search-space exploration and problem analysis share information and cooperate towards the solution of a given problem. Results from application of our methodology to known problems from SAT competitions and EDA problems show relevant improvements over the state of the art and yield the promise of further advances in CAD of electronic systems.

Marques-INESCid12-20.pdf Ricardo Marques, Luis G. Silva, Paulo Flores, and L. Miguel Silveira.
"cmcSAT - a cooperative multicore SAT solver". Technical Report TR No. 20/2012, INESC-ID, July 2012.
Abstract
Abstract-In this paper we present our results on exploiting multi-core shared memory architectures to improve the effectiveness of standard SAT solvers. Many problems in Electronic Design Automation (EDA) as well as many other fields can be converted to a SAT problem or a sequence of SAT problems and solved with a state of the art SAT solver. In EDA typical applications include problems in testing, timing, verification, routing, etc. Despite the enourmous progress achieved over the last decade in the development of SAT solvers, there is strong demand for higher algorithm efficiency to solve harder and larger problems. The widespread availability of multi-core, shared memory parallel environments provides an opportunity for such improvements and in this paper we present a cooperative approach to improving SAT solver efficiency and capacity. Multiple instances of the same basic solver using different heuristic strategies for search-space exploration and problem analysis share information and cooperate towards the solution of a given problem. Results from application of our methodology to known problems from SAT competitions and EDA problems show relevant improvements over the state of the art and yield the promise of further advances.

Sebastiao-INESCid09-amep.pdf Nuno Sebastião, Nuno Roma, and Paulo Flores.
"Insertion and improvement of testability mechanisms on a specialized multimedia ip core". Technical Report TR No. 32/2009, INESC-ID, May 2009.
Abstract
The set of custom validation mechanisms and test structures that were integrated in an ASIC implementation of a specialized multimedia IP core are presented in this paper. The dedicated test structures include a custom memory BIST controllers, which are capable of providing the addresses of all the faulty memory positions, a set of two scan chains and a JTAG compliant interface. Moreover, dedicated hardware structures were also proposed and included in the original IP core design, in order to provide extra test control points that significantly improve the controllability on specific locations of the circuit. An AT91SAM9263-EK board was used as an ATE equipment which required the development of dedicated software to support the STIL format for test vector generation. The obtained results demonstrated that the original circuit test coverage may be increased by a factor of 11.6, due to the use of scan chains and additional test control points placed at specific locations within the IP core. Moreover, the corresponding test pattern generation time was reduced by a factor of 1560 when these test structures were used. Despite the complexity of the circuit, the obtained results also show that the entire ASIC can be tested in less than 70ms (using a 100MHz clock).

Keywords: IP Core Test; Scan Chains; BIST; JTAG;

Flores-INESCid05-mcm.pdf Paulo Flores, José Monteiro, and Eduardo Costa.
"Maximal sharing of partial terms in multiple constant multiplications: Analysis of an exact algorithm". Technical Report TR/14/2005, INESC-ID, April 2005.
Flores-INESC00-mtp.pdf Paulo Flores, Horácio Neto, and João Marques-Silva.
"On computing minimum size test patterns". Technical Report RT/008/2000, INESC, September 2000.
Flores-EL98-icb.pdf Paulo Flores and Horácio Neto.
"Biblioteca de blocos de CI 1a versão". Actividade 3.1 - relatório d3.9, Projecto EuroLasic (Praxis XXI 2/2.1/TIT/1643/95), May 1998.
Flores-EL98-fluxo.pdf Paulo Flores.
"Fluxo de projecto para circuitos mistos". Actividade 1.1 - relatório d1.1, Projecto EuroLasic (Praxis XXI 2/2.1/TIT/1643/95), May 1998.
Flores-INESC97-mtp.pdf Paulo F. Flores, João P. Marques-Silva, Horácio Neto, and Krishnendu Chakrabarty.
"An exact solution to the minimum-size test pattern problem". Technical Report RT/12/97, INESC/IST, December 1997.
Flores-INESC97-ilp.pdf Paulo Flores.
"Síntese de alto nível utilizando programação linear inteira". Technical report, Relatório interno INESC/IST, May 1997.
Flores-INESC96-ctl.pdf Paulo Flores.
"Utilização de diagramas de decisão binária (BDDs) para representar máquinas de estado (FSM) e verificar fórmulas de lógica temporal (CTL)". Technical report, Relatório interno INESC/IST, July 1996.
Flores-INESC96-vhdl2blif.pdf Paulo Flores.
"Mapeamento de VHDL para BLIF-MV". Technical report, Relatório interno INESC/IST, January 1996.
Flores-QC95-tools.pdf Paulo Flores.
"PC based systhesis tool". Technical report, Project QuickChips (ESPRIT 6043), April 1995.
Flores-QC95-icb.pdf Paulo Flores and Leonel Simões.
"IC blocks reference manual". Technical report, Project QuickChips (ESPRIT 6043), April 1995.
Flores-QC94-tools.pdf Paulo Flores.
"Demonstration of tools". Technical report, Project QuickChips (ESPRIT 6043), May 1994.
Flores-QC94-sci.pdf Paulo Flores.
"Smart card interface". Technical report, Project QuickChips (ESPRIT 6043), May 1994. CONFIDENTIAL.
Flores-QC94-lang.pdf Paulo Flores.
"Selected languages and formats". Technical report, Project QuickChips (ESPRIT 6043), May 1994.
Flores-QC93-tools.pdf Paulo Flores.
"Set of synyhesis tools". Technical report, Project QuickChips (ESPRIT 6043), November 1993.
Flores-JNICT93.pdf Paulo Flores.
"Ambiente de sítese para projecto de circuitos digitais". Technical report, Projecto JNCIT PMCT/856/90, October 1993.
Flores-JNICT92.pdf Paulo Flores.
"Metodologias de síntese em VHDL". Technical report, Projecto JNICT PMCT/856/90, October 1992.
Flores-JNICT91-vhdl.pdf Paulo Flores.
"Aplicação da linguagem VHDL na especificação funcional de sistemas em ambiente de síntese". Technical report, Projecto JNICT PMCT/856/90, October 1991.
Flores-JNICT91-map.pdf Paulo Flores.
"Mapeamento em células standard". Technical report, Projecto JNICT PMCT/856/90, October 1991.
Flores-QC91-mic30.pdf Paulo Flores.
"MIC30 - a telecom circuit design to teste the CAD-VTQL interface constraints". Technical report, Projecto ESPRIT 5047, September 1991.
Flores-QC91-vhdl.pdf Paulo Flores.
"VHDL - a short review of basic concepts". Technical report, Projecto QuickChips (ESPRIT 5047), March 1991.
Silveira-PART88.pdf Luís M. Silveira, Paulo Flores, and Paulo Bicudo.
"A static partitioning tool for circuit simulation". Technical report, Projecto ESPRIT 1059, 1988.
Silveira-CANELA88.pdf Luís M. Silveira and Paulo Flores.
"Parallel circuit simulation - an accelerated of cinnamon". Technical report, Projecto ESPRIT 1058, 1988.

Industrial Related

1 references, last updated Tue Jan 20 17:02:19 2015

Patente-PT105282A-INPI12.pdf Jorge Fernandes, Cristiano Lazzari, Paulo Flores, and José Monteiro.
"Tabela multi-valor para dispositivos lógicos programáveis". Patent Nº 105282, Instituto Nacional da Propriedade Industrial (INPI), Portugal, July 7, 2012.
Abstract
A presente invenção consiste num novo produto para implementação de tabelas multi-valor para dispositivos lógicos programáveis que tem como entrada e saída sinais digitais multi-valor enquanto o processamento interno É realizado por funções lógicas digitais binárias. O novo produto aqui proposto consiste em utilizar circuitos de natureza analógica para fazer a conversão de sinais digitais multi-valor em sinais digitais binários, utilizando tecnologias mos convencionais com apenas um valor de Vth e um valor de tensão de alimentação. Os sinais binários atuam nos interruptores da tabela que permitem selecionar um dos multi-valores lógicos a colocar na saída, maximizando o seu desempenho por garantir níveis de tensão máximos.

Keywords: Multi-value lookup table for programable devices

Other Publications

15 references, last updated Tue Jan 20 17:02:19 2015

Marques-SATcompetiton14.pdf Ricardo Marques, Luis Guerra e Silva, Paulo Flores, and L. Miguel Silveira.
"pmcSAT". (Version 2.0) Participation in the SAT Competition 2014, July 2014. Bronze Medeal in the category: Core Solvers, Parallel, Hard-combinatorial SAT+UNSAT.
Abstract
This document describes the SAT solver PMC SAT, a conflict-driven clause learning (CDCL) portfolio solver that launches multiple instances of the same basic solver using different heuristic strategies, for search-space exploiting and problem analysis, which share information and cooperate towards the solution of a given problem.

Marques-SATcompetiton13.pdf Ricardo Marques, Luis Guerra e Silva, Paulo Flores, and L. Miguel Silveira.
"pmcSAT". (Version 1.0) Participation in the SAT Competition 2013, July 2013. Bronze Medeal in the category: Core Solvers, Parallel, Hard-combinatorial SAT+UNSAT.
Abstract
This document describes the SAT solver PMC SAT, a conflict-driven clause learning (CDCL) portfolio solver that launches multiple instances of the same basic solver using different heuristic strategies, for search-space exploiting and problem analysis, which share information and cooperate towards the solution of a given problem.

Aksoy-UnivBooth-DATE12.pdf Levent Aksoy, Eduardo Costa, Paulo Flores, and José Monteiro.
"Multicon - Multiplierless Design of Low-Complexity and High-Speed DSP Systems". Poster at University Booth: Session 5. In IEEE/ACM Design, Automation and Test in Europe Conference (DATE), March 12-16, 2012.
Abstract
The multiplication of data samples with constant coefficients is a ubiquitous operation and performance bottleneck in many Digital Signal Processing (DSP) systems, such as, digital Finite Impulse Response (FIR) filters, Discrete Cosine Transforms (DCTs), filter banks, and error correcting codes. Since the realization of a multiplication operation in hardware is expensive in terms of area, delay, and power dissipation and the constants to be multiplied by variable(s) are determined by the DSP algorithms beforehand, the constant multiplications are generally realized in a shift-adds architecture where each constant multiplication is realized using addition, subtraction, and shift operations. Hence, the fundamental optimization problem is defined as finding the minimum number of addition and subtraction operations that realize the constant multiplications. Over the years, many efficient high-level algorithms have been introduced for the multiplierless design of constant multiplications. However, the optimization of the number of operations does not always yield a design with the minimum area at gate-level. Thus, the high-level algorithms should take into account the gate-level implementation cost of each addition and subtraction operation considering the design platform, i.e., ASIC or FPGA. Moreover, performance is also a crucial parameter in many DSP systems and circuit area is generally expandable in order to achieve a given performance target. Although the delay parameter is dependent on several implementation issues, such as placement and routing, the delay of a shift-adds design of constant multiplications is generally considered in terms of the number of adder-steps, which denotes the maximal number of adders and subtracters in series. Hence, to find the optimal tradeoff between area and delay, the high-level algorithms should take into account both the gate-level implementation costs of operations and the adder-step of the design. Furthermore, power consumption has become a matter of concern with the increasing popularity of portable electronic devices that include many DSP systems. As indicated in the switching activity and power dissipation estimation models, power consumption in the multiplierless design of constant multiplications is highly related with the depth of each operation and its gate-level area at gate-level. Hence, the high-level algorithms should consider these metrics in order to reduce the power dissipation. In the U Booth of DATE12, we would like to present our high-level algorithms designed for the multiplierless realization of constant multiplications considering the area, delay, and power dissipation parameters. We will demonstrate the efficiency of high-level algorithms at both high-level and gate-level on DSP systems, such as FIR filters and DCTs, compared with the previously proposed algorithms. This work is partially supported by the Portuguese Foundation for Science and Technology (FCT) research project PTDC/EIA-EIA/103532/2008 Multicon - Architectural Optimization of DSP Systems with Multiple Constants Multiplications. Our expertise on this research area can be found in the Multicon project website, http://algos.inesc-id.pt/multicon.

Daitx-UFRGS07.pdf Fábio Fabian Daitx, Eduardo Costa, Paulo Flores, and Sergio Bampi.
"Uma ferramenta para geração automática de VHDL para filtros FIR otimizados". Reserch poster on the Universidade Federal do Rio Grande do Sul (UFRGS), August 2007.
Keywords: FIR, Filter design, CAD filter tool

Morgado-DAC07.pdf P. Marques Morgado, Paulo F. Flores, and L. Miguel Silveira.
"Generating realistic stimuli for accurate power grid analysis". Poster on SIGDA University Booth at the Design Automation Conference (DAC), June 4-8, 2007.
Keywords: Circuit CAD, Formal Verification, Integrated Circuit Design, Power Distribution, Power Grids, Software Tools, Timing

Flores-IST03-webpack.pdf Paulo Flores and Horácio Neto.
"Tutorial sobre o Xilinx WebPACK". IST, 2003. Para as cadeiras de CID e SID.
Flores-IST94-vhdl.pdf Paulo Flores.
"VHDL para síntese". Guia de programação para alunos de TFC e Mestrado, Instituto Superior Técnico, October 1994.
Flores-IST94-synopsys.pdf Paulo Flores.
"Introdução ao sistema da synopsys". Guia para alunos de TFC e Mestrado, Instituto Superior Técnico, October 1994.
Flores-IST93-lab-me.pdf Paulo Flores.
"Introdução ao sistema de viewlogic (simulação VHDL, síntese, simulação lógica)". Guia de laboratório para a cadeira de Microelectrónica, Instituto Superior Técnico, April 1993.
Flores-II92.pdf Paulo Flores.
"Metodologias de síntese". Relatório para a cadeira de Introdução à Investigação, September 1992.
Tavares-COMP92.pdf Emília Tavares and Paulo Flores.
"Alindador de VHDL". Relatório para a cadeira de Compiladores, April 1992.
Monteiro-TASC91.pdf José Monteiro and Paulo Flores.
"Simulador orientado por objectos". Relatório para a cadeira de Tópicos Avançados de Simulação de Circuitos, June 1991.
Monteiro-POO91.pdf José Monteiro and Paulo Flores.
"Simulador orientado por objectos". Relatório para a cadeira de Programação Orientada para Objectos, June 1991.
Monteiro-DB90.pdf José Monteiro, Maria Moreno, Paulo Flores, and Sérgio Marcos.
"Sistemas de informação para biblioteca". Relatório para a cadeira de Base de Dados, September 1990.
Monteiro-TAAC90.pdf José Monteiro and Paulo Flores.
"Algoritmos de mapeamento dinâmico num hipercubo". Relatório para cadeira de Tópicos Avançados em Arquitecturas de Computadores, March 1990.