# FPGA Implementation of 32 Tap FIR Filter with Multi Hierarchy Pipeline Architecture

Sridevi Sriadibhatla<sup>1</sup> and P.L.H. Varaprasad<sup>2</sup>

<sup>1</sup>Associate Professor, Dept. of Electronics & Communication Engg., Sri Prakash College of Engg., Rajupeta, Tuni, India <sup>2</sup>Professor, Dept. of Instrumentation Engg., Andhra University College of Engg., Visakhapatnam, India E-mail: sridevisriadibhatla@gmail.com

#### Abstract

This paper presents an efficient architecture for FIR filter using multi hierarchy pipelining. By choosing inner clock frequency several times more than the input sampling frequency multiply and accumulate components can be shared and hence the area and delay are optimized. An N tap FIR filter can be divided into N/M groups and two hierarchies of pipelining stages are inserted between N/M groups, within N/M groups and another five within a pipelined multiply and accumulate component (MAC). The design is implemented on vertex v50efg256-7 FPGA and the results presented, show that the proposed filter is optimized in area and delay.

Keywords: FIR Filter, Multi hierarchy pipelining, MAC, FPGA.

## Introduction

Finite duration impulse response (FIR) filters are very much useful in many applications such as digital communications, digital signal processing. They are always stable and can achieve linear phase response.

The output of an FIR filter can be described by the following equation

$$y(n) = \sum_{K=0}^{N-1} h_k x_{n-k}$$

Where N is the number of taps of the filter, x is the input data stream,  $h_k$  is the k<sup>th</sup> tap coefficient and y is the output data stream. The input data is multiplied by the coefficients. The sum of all these multiplications gives the filter output.

The critical problem in designing FIR filter is large number of multipliers which

leads to excessive area and power consumption. Methods such as simplifying FIR architecture, improving multiplier and reordering coefficient are used to optimize the area and power.[1]

There are two popular structures to implement FIR filters, direct and transposed shown in figure 1. In the direct form, delay units are placed between multipliers. At a time, the present input x(n) and previous samples of the input are fed to each multiplier input and the output y(n) is sum of product of every multiplier.

In the transposed form, delay units are placed between adders so that input can be fed to multipliers simultaneously. However in both the forms the total number of multipliers required is equal to the number of taps which increase the cost, area and power consumption.

In the proposed architecture the number of multipliers and adders are potentially decreased and a novel pipelined multiply and accumulate (MAC) is used.

The paper is organised as follows; section 2 focuses on proposed multi hierarchy scheme. The design of 32 tap FIR filter and FPGA synthesis results are presented in section 3 and conclusion in section 4.



Figure 1.1: FIR filter architecture. (a)Direct (b) Transposed.

# **FIR Filter Design**

## Single pipeline architecture

In the single pipeline implementation of FIR filter, one pipeline stage is inserted between the multipliers and the adder tree as shown in figure 2.1. In this form the critical path delay is reduced. The number of required multipliers is still equal to the number of taps of the filter.



Figure 2.1: Single Pipeline Architecture.

#### Two hierarchy pipeline architecture

In this design, by adapting two different clock frequencies, one sampling frequency for delay units and another higher clock frequency for multiplying and adding computations, multipliers and adders can be shared. In two hierarchy pipeline architecture the N tap FIR filter is separated in to N/M groups by adding (N/M)-1 delayed unit every M taps. Each group share one multiply and accumulating component (MAC)[2].In each group, the multiplications and additions can be computed by faster clock i.e., multiplying and adding operations in one group can be completed in M cycles of higher clock frequency or one cycle of sampling frequency. Within a group only a single multiply and accumulate component (MAC) can be used. The number of groups determines the number of multiply and accumulates components (MAC). [2]

The two hierarchy pipeline architecture is shown in figure 2.2. The first hierarchy of pipelining is between groups with N/M stages as shown in figure 2.3. The second hierarchy pipeline stages are within each group between multiplexers and multiply and accumulate components.



Figure 2.2: Two Level Pipeline Architecture.



Figure 2.3: First Hierarchy of FIR Filter.

# Multi hierarchy pipeline architecture

We propose a multi hierarchy pipeline architecture in which a pipelined MAC is used in the place of earlier MAC.A high speed and high throughput Multiplier-Accumulator is (MAC) essential to achieve a high performance digital signal processing system.[3]

A conventional MAC unit consists of multiplier and an accumulator that contains the sum of the previous consecutive products. The function of the MAC unit is given by the equation



Figure 2.4: MAC Architecture.

A pipelined MAC employs high speed and throughput compared to the nonpipelined structures.[5] The proposed pipelined MAC is shown in fig 2.5. The multiplier used here is an array multiplier consisting of carry save adder with three levels of pipeline stages. The adder is also of carry save type and the accumulator register is carry propagate adder. Another two pipeline registers are placed between multiplier and adder and also between adder and accumulator. The proposed pipelined MAC is thus consisting of five levels of pipelining.

The multi hierarchy pipelined structure of 32 tap FIR filter is shown in fig 2.6. It consists of four groups and in each group, two multiplexer's data mux and coefficient mux are present and they multiplex the input data stream and coefficient h under the control of inner clock. The initial value of sum and carry from MAC is zero and after N/M stages final sum and carry will be computed. A carry propagate adder will give the filter output y.



Figure 2.5: Pipelined MAC.



Figure 2.6: Multi Hierarchy pipelined FIR filter.

The reduction of MAC components from N to N/M reduces the area and also use of pipelined MAC reduces the delay.

## Simulation and synthesis results

32 tap FIR filter with multi hierarchy pipeline structure is designed and implemented on vertex v50efg256-7 FPGA and the results are compared with that of hierarchy pipelined FIR filter. Table 1 gives the specifications of 32 tap FIR low pass filter.

The inner clock frequency chosen for controlling MAC components fi=120MHz. M=inner clock frequency / sampling frequency =fi/fs=120MHz/15MHz=8.The total number of groups=N/M = 32/8 = 4. Hence 3 delay units (N/M)-1 are added between four groups.

| Filter                       | Low pass |
|------------------------------|----------|
| Normalized cut off frequency | 0.5      |
| Window used                  | Hamming  |
| Sampling frequency           | 15 MHz   |
| Cut-off frequency            | 3.2MHz   |

Table 1: Specifications of 32 tap FIR filter.

Table II gives the macro statics, device utilization of multi hierarchy pipeline FIR filter.

| # Registers                | 55 |
|----------------------------|----|
| # 3-bit Register           | 4  |
| # 4-bit Register           | 47 |
| # 9-bit Register           | 4  |
| # Multiplexers             | 8  |
| # 4-bit 8 to 1 Multiplexer | 8  |
| # Adders/ Substractors     | 4  |
| # 9-bit adder              | 4  |
| # Xors                     | 4  |

 Table II (a): Macro Statistics.

| Table II (b): Device Utilization Summary | y. |
|------------------------------------------|----|
|------------------------------------------|----|

| Number of slices           | 135 out of 768 (17%)  |
|----------------------------|-----------------------|
| Number of slice flip flops | 175 out of 1536 (11%) |
| Number of 4 input LUTs     | 136 out of 1536 (8%)  |
| Number of bonded IOBs      | 108 out of 180 (60%)  |
| Number of GCLKs            | 2 out of 4 (50%)      |

Table III shows the FPGA synthesis results for two and multi hierarchy pipelined FIR filters.

Table III: Synthesis Results for two hierarchy and multi hierarchy FIR filter.

|                   | Multi hierarchy            | Two hierarchy              |
|-------------------|----------------------------|----------------------------|
| Number of slices  | 135 out of 768 (17%)       | 181 out of 768 (23%)       |
| Number of slice   | 175 out of 1536 (11%)      | 226 out of 1536 (14%)      |
| flip flops        |                            |                            |
| Number of 4 input | 136 out of 1536 (8%)       | 187 out of 1536(12%)       |
| LUTs              |                            |                            |
| Number of bonded  | 108 out of 180 (60%)       | 141 out of 180(78%)        |
| IOBs              |                            |                            |
| Number of GCLKs   | 2 out of 4 (50%)           | 2 out of 4 (50%)           |
| Minimum period    | 5.003ns (maximum frequency | 6.741ns (maximum frequency |
|                   | 199.80Mhz)                 | 148.346Mhz)                |

Figure 4 shows the RTL schematic and floor plan of FIR filter.



Figure 4(a): RTL Schematic of FIR filter.

| · · · · · · · · · · · · · · · · · · · |  |
|---------------------------------------|--|
|                                       |  |
|                                       |  |
|                                       |  |
|                                       |  |

Figure 4(b): Floor Plan of FIR filter.

The results show that multi hierarchy pipelined FIR filter occupies smaller area and reduces the delay.

# Conclusion

We proposed a multi hierarchy pipelined FIR filter. It is implemented on vertex v50efg256-7 FPGA and the synthesis report is presented. The results show that the proposed design occupies less area and minimizes delay and is flexible for FPGA implementation.

# References

- [1] Sung-Mo Kang and Yusuf Leblebici, "CMOS Digital integrated circuits", Tata McGraw-Hill Publishing Company Limited, 2003.
- [2] Wang Qin, Li Zhancai , Qi Yue, "FIR Filter Design Based on Two-hierarchy pipeline structure", Acta Electronica Sinica 33,2,367,(2005)
- [3] Shanthalas, Cyril Prasanna raj, Dr.S.Y.Kulkarni Design and Implementation of pipelined multiply accumulate unit, second international journal conference on emerging trends in engineering and technology, ICETET-09.
- [4] "Accumulator using a high speed, low power static and dynamic full adder designs", IEEE Custom Integrated circuit Conference, 1995, pp. 593-5961.
- [5] Ichiro Kuroda, Eri Murata, Kouhei Nadehara, Kazumasa Suzukit Tomohisa Araitt and Atsushi Okamuratt "A 16-bit Parallel Mac Architecture for a Multimedia Risc Processor" IEEE Trans. VLSI systems, vol. 83, no. 83, pp 103-112, 1995.
- [6] Jae Sung Lee, Young Seop Jeon, and Myung H.Sunwoo "Design of New Dsp instructions and their Hardware Architecture for High-Speed FFT" IEEE Trans. VLSI systems,, pp 80-90, 2001.
- [7] Dusan Suvakovic, C. Andre, Salama "A Pipelined Multiply-Accumulate Unit Design for Energy Recovery DSP Systems" IEEE International Symposium on Circuits and Systems, May 28-31, 2000
- [8] Shyh-Jye Jou, Chang-Yu Chen, En-Chung and Chau-Chin Su "A Pipeline Multiplier-Accumulator Using a High Speed Low-Power Static and Dynamic Full Adder" Journal of Solid State Circuits, Vol 32, no-1, January 2000
- [9] Sung-Mo Kang and Yusuf Leblebici, "CMOS Digital integrated circuits", Third Edition, Tata McGraw-Hill Publishing Company Limited, 2003.
- [10] Jan M.Rabaey, Anantha Chandrakasan and Borivoje Nikolic, "Digital Integrated Circuits", Second Edition, Prentice Hall Electronics and VLSI series, 2004.
- [11] L.Mintzer, FIR filters with FPGA journal of VLSI signal processing, 6, 1993, 119-127.