|
Abstract:
|
Floating -point arithmetic is attractive for the implementation for a variety of Digital Signal Processing (DSP ) applications because it allows the designer and user to concentrate on the algorithms and architecture without worrying about numerical issues . In the past , many DSP applications used fixed point arithmetic due to the high cost (in delay , silicon area , and power consumption ) of floating -point arithmetic units . In the realization of modern general purpose processors , fused floating -point multiply add units have become attractive since their delay and silicon area is often less than that of a discrete floating -point multiplier followed by a floating point adder . Further the accuracy is improved by the fused implementation since rounding is performed only once (after the multiplication and addition ) . This work extends the consideration of fused floating -point arithmetic to operations that are frequently encountered in DSP . The Fast Fourier Transform is a case in point since it uses a complex butterfly operation . For a radix -2 implementation , the butterfly consists of a complex multiply and the complex addition and subtraction of the same pair of data . For a radix -4 implementation , the butterfly consists of three complex multiplications and eight complex additions and subtractions . Both of these butterfly operations can be implemented with two fused primitives , a fused two -term dot -product unit and a fused add -subtract unit . The fused two -term dot -product multiplies two sets of operands and adds the products as a single operation . The two products do not need to be rounded (only the sum is normalized and rounded ) which reduces the delay by about 15 % while reducing the silicon area by about 33 % . For the add -subtract unit , much of the complexity of a discrete implementation comes from the need to compare the operand exponents and align the significands prior to the add and the subtract operations . For the fused implementation , sharing the comparison and alignment greatly reduces the complexity . The delay and the arithmetic results are the same as if the operations are performed in the conventional manner with a floating -point adder and a separate floating -point subtracter . In this case , the fused implementation is about 20 % smaller than the discrete equivalent . |