310 12 77KB
English Pages 4
DIRECT DIGITAL FREQUENCY SYNTHESIS USING A MODIFIED CORDIC Eugene Grayver, Babak Daneshrad Integrated Circuits and Systems Laboratory UCLA, Electrical Engineering Department [email protected]
ABSTRACT This paper introduces a new approach to direct digital frequency synthesis (DDFS) based on the Coordinate Rotation (CORDIC) algorithm. The modifications to the standard CORDIC algorithm introduced in this paper allow fine frequency resolution, and exhibit significant potential for low power applications. The new architecture does not need a large ROM and can be implemented on a general purpose processor, or on a flexible ASIC architecture.
1
DIRECT DIGITAL FREQUENCY SYNTHESIS
All passband communication systems employ some form of up/down conversion. Frequency conversion is required to transmit the data in the desired frequency band. Different frequency bands are also used to allow efficient use of the allocated spectrum when using FDMA. The baseband signal is up/down converted by either multiplication by a sinusoid of controllable frequency (e.g. QAM) or by directly modulating the frequency of the sinusoid (e.g. FM, GMSK) [1]. A fully digital implementation of any communication system requires direct digital frequency synthesis (DDFS) [1],[2] . Digital frequency synthesis is also preferred over the analog approach due to lower phase noise, fine frequency resolution and the ability to rapidly change frequency. Conventional methods for digital frequency synthesis use a phase accumulation technique, as shown in Figure 1. The phase control word, W, is continuously increased in constant increments of α. W is used as an argument to a sine lookup table or generator. Since frequency is defined as the derivative of the phase, the output of the sine generator is a sinusoid of a constant frequency, determined by α. This derivation is summarized in equation (1) [1],[2].
α
N bits
+
Reg
W
Sin() Look Up or Generation
Sin(W)
Figure 1. Conventional frequency synthesis architecture
W (t ) = f clk α t; θ (t ) = π
W ; 2N
f out =
1 dθ α = f clk (1) 2π d t 2 N +1
Most of the DDFS designs used today store pre-computed samples of a sinusoid in a ROM lookup table [2]. A major disadvantage of this approach is the requirement of a rather large ROM in order to achieve acceptable spectral purity. In traditional ROM based DDFS systems, the size of the ROM grows exponentially with spectral purity.
An alternative method for generation of a sinusoid is based on trigonometric definition and properties of the sine and cosine. This method, known as coordinate rotation (CORDIC) [3],[6], requires very few constant coefficients and is more suitable for implementation in a flexible ASIC architecture or a general purpose processor. Two major problems have prevented use of the CORDIC algorithm in DDFS architectures, namely poor frequency resolution and potentially high power consumption. The architecture proposed in this paper introduces modifications to the classical CORDIC algorithm that circumvent both of these problems.
2
CONVENTIONAL CORDIC
In the CORDIC algorithm [6], sine & cosine of the desired angle are calculated using a cascade of N ’sub-rotation’ stages. The kth stage rotates the input complex number, considered as a 2 element vector (2-vector), by ±δ/2k (δ=π/2) radians depending on the kth bit of W. By changing the phase control word we can rotate an initial vector by any angle in the range [0..π-δ/2N+1] in increments of δ/2N radians. Each stage implements a Givens’ plane rotation, of the form:
x ′ cos Θ − sin Θ x 1 − tan Θ x (2). y ′ = sin Θ cos Θ y = cos Θ tan Θ 1 y CORDIC based designs recognize that the multiplications by cos(Θ) for all sub-rotation stages can be collected together into a single constant N −1
κ = ∏ cos(Θ k )
(3)
k =0
which is independent of the overall angle. Using the definitions in equation (4),
δ x 0 = κ ; y 0 = 0; N = number of stages; Tk = tan k 2 (4) th N 1 when the k bit of W is 1 1 θ = π ∑ ak i ; a k = th 2 k =2 - 1 when the k bit of W is 0 the rotation by the angle set by W is summarized in equation (5).
x k −1 − Tk y k −1 , y k −1 + Tk x k −1 when W [k ] = 1 ( xk , yk ) = x k −1 + Tk y k −1 , y k −1 − Tk x k −1 when W [k ] = 0 (5) x N = cos(θ ), y N = sin(θ ) By continuously incrementing W we can obtain Sin/Cos(ωt) thereby generating a sinusoid, as shown in Figure 2. Note that equation (1) applies to this architecture as well.
α
+
N bits
κ
Cos(W)
Reg
0
Sin(W) W
equation (6) (e.g. 26 stages are needed to achieve frequency resolution of 1Hz at the clock frequency of 50MHz). Increasing the number of stages has a negative impact on both the power consumption and the chip area, making the traditional architecture impractical.
Figure 2. Traditional CORDIC architecture for DDFS A DDFS system implemented using this method suffers from two problems: The frequency resolution is determined by the number of stages, and is given by
∆f min =
f is the clock frequency δ 1 f clk where clk N N is the number of stages π 2
(6).
For high clock frequencies, a large number of CORDIC stages are needed to obtain sufficiently fine frequency resolution. Also, since W changes on every clock, all of the sub-rotation stages have to operate at the clock frequency, using a significant amount of energy.
3 3.1
PROPOSED ARCHITECTURE
Recursive Computation
The periodic nature of a sinusoid makes it ideally suited for recursive computation. By continuously executing equation (7), with the angle of rotation, θ, determined by the desired frequency, a sustained sinusoid can be obtained.
x (n + 1) = x ( n ) cosθ − y ( n ) sin θ y ( n + 1) = y ( n ) cosθ + x( n ) sin θ
θ = 2π
f desired f clk
δ
2
N −2
1 1 1 = N + M −1 f clk M π 2 2
(8).
Since M can be increased at very little cost, exteremely fine frequency resolution is available. The frequency control word, W, is N-2+M bits, with N-2 MSB bits controlling the sub-rotation stages, and the remaining M bits going to the accumulator. The addition of this stage has very little impact on the overall power consumption of the circuit. Just like the other pre-compute stages, the fine frequency resolution stage is off most of the time. The only extra power is dissipated in the multiplexer every time the accumulator overflows. A complete architecture, incorporating the two modifications introduced above is presented in Figure 4.
κ
Fine Freq. Feedback Stage Resolution Stage Cos
0
Sin
N-2 Precompute Stages
The first modification to the standard CORDIC algorithm introduced in this paper is shown in Figure 3. The first N-1 stages, identical to those used in Figure 2, are used to compute Sin(θ) and Cos(θ). The last stage implements the recursive equation (7), generating a sustained sinusoid.
κ
Cos
0
Sin
Figure 3. CORDIC with a feedback stage (first modification) It is important to note that the pre-compute stages are only used for at most N clock cycles every time W changes. Generation of a constant frequency sinusoid uses only the last stage, making this architecture significantly more power efficient than the classical CORDIC DDFS architecture. A simple circuit can be used to enable the pre-compute stages only when W changes.
Frequency Resolution
The traditional CORDIC DDFS architecture requires a large number of stages to achieve fine frequency resolution at high clock frequencies since ∆f min ∝ f clk , as can be seen from
M bits
W[1..N-2] W[N-1..N-2+M]
Feedback Stage
W
3.2
∆f min = f clk
(7)
Once Sin(θ) and Cos(θ) corresponding to the desired frequency have been computed, a feedback circuit implementing equation (7) can be used to generate the sinusoid at the desired frequency.
Precompute Stages
The second modification to the CORDIC DDFS architecture introduced in this paper allows fine frequency resolution without requiring a large number of stages. To achieve fine frequency resolution an additional sub-rotation stage is introduced, Figure 4. This stage rotates by the same angle as the last precompute stage (δ/2N-2 radians). Unlike the previous stages, it is controlled by the MSB of an accumulator. Every time the accumulator reaches its maximum value, the vector is rotated by an additional δ/2N-2 radians. An M-bit accumulator provides a frequency resolution of
+
Reg
Figure 4. DDFS using modified CORDIC
3.3
Spectral Characteristics
A number of papers [4],[5] have addressed the issue of spurs in digitally synthesized waveforms. Derivations shown in these papers prove that spurious free dynamic range (SFDR) of a sinusoid obtained using direct digital frequency synthesis depends on the smallest angle that can be resolved by the DDFS system and on the finite precision calculations. The spurs due to finite precision calculations are negligible. If the smallest angle is 2-N, the SFDR is approximately given by 6N dB. For the purpose of spectral analysis, the architecture presented above is equivalent to a ROM based DDFS with an 2N word ROM. In the proposed architecture, the SFDR can be increased by adding more stages to the pre-compute part of the circuit. However, the number of stages is limited by the width of the databus. Since
tan( x ) ~ x for x ~ 0 → Tk = tan(
δ ) ~ 2 − ( k +1) k 2
at least k bits are required to represent Tk. Thus, in order to effectively utilize more stages, the width of the databus must be increased. If 14 precompute stages are used, with a 16bit wide databus, the expected SFDR is 14*6=84. Results of a bit and cycle accurate simulation, shown in Figure 5, confirm this calculation.
S inus oid decays
1
(9),
(a) -1 1000
2000
5000
80dB 20
-1
0
S am ple #
50
150
200
0
Figure 6. Finite Precision Effects -20
-60 0.1
0.2
0.3
0.4
0.5
Figure 5. Spectrum for N=14 To summarize the points made above: SFDR depends only on the number of precompute stages and is independent of the output frequency, main clock frequency. The frequency resolution is determined entirely by the sum of the number of precompute stages and the width of the accumulator for the fine frequency resolution stage. Thus, the number of precompute stages can be decreased without affecting the frequency resolution, provided the accumulator width is increased. It is therefore possible to trade-off SFDR for circuit size by removing precompute stages while retaining the same frequency resolution. The power consumption is determined only by the main clock frequency. It is independent of the frequency of the output sinusoid, and shows very weak dependence on the number of precompute stages.
A similar problem is solved in [7] by periodically resetting the system to a known state. Use of this method in the proposed architecture has a negative impact on the spectral characteristic of the sinusoid (Figure 7b). This can be alleviated by introducing a correction instead of resetting. Figure 7a shows a 15dB SFDR gain when using correction instead of resetting. Correction
Resetting
60
Power Spectrum Magnitude (dB)
-40
0
50dB
40 20 0 -20 -40 -60
60 35dB
40 20 0 -20 -40 -60
0
3.4
4000
(b)
40
Power Spectrum Magnitude (dB)
Power Spectrum Magnitude (dB)
1 60
S am ple #
Overflow c orruption
0.1
0.2
a
0.3
0.4
0
0.1
0.2
b
0.3
0.4
Finite Precision Effects
An inherent difficulty in realizing recursive architectures is accumulation of error caused by finite precision calculation. A multiplication of two L bit numbers results in a 2L bit number. Since a constant data bus width is desirable, multiplication results must be truncated, thereby incurring a loss of precision. If truncation decreases the number, the sinusoid generated by this architecture will decay, as shown in Figure 6a; if truncation increases the number, the sinusoid will be corrupted by overflows, as shown in Figure 6b.
Figure 7. SFDR using the correction and resetting methods The correction circuit is inserted in the feedback path of the last CORDIC stage as shown in Figure 8. The details of this circuit are discussed in the ensuing sections.
Correction Circuit Cos
Precomputed Sin & Cos
Sin Feedback Stage Figure 8. Feedback stage
3.4.1
Correcting Decaying Sinusoid
If the sinusoid is decaying, the correction necessary to prevent the accumulation of error in the feedback DDFS is based on the relationship cos 2 θ + sin 2 θ = 1 . Rewriting in terms of the variables in equation (5) and applying Taylor expansion we get
x = 1− y2 ≈ 1−
y2 . The approximation is very good for 2
small values of y. The correction is therefore applied once a preset delay (tdelay) has elapsed and the value of y is below a preset value (ymax). Suitable values for these parameters were determined expreimentally to be: tdelay=60 cycles, ymax =1/16. The entire correction circuit is shown in Figure 9.
W
N
W[0]
W[1]
0
κ Cos(ωt)
1
÷
Cos
+ Sin
Where each
is
C
Sin(ωt)
-
2 Counter Reset
0 & B>0 & C