In the introduction of acoustic books and literature, we will introduce books that our R & D representators are always referring to and papers that are attracting attention.

The links to the books and literature introduced are also posted, so please use it if you are interested.

The links to the books and literature introduced are also posted, so please use it if you are interested.

**James W. Cooley and John W. Tukey**

"An Algorithm for the Machine Calculation of Complex Fourieer Series"

"An Algorithm for the Machine Calculation of Complex Fourieer Series"

Mathematics of Computation

19 (90), pp.297-301, 1965

Doi: https://doi.org/10.1090/S0025-5718-1965-0178586-1

The paper introduced this time is free access. Click "Full-text PDF" on the linked page above to download the pdf of the dissertation.

Keywords: High -speed Fourier conversion (FFT), discrete Fourier conversion (DFT), finite impulse response (FIR)

Do you know the FIR filter?

FIR stands for Finite Impulse Response, and is called a limited length impulse response in Japanese. Mathematically, the output when the impulse signal is entered in a system is an impalse response, and the impulse response is cut out at a finite time.

If you compare it with sound, when you hit your hand in the bath or hall (although the sound when you hit your hand is not strictly impulse signal) or when you hit your hand in front of the dummy head microphone. The time waveform is the impulse response. The output is observed as a time waveform when (input) when (input) to the sound space such as a bath and hall, and the human body shape such as the head and external ears (entered). It is an impulse response.

So what is delicious when you know FIR?

In fact, in an immutable system that is inevitable, if you know the FIR, you can see the output of the system for any input. If you compare it with the sound you mentioned earlier, what kind of sound waves will come to the eardrum when you use the sound space or dummy head FIR to play your favorite sound (enter the system) in the sound space (input to the system). You can calculate the output from).

The reason for such a thing is that the signal of the sound entered in the system can be regarded as a collection of many impals. By calculating the impulse response to each of the many impulse, you can calculate the output to the input sound signal by overlapping it.

This is mathematically called folding and is defined in the following formula. \ begin {Align*} Y [n] = \ sum_ {k = 0}^{n-1} h [k] x [n-k] \ end {Align*} Here, $ h [t] $ is $ n $, and $ x [n] $ and $ Y [n] $ are each input and output. $ h [k] x [n-k] $ is an impalice response to each impulse, which is overlapped by $ \ sum $.

In this way, combining a FIR to a certain sound sign is called a FIR filter.

The above explanations are quite broken, so please refer to specialized books for more detailed parts.

It is very attractive to be able to reproduce the sound in various sound environments using a FIR filter, but there is one problem.

That is, the calculation of folding requires a huge amount of calculation. For example, suppose there are sound sources and FIRs sampled with $ 44,100 \ text {Hz} $, and each length is $ 1 $ seconds. (In fact, FIR is often shorter than 1 second.) When calculating this folding, there is $ 44,100 $ quantified sound signals, and each sound signal $ 1 $ Impulse response per $ one. Since it is $ 44,100 $, it is necessary to calculate $ 1,944,810,000 $ for $ 44,100 \ TIMES 44,100 $ to overlap all. As the number of samples increases, the amount of calculation increases. When it comes to generalization, the calculation amount required for folding the sample of $ n $ and the FIR fold is $ \ mathcal {o} (n^2) $.

It is not realistic to calculate this amount, even though the modern computers (PCs) have become high performance. In recent years, games such as shooting games have been released in addition to images, and games that rely on the direction in which sounds come in three -dimensional sound space have been released. In these games, the calculation is performed to make the three -dimensional sound space a 3D binaural signal, and you can enjoy it with headphones. However, these games require as much as possible sound delays in action, so there is no room for calculating folding as defined.

There is a DFT (Discrete Fourier Transform). The reason why DFT appears here is that you can easily make folding calculations by using DFT. DFT is defined in the following formula: \ begin {Align*} A [k] = \ flac {1} {n} \ sum_ {n = 0}^{n-1} x [n] w^{-nk}, k = 0, 1, \ cdots, n-1 \ end {Align*} However, $ x [n] $ and $ a [k] $ here are complex numbers, $ W = e^\ frac {2 \ pi i} {n} $. The DFT can be considered to be converted to $ X [n] $ in the time area to $ N $ data in the frequency area $ a [k] $. For example, it is an image of converting one chord's sound signal into the data of which keyboard you pressed. Use the following discrete Fourier reverse to return data $ a [k] $ in the frequency area $ a [k] $ $ x [n] $. \ begin {Align*} x [n] = \ sum_ {k = 0}^{n-1} a [k] w^{nk} \ end {Align*} Furthermore, if the $ y [n] $ that appeared in the definition formula of the folded calculation was calculated in accordance with the definition of DFT, \ begin {Align*} Y [k] & = \ frac {1} {n} \ sum_ {n = 0}^{n-1} y [n] w^{-nk} \\ & = \ Frac {1} {n} \ sum_ {n = 0}^{n-1} \ sum_ {m = 0}^{n-1} h [m] x [n-m] w^{-nk} \\ & = \ Frac {1} {n} \ sum_ {m = 0}^{n-1} h [m] \ sum_ {n = 0}^{n-1} x [n-m] w^{-nk} \\ & = \ Frac {1} {n} \ sum_ {m = 0}^{n-1} h [m] \ sum_ {l = -m}^{n-m-1} x [l] w^{-( l+m) k} (\ text {here,} l = n-m) \\ & = N \ cdot \ Frac {1} {n} \ sum_ {m = 0}^{n-1} h [m] w^{-mk} \ sum_ {l = 0}^{n-1} x [L] w^{-LK} \\ & = N \ cdot h [k] x [k] \ end {Align*} Will be. However, $ y [k], h [k], x [k] $ each is $ Y [n], h [n], x [n] $ DFT, and $ x [n] $ Assuming that it is a signal \ begin {Align*} x [-l] w^{-(-l) k} = x [n-l] w^{-(n-l) k} \ end {Align*} I am using. In other words, it is a circulating folding.

From this equation, the folding calculation, which required the calculation amount of $ n^2 $ in the time area, can be calculated by multiplying the frequency area by multiplying the $ n times.

But there is a pitfall here. DFT definition \ begin {Align*} A [k] = \ flac {1} {n} \ sum_ {n = 0}^{n-1} x [n] w^{-nk}, k = 0,1, \ cdots, n-1 \ end {Align*} If you look closely, there is a $ N $ $ pawning calculation for a certain $ A [k] $.

$ K $ has $ n $, so the overall calculation is $ n^2 $. In other words, the amount of calculation when converting the time area and frequency region by DFT remains the same as the original folding calculation. It seems that there is no point in performing DFT as it is.

The introduction has become longer, but here is the main subject of this story.

The idea of solving the problem that the DFT itself is the same as the original folding calculation is the Cooley and Tukey algorithms introduced this time.

It seems that the algorithm itself was previously discovered before the Cooley and Tukey papers, but in light of the achievements of spreading to the general public, I will introduce their papers.

Their algorithms are as follows. The Complex Fourier Series is determined as follows. (In accordance with their papers, the discrete Fourier conversion / reverse conversion is the same as Complex Fourier Series, except for the difference between the constant and complex conjunction.) \ begin {Align*} x [n] = \ sum_ {k = 0}^{n-1} a [k] w^{nk}, n = 0,1, \ cdots, n-1 \ end {Align*} On the other hand, when $ n = R_1 \ Times R_2 $ $ $ R_1, R_2 $ exists. \ begin {Align*} n & n_1 r_1+n_0, n_0 = 0,1, \ cdots, r_1-1, n_1 = 0,1, \ cdots, r_2-1 \\ K & = k_2+k_0, k_0 = 0,1, \ cdots, r_2-1, k_1 = 0,1, \ cdots, r_1-1 \ end {Align*} By doing so, the original Complex Fourier Series \ begin {Align*} x [n] = x [n_1, n_0] & = \ sum_ {k = 0}^{n-1} a [k] w^{nk} \\ & = \ sum_ {k_0 = 0}^{r_2-1} \ sum_ {k_1 = 0}^{r_1-1} a [k_1, k_0] w^{nk_1 r_2} w^{nk_0} \ end {Align*} Can be transformed. Furthermore, from the official official \ begin {Align*} W^{nk_1 r_2} = w^{n_0 k_1 r_2} \ end {Align*} By using \ begin {Align*} A_1 [n_0, k_0] & = \ sum_ {k_1 = 0}^{r_1-1} a [k_1, k_0] w^{n_0 k_1 r_2} \\ x [n_1, n_0] & = \ sum_ {k_0 = 0}^{r_2-1} _1 [n_0, k_0] w^{(n_1 r_1+n_0) k_0} \ end {Align*} And can be divided into two stages.

In the first stage, $ R_1 $ $ is taken for $ [n_0, k_0] $ n $ pieces, so the number of calculations is $ (apart from the $ W $ riding calculation). NR_1 $, and in the same way, the number of calculations in the second stage is $ n R_2 $, so the total calculation is $ n (R_1+R_2) $.

Furthermore, when it can be disassembled as $ n = r_1 \ times r_2 \ times \ cdots \ times R_m $, the total number of calculations is $ N (R_1+R_2+\ CDOTS+\ CDOTS+\ CDOTS+\ CDOTS+\ CDOTS+CDOTS+\ CDOTS+\ CDOTS+\ CDOTS+\ CDOTS+\ CDOTS+\ CDOTS+\ CDOTS+\ CDOTS+\ CDOTS+\ CDOTS+\ CDOTS+\ CDOTS+\ CDOTS+\ R_m) $), and in $ n = r^m $, the calculation is $ nR \ log_r n $.

In this way, I was impressed by the fact that the calculation amount could be reduced from the original $ n^2 $, so I introduced a paper described this algorithm.

The place where this algorithm feels amazing is not only the decrease in the amount of calculation. I was impressed by the following three points.

First, in each stage, it is independent except for the parameters that take over the sum, so it can be calculated in parallel.

Next, the length of the array required to save data during calculation must be $ n $. (At the time of this paper, a calculator using transistors appeared, so the amount of memory was so small that the amount of memory was incomparable. Therefore. It was a big advantage.

In the case of $ n = 2^m $, the data can be specified with the $ m $ bit index, and for the calculation of $ L $ stage $ (1 \ LEQ L \ LEQ M) $, $. (M-L) $ $ 2 It is easy to specify a data index because two indexes with different only bits are different. Personally, I think this is the most beautiful.

The above is the summary of the dissertation. This is the Cooley-Tukey FFT algorithm (FFT: FAST FOURIER TRANSFORM, high-speed Fourier conversion), a method for calculating DFT at high speed. Until now, many FFT algorithms have been developed, such as FFT for real value data and FFT for distributed memory type calculators, most of which are based on Cooley-Tukey FFT algorithms.

Many of the FFT libraries currently used are $ n $ $ n $ as $ n = 2^m $, and the calculation at that time is $ \ mathcal {o} (n \ log_2 n) $. Masu. For example, if the number of input data is $ n = 2^{15} = 32,768 $, $ n^2 $ will be $ 1,073,741,824 $, while $ N log_2 n $ will be $ 491,520 $, so the calculation is dramatic. You can see that it has been improved.

The folding calculation of the FIR filter described at the beginning is the amount of calculation of $ \ mathcal {o} (n \ log_2 n) $ by using FFT, which is very practical. At present, FIR filters are used in various devices and applications, such as digital audio equipment and computer games.

Beyond the application we usually use, a beautiful theory is hidden.

This time, we talked about one of these, Cooley and Tukey algorithm.

η