149 91 50KB
English Pages 8 Year 2020
C Solutions to Exercises to Speech and Audio Processing with Matlab by Paul Hill
CONTENTS
Solution 1.1 a = pi*ones(50,1); Solution 1.2 b = [1:n-1]; Solution 1.3 c = rand(100,1); Solution 1.4 e = d(1:2:end); f = d(2:2:end); Solution 1.5 A = rand(100); ROIIndex = [45:45+10-1,45:45+10-1]; ROI = [45:45+10-1,45:45+10-1]; ROI = ROI’; A(ROIIndex) = ROI; Solution 1.6 magic(4) magic(5) magic(6) Solution 1.7 ind = [0:99]; g = sin(ind*(pi/180));
329
330C Solutions to Exercises to Speech and Audio Processing with Matlab by Paul Hill Solution 1.8 [y,f,t,p] = spectrogram(x,256,250,F,1E3,’yaxis’); Solution 2.1 ☎ 1 2 3 4 5 6
>> x0 = [1 -2 3]; >> x1 = [4 2 -5]; >> conv(x0,x1) ans = 4 -6 3 16 -15 Solution 2.2 ☎
1 2 3 4 5 6 7
>> syms z x0 x1 >> x0 = 1 - 2* z^(-1) + 3*z^(-2); >> x1 = 4 + 2* z^(-1) - 5*z^(-2); >> simplify(x0*x1) ans = (4*z^4 - 6*z^3 + 3*z^2 + 16*z - 15)/z^4 Solution 2.3 ☎
1 2 3 4 5 6
>> x0 = [1 -1 2 ]; >> x1 = [3 1 -4 ]; >> conv(x0,x1) ans = 3 -2 1 6 -8 Solution 2.4 ☎
1 2 3 4 5 6
>> x0 = 1 - z^(-1) + 2*z^(-2); >> x1 = 3 + z^(-1) - 4*z^(-2); >> simplify(x0*x1) ans = (3*z^4 - 2*z^3 + z^2 + 6*z - 8)/z^4
C.0
331
Solution 3.1 ☎ 1 % Generate spectrogram of the Aphex Twin track "equation". 2 urlwrite('http://www.aphextwin.nu/visuals/hiddenfaces/ equation9sec.wav', 'equation9sec.wav'); 3 [Y, FS]=audioread('equation9sec.wav'); 4 Y = mean(Y')'; 5 F=logspace(log10(100),log10(22050),32); 6 [y,f,t,p] = spectrogram(Y,1024*32,1023*16,F,FS); 7 imagesc(t,f,10*log10(abs(p))); 8 axis xy; 9 xlabel('Time (s)', 'fontsize', 16); 10 ylabel('Frequency (Hz)', 'fontsize', 16); 11 set(gca,'Yscale','log'); Solution 3.2 See the last chapter for a similar implementation to this. Solution 4.1 ☎ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
clear L = 1000;
x = 0:L; k = 0.5; y1 = sin(k*pi*x/L); k = 1; y2 = sin(k*pi*x/L); k = 1.5; y3 = sin(k*pi*x/L); k = 2; y4 = sin(k*pi*x/L); k = 2.5; y5 =sin(k*pi*x/L); k = 3; y6 = sin(k*pi*x/L); plot(x,y1,'r:'); hold on; plot(x,y2,'k'); plot(x,y3,'b:'); plot(x,y4,'r'); plot(x,y5,'k:'); plot(x,y6,'k');
332C Solutions to Exercises to Speech and Audio Processing with Matlab by Paul Hill 26 27 28 29 30 31 32
set(gca, 'XTick', [0 1000]); set(gca, 'XTickLabel', {'0','L'}); set(gca, 'FontSize', 18); legend('k=0.5','k=1','k=1.5','k=2','k=2.5','k=3'); grid on; xlabel('x', 'fontsize', 18); ylabel('\xi', 'fontsize', 18); Solution 4.2 ☎
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
clear L = 1000;
x = 0:L; k = 1; y1 = cos(k*pi*x/(2*L)); k = 2; y2 = cos(k*pi*x/(2*L)); k = 3; y3 = cos(k*pi*x/(2*L)); k = 4; y4 = cos(k*pi*x/(2*L)); k = 5; y5 = cos(k*pi*x/(2*L)); plot(x,y1,'r'); hold on; plot(x,y2,'k:'); plot(x,y3,'b'); plot(x,y4,'r:'); hold on; plot(x,y5,'k');
set(gca, 'XTick', [0 1000]); set(gca, 'XTickLabel', {'0','L'}); set(gca, 'FontSize', 18); legend('k=1','k=2','k=3','k=4','k=5'); grid on; xlabel('x', 'fontsize', 18); ylabel('\xi', 'fontsize', 18);
C.0
333
Solution 5.1 ☎ >>bw = [500,1000,1500]; >>f = sqrt(((((bw-25)/75).^(1/0.69)-1)/1.4))*1000 f = 1.0e+03 * 1.3132 2.5779 5.3555 Solution 5.2 ☎ >>b = [5,6,7]; >>f = 600*sinh(b/6) f = 559.9133 705.1207 869.9602 Solution 5.3 Draw a diagram joining the intersections with the filters shown in the top of the diagram in a similar way to the bottom of the figure. Solution 6.1 ☎ >>f = [100,1000,10000] >>RA= (12200^2 * f.^4)./((f.^2+20.6^2).*(((f.^2+107.7^2).*(f .^2+737.9.^2) ).^0.5).* (f.^2+12200^2)); >>AF = 2.0+20 * log10(RA); AF = -19.1450 0.0002 -2.4881 1000Hz is obviously more perceptually important that these other high and low frequencies. Solution 6.2 The largest difference between the two different models is at approximately 30Hz on the 100dB equal loudness curves. Solution 6.3 ☎ N = 2.^((LN-40)/10);
334C Solutions to Exercises to Speech and Audio Processing with Matlab by Paul Hill Solution 6.4 ☎ [2 4 32 512]; Solution 6.5 ☎ Ln = 40 + 10*log2(N); Solution 6.6 ☎ [40 50 55.8496 60]; Solution 7.1 The solution should look like figure 7.13 Solution 7.2 Computational complexity. Solution 9.1 The answer should provide a Matlab figure and code that shows how the instantaneous frequency is reflected by the zero crossing rate. Solution 9.2 ☎ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
f1 = 0:100:8000; % MFCC Pre-Emphasis Filter Definition a = 1; b = [1 -0.7]; MELFS = 16000; %typical sampling rate for HTK MFCC radiansW1 = 2*pi*f1/MELFS; h = freqz(b,a,radiansW1); plot(f1/1000,20*log10(abs(h)),'r'); hold on; a = 1; b = [1 -0.8]; h = freqz(b,a,radiansW1); plot(f1/1000,20*log10(abs(h)),'r'); a = 1; b = [1 -0.9]; h = freqz(b,a,radiansW1); plot(f1/1000,20*log10(abs(h)),'r');
C.0
335
Solution 10.1 The partial probabilities are the same as on the figure apart from the last column. In this column, the probabilities are: (0.1498*0.5+0.0411 *0.2+0.0552*0.3) *0.7 = 0.0698, (0.1498*0.25+0.0411*0.6+0.0552*0.15)*0.3 = 0.0211, (0.1498*0.35+0.0411*0.2+0.0552*0.55)*0.3=0.0273 Total Probability (the sum of the above) = 0.0273+0.0211+0.0698 = 0.1182 Solution 10.2 The partial probabilities are the same as on the figure apart from the last column. In this column, the probabilties are: (0.1498*0.5+0.0411 *0.2+0.0552*0.3) *0.2 = 0.0199, (0.1498*0.25+0.0411*0.6+0.0552*0.15)*0.6 = 0.0422, (0.1498*0.35+0.0411*0.2+0.0552*0.55)*0.5=0.0455 Total Probability (the sum of the above) = 0.0199+0.0422+0.0455 = 0.1076 Solution 10.3 HMMs are only able to model a modestly sized number of hiddent states relative to LSTM methods. Also, the use of LSTM methods removes the need to use heirarchical HMM models. LSTM methods are also able to direclty use the output from an STFT and learn characterising representations rather than hand crafted features. Solution 11.1 A(z) =
P (z) + Q(z) 2
Solution 11.2 LPC parameters are very susceptible to quantisation noise. High order LPC parameters (higher values of j) are much more susceptible to quantisation. Finally, LPC parameter are not easily interpolated from one frame to the next. Solution 12.1 Where the 0 is entered into the listing a random number array should be generated of range -pi to pi and the dot product made at the same point in the code where the phase is zeroed. Solution 12.2 From the insights in the acoustics chapter the code should produce side bands only of odd numbered harmonics in order to generate a “clarinet” type sound. This is what the DX7 Syntehsiser did.
336C Solutions to Exercises to Speech and Audio Processing with Matlab by Paul Hill Solution 12.3 For amplitude modulation the carrier signal’s frequency content is present in the outpout whereas it is not for Ring Modulation.