The example at the above link is as follows:
from scipy import signal import matplotlib.pyplot as plt # Generate a test signal, a 2 Vrms sine wave whose frequency linearly # changes with time from 1 kHz to 2 kHz, corrupted by 0.001 V ** 2 / Hz of # white noise sampled at 10 kHz. fs = 10e3 N = 1e5 amp = 2 * np.sqrt(2) noise_power = 0.001 * fs / 2 time = np.arange(N) / fs freq = np.linspace(1e3, 2e3, N) x = amp * np.sin(2 * np.pi * freq * time) x += np.random.normal(scale = np.sqrt(noise_power), size = time.shape) #Compute and plot the spectrogram. f, t, Sxx = signal.spectrogram(x, fs) plt.pcolormesh(t, f, Sxx) plt.ylabel('Frequency [Hz]') plt.xlabel('Time [sec]') plt.show()
I'm not a Python person, but I can give you some pseudo code which should be enough to get you coding:
N = length of data input
N_FFT = no of samples per block( == FFT size, e.g.1024)
i = 0;;
i = index of spectrum within 3 D output array
for block_start = 0 to N - block_start
block_end = block_start + N_FFT
get samples from block_start..block_end
apply window
function to block(e.g.Hamming)
apply FFT to windowed block
calculate magnitude spectrum(20 * log10(re * re + im * im))
store spectrum in output array at index i
block_start += N_FFT / 2;;
NB: 50 % overlap
i++
end
Edit: Oh, so it seems this returns values, but they don't fit to the audio file at all. Even though they can be used as magnitude on spectrogram, they won't work for example in those classic audio visualizers which you can see in many music players. I also tried matplotlib's pylab for the spectrogram, but the result is same.
import os
import wave
import pylab
import math
from numpy
import amax
from numpy
import amin
def get_wav_info(wav_file, mi, mx):
wav = wave.open(wav_file, 'r')
frames = wav.readframes(-1)
sound_info = pylab.fromstring(frames, 'Int16')
frame_rate = wav.getframerate()
wav.close()
spectrum, freqs, t, im = pylab.specgram(sound_info, NFFT = 1024, Fs = frame_rate)
n = 0
while n < 20:
for index, power in enumerate(spectrum[n]):
print("%s,%s,%s" % (n, int(round(t[index] * 1000)), math.ceil(power * 100) / 100))
n += 1
get_wav_info("wave.wav", 1, 20)
Fixed: Case completely solved.
import os
import pylab
import math
from numpy
import amax
from numpy
import amin
from scipy.io
import wavfile
frame_rate, snd = wavfile.read(wav_file)
sound_info = snd[: , 0]
spectrum, freqs, t, im = pylab.specgram(sound_info, NFFT = 1024, Fs = frame_rate, noverlap = 5, mode = 'magnitude')
EXAMPLE: Use fft and ifft function from scipy to calculate the FFT amplitude spectrum and inverse FFT to obtain the original signal. Plot both results. Time the fft function using this 2000 length signal.,EXAMPLE: Use fft and ifft function from numpy to calculate the FFT amplitude spectrum and inverse FFT to obtain the original signal. Plot both results. Time the fft function using this 2000 length signal.,EXAMPLE: We can use the signal we generated at the beginning of this section (the mixed sine waves with 1, 4, and 7 Hz), and high-pass filter this signal at 6 Hz. Plot the filtered signal and the FFT amplitude before and after the filtering.,Now we can see that the built-in fft functions are much faster and easy to use, especially for the scipy version. Here is the results for comparison:
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('seaborn-poster') %
matplotlib inline
# sampling rate sr = 2000 # sampling interval ts = 1.0 / sr t = np.arange(0, 1, ts) freq = 1. x = 3 * np.sin(2 * np.pi * freq * t) freq = 4 x += np.sin(2 * np.pi * freq * t) freq = 7 x += 0.5 * np.sin(2 * np.pi * freq * t) plt.figure(figsize = (8, 6)) plt.plot(t, x, 'r') plt.ylabel('Amplitude') plt.show()
from numpy.fft
import fft, ifft
X = fft(x)
N = len(X)
n = np.arange(N)
T = N / sr
freq = n / T
plt.figure(figsize = (12, 6))
plt.subplot(121)
plt.stem(freq, np.abs(X), 'b', \
markerfmt = " ", basefmt = "-b")
plt.xlabel('Freq (Hz)')
plt.ylabel('FFT Amplitude |X(freq)|')
plt.xlim(0, 10)
plt.subplot(122)
plt.plot(t, ifft(X), 'r')
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.tight_layout()
plt.show()
/Users/qingkaikong / miniconda3 / lib / python3 .6 / site - packages / ipykernel_launcher.py: 13: UserWarning: In Matplotlib 3.3 individual lines on a stem plot will be added as a LineCollection instead of individual lines.This significantly improves the performance of a stem plot.To remove this warning and
switch to the new behaviour, set the "use_line_collection"
keyword argument to True.
del sys.path[0] /
Users / qingkaikong / miniconda3 / lib / python3 .6 / site - packages / numpy / core / numeric.py: 538: ComplexWarning: Casting complex values to real discards the imaginary part
return array(a, dtype, copy = False, order = order)
% timeit fft(x)
36.2 µs± 775 ns per loop(mean± std.dev.of 7 runs, 10000 loops each)
Of course, a bird sings many notes throughout the song, so we’d also like to know when each note occurs. The Fourier transform takes a signal in the time domain (i.e., a set of measurements over time) and turns it into a spectrum—a set of frequencies with corresponding (complex2) values. The spectrum does not contain any information about time!3,The Fourier transform takes us from the time to the frequency domain, and this turns out to have a massive number of applications. The fast Fourier transform (FFT) is an algorithm for computing the DFT; it achieves its high speed by storing and reusing results of computations as it progresses.,So, to find both the frequencies and the time at which they were sung, we’ll need to be somewhat clever. Our strategy is as follows: take the audio signal, split it into small, overlapping slices, and apply the Fourier transform to each (a technique known as the short time Fourier transform).,Compute the DFT of a real sequence, exploiting the symmetry of the resulting spectrum for increased performance. Also used by fft internally when applicable.
# Make plots appear inline, set custom plotting style % matplotlib inline import matplotlib.pyplot as plt plt.style.use('style/elegant.mplstyle')
# Make plots appear inline, set custom plotting style
%
matplotlib
inline
import
matplotlib.pyplot
as
plt
plt
.
style
.
use
(
'style/elegant.mplstyle'
)
import numpy as np
The discrete1 Fourier transform (DFT) is a mathematical technique used to convert temporal or spatial data into frequency domain data. Frequency is a familiar concept, due to its colloquial occurrence in the English language: the lowest notes your headphones can rumble out are around 20 Hz, whereas middle C on a piano lies around 261.6 Hz; Hertz, or oscillations per second, in this case literally refers to the number of times per second at which the membrane inside the headphone moves to-and-fro. That, in turn, creates compressed pulses of air which, upon arrival at your eardrum, induces a vibration at the same frequency. So, if you take a simple periodic function, sin(10 × 2πt), you can view it as a wave:
f = 10 # Frequency, in cycles per second, or Hertz f_s = 100 # Sampling rate, or number of measurements per second t = np.linspace(0, 2, 2 * f_s, endpoint = False) x = np.sin(f * 2 * np.pi * t) fig, ax = plt.subplots() ax.plot(t, x) ax.set_xlabel('Time [s]') ax.set_ylabel('Signal amplitude');
from
scipy
import
fftpack
X
=
fftpack
.
fft
(
x
)
freqs
=
fftpack
.
fftfreq
(
len
(
x
))
*
f_s
fig
,
ax
=
plt
.
subplots
()
ax
.
stem
(
freqs
,
np
.
abs
(
X
))
ax
.
set_xlabel
(
'Frequency in Hertz [Hz]'
)
ax
.
set_ylabel
(
'Frequency Domain (Spectrum) Magnitude'
)
ax
.
set_xlim
(
-
f_s
/
2
,
f_s
/
2
)
ax
.
set_ylim
(
-
5
,
110
)
(-5, 110)
Listen to this snippet of the nightingale birdsong (released under CC BY 4.0):
from IPython.display
import Audio
Audio('data/nightingale.wav')
We load the audio file, which gives us the sampling rate (number of measurements per second) as well as audio data as an (N, 2)
array—two columns because this is a stereo recording.
from scipy.io
import wavfile
rate, audio = wavfile.read('data/nightingale.wav')
We convert to mono by averaging the left and right channels.
audio = np.mean(audio, axis = 1)
N
=
audio
.
shape
[
0
]
L
=
N
/
rate
(
f
'Audio length: {L:.2f} seconds'
)
f
,
ax
=
plt
.
subplots
()
ax
.
plot
(
np
.
arange
(
N
)
/
rate
,
audio
)
ax
.
set_xlabel
(
'Time [s]'
)
ax
.
set_ylabel
(
'Amplitude [unknown]'
);
Audio length: 7.67 seconds
Start by chopping up the signal into slices of 1024 samples, each slice overlapping the previous by 100 samples. The resulting slices
object contains one slice per row.
from skimage
import util
M = 1024
slices = util.view_as_windows(audio, window_shape = (M, ), step = 100)
print(f 'Audio shape: {audio.shape}, Sliced audio shape: {slices.shape}')
Let us illustrate:
import time from scipy import fftpack from sympy import factorint K = 1000 lengths = range(250, 260) # Calculate the smoothness for all input lengths smoothness = [max(factorint(i).keys()) for i in lengths] exec_times = [] for i in lengths: z = np.random.random(i) # For each input length i, execute the FFT K times # and store the execution time times = [] for k in range(K): tic = time.monotonic() fftpack.fft(z) toc = time.monotonic() times.append(toc - tic) # For each input length, remember the * minimum * execution time exec_times.append(min(times)) f, (ax0, ax1) = plt.subplots(2, 1, sharex = True) ax0.stem(lengths, np.array(exec_times) * 10 ** 6) ax0.set_ylabel('Execution time (µs)') ax1.stem(lengths, smoothness) ax1.set_ylabel('Smoothness of input length\n(lower is better)') ax1.set_xlabel('Length of input');
For historical reasons, most implementations return an array where frequencies vary from low to high to low (see “Discrete Fourier Transforms” for further explanation of frequencies). For example, when we do the real Fourier transform of a signal of all ones, an input that has no variation and therefore only has the slowest, constant Fourier component (also known as the “DC,” or Direct Current, component—just electronics jargon for “mean of the signal”), appearing as the first entry:
from scipy import fftpack N = 10 fftpack.fft(np.ones(N)) # The first component is np.mean(x) * N
from
scipy
import
fftpack
N
=
10
fftpack
.
fft
(
np
.
ones
(
N
))
# The first component is np.mean(x) * N
array([10. + 0. j, 0. + 0. j, 0. + 0. j, 0. + 0. j, 0. + 0. j, 0. + 0. j, 0. - 0. j, 0. - 0. j, 0. - 0. j, 0. - 0. j ])
When we try the FFT on a rapidly changing signal, we see a high-frequency component appear:
z = np.ones(10)
z[::2] = -1
print(f 'Applying FFT to {z}')
fftpack.fft(z)
Applying FFT to [-1. 1. -1. 1. -1. 1. -1. 1. -1. 1.]
array([0. + 0. j, 0. + 0. j, 0. + 0. j, 0. + 0. j, 0. + 0. j, -10. + 0. j, 0. - 0. j, 0. - 0. j, 0. - 0. j, 0. - 0. j ])
Note that the FFT returns a complex spectrum that, in the case of real inputs, is conjugate symmetrical (i.e., symmetric in the real part and antisymmetric in the imaginary part):
x = np.array([1, 5, 12, 7, 3, 0, 4, 3, 2, 8])
X = fftpack.fft(x)
np.set_printoptions(precision = 2)
print("Real part: ", X.real)
print("Imaginary part:", X.imag)
np.set_printoptions()
If we examine the Fourier transform of a rectangular pulse, we see significant side lobes in the spectrum:
x = np.zeros(500) x[100: 150] = 1 X = fftpack.fft(x) f, (ax0, ax1) = plt.subplots(2, 1, sharex = True) ax0.plot(x) ax0.set_ylim(-0.1, 1.1) ax1.plot(fftpack.fftshift(np.abs(X))) ax1.set_ylim(-5, 55);
We measure the signal for only a short time, labeled Teff. The Fourier transform assumes that x(8) = x(0), and that the signal is continued as the dashed, rather than the solid, line. This introduces a big jump at the edge, with the expected oscillation in the spectrum:
t = np.linspace(0, 1, 500) x = np.sin(49 * np.pi * t) X = fftpack.fft(x) f, (ax0, ax1) = plt.subplots(2, 1) ax0.plot(x) ax0.set_ylim(-1.1, 1.1) ax1.plot(fftpack.fftfreq(len(t)), np.abs(X)) ax1.set_ylim(0, 190);
We can counter this effect by a process called windowing. The original function is multiplied with a window function such as the Kaiser window K(N, β). Here we visualize it for β ranging from 0 to 100:
f, ax = plt.subplots()
N = 10
beta_max = 5
colormap = plt.cm.plasma
norm = plt.Normalize(vmin = 0, vmax = beta_max)
lines = [
ax.plot(np.kaiser(100, beta), color = colormap(norm(beta)))
for beta in np.linspace(0, beta_max, N)
]
sm = plt.cm.ScalarMappable(cmap = colormap, norm = norm)
sm._A = []
plt.colorbar(sm).set_label(r 'Kaiser $\beta$');
First, we take the FFTs of our three signals (synthetic single target, synthetic multi-target, and real) and then display the positive frequency components (i.e., components 0 to N/2; see Figure 4-12). These are called the range traces in radar terminology.
fig, axes = plt.subplots(3, 1, sharex = True, figsize = (4.8, 2.4)) # Take FFTs of our signals.Note the convention to name FFTs with a # capital letter. V_single = np.fft.fft(v_single) V_sim = np.fft.fft(v_sim) V_actual = np.fft.fft(v_actual) N = len(V_single) with plt.style.context('style/thinner.mplstyle'): axes[0].plot(np.abs(V_single[: N // 2])) axes[0].set_ylabel("$|V_\mathrm{single}|$") axes[0].set_xlim(0, N // 2) axes[0].set_ylim(0, 1100) axes[1].plot(np.abs(V_sim[: N // 2])) axes[1].set_ylabel("$|V_\mathrm{sim} |$") axes[1].set_ylim(0, 1000) axes[2].plot(np.abs(V_actual[: N // 2])) axes[2].set_ylim(0, 750) axes[2].set_ylabel("$|V_\mathrm{actual}|$") axes[2].set_xlabel("FFT component $n$") for ax in axes: ax.grid()
This result is quite satisfying—but the dynamic range is so large that we could very easily miss some peaks. Let’s take the log as before with the spectrogram:
c = 3e8 # Approximately the speed of light and of # electromagnetic waves in air fig, (ax0, ax1, ax2) = plt.subplots(3, 1) def dB(y): "Calculate the log ratio of y / max(y) in decibel." y = np.abs(y) y /= y.max() return 20 * np.log10(y) def log_plot_normalized(x, y, ylabel, ax): ax.plot(x, dB(y)) ax.set_ylabel(ylabel) ax.grid() rng = np.arange(N // 2) * c / 2 / Beff with plt.style.context('style/thinner.mplstyle'): log_plot_normalized(rng, V_single[: N // 2], "$|V_0|$ [dB]", ax0) log_plot_normalized(rng, V_sim[: N // 2], "$|V_5|$ [dB]", ax1) log_plot_normalized(rng, V_actual[: N // 2], "$|V_{\mathrm{actual}}|$ [dB]" , ax2) ax0.set_xlim(0, 300) # Change x limits for these plots so that ax1.set_xlim(0, 300) # we are better able to see the shape of the peaks.ax2.set_xlim(0, len(V_actual) // 2) ax2.set_xlabel('range')
The returned float array f contains the frequency bin centers in cycles per unit of the sample spacing (with zero at the start). For instance, if the sample spacing is in seconds, then the frequency unit is cycles/second.,Return the Discrete Fourier Transform sample frequencies.,Array of length n containing the sample frequencies.,Sample spacing (inverse of the sampling rate). Defaults to 1.
f = [0, 1, ..., n / 2 - 1, -n / 2, ..., -1] / (d * n) if n is even f = [0, 1, ..., (n - 1) / 2, -(n - 1) / 2, ..., -1] / (d * n) if n is odd
>>> signal = np.array([-2, 8, 6, 4, 1, 0, 3, 5], dtype = float) >>> fourier = np.fft.fft(signal) >>> n = signal.size >>> timestep = 0.1 >>> freq = np.fft.fftfreq(n, d = timestep) >>> freq array([0., 1.25, 2.5, ..., -3.75, -2.5, -1.25])