To run the experiments data has been collected from several \gls{dm} albums.
The exact data used is available in Appendix~\ref{app:data}. The albums are
extracted from the audio CD and converted to a mono channel waveform with the
-correct samplerate \emph{SoX}\footnote{\url{http://sox.sourceforge.net/}}.
-Every file is annotated using
-Praat\cite{boersma_praat_2002} where the utterances are manually aligned to
-the audio. Examples of utterances are shown in
-Figure~\ref{fig:bloodstained} and Figure~\ref{fig:abominations} where the
-waveform, $1-8000$Hz spectrals and annotations are shown. It is clearly visible
-that within the genre of death metal there are a different spectral patterns
-visible.
+correct samplerate utilizing \emph{SoX}%
+\footnote{\url{http://sox.sourceforge.net/}}. Every file is annotated using
+Praat\cite{boersma_praat_2002} where the utterances are manually aligned to the
+audio. Examples of utterances are shown in Figure~\ref{fig:bloodstained} and
+Figure~\ref{fig:abominations} where the waveform, $1-8000$Hz spectrals and
+annotations are shown. It is clearly visible that within the genre of death
+metal there are different spectral patterns visible over time.
\begin{figure}[ht]
\centering
\emph{Enthroned Abominations}}\label{fig:abominations}
\end{figure}
-The data is collected from three studio albums. The
-first band is called \emph{Cannibal Corpse} and has been producing \gls{dm} for
-almost 25 years and have been creating the same type every album. The singer of
-\emph{Cannibal Corpse} has a very raspy growls and the lyrics are quite
-comprehensible. The vocals produced by \emph{Cannibal Corpse} are bordering
-regular shouting.
+The data is collected from three studio albums. The first band is called
+\emph{Cannibal Corpse} and has been producing \gls{dm} for almost 25 years and
+have been creating the same type every album. The singer of \emph{Cannibal
+Corpse} has a very raspy growls and the lyrics are quite comprehensible. The
+vocals produced by \emph{Cannibal Corpse} are bordering regular shouting.
The second band is called \emph{Disgorge} and make even more violently sounding
music. The growls of the lead singer sound like a coffee grinder and are more
\emph{python\_speech\_features}%
\footnote{\url{https://github.com/jameslyons/python_speech_features}} package.
-\gls{MFCC} features are nature inspired and built incrementally in several
-steps.
+\gls{MFCC} features are inspired by human auditory processing inspired and
+built incrementally in several steps.
\begin{enumerate}
\item The first step in the process is converting the time representation
of the signal to a spectral representation using a sliding window with
impossible so it is arguable that the window size is very small.
\item The standard \gls{FT} gives a spectral representation that has
linearly scaled frequencies. This scale is converted to the \gls{MS}
- using triangular overlapping windows.
- \item The log is taken of the Mel frequencies. This step is inspired by the
- \emph{Weber-Fechner} law that describes how humans perceive physical
+ using triangular overlapping windows to get a more tonotopic
+ representation trying to match the actual representation in the cochlea
+ of the human ear.
+ \item The \emph{Weber-Fechner} law that describes how humans perceive physical
magnitudes\footnote{Fechner, Gustav Theodor (1860). Elemente der
- Psychophysik}
- \item To decorrelate the signal a \gls{DCT} is applied. The \gls{MFCC}
- features are then the amplitudes of the spectrum.
+ Psychophysik} and it was found that energy is perceived in logarithmic
+ increments. This means that twice the amount of decibels does not mean
+ twice the amount of perceived loudness. Therefore in this step log is
+ taken of energy or amplitude of the \gls{MS} frequency spectrum to
+ closer match the human hearing.
+ \item The amplitudes of the spectrum are highly correlated and therefore
+ the last step is a decorrelation step. \Gls{DCT} is applied on the
+ amplitudes interpreted as a signal. \Gls{DCT} is a technique of
+ describing a signal as a combination of several primitive cosine
+ functions.
\end{enumerate}
\section{\gls{ANN} Classifier}