+\section{Data \& Preprocessing}
+To run the experiments data has been collected from several \gls{dm} albums.
+The exact data used is available in Appendix~\ref{app:data}. The albums are
+extracted from the audio CD and converted to a mono channel waveform with the
+correct samplerate \emph{SoX}\footnote{\url{http://sox.sourceforge.net/}}.
+Every file is annotated using
+Praat\cite{boersma_praat_2002} where the utterances are manually aligned to
+the audio. Examples of utterances are shown in
+Figure~\ref{fig:bloodstained} and Figure~\ref{fig:abominations} where the
+waveform, $1-8000$Hz spectrals and annotations are shown. It is clearly visible
+that within the genre of death metal there are a different spectral patterns
+visible.
+
+\begin{figure}[ht]
+ \centering
+ \includegraphics[width=.7\linewidth]{cement}
+ \caption{A vocal segment of the \emph{Cannibal Corpse} song
+ \emph{Bloodstained Cement}}\label{fig:bloodstained}
+\end{figure}
+
+\begin{figure}[ht]
+ \centering
+ \includegraphics[width=.7\linewidth]{abominations}
+ \caption{A vocal segment of the \emph{Disgorge} song
+ \emph{Enthroned Abominations}}\label{fig:abominations}
+\end{figure}
+
+The data is collected from three studio albums. The
+first band is called \emph{Cannibal Corpse} and has been producing \gls{dm} for
+almost 25 years and have been creating the same type every album. The singer of
+\emph{Cannibal Corpse} has a very raspy growls and the lyrics are quite
+comprehensible. The vocals produced by \emph{Cannibal Corpse} are bordering
+regular shouting.
+
+The second band is called \emph{Disgorge} and make even more violently sounding
+music. The growls of the lead singer sound like a coffee grinder and are more
+shallow. In the spectrals it is clearly visible that there are overtones
+produced during some parts of the growling. The lyrics are completely
+incomprehensible and therefore some parts were not annotated with the actual
+lyrics because it was not possible what was being sung.
+
+Lastly a band from Moscow is chosen bearing the name \emph{Who Dies in
+Siberian Slush}. This band is a little odd compared to the previous \gls{dm}
+bands because they create \gls{dom}. \gls{dom} is characterized by the very
+slow tempo and low tuned guitars. The vocalist has a very characteristic growl
+and performs in several moscovian bands. This band also stands out because it
+uses piano's and synthesizers. The droning synthesizers often operate in the
+same frequency as the vocals.
+
+\section{\gls{MFCC} Features}
+The waveforms in itself are not very suitable to be used as features due to the
+high dimensionality and correlation. Therefore we use the aften used
+\glspl{MFCC} feature vectors.\todo{cite which papers use this} The actual
+conversion is done using the \emph{python\_speech\_features}%
+\footnote{\url{https://github.com/jameslyons/python_speech_features}} package.
+
+\gls{MFCC} features are nature inspired and built incrementally in a several of
+steps.
+\begin{enumerate}
+ \item The first step in the process is converting the time representation
+ of the signal to a spectral representation using a sliding window with
+ overlap. The width of the window and the step size are two important
+ parameters in the system. In classical phonetic analysis window sizes
+ of $25ms$ with a step of $10ms$ are often chosen because they are small
+ enough to only contain subphone entities. Singing for $25ms$ is
+ impossible so it is arguable that the window size is very small.
+ \item The standard \gls{FT} gives a spectral representation that has
+ linearly scaled frequencies. This scale is converted to the \gls{MS}
+ using triangular overlapping windows.
+ \item
+\end{enumerate}
+
+
+\todo{Explain why MFCC and which parameters}
+
+\section{\gls{ANN} Classifier}
+\todo{Spectrals might be enough, no decorrelation}
+
+\section{Model training}
+
+\section{Experiments}
+
+\section{Results}
+