-comprehensible. The second band is called \emph{Disgorge} and make even more
-violent music. The growls of the lead singer sound more like a coffee grinder
-and are more shallow. The lyrics are completely incomprehensible and therefore
-some parts are not annotated with lyrics because it was too difficult to hear
-what was being sung.
-
-\section{Methods}
-\todo{To remove in final thesis}
-The initial planning is still up to date. About one and a half album has been
-annotated and a framework for setting up experiments has been created.
-Moreover, the first exploratory experiments are already been executed and
-promising. In April the experimental dataset will be expanded and I will try to
-mimic some of the experiments done in the literature to see whether it performs
-similar on Death Metal
-\begin{table}[ht]
- \centering
- \begin{tabular}{cll}
- \toprule
- Month & Description\\
- \midrule
- March
- & Preparing the data\\
- & Preparing an experiment platform\\
- & Literature research\\
- April
- & Running the experiments\\
- & Fiddle with parameters\\
- & Explore the possibilities for forced alignment\\
- May
- & Write up the thesis\\
- & Possibly do forced alignment\\
- June
- & Finish up thesis\\
- & Wrap up\\
- \bottomrule
- \end{tabular}
- \caption{Outline}
-\end{table}
+comprehensible. The vocals produced by \emph{Cannibal Corpse} are bordering
+regular shouting.
+
+The second band is called \emph{Disgorge} and make even more violently sounding
+music. The growls of the lead singer sound like a coffee grinder and are more
+shallow. In the spectrals it is clearly visible that there are overtones
+produced during some parts of the growling. The lyrics are completely
+incomprehensible and therefore some parts were not annotated with the actual
+lyrics because it was not possible what was being sung.
+
+Lastly a band from Moscow is chosen bearing the name \emph{Who Dies in
+Siberian Slush}. This band is a little odd compared to the previous \gls{dm}
+bands because they create \gls{dom}. \gls{dom} is characterized by the very
+slow tempo and low tuned guitars. The vocalist has a very characteristic growl
+and performs in several moscovian bands. This band also stands out because it
+uses piano's and synthesizers. The droning synthesizers often operate in the
+same frequency as the vocals.
+
+\section{\gls{MFCC} Features}
+The waveforms in itself are not very suitable to be used as features due to the
+high dimensionality and correlation. Therefore we use the aften used
+\glspl{MFCC} feature vectors.\todo{cite which papers use this} The actual
+conversion is done using the \emph{python\_speech\_features}%
+\footnote{\url{https://github.com/jameslyons/python_speech_features}} package.
+
+\gls{MFCC} features are nature inspired and built incrementally in a several of
+steps.
+\begin{enumerate}
+ \item The first step in the process is converting the time representation
+ of the signal to a spectral representation using a sliding window with
+ overlap. The width of the window and the step size are two important
+ parameters in the system. In classical phonetic analysis window sizes
+ of $25ms$ with a step of $10ms$ are often chosen because they are small
+ enough to only contain subphone entities. Singing for $25ms$ is
+ impossible so it is arguable that the window size is very small.
+ \item The standard \gls{FT} gives a spectral representation that has
+ linearly scaled frequencies. This scale is converted to the \gls{MS}
+ using triangular overlapping windows.
+ \item
+\end{enumerate}
+