X-Git-Url: https://git.martlubbers.net/?a=blobdiff_plain;f=asr.tex;h=3f5eaaf8f02197197b5e58c1fec243ed80cdfb15;hb=0ada197a78af4323b8cf5efc508a5aab3d80e4b2;hp=f149b1d8c235c5b32edb97fddc711de83a3ab9df;hpb=ffa8517ae9d919b4da3ebeace34bc7897b56142b;p=asr1617.git

diff --git a/asr.tex b/asr.tex
index f149b1d..3f5eaaf 100644
--- a/asr.tex
+++ b/asr.tex
@@ -13,6 +13,14 @@
 \newglossaryentry{dm}{name={Death Metal},
 	description={is an extreme heavy metal music style with growling vocals and
 	pounding drums}}
+\newglossaryentry{dom}{name={Doom Metal},
+	description={is an extreme heavy metal music style with growling vocals and
+	pounding drums played very slowly}}
+\newglossaryentry{FT}{name={Fourier Transform},
+	description={is a technique of converting a time representation signal to a
+	frequency representation}}
+\newglossaryentry{MS}{name={Mel-Scale},
+	description={is a human ear inspired scale for spectral signals.}}
 
 \begin{document}
 \frontmatter{}
@@ -141,12 +149,7 @@ To run the experiments data has been collected from several \gls{dm} albums.
 The exact data used is available in Appendix~\ref{app:data}. The albums are
 extracted from the audio CD and converted to a mono channel waveform with the
 correct samplerate \emph{SoX}\footnote{\url{http://sox.sourceforge.net/}}.
-When the waveforms are finished they are converted to \glspl{MFCC} vectors
-using the \emph{python\_speech\_features}%
-\footnote{\url{https://github.com/jameslyons/python_speech_features}} package.
-All these steps combined results in thirteen tab separated features per line in
-a file for every source file. Technical info about the processing steps is
-given in the following sections. Every file is annotated using
+Every file is annotated using
 Praat\cite{boersma_praat_2002} where the utterances are manually aligned to
 the audio. Examples of utterances are shown in
 Figure~\ref{fig:bloodstained} and Figure~\ref{fig:abominations} where the
@@ -168,62 +171,70 @@ visible.
 		\emph{Enthroned Abominations}}\label{fig:abominations}
 \end{figure}
 
-The data is collected from two\todo{more in the future}\ studio albums. The first
-band is called \emph{Cannibal Corpse} and has been producing \gls{dm} for almost
-25 years and have been creating the same type every album. The singer of
+The data is collected from three studio albums. The
+first band is called \emph{Cannibal Corpse} and has been producing \gls{dm} for
+almost 25 years and have been creating the same type every album. The singer of
 \emph{Cannibal Corpse} has a very raspy growls and the lyrics are quite
-comprehensible. The second band is called \emph{Disgorge} and make even more
-violent music. The growls of the lead singer sound more like a coffee grinder
-and are more shallow. The lyrics are completely incomprehensible and therefore
-some parts are not annotated with lyrics because it was too difficult to hear
-what was being sung.
-
-\section{Methods}
-\todo{To remove in final thesis}
-The initial planning is still up to date. About one and a half album has been
-annotated and a framework for setting up experiments has been created.
-Moreover, the first exploratory experiments are already been executed and
-promising. In April the experimental dataset will be expanded and I will try to
-mimic some of the experiments done in the literature to see whether it performs
-similar on Death Metal
-\begin{table}[ht]
-	\centering
-	\begin{tabular}{cll}
-		\toprule
-		Month & Description\\
-		\midrule
-		March
-			& Preparing the data\\
-			& Preparing an experiment platform\\
-			& Literature research\\
-		April
-			& Running the experiments\\
-			& Fiddle with parameters\\
-			& Explore the possibilities for forced alignment\\
-		May
-			& Write up the thesis\\
-			& Possibly do forced alignment\\
-		June
-			& Finish up thesis\\
-			& Wrap up\\
-		\bottomrule
-	\end{tabular}
-	\caption{Outline}
-\end{table}
+comprehensible. The vocals produced by \emph{Cannibal Corpse} are bordering
+regular shouting. 
+
+The second band is called \emph{Disgorge} and make even more violently sounding
+music. The growls of the lead singer sound like a coffee grinder and are more
+shallow. In the spectrals it is clearly visible that there are overtones
+produced during some parts of the growling. The lyrics are completely
+incomprehensible and therefore some parts were not annotated with the actual
+lyrics because it was not possible what was being sung.
+
+Lastly a band from Moscow is chosen bearing the name \emph{Who Dies in
+Siberian Slush}. This band is a little odd compared to the previous \gls{dm}
+bands because they create \gls{dom}. \gls{dom} is characterized by the very
+slow tempo and low tuned guitars. The vocalist has a very characteristic growl
+and performs in several moscovian bands. This band also stands out because it
+uses piano's and synthesizers. The droning synthesizers often operate in the
+same frequency as the vocals.
+
+\section{\gls{MFCC} Features}
+The waveforms in itself are not very suitable to be used as features due to the
+high dimensionality and correlation. Therefore we use the aften used
+\glspl{MFCC} feature vectors.\todo{cite which papers use this} The actual
+conversion is done using the \emph{python\_speech\_features}%
+\footnote{\url{https://github.com/jameslyons/python_speech_features}} package.
 
-\section{Features}
+\gls{MFCC} features are nature inspired and built incrementally in a several of
+steps. 
+\begin{enumerate}
+	\item The first step in the process is converting the time representation
+		of the signal to a spectral representation using a sliding window with
+		overlap. The width of the window and the step size are two important
+		parameters in the system. In classical phonetic analysis window sizes
+		of $25ms$ with a step of $10ms$ are often chosen because they are small
+		enough to only contain subphone entities. Singing for $25ms$ is
+		impossible so it is arguable that the window size is very small.
+	\item The standard \gls{FT} gives a spectral representation that has
+		linearly scaled frequencies. This scale is converted to the \gls{MS}
+		using triangular overlapping windows.
+	\item
+\end{enumerate}
 
 
 \todo{Explain why MFCC and which parameters}
+
+\section{\gls{ANN} Classifier}
 \todo{Spectrals might be enough, no decorrelation}
 
+\section{Model training}
+
 \section{Experiments}
 
 \section{Results}
 
 
 \chapter{Conclusion \& Discussion}
+\section{Conclusion}
 %Discussion section
+
+\section{Discussion}
+
 \todo{Novelty}
 \todo{Weaknesses}
 \todo{Dataset is not very varied but\ldots}
@@ -263,6 +274,15 @@ similar on Death Metal
 		19 & Disgorge & Parallels of Infinite Torture & Parallels of Infinite Torture & 05:03.33\\
 		20 & Disgorge & Parallels of Infinite Torture & Asphyxiation of Thee Oppressed & 05:42.37\\
 		21 & Disgorge & Parallels of Infinite Torture & Ominous Sigils of Ungodly Ruin & 04:59.15\\
+		22 & Who Dies In Siberian Slush & Bitterness Of The Years That Are Lost & Leave Me & 06:35.60\\
+		23 & Who Dies In Siberian Slush & Bitterness Of The Years That Are Lost & The Woman We Are Looking For & 06:53.63\\
+		24 & Who Dies In Siberian Slush & Bitterness Of The Years That Are Lost & M\"obius Ring & 07:20.56\\
+		25 & Who Dies In Siberian Slush & Bitterness Of The Years That Are Lost & Interlude & 04:26.49\\
+		26 & Who Dies In Siberian Slush & Bitterness Of The Years That Are Lost & ÐÐ°Ð²ÐµÑÐ°Ð½Ð¸Ðµ ÐÑÐ¼Ð¸Ð»ÑÐ²Ð° & 08:46.76\\
+		27 & Who Dies In Siberian Slush & Bitterness Of The Years That Are Lost & An Old Road Through The Snow & 02:31.56\\
+		28 & Who Dies In Siberian Slush & Bitterness Of The Years That Are Lost & Bitterness Of The Years That Are Lost & 09:10.49\\
+		\midrule
+		& & & Total: & 02:13:40\\
 		\bottomrule
 	\end{tabular}
 	\caption{Songs used in the experiments}