From: Mart Lubbers Date: Wed, 7 Jun 2017 11:44:19 +0000 (+0200) Subject: process comments for 2.1A X-Git-Url: https://git.martlubbers.net/?a=commitdiff_plain;h=47b70522dd357550c5894a14fd95e52c3411a32f;p=asr1617.git process comments for 2.1A --- diff --git a/methods.tex b/methods.tex index 96ee578..74c60af 100644 --- a/methods.tex +++ b/methods.tex @@ -2,12 +2,13 @@ %Experiment(s) (set-up, data, results, discussion) \section{Data \& Preprocessing} -To run the experiments data has been collected from several \gls{dm} albums. -The exact data used is available in Appendix~\ref{app:data}. The albums are -extracted from the audio CD and converted to a mono channel waveform with the -correct samplerate utilizing \emph{SoX}% -\footnote{\url{http://sox.sourceforge.net/}}. Every file is annotated using -Praat\cite{boersma_praat_2002} where the utterances are manually aligned to the +Several experiments have been performed to gain insight on the research +question. To run the experiments data has been collected from several \gls{dm} +albums. The exact data used is available in Appendix~\ref{app:data}. The +albums are extracted from the audio CD and converted to a mono channel waveform +with the correct samplerate utilizing \emph{SoX}% +\footnote{\url{http://sox.sourceforge.net/}}. Every file is annotated using +Praat~\cite{boersma_praat_2002} where the lyrics are manually aligned to the audio. Examples of utterances are shown in Figure~\ref{fig:bloodstained} and Figure~\ref{fig:abominations} where the waveform, $1-8000$Hz spectrals and annotations are shown. It is clearly visible that within the genre of death @@ -29,31 +30,33 @@ metal there are different spectral patterns visible over time. The data is collected from three studio albums. The first band is called \gls{CC} and has been producing \gls{dm} for almost 25 years and has been -creating album with a consistent style. The singer of \gls{CC} has a very raspy -growl and the lyrics are quite comprehensible. The vocals produced by \gls{CC} -border regular shouting. +creating albums with a consistent style. The singer of \gls{CC} has a very +raspy growl and the lyrics are quite comprehensible. The vocals produced by +\gls{CC} are very close to regular shouting. The second band is called \gls{DG} and makes even more violently sounding music. The growls of the lead singer sound like a coffee grinder and -are more shallow. In the spectrals it is clearly visible that there are +are sound less full. In the spectrals it is clearly visible that there are overtones produced during some parts of the growling. The lyrics are completely incomprehensible and therefore some parts were not annotated with the actual lyrics because it was impossible to hear what was being sung. -Lastly a band from Moscow is chosen bearing the name \gls{WDISS}. This band is -a little odd compared to the previous \gls{dm} bands because they create -\gls{dom}. \gls{dom} is characterized by the very slow tempo and low tuned -guitars. The vocalist has a very characteristic growl and performs in several -Muscovite bands. This band also stands out because it uses piano's and -synthesizers. The droning synthesizers often operate in the same frequency as -the vocals. +The third band --- originating from Moscow --- is chosen bearing the name +\gls{WDISS}. This band is a little odd compared to the previous \gls{dm} bands +because they create \gls{dom}. \gls{dom} is characterized by the very slow +tempo and low tuned guitars. The vocalist has a very characteristic growl and +performs in several Muscovite bands. This band also stands out because it uses +piano's and synthesizers. The droning synthesizers often operate in the same +frequency as the vocals. -The training and test data is divided as follows: +The data is labeled as singing and instrumental and labeled per band. The +distribution for this is shown in Table~\ref{tbl:distribution}. A random $10\%$ +of the data is extracted for a held out test set. \begin{table}[H] \centering \begin{tabular}{lcc} \toprule - Singing & Instrumental\\ + Instrumental & Singing\\ \midrule 0.59 & 0.41\\ \bottomrule @@ -66,15 +69,16 @@ The training and test data is divided as follows: 0.59 & 0.16 & 0.19 & 0.06\\ \bottomrule \end{tabular} + \caption{Data distribution}\label{tbl:distribution} \end{table} \section{\acrlong{MFCC} Features} The waveforms in itself are not very suitable to be used as features due to the high dimensionality and correlation. Therefore we use the often used -\glspl{MFCC} feature vectors which have shown to be suitable% +\glspl{MFCC} feature vectors which have shown to be suitable~% \cite{rocamora_comparing_2007}. It has also been found that altering the mel scale to better suit singing does not yield a better -performance\cite{you_comparative_2015}. The actual conversion is done using the +performance~\cite{you_comparative_2015}. The actual conversion is done using the \emph{python\_speech\_features}% \footnote{\url{https://github.com/jameslyons/python_speech_features}} package.