brush up introduction
[asr1617.git] / experiment.tex
1 %&experiment
2 \begin{document}
3 \maketitle
4
5 \section{Setup}
6 At the moment a minimal framework for running experiments has been set-up and
7 is up and running.
8
9 As of now a full album of the death metal band \emph{Cannibal Corpse} has been
10 annotated. Figure~\ref{fig:bloodstained} shows a segment of the song
11 \emph{Bloodstained Cement}. From the spectrals it is clearly visible that
12 during growling the regions around $100$Hz have an increased intensity.
13
14 \begin{itemize}
15 \item Sox~\footnote{\url{https://sox.sourceforge.net}} is used to convert
16 the stereo CD audio to mono $44.1Khz$ waveforms
17 \item Using the \texttt{python\_speech\_features}\footnote{\url{%
18 https://github.com/jameslyons/python_speech_features}}
19 the waveforms are converted to $13$ $MFCC$ cepstrals with the default
20 $25ms$ window every $10ms$.
21 \item The data is matched with the annotated files using
22 \texttt{pympi}~\footnote{\url{https://github.com/dopefishh/pympi}}.
23 \item The framework Keras~\footnote{\url{https://keras.io}} is used to
24 train models and classify the data
25 \end{itemize}
26
27 \section{Preliminary results}
28 The simplest models with only one hidden layer already score around $85\%$
29 accuracy. In the comings week more data will be annotated from different bands
30 to see the robustness of the models. Moreover, smoothing needs to be applied
31 because the predictions are very noisy. This is probably due to pauses in
32 growling. This can easily be smoothed out by not allowing extremely short
33 growling segments.
34
35 \begin{figure}[ht]
36 \centering
37 \includegraphics[width=.7\linewidth]{cement}
38 \caption{A vocal segment of the \emph{Cannibal Corpse} song
39 \emph{Bloodstained Cement}}\label{fig:bloodstained}
40 \end{figure}
41 \end{document}