From: Mart Lubbers Date: Tue, 21 Mar 2017 21:17:09 +0000 (+0100) Subject: add experiment X-Git-Url: https://git.martlubbers.net/?a=commitdiff_plain;h=a99d3cf3048fef93b08756d748f3ebc93804c2e6;p=asr1617.git add experiment --- diff --git a/Makefile b/Makefile index ea2399a..f40e93c 100644 --- a/Makefile +++ b/Makefile @@ -1,4 +1,4 @@ -DOCS:=asr proposal +DOCS:=asr proposal experiment GREP?=grep LATEX?=pdflatex BIBTEX?=bibtex diff --git a/experiment.pre b/experiment.pre new file mode 100644 index 0000000..c25962c --- /dev/null +++ b/experiment.pre @@ -0,0 +1,23 @@ +\documentclass[a4paper]{article} + +\usepackage[british]{babel} + +\usepackage{geometry} % Papersize +\usepackage{hyperref} % Hyperlinks +\usepackage{graphicx} % Images +\graphicspath{{img/}} +\urlstyle{same} +\hypersetup{% + pdftitle={}, + pdfauthor={Mart Lubbers}, + pdfsubject={}, + pdfcreator={Mart Lubbers}, + pdfproducer={Mart Lubbers}, + pdfkeywords={}, + hidelinks=true +} + +\title{(Automatic) Speech Recognition\\{\large Experiment setup}} +\author{Mart Lubbers\\ + {\small\href{mailto:mart@martlubbers.net}{mart@martlubbers.net}}} +\date{\today} diff --git a/experiment.tex b/experiment.tex new file mode 100644 index 0000000..328c089 --- /dev/null +++ b/experiment.tex @@ -0,0 +1,41 @@ +%&experiment +\begin{document} +\maketitle + +\section{Setup} +At the moment a minimal framework for running experiments has been set-up and +is up and running. + +As of now a full album of the death metal band \emph{Cannibal Corpse} has been +annotated. Figure~\ref{fig:bloodstained} shows a segment of the song +\emph{Bloodstained Cement}. From the spectrals it is clearly visible that +during growling the regions around $100$Hz have an increased intensity. + +\begin{itemize} + \item Sox~\footnote{\url{https://sox.sourceforge.net}} is used to convert + the stereo CD audio to mono $44.1Khz$ waveforms + \item Using the \texttt{python\_speech\_features}~% + \footnote{\url{https://github.com/jameslyons/python_speech_features}} + the waveforms are converted to $13$ $MFCC$ cepstrals with the default + $25ms$ window every $10ms$. + \item The data is matched with the annotated files using + \texttt{pympi}~\footnote{\url{https://github.com/dopefishh/pympi}}. + \item The framework Keras~\footnote{\url{https://keras.io}} is used to + train models and classify the data +\end{itemize} + +\section{Preliminary results} +The simplest models with only one hidden layer already score around $85\%$ +accuracy. In the comings week more data will be annotated from different bands +to see the robustness of the models. Moreover, smoothing needs to be applied +because the predictions are very noisy. This is probably due to pauses in +growling. This can easily be smoothed out by not allowing extremely short +growling segments. + +\begin{figure}[h] + \centering + \includegraphics[width=.7\linewidth]{cement} + \caption{A vocal segment of the \emph{Cannibal Corpse} song + \emph{Bloodstained Cement}}\label{fig:bloodstained} +\end{figure} +\end{document} diff --git a/img/cement.png b/img/cement.png new file mode 100644 index 0000000..576e7df Binary files /dev/null and b/img/cement.png differ