X-Git-Url: https://git.martlubbers.net/?a=blobdiff_plain;f=asr.tex;h=9cdbaf5582067cf91e3a6b83b320d68e1fbf6ded;hb=5945b2bce63d92454882cb7c66fb1c8d87c3a271;hp=4899bd135279fbcb2704186ea0697f17316d707d;hpb=a70106a1b0f0d504f3fd311f237673ace5751b5c;p=asr1617.git

diff --git a/asr.tex b/asr.tex
index 4899bd1..9cdbaf5 100644
--- a/asr.tex
+++ b/asr.tex
@@ -1,5 +1,5 @@
 %&asr
-\usepackage[nonumberlist,acronyms]{glossaries}
+\usepackage[toc,nonumberlist,acronyms]{glossaries}
 \makeglossaries%
 \newacronym{ANN}{ANN}{Artificial Neural Network}
 \newacronym{HMM}{HMM}{Hidden Markov Model}
@@ -9,10 +9,27 @@
 \newacronym{FA}{FA}{Forced alignment}
 \newacronym{MFC}{MFC}{Mel-frequency cepstrum}
 \newacronym{MFCC}{MFCC}{\acrlong{MFC} coefficient}
+\newacronym{PPF}{PPF}{Posterior Probability Features}
+\newacronym{MLP}{MLP}{Multi-layer Perceptron}
+\newacronym{PLP}{PLP}{Perceptual Linear Prediction}
+\newacronym{ZCR}{ZCR}{Zero-crossing Rate}
+\newacronym{LPC}{LPC}{Linear Prediction Coefficients}
+\newacronym{LPCC}{LPCC}{\acrlong{LPC} derivec cepstrum}
 \newacronym{IFPI}{IFPI}{International Federation of the Phonographic Industry}
 \newglossaryentry{dm}{name={Death Metal},
 	description={is an extreme heavy metal music style with growling vocals and
 	pounding drums}}
+\newglossaryentry{dom}{name={Doom Metal},
+	description={is an extreme heavy metal music style with growling vocals and
+	pounding drums played very slowly}}
+\newglossaryentry{FT}{name={Fourier Transform},
+	description={is a technique of converting a time representation signal to a
+	frequency representation}}
+\newglossaryentry{MS}{name={Mel-Scale},
+	description={is a human ear inspired scale for spectral signals.}}
+\newglossaryentry{Viterbi}{name={Viterbi},
+	description={is a dynamic programming algorithm for finding the most likely
+	sequence of hidden states in a \gls{HMM}}}
 
 \begin{document}
 \frontmatter{}
@@ -26,10 +43,6 @@
 
 \tableofcontents
 
-%Glossaries
-%\glsaddall{}
-%\printglossaries
-
 \mainmatter{}
 %Berenzweig and Ellis use acoustic classifiers from speech recognition as a
 %detector for singing lines.  They achive 80\% accuracy for forty 15 second
@@ -51,185 +64,26 @@
 
 %Introduction, leading to a clearly defined research question
 \chapter{Introduction}
-\section{Introduction}
-The \gls{IFPI} stated that about $43\%$ of music revenue rises from digital
-distribution. The overtake on physical formats took place somewhere in 2015 and
-since twenty years the music industry has seen significant
-growth~\footnote{\url{http://www.ifpi.org/facts-and-stats.php}}.
-
-A lot of this musical distribution goes via non-official channels such as
-YouTube~\footnote{\url{https://youtube.com}} in which fans of the musical group
-accompany the music with synchronized lyrics so that users can sing or read
-along. Because of this interest it is very useful to device automatic
-techniques for segmenting instrumental and vocal parts of a song and
-apply forced alignment or even lyrics recognition on the audio file.
-
-
-%A majority of the music is not only instrumental but also contains vocal
-%segments.
-%
-%Music is a leading type of data distributed on the internet. Regular music
-%distribution is almost entirely digital and services like Spotify and YouTube
-%allow one to listen to almost any song within a few clicks. Moreover, there are
-%myriads of websites offering lyrics of songs.
-%
-%\todo{explain relevancy, (preprocessing for lyric alignment)}
-%
-%This leads to the following research question:
-%\begin{center}\em%
-%	Are standard \gls{ANN} based techniques for singing voice detection
-%	suitable for non-standard musical genres like Death metal.
-%\end{center}
-
-%Literature overview / related work
-\section{Related work}
-The field of applying standard speech processing techniques on music started in
-the late 90s~\cite{saunders_real-time_1996,scheirer_construction_1997} and it
-was found that music has different discriminating features compared to normal
-speech.
-
-Berenzweig and Ellis expanded on the aforementioned research by trying to
-separate singing from instrumental music\cite{berenzweig_locating_2001}.
-
-\todo{Incorporate this in literary framing}
-~\cite{fujihara_automatic_2006}
-~\cite{fujihara_lyricsynchronizer:_2011}
-~\cite{fujihara_three_2008}
-~\cite{mauch_integrating_2012}
-~\cite{mesaros_adaptation_2009}
-~\cite{mesaros_automatic_2008}
-~\cite{mesaros_automatic_2010}
-~%\cite{muller_multimodal_2012}
-~\cite{pedone_phoneme-level_2011}
-~\cite{yang_machine_2012}
-
-\section{Research question}
-This leads to the following research question:
-\begin{center}\em%
-	Are standard \gls{ANN} based techniques for singing voice detection
-	suitable for non-standard musical genres like Death metal.
-\end{center}
+\input{intro.tex}
 
 \chapter{Methods}
-%Methodology
-
-%Experiment(s) (set-up, data, results, discussion)
-\section{Data \& Preprocessing}
-To run the experiments we have collected data from several \gls{dm} albums. The
-exact data used is available in Appendix~\ref{app:data}. The albums are
-extracted from the audio CD and converted to a mono channel waveform with the
-correct samplerate \emph{SoX}~\footnote{\url{http://sox.sourceforge.net/}}.
-When the waveforms are finished they are converted to \glspl{MFCC} vectors
-using the \emph{python\_speech\_features}%
-~\footnote{\url{https://github.com/jameslyons/python_speech_features}} package.
-All these steps combined results in thirteen tab separated features per line in
-a file for every source file. Every file is annotated using
-Praat~\cite{boersma_praat_2002} where the utterances are manually
-aligned to the audio. An example of an utterances are shown in
-Figures~\ref{fig:bloodstained,fig:abominations}. It is clearly visible that
-within the genre of death metal there are a lot of different spectral patterns
-visible.
-
-\begin{figure}[ht]
-	\centering
-	\includegraphics[width=.7\linewidth]{cement}
-	\caption{A vocal segment of the \emph{Cannibal Corpse} song
-		\emph{Bloodstained Cement}}\label{fig:bloodstained}
-\end{figure}
-
-\begin{figure}[ht]
-	\centering
-	\includegraphics[width=.7\linewidth]{abominations}
-	\caption{A vocal segment of the \emph{Disgorge} song
-		\emph{Enthroned Abominations}}\label{fig:abominations}
-\end{figure}
-
-The data is collected from two\todo{more in the future}\ studio albums. The first
-band is called \emph{Cannibal Corpse} and has been producing \gls{dm} for almost
-25 years and have been creating the same type every album. The singer of
-\emph{Cannibal Corpse} has a very raspy growls and the lyrics are quite
-comprehensible. The second band is called \emph{Disgorge} and make even more
-violent music. The growls of the lead singer sound more like a coffee grinder
-and are more shallow. The lyrics are completely incomprehensible and therefore
-some parts are not annotated with lyrics because it was too difficult to hear
-what was being sung.
-
-\section{Methods}
-\todo{To remove in final thesis}
-The initial planning is still up to date. About one and a half album has been
-annotated and a framework for setting up experiments has been created.
-Moreover, the first exploratory experiments are already been executed and
-promising. In April the experimental dataset will be expanded and I will try to
-mimic some of the experiments done in the literature to see whether it performs
-similar on Death Metal
-\begin{table}[ht]
-	\centering
-	\begin{tabular}{cll}
-		\toprule
-		Month & Description\\
-		\midrule
-		March
-			& Preparing the data\\
-			& Preparing an experiment platform\\
-			& Literature research\\
-		April
-			& Running the experiments\\
-			& Fiddle with parameters\\
-			& Explore the possibilities for forced alignment\\
-		May
-			& Write up the thesis\\
-			& Possibly do forced alignment\\
-		June
-			& Finish up thesis\\
-			& Wrap up\\
-		\bottomrule
-	\end{tabular}
-	\caption{Outline}
-\end{table}
-
-\section{Results}
-
+\input{methods.tex}
 
 \chapter{Conclusion \& Discussion}
-%Discussion section
-%Conclusion section
-%Acknowledgements
-%Statement on authors' contributions
+\input{conclusion.tex}
+
 %(Appendices)
 \appendix
-\chapter{Experimental data}\label{app:data}
-\begin{table}[h]
-	\centering
-	\begin{tabular}{cllll}
-		\toprule
-		Num. & Artist & Album & Song & Duration\\
-		\midrule
-		00 & Cannibal Corpse & A Skeletal Domain & High Velocity Impact Spatter & 04:06.91\\
-		01 & Cannibal Corpse & A Skeletal Domain & Sadistic Embodiment & 03:17.31\\
-		02 & Cannibal Corpse & A Skeletal Domain & Kill or Become & 03:50.67\\
-		03 & Cannibal Corpse & A Skeletal Domain & A Skeletal Domain & 03:38.77\\
-		04 & Cannibal Corpse & A Skeletal Domain & Headlong Into Carnage & 03:01.25\\
-		05 & Cannibal Corpse & A Skeletal Domain & The Murderer's Pact & 05:05.23\\
-		06 & Cannibal Corpse & A Skeletal Domain & Funeral Cremation & 03:41.89\\
-		07 & Cannibal Corpse & A Skeletal Domain & Icepick Lobotomy & 03:16.24\\
-		08 & Cannibal Corpse & A Skeletal Domain & Vector of Cruelty & 03:25.15\\
-		09 & Cannibal Corpse & A Skeletal Domain & Bloodstained Cement & 03:41.99\\
-		10 & Cannibal Corpse & A Skeletal Domain & Asphyxiate to Resuscitate & 03:47.40\\
-		11 & Cannibal Corpse & A Skeletal Domain & Hollowed Bodies & 03:05.80\\
-		12 & Disgorge & Parallels of Infinite Torture & Revealed in Obscurity & 05:13.20\\
-		13 & Disgorge & Parallels of Infinite Torture & Enthroned Abominations & 04:05.39\\
-		14 & Disgorge & Parallels of Infinite Torture & Atonement & 02:57.36\\
-		15 & Disgorge & Parallels of Infinite Torture & Abhorrent Desecration of Thee Iniquity & 04:17.20\\
-		16 & Disgorge & Parallels of Infinite Torture & Forgotten Scriptures & 02:01.72\\
-		17 & Disgorge & Parallels of Infinite Torture & Descending Upon Convulsive Devourment & 04:38.85\\
-		18 & Disgorge & Parallels of Infinite Torture & Condemned to Sufferance & 04:57.59\\
-		19 & Disgorge & Parallels of Infinite Torture & Parallels of Infinite Torture & 05:03.33\\
-		20 & Disgorge & Parallels of Infinite Torture & Asphyxiation of Thee Oppressed & 05:42.37\\
-		21 & Disgorge & Parallels of Infinite Torture & Ominous Sigils of Ungodly Ruin & 04:59.15\\
-		\bottomrule
-	\end{tabular}
-	\caption{Songs used in the experiments}
-\end{table}
+\input{appendices.tex}
+
+\newpage
+%Glossaries
+\glsaddall{}
+\begingroup
+\let\clearpage\relax
+\let\cleardoublepage\relax
+\printglossaries{}
+\endgroup
 
 \bibliographystyle{ieeetr}
 \bibliography{asr}