X-Git-Url: https://git.martlubbers.net/?a=blobdiff_plain;f=asr.tex;h=9cdbaf5582067cf91e3a6b83b320d68e1fbf6ded;hb=5945b2bce63d92454882cb7c66fb1c8d87c3a271;hp=4899bd135279fbcb2704186ea0697f17316d707d;hpb=a70106a1b0f0d504f3fd311f237673ace5751b5c;p=asr1617.git diff --git a/asr.tex b/asr.tex index 4899bd1..9cdbaf5 100644 --- a/asr.tex +++ b/asr.tex @@ -1,5 +1,5 @@ %&asr -\usepackage[nonumberlist,acronyms]{glossaries} +\usepackage[toc,nonumberlist,acronyms]{glossaries} \makeglossaries% \newacronym{ANN}{ANN}{Artificial Neural Network} \newacronym{HMM}{HMM}{Hidden Markov Model} @@ -9,10 +9,27 @@ \newacronym{FA}{FA}{Forced alignment} \newacronym{MFC}{MFC}{Mel-frequency cepstrum} \newacronym{MFCC}{MFCC}{\acrlong{MFC} coefficient} +\newacronym{PPF}{PPF}{Posterior Probability Features} +\newacronym{MLP}{MLP}{Multi-layer Perceptron} +\newacronym{PLP}{PLP}{Perceptual Linear Prediction} +\newacronym{ZCR}{ZCR}{Zero-crossing Rate} +\newacronym{LPC}{LPC}{Linear Prediction Coefficients} +\newacronym{LPCC}{LPCC}{\acrlong{LPC} derivec cepstrum} \newacronym{IFPI}{IFPI}{International Federation of the Phonographic Industry} \newglossaryentry{dm}{name={Death Metal}, description={is an extreme heavy metal music style with growling vocals and pounding drums}} +\newglossaryentry{dom}{name={Doom Metal}, + description={is an extreme heavy metal music style with growling vocals and + pounding drums played very slowly}} +\newglossaryentry{FT}{name={Fourier Transform}, + description={is a technique of converting a time representation signal to a + frequency representation}} +\newglossaryentry{MS}{name={Mel-Scale}, + description={is a human ear inspired scale for spectral signals.}} +\newglossaryentry{Viterbi}{name={Viterbi}, + description={is a dynamic programming algorithm for finding the most likely + sequence of hidden states in a \gls{HMM}}} \begin{document} \frontmatter{} @@ -26,10 +43,6 @@ \tableofcontents -%Glossaries -%\glsaddall{} -%\printglossaries - \mainmatter{} %Berenzweig and Ellis use acoustic classifiers from speech recognition as a %detector for singing lines. They achive 80\% accuracy for forty 15 second @@ -51,185 +64,26 @@ %Introduction, leading to a clearly defined research question \chapter{Introduction} -\section{Introduction} -The \gls{IFPI} stated that about $43\%$ of music revenue rises from digital -distribution. The overtake on physical formats took place somewhere in 2015 and -since twenty years the music industry has seen significant -growth~\footnote{\url{http://www.ifpi.org/facts-and-stats.php}}. - -A lot of this musical distribution goes via non-official channels such as -YouTube~\footnote{\url{https://youtube.com}} in which fans of the musical group -accompany the music with synchronized lyrics so that users can sing or read -along. Because of this interest it is very useful to device automatic -techniques for segmenting instrumental and vocal parts of a song and -apply forced alignment or even lyrics recognition on the audio file. - - -%A majority of the music is not only instrumental but also contains vocal -%segments. -% -%Music is a leading type of data distributed on the internet. Regular music -%distribution is almost entirely digital and services like Spotify and YouTube -%allow one to listen to almost any song within a few clicks. Moreover, there are -%myriads of websites offering lyrics of songs. -% -%\todo{explain relevancy, (preprocessing for lyric alignment)} -% -%This leads to the following research question: -%\begin{center}\em% -% Are standard \gls{ANN} based techniques for singing voice detection -% suitable for non-standard musical genres like Death metal. -%\end{center} - -%Literature overview / related work -\section{Related work} -The field of applying standard speech processing techniques on music started in -the late 90s~\cite{saunders_real-time_1996,scheirer_construction_1997} and it -was found that music has different discriminating features compared to normal -speech. - -Berenzweig and Ellis expanded on the aforementioned research by trying to -separate singing from instrumental music\cite{berenzweig_locating_2001}. - -\todo{Incorporate this in literary framing} -~\cite{fujihara_automatic_2006} -~\cite{fujihara_lyricsynchronizer:_2011} -~\cite{fujihara_three_2008} -~\cite{mauch_integrating_2012} -~\cite{mesaros_adaptation_2009} -~\cite{mesaros_automatic_2008} -~\cite{mesaros_automatic_2010} -~%\cite{muller_multimodal_2012} -~\cite{pedone_phoneme-level_2011} -~\cite{yang_machine_2012} - -\section{Research question} -This leads to the following research question: -\begin{center}\em% - Are standard \gls{ANN} based techniques for singing voice detection - suitable for non-standard musical genres like Death metal. -\end{center} +\input{intro.tex} \chapter{Methods} -%Methodology - -%Experiment(s) (set-up, data, results, discussion) -\section{Data \& Preprocessing} -To run the experiments we have collected data from several \gls{dm} albums. The -exact data used is available in Appendix~\ref{app:data}. The albums are -extracted from the audio CD and converted to a mono channel waveform with the -correct samplerate \emph{SoX}~\footnote{\url{http://sox.sourceforge.net/}}. -When the waveforms are finished they are converted to \glspl{MFCC} vectors -using the \emph{python\_speech\_features}% -~\footnote{\url{https://github.com/jameslyons/python_speech_features}} package. -All these steps combined results in thirteen tab separated features per line in -a file for every source file. Every file is annotated using -Praat~\cite{boersma_praat_2002} where the utterances are manually -aligned to the audio. An example of an utterances are shown in -Figures~\ref{fig:bloodstained,fig:abominations}. It is clearly visible that -within the genre of death metal there are a lot of different spectral patterns -visible. - -\begin{figure}[ht] - \centering - \includegraphics[width=.7\linewidth]{cement} - \caption{A vocal segment of the \emph{Cannibal Corpse} song - \emph{Bloodstained Cement}}\label{fig:bloodstained} -\end{figure} - -\begin{figure}[ht] - \centering - \includegraphics[width=.7\linewidth]{abominations} - \caption{A vocal segment of the \emph{Disgorge} song - \emph{Enthroned Abominations}}\label{fig:abominations} -\end{figure} - -The data is collected from two\todo{more in the future}\ studio albums. The first -band is called \emph{Cannibal Corpse} and has been producing \gls{dm} for almost -25 years and have been creating the same type every album. The singer of -\emph{Cannibal Corpse} has a very raspy growls and the lyrics are quite -comprehensible. The second band is called \emph{Disgorge} and make even more -violent music. The growls of the lead singer sound more like a coffee grinder -and are more shallow. The lyrics are completely incomprehensible and therefore -some parts are not annotated with lyrics because it was too difficult to hear -what was being sung. - -\section{Methods} -\todo{To remove in final thesis} -The initial planning is still up to date. About one and a half album has been -annotated and a framework for setting up experiments has been created. -Moreover, the first exploratory experiments are already been executed and -promising. In April the experimental dataset will be expanded and I will try to -mimic some of the experiments done in the literature to see whether it performs -similar on Death Metal -\begin{table}[ht] - \centering - \begin{tabular}{cll} - \toprule - Month & Description\\ - \midrule - March - & Preparing the data\\ - & Preparing an experiment platform\\ - & Literature research\\ - April - & Running the experiments\\ - & Fiddle with parameters\\ - & Explore the possibilities for forced alignment\\ - May - & Write up the thesis\\ - & Possibly do forced alignment\\ - June - & Finish up thesis\\ - & Wrap up\\ - \bottomrule - \end{tabular} - \caption{Outline} -\end{table} - -\section{Results} - +\input{methods.tex} \chapter{Conclusion \& Discussion} -%Discussion section -%Conclusion section -%Acknowledgements -%Statement on authors' contributions +\input{conclusion.tex} + %(Appendices) \appendix -\chapter{Experimental data}\label{app:data} -\begin{table}[h] - \centering - \begin{tabular}{cllll} - \toprule - Num. & Artist & Album & Song & Duration\\ - \midrule - 00 & Cannibal Corpse & A Skeletal Domain & High Velocity Impact Spatter & 04:06.91\\ - 01 & Cannibal Corpse & A Skeletal Domain & Sadistic Embodiment & 03:17.31\\ - 02 & Cannibal Corpse & A Skeletal Domain & Kill or Become & 03:50.67\\ - 03 & Cannibal Corpse & A Skeletal Domain & A Skeletal Domain & 03:38.77\\ - 04 & Cannibal Corpse & A Skeletal Domain & Headlong Into Carnage & 03:01.25\\ - 05 & Cannibal Corpse & A Skeletal Domain & The Murderer's Pact & 05:05.23\\ - 06 & Cannibal Corpse & A Skeletal Domain & Funeral Cremation & 03:41.89\\ - 07 & Cannibal Corpse & A Skeletal Domain & Icepick Lobotomy & 03:16.24\\ - 08 & Cannibal Corpse & A Skeletal Domain & Vector of Cruelty & 03:25.15\\ - 09 & Cannibal Corpse & A Skeletal Domain & Bloodstained Cement & 03:41.99\\ - 10 & Cannibal Corpse & A Skeletal Domain & Asphyxiate to Resuscitate & 03:47.40\\ - 11 & Cannibal Corpse & A Skeletal Domain & Hollowed Bodies & 03:05.80\\ - 12 & Disgorge & Parallels of Infinite Torture & Revealed in Obscurity & 05:13.20\\ - 13 & Disgorge & Parallels of Infinite Torture & Enthroned Abominations & 04:05.39\\ - 14 & Disgorge & Parallels of Infinite Torture & Atonement & 02:57.36\\ - 15 & Disgorge & Parallels of Infinite Torture & Abhorrent Desecration of Thee Iniquity & 04:17.20\\ - 16 & Disgorge & Parallels of Infinite Torture & Forgotten Scriptures & 02:01.72\\ - 17 & Disgorge & Parallels of Infinite Torture & Descending Upon Convulsive Devourment & 04:38.85\\ - 18 & Disgorge & Parallels of Infinite Torture & Condemned to Sufferance & 04:57.59\\ - 19 & Disgorge & Parallels of Infinite Torture & Parallels of Infinite Torture & 05:03.33\\ - 20 & Disgorge & Parallels of Infinite Torture & Asphyxiation of Thee Oppressed & 05:42.37\\ - 21 & Disgorge & Parallels of Infinite Torture & Ominous Sigils of Ungodly Ruin & 04:59.15\\ - \bottomrule - \end{tabular} - \caption{Songs used in the experiments} -\end{table} +\input{appendices.tex} + +\newpage +%Glossaries +\glsaddall{} +\begingroup +\let\clearpage\relax +\let\cleardoublepage\relax +\printglossaries{} +\endgroup \bibliographystyle{ieeetr} \bibliography{asr}