From: Mart Lubbers Date: Thu, 13 Apr 2017 08:43:57 +0000 (+0200) Subject: brush up intro X-Git-Url: https://git.martlubbers.net/?a=commitdiff_plain;h=12f24814de4a22aa484bc1e845aa078881e6e76c;p=asr1617.git brush up intro --- diff --git a/asr.tex b/asr.tex index 2b812d1..2c557de 100644 --- a/asr.tex +++ b/asr.tex @@ -52,27 +52,40 @@ %Introduction, leading to a clearly defined research question \chapter{Introduction} \section{Introduction} -The \gls{IFPI} stated that about $43\%$ of music revenue rises from digital -distribution. The overtake on physical formats took place somewhere in 2015 and -since twenty years the music industry has seen significant -growth~\footnote{\url{http://www.ifpi.org/facts-and-stats.php}}. +The primary medium for music distribution is rapidly changing from physical +media to digital media. The \gls{IFPI} stated that about $43\%$ of music +revenue rises from digital distribution. Another $39\%$ arises from the +physical sale and the remaining $16\%$ is made through performance and +synchronisation revenieus. The overtake of digital formats on physical formats +took place somewhere in 2015. Moreover, ever since twenty years the music +industry has seen significant growth +again~\footnote{\url{http://www.ifpi.org/facts-and-stats.php}}. + +There has always been an interest in lyrics to music alignment to be used in +for example karaoke. As early as in the late 1980s karaoke machines were +available for consumers. While the lyrics for the track are almost always +available, a alignment is not and it involves manual labour to create such an +alignment. A lot of this musical distribution goes via non-official channels such as -YouTube~\footnote{\url{https://youtube.com}} in which fans of the musical group -accompany the music with synchronized lyrics so that users can sing or read -along. Because of this interest it is very useful to device automatic -techniques for segmenting instrumental and vocal parts of a song and -apply forced alignment or even lyrics recognition on the audio file. +YouTube~\footnote{\url{https://youtube.com}} in which fans of the performers +often accompany the music with synchronized lyrics. This means that there is an +enormous treasure of lyrics-annotated music available but not within our reach +since the subtitles are almost always hardcoded into the video stream and thus +not directly usable as data. Because of this interest it is very useful to +device automatic techniques for segmenting instrumental and vocal parts of a +song, apply forced alignment or even lyrics recognition on the audio file. Such techniques are heavily researched and working systems have been created. -However, these techniques are designed to detect a clean singing voice. Extreme -genres such as \gls{dm} are using more extreme vocal techniques such as -grunting or growling. It must be noted that grunting is not a technique only -used in extreme metal styles. Similar or equal techniques have been used in -\emph{Beijing opera}, Japanese \emph{Noh} and but also more western styles like -jazz singing by Louis Armstrong~\cite{sakakibara_growl_2004}. It might even be -traced back to viking times. An arab merchant wrote in the tenth -century~\cite{friis_vikings_2004}: +However, these techniques are designed to detect a clean singing voice and have +not been testen on so-called \emph{extended vocal techniques} such as grunting +or growling. Growling is heavily used in extreme metal genres such as \gls{dm} +but it must be noted that grunting is not a technique only used in extreme +metal styles. Similar or equal techniques have been used in \emph{Beijing +opera}, Japanese \emph{Noh} and but also more western styles like jazz singing +by Louis Armstrong~\cite{sakakibara_growl_2004}. It might even be traced back +to viking times. For example, an arab merchant visiting a village in Denmark +wrote in the tenth century~\cite{friis_vikings_2004}: \begin{displayquote} Never before I have heard uglier songs than those of the Vikings in @@ -80,21 +93,7 @@ century~\cite{friis_vikings_2004}: howling, only more untamed. \end{displayquote} -%A majority of the music is not only instrumental but also contains vocal -%segments. -% -%Music is a leading type of data distributed on the internet. Regular music -%distribution is almost entirely digital and services like Spotify and YouTube -%allow one to listen to almost any song within a few clicks. Moreover, there are -%myriads of websites offering lyrics of songs. -% -%\todo{explain relevancy, (preprocessing for lyric alignment)} -% -%This leads to the following research question: -%\begin{center}\em% -% Are standard \gls{ANN} based techniques for singing voice detection -% suitable for non-standard musical genres like Death metal. -%\end{center} +\section{\gls{dm}} %Literature overview / related work \section{Related work} @@ -148,9 +147,9 @@ using the \emph{python\_speech\_features}% All these steps combined results in thirteen tab separated features per line in a file for every source file. Every file is annotated using Praat~\cite{boersma_praat_2002} where the utterances are manually aligned to -the audio. An example of an utterances are shown in +the audio. Examples of utterances are shown in Figures~\ref{fig:bloodstained,fig:abominations}. It is clearly visible that -within the genre of death metal there are a lot of different spectral patterns +within the genre of death metal there are a different spectral patterns visible. \begin{figure}[ht]