add excerpt for second dzhambazov paper

[asr1617.git] / asr.tex
diff --git a/asr.tex b/asr.tex

index e4c126d..42af838 100644 (file)
--- a/asr.tex
+++ b/asr.tex
@@ -1,4 +1,16 @@
  %&asr
+\usepackage[nonumberlist,acronyms]{glossaries}
+\makeglossaries%
+\newacronym{HMM}{HMM}{Hidden Markov Model}
+\newacronym{GMM}{GMM}{Gaussian Mixture Models}
+\newacronym{DHMM}{DHMM}{Duration-explicit \acrlong{HMM}}
+\newacronym{HTK}{HTK}{\acrlong{HMM} Toolkit}
+\newacronym{FA}{FA}{Forced alignment}
+\newacronym{MFC}{MFC}{Mel-frequency cepstrum}
+\newacronym{MFCC}{MFCC}{\acrlong{MFC} coefficient}
+%\newglossaryentry{mTask}{name=mTask,
+%      description={is an abstraction for \glspl{Task} living on \acrshort{IoT} devices}}
+
  \begin{document}
  %Titlepage
  \maketitleru[
@@ -7,31 +19,38 @@
         authorstext={Author:}]
  \listoftodos[Todo]
  
-t\cite{muller_multimodal_2012}
+\tableofcontents
  
-t\cite{pedone_phoneme-level_2011}
+%Glossaries
+\glsaddall{}
+\printglossaries%
  
-t\cite{fujihara_automatic_2006}
+Berenzweig and Ellis use acoustic classifiers from speech recognition as a
+detector for singing lines.  They achive 80\% accuracy for forty 15 second
+exerpts. They mention people that wrote signal features that discriminate
+between speech and music. Neural net
+\glspl{HMM}~\cite{berenzweig_locating_2001}.
  
-t\cite{mesaros_adaptation_2009}
-
-t\cite{mesaros_automatic_2010}
-
-t\cite{dzhambazov_automatic_2016}
-
-t\cite{mesaros_automatic_2008}
-
-t\cite{berenzweig_locating_2001}
-
-t\cite{dzhambazov_automatic_2014}
-
-t\cite{fujihara_three_2008}
-
-t\cite{yang_machine_2012}
+In 2014 Dzhambazov et al.\ applied state of the art segmentation methods to
+polyphonic turkish music, this might be interesting to use for heavy metal.
+They mention Fujihara(2011) to have a similar \gls{FA} system. This method uses
+phone level segmentation, first 12 \gls{MFCC}s. They first do vocal/non-vocal
+detection, then melody extraction, then alignment. They compare results with
+Mesaros \& Virtanen, 2008~\cite{dzhambazov_automatic_2014}. Later they
+specialize in long syllables in a capella. They use \glspl{DHMM} with
+\glspl{GMM} and show that adding knowledge increases alignment (bejing opera
+has long syllables)~\cite{dzhambazov_automatic_2016}.
  
+t\cite{fujihara_automatic_2006}
  t\cite{fujihara_lyricsynchronizer:_2011}
-
+t\cite{fujihara_three_2008}
  t\cite{mauch_integrating_2012}
+t\cite{mesaros_adaptation_2009}
+t\cite{mesaros_automatic_2008}
+t\cite{mesaros_automatic_2010}
+t\cite{muller_multimodal_2012}
+t\cite{pedone_phoneme-level_2011}
+t\cite{yang_machine_2012}
  
  \bibliographystyle{ieeetr}
  \bibliography{asr}