From: Mart Lubbers Date: Thu, 2 Mar 2017 20:32:39 +0000 (+0100) Subject: add excerpt for second dzhambazov paper X-Git-Url: https://git.martlubbers.net/?a=commitdiff_plain;h=844805e280d5d10d0e088dec8c938c00d941b753;p=asr1617.git add excerpt for second dzhambazov paper --- diff --git a/asr.tex b/asr.tex index bfbcf86..42af838 100644 --- a/asr.tex +++ b/asr.tex @@ -2,6 +2,8 @@ \usepackage[nonumberlist,acronyms]{glossaries} \makeglossaries% \newacronym{HMM}{HMM}{Hidden Markov Model} +\newacronym{GMM}{GMM}{Gaussian Mixture Models} +\newacronym{DHMM}{DHMM}{Duration-explicit \acrlong{HMM}} \newacronym{HTK}{HTK}{\acrlong{HMM} Toolkit} \newacronym{FA}{FA}{Forced alignment} \newacronym{MFC}{MFC}{Mel-frequency cepstrum} @@ -27,17 +29,18 @@ Berenzweig and Ellis use acoustic classifiers from speech recognition as a detector for singing lines. They achive 80\% accuracy for forty 15 second exerpts. They mention people that wrote signal features that discriminate between speech and music. Neural net -\glspl{HMM}.\cite{berenzweig_locating_2001}. +\glspl{HMM}~\cite{berenzweig_locating_2001}. In 2014 Dzhambazov et al.\ applied state of the art segmentation methods to polyphonic turkish music, this might be interesting to use for heavy metal. They mention Fujihara(2011) to have a similar \gls{FA} system. This method uses phone level segmentation, first 12 \gls{MFCC}s. They first do vocal/non-vocal detection, then melody extraction, then alignment. They compare results with -Mesaros \& Virtanen, 2008. +Mesaros \& Virtanen, 2008~\cite{dzhambazov_automatic_2014}. Later they +specialize in long syllables in a capella. They use \glspl{DHMM} with +\glspl{GMM} and show that adding knowledge increases alignment (bejing opera +has long syllables)~\cite{dzhambazov_automatic_2016}. -t\cite{dzhambazov_automatic_2014} -t\cite{dzhambazov_automatic_2016} t\cite{fujihara_automatic_2006} t\cite{fujihara_lyricsynchronizer:_2011} t\cite{fujihara_three_2008}