add excerpt for second dzhambazov paper

author Mart Lubbers <mart@martlubbers.net>

Thu, 2 Mar 2017 20:32:39 +0000 (21:32 +0100)

committer Mart Lubbers <mart@martlubbers.net>

Thu, 2 Mar 2017 20:32:39 +0000 (21:32 +0100)
author Mart Lubbers <mart@martlubbers.net>
Thu, 2 Mar 2017 20:32:39 +0000 (21:32 +0100)
committer Mart Lubbers <mart@martlubbers.net>
Thu, 2 Mar 2017 20:32:39 +0000 (21:32 +0100)
diff --git a/asr.tex b/asr.tex

index bfbcf86..42af838 100644 (file)
--- a/asr.tex
+++ b/asr.tex
@@ -2,6 +2,8 @@
  \usepackage[nonumberlist,acronyms]{glossaries}
  \makeglossaries%
  \newacronym{HMM}{HMM}{Hidden Markov Model}
  \usepackage[nonumberlist,acronyms]{glossaries}
  \makeglossaries%
  \newacronym{HMM}{HMM}{Hidden Markov Model}
+\newacronym{GMM}{GMM}{Gaussian Mixture Models}
+\newacronym{DHMM}{DHMM}{Duration-explicit \acrlong{HMM}}
  \newacronym{HTK}{HTK}{\acrlong{HMM} Toolkit}
  \newacronym{FA}{FA}{Forced alignment}
  \newacronym{MFC}{MFC}{Mel-frequency cepstrum}
  \newacronym{HTK}{HTK}{\acrlong{HMM} Toolkit}
  \newacronym{FA}{FA}{Forced alignment}
  \newacronym{MFC}{MFC}{Mel-frequency cepstrum}
@@ -27,17 +29,18 @@ Berenzweig and Ellis use acoustic classifiers from speech recognition as a
  detector for singing lines.  They achive 80\% accuracy for forty 15 second
  exerpts. They mention people that wrote signal features that discriminate
  between speech and music. Neural net
  detector for singing lines.  They achive 80\% accuracy for forty 15 second
  exerpts. They mention people that wrote signal features that discriminate
  between speech and music. Neural net
-\glspl{HMM}.\cite{berenzweig_locating_2001}.
+\glspl{HMM}~\cite{berenzweig_locating_2001}.
  
  In 2014 Dzhambazov et al.\ applied state of the art segmentation methods to
  polyphonic turkish music, this might be interesting to use for heavy metal.
  They mention Fujihara(2011) to have a similar \gls{FA} system. This method uses
  phone level segmentation, first 12 \gls{MFCC}s. They first do vocal/non-vocal
  detection, then melody extraction, then alignment. They compare results with
  
  In 2014 Dzhambazov et al.\ applied state of the art segmentation methods to
  polyphonic turkish music, this might be interesting to use for heavy metal.
  They mention Fujihara(2011) to have a similar \gls{FA} system. This method uses
  phone level segmentation, first 12 \gls{MFCC}s. They first do vocal/non-vocal
  detection, then melody extraction, then alignment. They compare results with
-Mesaros \& Virtanen, 2008.
+Mesaros \& Virtanen, 2008~\cite{dzhambazov_automatic_2014}. Later they
+specialize in long syllables in a capella. They use \glspl{DHMM} with
+\glspl{GMM} and show that adding knowledge increases alignment (bejing opera
+has long syllables)~\cite{dzhambazov_automatic_2016}.
  
  
-t\cite{dzhambazov_automatic_2014}
-t\cite{dzhambazov_automatic_2016}
  t\cite{fujihara_automatic_2006}
  t\cite{fujihara_lyricsynchronizer:_2011}
  t\cite{fujihara_three_2008}
  t\cite{fujihara_automatic_2006}
  t\cite{fujihara_lyricsynchronizer:_2011}
  t\cite{fujihara_three_2008}
author	Mart Lubbers <mart@martlubbers.net>
	Thu, 2 Mar 2017 20:32:39 +0000 (21:32 +0100)
committer	Mart Lubbers <mart@martlubbers.net>
	Thu, 2 Mar 2017 20:32:39 +0000 (21:32 +0100)