\newacronym{HMM}{HMM}{Hidden Markov Model}
+\newacronym{GMM}{GMM}{Gaussian Mixture Models}
+\newacronym{DHMM}{DHMM}{Duration-explicit \acrlong{HMM}}
\newacronym{HTK}{HTK}{\acrlong{HMM} Toolkit}
\newacronym{FA}{FA}{Forced alignment}
\newacronym{MFC}{MFC}{Mel-frequency cepstrum}
detector for singing lines. They achive 80\% accuracy for forty 15 second
exerpts. They mention people that wrote signal features that discriminate
between speech and music. Neural net
In 2014 Dzhambazov et al.\ applied state of the art segmentation methods to
polyphonic turkish music, this might be interesting to use for heavy metal.
They mention Fujihara(2011) to have a similar \gls{FA} system. This method uses
phone level segmentation, first 12 \gls{MFCC}s. They first do vocal/non-vocal
detection, then melody extraction, then alignment. They compare results with
-Mesaros \& Virtanen, 2008.
+Mesaros \& Virtanen, 2008~\cite{dzhambazov_automatic_2014}. Later they
+specialize in long syllables in a capella. They use \glspl{DHMM} with
+\glspl{GMM} and show that adding knowledge increases alignment (bejing opera
+has long syllables)~\cite{dzhambazov_automatic_2016}.