asr.tex

   1 %&asr
   2 \usepackage[nonumberlist,acronyms]{glossaries}
   3 \makeglossaries%
   4 \newacronym{HMM}{HMM}{Hidden Markov Model}
   5 \newacronym{GMM}{GMM}{Gaussian Mixture Models}
   6 \newacronym{DHMM}{DHMM}{Duration-explicit \acrlong{HMM}}
   7 \newacronym{HTK}{HTK}{\acrlong{HMM} Toolkit}
   8 \newacronym{FA}{FA}{Forced alignment}
   9 \newacronym{MFC}{MFC}{Mel-frequency cepstrum}
  10 \newacronym{MFCC}{MFCC}{\acrlong{MFC} coefficient}
  11 %\newglossaryentry{mTask}{name=mTask,
  12 %       description={is an abstraction for \glspl{Task} living on \acrshort{IoT} devices}}
  13
  14 \begin{document}
  15 %Titlepage
  16 \maketitleru[
  17         course={(Automatic) Speech Recognition},
  18         institute={Radboud University Nijmegen},
  19         authorstext={Author:}]
  20 \listoftodos[Todo]
  21
  22 \tableofcontents
  23
  24 %Glossaries
  25 \glsaddall{}
  26 \printglossaries%
  27
  28 Berenzweig and Ellis use acoustic classifiers from speech recognition as a
  29 detector for singing lines.  They achive 80\% accuracy for forty 15 second
  30 exerpts. They mention people that wrote signal features that discriminate
  31 between speech and music. Neural net
  32 \glspl{HMM}~\cite{berenzweig_locating_2001}.
  33
  34 In 2014 Dzhambazov et al.\ applied state of the art segmentation methods to
  35 polyphonic turkish music, this might be interesting to use for heavy metal.
  36 They mention Fujihara(2011) to have a similar \gls{FA} system. This method uses
  37 phone level segmentation, first 12 \gls{MFCC}s. They first do vocal/non-vocal
  38 detection, then melody extraction, then alignment. They compare results with
  39 Mesaros \& Virtanen, 2008~\cite{dzhambazov_automatic_2014}. Later they
  40 specialize in long syllables in a capella. They use \glspl{DHMM} with
  41 \glspl{GMM} and show that adding knowledge increases alignment (bejing opera
  42 has long syllables)~\cite{dzhambazov_automatic_2016}.
  43
  44 t\cite{fujihara_automatic_2006}
  45 t\cite{fujihara_lyricsynchronizer:_2011}
  46 t\cite{fujihara_three_2008}
  47 t\cite{mauch_integrating_2012}
  48 t\cite{mesaros_adaptation_2009}
  49 t\cite{mesaros_automatic_2008}
  50 t\cite{mesaros_automatic_2010}
  51 t\cite{muller_multimodal_2012}
  52 t\cite{pedone_phoneme-level_2011}
  53 t\cite{yang_machine_2012}
  54
  55
  56 %Introduction, leading to a clearly defined research question
  57 %Literature overview / related work
  58 %Methodology
  59 %Experiment(s) (set-up, data, results, discussion)
  60 %Discussion section
  61 %Conclusion section
  62 %Acknowledgements
  63 %Statement on authors' contributions
  64 %(Appendices)
  65
  66 \bibliographystyle{ieeetr}
  67 \bibliography{asr}
  68 \end{document}