Add glossary and start making excerpts from the literature found
[asr1617.git] / asr.tex
1 %&asr
2 \usepackage[nonumberlist,acronyms]{glossaries}
3 \makeglossaries%
4 \newacronym{HMM}{HMM}{Hidden Markov Model}
5 \newacronym{HTK}{HTK}{\acrlong{HMM} Toolkit}
6 \newacronym{FA}{FA}{Forced alignment}
7 \newacronym{MFC}{MFC}{Mel-frequency cepstrum}
8 \newacronym{MFCC}{MFCC}{\acrlong{MFC} coefficient}
9 %\newglossaryentry{mTask}{name=mTask,
10 % description={is an abstraction for \glspl{Task} living on \acrshort{IoT} devices}}
11
12 \begin{document}
13 %Titlepage
14 \maketitleru[
15 course={(Automatic) Speech Recognition},
16 institute={Radboud University Nijmegen},
17 authorstext={Author:}]
18 \listoftodos[Todo]
19
20 \tableofcontents
21
22 %Glossaries
23 \glsaddall{}
24 \printglossaries%
25
26 Berenzweig and Ellis use acoustic classifiers from speech recognition as a
27 detector for singing lines. They achive 80\% accuracy for forty 15 second
28 exerpts. They mention people that wrote signal features that discriminate
29 between speech and music. Neural net
30 \glspl{HMM}.\cite{berenzweig_locating_2001}.
31
32 In 2014 Dzhambazov et al.\ applied state of the art segmentation methods to
33 polyphonic turkish music, this might be interesting to use for heavy metal.
34 They mention Fujihara(2011) to have a similar \gls{FA} system. This method uses
35 phone level segmentation, first 12 \gls{MFCC}s. They first do vocal/non-vocal
36 detection, then melody extraction, then alignment. They compare results with
37 Mesaros \& Virtanen, 2008.
38
39 t\cite{dzhambazov_automatic_2014}
40 t\cite{dzhambazov_automatic_2016}
41 t\cite{fujihara_automatic_2006}
42 t\cite{fujihara_lyricsynchronizer:_2011}
43 t\cite{fujihara_three_2008}
44 t\cite{mauch_integrating_2012}
45 t\cite{mesaros_adaptation_2009}
46 t\cite{mesaros_automatic_2008}
47 t\cite{mesaros_automatic_2010}
48 t\cite{muller_multimodal_2012}
49 t\cite{pedone_phoneme-level_2011}
50 t\cite{yang_machine_2012}
51
52 \bibliographystyle{ieeetr}
53 \bibliography{asr}
54 \end{document}