finalize proposal
[asr1617.git] / asr.tex
1 %&asr
2 \usepackage[nonumberlist,acronyms]{glossaries}
3 \makeglossaries%
4 \newacronym{HMM}{HMM}{Hidden Markov Model}
5 \newacronym{GMM}{GMM}{Gaussian Mixture Models}
6 \newacronym{DHMM}{DHMM}{Duration-explicit \acrlong{HMM}}
7 \newacronym{HTK}{HTK}{\acrlong{HMM} Toolkit}
8 \newacronym{FA}{FA}{Forced alignment}
9 \newacronym{MFC}{MFC}{Mel-frequency cepstrum}
10 \newacronym{MFCC}{MFCC}{\acrlong{MFC} coefficient}
11 %\newglossaryentry{mTask}{name=mTask,
12 % description={is an abstraction for \glspl{Task} living on \acrshort{IoT} devices}}
13
14 \begin{document}
15 %Titlepage
16 \maketitleru[
17 course={(Automatic) Speech Recognition},
18 institute={Radboud University Nijmegen},
19 authorstext={Author:}]
20 \listoftodos[Todo]
21
22 \tableofcontents
23
24 %Glossaries
25 \glsaddall{}
26 \printglossaries%
27
28 Berenzweig and Ellis use acoustic classifiers from speech recognition as a
29 detector for singing lines. They achive 80\% accuracy for forty 15 second
30 exerpts. They mention people that wrote signal features that discriminate
31 between speech and music. Neural net
32 \glspl{HMM}~\cite{berenzweig_locating_2001}.
33
34 In 2014 Dzhambazov et al.\ applied state of the art segmentation methods to
35 polyphonic turkish music, this might be interesting to use for heavy metal.
36 They mention Fujihara(2011) to have a similar \gls{FA} system. This method uses
37 phone level segmentation, first 12 \gls{MFCC}s. They first do vocal/non-vocal
38 detection, then melody extraction, then alignment. They compare results with
39 Mesaros \& Virtanen, 2008~\cite{dzhambazov_automatic_2014}. Later they
40 specialize in long syllables in a capella. They use \glspl{DHMM} with
41 \glspl{GMM} and show that adding knowledge increases alignment (bejing opera
42 has long syllables)~\cite{dzhambazov_automatic_2016}.
43
44 t\cite{fujihara_automatic_2006}
45 t\cite{fujihara_lyricsynchronizer:_2011}
46 t\cite{fujihara_three_2008}
47 t\cite{mauch_integrating_2012}
48 t\cite{mesaros_adaptation_2009}
49 t\cite{mesaros_automatic_2008}
50 t\cite{mesaros_automatic_2010}
51 t\cite{muller_multimodal_2012}
52 t\cite{pedone_phoneme-level_2011}
53 t\cite{yang_machine_2012}
54
55
56 %Introduction, leading to a clearly defined research question
57 %Literature overview / related work
58 %Methodology
59 %Experiment(s) (set-up, data, results, discussion)
60 %Discussion section
61 %Conclusion section
62 %Acknowledgements
63 %Statement on authors' contributions
64 %(Appendices)
65
66 \bibliographystyle{ieeetr}
67 \bibliography{asr}
68 \end{document}