true final
[asr1617.git] / results.tex
1 \section{\emph{Singing}-voice detection}
2 Table~\ref{tbl:singing} shows the results for the singing-voice detection. The
3 performance is given by the fraction of correctly classified samples
4 (accuracy). The rows represent the count of hidden nodes, the columns represent
5 the analysis window step size and the analysis window length in the \gls{MFCC}
6 extraction. A ceiling effect was observed after two epochs for every hidden
7 node configuration.
8
9 \begin{table}[H]
10 \centering
11 \begin{tabular}{rrccc}
12 \toprule
13 & & \multicolumn{3}{c}{Parameters (step/length)}\\
14 & & 10/25 & 40/100 & 80/200\\
15 \midrule
16 \multirow{4}{*}{Hidden Nodes}
17 & 3 & 0.86 (0.34) & 0.87 (0.32) & 0.85 (0.35)\\
18 & 5 & 0.87 (0.31) & 0.88 (0.30) & 0.87 (0.32)\\
19 & 8 & 0.88 (0.30) & 0.88 (0.31) & 0.88 (0.29)\\
20 & 13 & 0.89 (0.28) & 0.89 (0.29) & 0.88 (0.30)\\
21 \bottomrule
22 \end{tabular}
23 \caption{Binary classification results (accuracy (loss))}%
24 \label{tbl:singing}
25 \end{table}
26
27 Figure~\ref{fig:bclass} shows an example of a segment of a song with the
28 classifier plotted underneath. For this illustration the $13$ node model is
29 used with a analysis window size and step of $40$ and $100$ respectively. The
30 output is smoothed using a hanning window. This figure shows that the model
31 focusses on the frequencies around $300Hz$ which contain the growling. When
32 there is a little silence in between the growls the classifier immediately
33 drops. This phenomenon is visible throughout the songs.
34
35 \begin{figure}[H]
36 \centering
37 \includegraphics[width=1\linewidth]{bclass}
38 \caption{Plotting the classifier under the audio signal}\label{fig:bclass}
39 \end{figure}
40
41 \section{\emph{Singer}-voice detection}
42 Table~\ref{tbl:singer} shows the results for the singer-voice detection. The
43 same metrics are used as in \emph{Singing}-voice detection. In these
44 experiment a ceiling effect was observed after two to three epochs for every
45 hidden node configuration.
46
47 \begin{table}[H]
48 \centering
49 \begin{tabular}{rrccc}
50 \toprule
51 & & \multicolumn{3}{c}{Parameters (step/length)}\\
52 & & 10/25 & 40/100 & 80/200\\
53 \midrule
54 \multirow{4}{*}{Hidden Nodes}
55 & 3 & 0.83 (0.48) & 0.82 (0.48) & 0.82 (0.48)\\
56 & 5 & 0.85 (0.43) & 0.84 (0.44) & 0.84 (0.44)\\
57 & 8 & 0.86 (0.41) & 0.86 (0.39) & 0.86 (0.40)\\
58 & 13 & 0.87 (0.37) & 0.87 (0.38) & 0.86 (0.39)\\
59 \bottomrule
60 \end{tabular}
61 \caption{Multiclass classification results (accuracy
62 (loss))}\label{tbl:singer}
63 \end{table}
64
65 \section{Alien data}
66 To test the generalizability of the models the system is tested on alien data.
67 The data was retrieved from the album \emph{The Desperation} by \emph{Godless
68 Truth}. \emph{Godless Truth} is a so called old-school \gls{dm} band that has
69 very raspy vocals and the vocals are very up front in the mastering. This means
70 that the vocals are very prevalent in the recording and therefore no difficulty
71 is expected for the classifier. Figure~\ref{fig:alien1} shows that indeed the
72 classifier scores very accurately. Note that the spectogram settings have been
73 adapted a little bit to make the picture more clear. The spectogram shows the
74 frequency range from $0$ to $3000Hz$.
75
76 \begin{figure}[H]
77 \centering
78 \includegraphics[width=.7\linewidth]{alien1}.
79 \caption{Plotting the classifier with alien data containing familiar vocal
80 styles}\label{fig:alien1}
81 \end{figure}
82
83 To really test the limits, a song from the highly atmospheric doom metal band
84 called \emph{Catacombs} has been tested on the system. The album \emph{Echoes
85 Through the Catacombs} is an album that has a lot of synthesizers, heavy
86 droning guitars and bass lines. The vocals are not mixed in a way that makes
87 them stand out. The models have never seen trainingsdata that is even remotely
88 similar to this type of metal. Figure~\ref{fig:alien2} shows a segment of the
89 data. It is visible that the classifier can not distinguish singing from non
90 singing.
91
92 \begin{figure}[H]
93 \centering
94 \includegraphics[width=.7\linewidth]{alien2}.
95 \caption{Plotting the classifier with alien data containing unobserved
96 vocal styles}\label{fig:alien2}
97 \end{figure}