typos
[asr1617.git] / results.tex
1 \section{\emph{Singing}-voice detection}
2 Table~\ref{tbl:singing} shows the results for the singing-voice detection. The
3 performance is given by the fraction of correctly classified samples
4 (accuracy). The rows represent the count of hidden nodes, the columns represent
5 the analysis window step size and the analysis window length in the \gls{MFCC}
6 extraction.
7
8 \begin{table}[H]
9 \centering
10 \begin{tabular}{rrccc}
11 \toprule
12 & & \multicolumn{3}{c}{Parameters (step/length)}\\
13 & & 10/25 & 40/100 & 80/200\\
14 \midrule
15 \multirow{4}{*}{Hidden Nodes}
16 & 3 & 0.86 (0.34) & 0.87 (0.32) & 0.85 (0.35)\\
17 & 5 & 0.87 (0.31) & 0.88 (0.30) & 0.87 (0.32)\\
18 & 8 & 0.88 (0.30) & 0.88 (0.31) & 0.88 (0.29)\\
19 & 13 & 0.89 (0.28) & 0.89 (0.29) & 0.88 (0.30)\\
20 \bottomrule
21 \end{tabular}
22 \caption{Binary classification results (accuracy (loss))}%
23 \label{tbl:singing}
24 \end{table}
25
26 Figure~\ref{fig:bclass} shows an example of a segment of a song with the
27 classifier plotted underneath. For this illustration the $13$ node model is
28 used with a analysis window size and step of $40$ and $100$ respectively. The
29 output is smoothed using a hanning window. This figure shows that the model
30 focusses on the frequencies around $300Hz$ which contain the growling. When
31 there is a little silence in between the growls the classifier immediately
32 drops. This phenomenon is visible throughout the songs.
33
34 \begin{figure}[H]
35 \centering
36 \includegraphics[width=1\linewidth]{bclass}
37 \caption{Plotting the classifier under the audio signal}\label{fig:bclass}
38 \end{figure}
39
40 \section{\emph{Singer}-voice detection}
41 Table~\ref{tbl:singer} shows the results for the singer-voice detection. The
42 same metrics are used as in \emph{Singing}-voice detection.
43
44 \begin{table}[H]
45 \centering
46 \begin{tabular}{rrccc}
47 \toprule
48 & & \multicolumn{3}{c}{Parameters (step/length)}\\
49 & & 10/25 & 40/100 & 80/200\\
50 \midrule
51 \multirow{4}{*}{Hidden Nodes}
52 & 3 & 0.83 (0.48) & 0.82 (0.48) & 0.82 (0.48)\\
53 & 5 & 0.85 (0.43) & 0.84 (0.44) & 0.84 (0.44)\\
54 & 8 & 0.86 (0.41) & 0.86 (0.39) & 0.86 (0.40)\\
55 & 13 & 0.87 (0.37) & 0.87 (0.38) & 0.86 (0.39)\\
56 \bottomrule
57 \end{tabular}
58 \caption{Multiclass classification results (accuracy
59 (loss))}\label{tbl:singer}
60 \end{table}
61
62 \section{Alien data}
63 To test the generalizability of the models the system is tested on alien data.
64 The data was retrieved from the album \emph{The Desperation} by \emph{Godless
65 Truth}. \emph{Godless Truth} is a so called old-school \gls{dm} band that has
66 very raspy vocals and the vocals are very up front in the mastering. This means
67 that the vocals are very prevalent in the recording and therefore no difficulty
68 is expected for the classifier. Figure~\ref{fig:alien1} shows that indeed the
69 classifier scores very accurately. Note that the spectogram settings have been
70 adapted a little bit to make the picture more clear. The spectogram shows the
71 frequency range from $0$ to $3000Hz$.
72
73 \begin{figure}[H]
74 \centering
75 \includegraphics[width=.7\linewidth]{alien1}.
76 \caption{Plotting the classifier with alien data containing familiar vocal
77 styles}\label{fig:alien1}
78 \end{figure}
79
80 To really test the limits, a song from the highly atmospheric doom metal band
81 called \emph{Catacombs} has been tested on the system. The album \emph{Echoes
82 Through the Catacombs} is an album that has a lot of synthesizers, heavy
83 droning guitars and bass lines. The vocals are not mixed in a way that makes
84 them stand out. The models have never seen trainingsdata that is even remotely
85 similar to this type of metal. Figure~\ref{fig:alien2} shows a segment of the
86 data. It is visible that the classifier can not distinguish singing from non
87 singing.
88
89 \begin{figure}[H]
90 \centering
91 \includegraphics[width=.7\linewidth]{alien2}.
92 \caption{Plotting the classifier with alien data containing strange vocal
93 styles}\label{fig:alien2}
94 \end{figure}