new classifier picture and extended results
[asr1617.git] / results.tex
1 \section{\emph{Singing}-voice detection}
2 Table~\ref{tbl:singing} shows the results for the singing-voice detection. The
3 performance is given by the accuracy (and loss). The accuracy is the percentage
4 of correctly classified samples.
5
6 Figure~\ref{fig:bclass} shows an example of a segment of a song with the
7 classifier plotted underneath. For this illustration the $13$ node model is
8 used with a analysis window size and step of $40$ and $100ms$ respectively. The
9 output is smoothed using a hanning window.
10
11 \begin{table}[H]
12 \centering
13 \begin{tabular}{rccc}
14 \toprule
15 & \multicolumn{3}{c}{Parameters (step/length)}\\
16 & 10/25 & 40/100 & 80/200\\
17 \midrule
18 \multirow{4}{*}{Hidden Nodes}
19 & 0.86 (0.34) & 0.87 (0.32) & 0.85 (0.35)\\
20 & 0.87 (0.31) & 0.88 (0.30) & 0.87 (0.32)\\
21 & 0.88 (0.30) & 0.88 (0.31) & 0.88 (0.29)\\
22 & 0.89 (0.28) & 0.89 (0.29) & 0.88 (0.30)\\
23 \bottomrule
24 \end{tabular}
25 \caption{Binary classification results (accuracy (loss))}%
26 \label{tbl:singing}
27 \end{table}
28
29 \begin{figure}[H]
30 \centering
31 \includegraphics[width=1\linewidth]{bclass}
32 \caption{Plotting the classifier under the audio signal}\label{fig:bclass}
33 \end{figure}
34
35 \section{\emph{Singer}-voice detection}
36 Table~\ref{tbl:singer} shows the results for the singer-voice detection. The
37 same metrics are used as in \emph{Singing}-voice detection.
38
39 \begin{table}[H]
40 \centering
41 \begin{tabular}{rccc}
42 \toprule
43 & \multicolumn{3}{c}{Parameters (step/length)}\\
44 & 10/25 & 40/100 & 80/200\\
45 \midrule
46 \multirow{4}{*}{Hidden Nodes}
47 & 0.83 (0.48) & 0.82 (0.48) & 0.82 (0.48)\\
48 & 0.85 (0.43) & 0.84 (0.44) & 0.84 (0.44)\\
49 & 0.86 (0.41) & 0.86 (0.39) & 0.86 (0.40)\\
50 & 0.87 (0.37) & 0.87 (0.38) & 0.86 (0.39)\\
51 \bottomrule
52 \end{tabular}
53 \caption{Multiclass classification results (accuracy
54 (loss))}\label{tbl:singer}
55 \end{table}
56
57 \section{Alien data}
58 To test the generalizability of the models the system is tested on alien data.
59 The data was retrieved from the album \emph{The Desperation} by \emph{Godless
60 Truth}. \emph{Godless Truth} is a so called old-school \gls{dm} band that has
61 very raspy vocals and the vocals are very up front in the mastering. This means
62 that the vocals are very prevalent in the recording and therefore no difficulty
63 is expected for the classifier. Figure~\ref{fig:alien1} shows that indeed the
64 classifier scores very accurately. Note that the spectogram settings have been
65 adapted a little bit to make the picture more clear. The spectogram shows the
66 frequency range from $0$ to $3000Hz$.
67
68 \begin{figure}[H]
69 \centering
70 \includegraphics[width=.7\linewidth]{alien1}.
71 \caption{Plotting the classifier with alien data containing familiar vocal
72 styles}\label{fig:alien1}
73 \end{figure}
74
75 To really test the limits, a song from the highly atmospheric doom metal band
76 called \emph{Catacombs} has been tested on the system. The album \emph{Echoes
77 Through the Catacombs} is an album that has a lot of synthesizers, heavy
78 droning guitars and bass lines. The vocals are not mixed in a way that makes
79 them stand out. The models have never seen trainingsdata that is even remotely
80 similar to this type of metal. Figure~\ref{fig:alien2} shows a segment of the
81 data. It is visible that the classifier can not distinguish singing from non
82 singing.
83
84 \begin{figure}[H]
85 \centering
86 \includegraphics[width=.7\linewidth]{alien2}.
87 \caption{Plotting the classifier with alien data containing strange vocal
88 styles}\label{fig:alien2}
89 \end{figure}