results.tex

   1 \section{\emph{Singing}-voice detection}
   2 Table~\ref{tbl:singing} shows the results for the singing-voice detection. The
   3 performance is given by the fraction of correctly classified samples
   4 (accuracy). The rows represent the count of hidden nodes, the columns represent
   5 the analysis window step size and the analysis window length in the \gls{MFCC}
   6 extraction. A ceiling effect was observed after two epochs for every hidden
   7 node configuration.
   8
   9 \begin{table}[H]
  10         \centering
  11         \begin{tabular}{rrccc}
  12                 \toprule
  13                    & & \multicolumn{3}{c}{Parameters (step/length)}\\
  14                    & & 10/25 & 40/100 & 80/200\\
  15                 \midrule
  16                 \multirow{4}{*}{Hidden Nodes}
  17                  & 3 & 0.86 (0.34) & 0.87 (0.32) & 0.85 (0.35)\\
  18                  & 5 & 0.87 (0.31) & 0.88 (0.30) & 0.87 (0.32)\\
  19                  & 8 & 0.88 (0.30) & 0.88 (0.31) & 0.88 (0.29)\\
  20                  & 13 & 0.89 (0.28) & 0.89 (0.29) & 0.88 (0.30)\\
  21                 \bottomrule
  22         \end{tabular}
  23         \caption{Binary classification results (accuracy (loss))}%
  24         \label{tbl:singing}
  25 \end{table}
  26
  27 Figure~\ref{fig:bclass} shows an example of a segment of a song with the
  28 classifier plotted underneath. For this illustration the $13$ node model is
  29 used with a analysis window size and step of $40$ and $100$ respectively. The
  30 output is smoothed using a hanning window. This figure shows that the model
  31 focusses on the frequencies around $300Hz$ which contain the growling. When
  32 there is a little silence in between the growls the classifier immediately
  33 drops. This phenomenon is visible throughout the songs.
  34
  35 \begin{figure}[H]
  36         \centering
  37         \includegraphics[width=1\linewidth]{bclass}
  38         \caption{Plotting the classifier under the audio signal}\label{fig:bclass}
  39 \end{figure}
  40
  41 \section{\emph{Singer}-voice detection}
  42 Table~\ref{tbl:singer} shows the results for the singer-voice detection. The
  43 same metrics are used as in \emph{Singing}-voice detection. In these
  44 experiment a ceiling effect was observed after two to three epochs for every
  45 hidden node configuration.
  46
  47 \begin{table}[H]
  48         \centering
  49         \begin{tabular}{rrccc}
  50                 \toprule
  51                    & & \multicolumn{3}{c}{Parameters (step/length)}\\
  52                    & & 10/25 & 40/100 & 80/200\\
  53                 \midrule
  54                 \multirow{4}{*}{Hidden Nodes}
  55                  & 3 & 0.83 (0.48) & 0.82 (0.48) & 0.82 (0.48)\\
  56                  & 5 & 0.85 (0.43) & 0.84 (0.44) & 0.84 (0.44)\\
  57                  & 8 & 0.86 (0.41) & 0.86 (0.39) & 0.86 (0.40)\\
  58                  & 13 & 0.87 (0.37) & 0.87 (0.38) & 0.86 (0.39)\\
  59                 \bottomrule
  60         \end{tabular}
  61         \caption{Multiclass classification results (accuracy
  62                 (loss))}\label{tbl:singer}
  63 \end{table}
  64
  65 \section{Alien data}
  66 To test the generalizability of the models the system is tested on alien data.
  67 The data was retrieved from the album \emph{The Desperation} by \emph{Godless
  68 Truth}. \emph{Godless Truth} is a so called old-school \gls{dm} band that has
  69 very raspy vocals and the vocals are very up front in the mastering. This means
  70 that the vocals are very prevalent in the recording and therefore no difficulty
  71 is expected for the classifier. Figure~\ref{fig:alien1} shows that indeed the
  72 classifier scores very accurately. Note that the spectogram settings have been
  73 adapted a little bit to make the picture more clear. The spectogram shows the
  74 frequency range from $0$ to $3000Hz$.
  75
  76 \begin{figure}[H]
  77         \centering
  78         \includegraphics[width=.7\linewidth]{alien1}.
  79         \caption{Plotting the classifier with alien data containing familiar vocal
  80         styles}\label{fig:alien1}
  81 \end{figure}
  82
  83 To really test the limits, a song from the highly atmospheric doom metal band
  84 called \emph{Catacombs} has been tested on the system. The album \emph{Echoes
  85 Through the Catacombs} is an album that has a lot of synthesizers, heavy
  86 droning guitars and bass lines. The vocals are not mixed in a way that makes
  87 them stand out. The models have never seen trainingsdata that is even remotely
  88 similar to this type of metal. Figure~\ref{fig:alien2} shows a segment of the
  89 data. It is visible that the classifier can not distinguish singing from non
  90 singing.
  91
  92 \begin{figure}[H]
  93         \centering
  94         \includegraphics[width=.7\linewidth]{alien2}.
  95         \caption{Plotting the classifier with alien data containing unobserved
  96         vocal styles}\label{fig:alien2}
  97 \end{figure}