results.tex

   1 \section{\emph{Singing}-voice detection}
   2 Table~\ref{tbl:singing} shows the results for the singing-voice detection. The
   3 performance is given by the fraction of correctly classified samples
   4 (accuracy). The rows represent the count of hidden nodes, the columns represent
   5 the analysis window step size and the analysis window length in the \gls{MFCC}
   6 extraction.
   7
   8 \begin{table}[H]
   9         \centering
  10         \begin{tabular}{rrccc}
  11                 \toprule
  12                    & & \multicolumn{3}{c}{Parameters (step/length)}\\
  13                    & & 10/25 & 40/100 & 80/200\\
  14                 \midrule
  15                 \multirow{4}{*}{Hidden Nodes}
  16                  & 3 & 0.86 (0.34) & 0.87 (0.32) & 0.85 (0.35)\\
  17                  & 5 & 0.87 (0.31) & 0.88 (0.30) & 0.87 (0.32)\\
  18                  & 8 & 0.88 (0.30) & 0.88 (0.31) & 0.88 (0.29)\\
  19                  & 13 & 0.89 (0.28) & 0.89 (0.29) & 0.88 (0.30)\\
  20                 \bottomrule
  21         \end{tabular}
  22         \caption{Binary classification results (accuracy (loss))}%
  23         \label{tbl:singing}
  24 \end{table}
  25
  26 Figure~\ref{fig:bclass} shows an example of a segment of a song with the
  27 classifier plotted underneath. For this illustration the $13$ node model is
  28 used with a analysis window size and step of $40$ and $100$ respectively. The
  29 output is smoothed using a hanning window. This figure shows that the model
  30 focusses on the frequencies around $300Hz$ which contain the growling. When
  31 there is a little silence in between the growls the classifier immediately
  32 drops. This phenomenon is visible throughout the songs.
  33
  34 \begin{figure}[H]
  35         \centering
  36         \includegraphics[width=1\linewidth]{bclass}
  37         \caption{Plotting the classifier under the audio signal}\label{fig:bclass}
  38 \end{figure}
  39
  40 \section{\emph{Singer}-voice detection}
  41 Table~\ref{tbl:singer} shows the results for the singer-voice detection. The
  42 same metrics are used as in \emph{Singing}-voice detection.
  43
  44 \begin{table}[H]
  45         \centering
  46         \begin{tabular}{rrccc}
  47                 \toprule
  48                    & & \multicolumn{3}{c}{Parameters (step/length)}\\
  49                    & & 10/25 & 40/100 & 80/200\\
  50                 \midrule
  51                 \multirow{4}{*}{Hidden Nodes}
  52                  & 3 & 0.83 (0.48) & 0.82 (0.48) & 0.82 (0.48)\\
  53                  & 5 & 0.85 (0.43) & 0.84 (0.44) & 0.84 (0.44)\\
  54                  & 8 & 0.86 (0.41) & 0.86 (0.39) & 0.86 (0.40)\\
  55                  & 13 & 0.87 (0.37) & 0.87 (0.38) & 0.86 (0.39)\\
  56                 \bottomrule
  57         \end{tabular}
  58         \caption{Multiclass classification results (accuracy
  59                 (loss))}\label{tbl:singer}
  60 \end{table}
  61
  62 \section{Alien data}
  63 To test the generalizability of the models the system is tested on alien data.
  64 The data was retrieved from the album \emph{The Desperation} by \emph{Godless
  65 Truth}. \emph{Godless Truth} is a so called old-school \gls{dm} band that has
  66 very raspy vocals and the vocals are very up front in the mastering. This means
  67 that the vocals are very prevalent in the recording and therefore no difficulty
  68 is expected for the classifier. Figure~\ref{fig:alien1} shows that indeed the
  69 classifier scores very accurately. Note that the spectogram settings have been
  70 adapted a little bit to make the picture more clear. The spectogram shows the
  71 frequency range from $0$ to $3000Hz$.
  72
  73 \begin{figure}[H]
  74         \centering
  75         \includegraphics[width=.7\linewidth]{alien1}.
  76         \caption{Plotting the classifier with alien data containing familiar vocal
  77         styles}\label{fig:alien1}
  78 \end{figure}
  79
  80 To really test the limits, a song from the highly atmospheric doom metal band
  81 called \emph{Catacombs} has been tested on the system. The album \emph{Echoes
  82 Through the Catacombs} is an album that has a lot of synthesizers, heavy
  83 droning guitars and bass lines. The vocals are not mixed in a way that makes
  84 them stand out. The models have never seen trainingsdata that is even remotely
  85 similar to this type of metal. Figure~\ref{fig:alien2} shows a segment of the
  86 data. It is visible that the classifier can not distinguish singing from non
  87 singing.
  88
  89 \begin{figure}[H]
  90         \centering
  91         \includegraphics[width=.7\linewidth]{alien2}.
  92         \caption{Plotting the classifier with alien data containing strange vocal
  93         styles}\label{fig:alien2}
  94 \end{figure}