results.tex

   1 \section{\emph{Singing}-voice detection}
   2 Table~\ref{tbl:singing} shows the results for the singing-voice detection. The
   3 performance is given by the accuracy (and loss). The accuracy is the percentage
   4 of correctly classified samples.
   5
   6 Figure~\ref{fig:bclass} shows an example of a segment of a song with the
   7 classifier plotted underneath. For this illustration the $13$ node model is
   8 used with a analysis window size and step of $40$ and $100ms$ respectively. The
   9 output is smoothed using a hanning window.
  10
  11 \begin{table}[H]
  12         \centering
  13         \begin{tabular}{rccc}
  14                 \toprule
  15                    & \multicolumn{3}{c}{Parameters (step/length)}\\
  16                     & 10/25 & 40/100 & 80/200\\
  17                 \midrule
  18                 \multirow{4}{*}{Hidden Nodes}
  19                  & 0.86 (0.34) & 0.87 (0.32) & 0.85 (0.35)\\
  20                  & 0.87 (0.31) & 0.88 (0.30) & 0.87 (0.32)\\
  21                  & 0.88 (0.30) & 0.88 (0.31) & 0.88 (0.29)\\
  22                  & 0.89 (0.28) & 0.89 (0.29) & 0.88 (0.30)\\
  23                 \bottomrule
  24         \end{tabular}
  25         \caption{Binary classification results (accuracy (loss))}%
  26         \label{tbl:singing}
  27 \end{table}
  28
  29 \begin{figure}[H]
  30         \centering
  31         \includegraphics[width=1\linewidth]{bclass}
  32         \caption{Plotting the classifier under the audio signal}\label{fig:bclass}
  33 \end{figure}
  34
  35 \section{\emph{Singer}-voice detection}
  36 Table~\ref{tbl:singer} shows the results for the singer-voice detection. The
  37 same metrics are used as in \emph{Singing}-voice detection.
  38
  39 \begin{table}[H]
  40         \centering
  41         \begin{tabular}{rccc}
  42                 \toprule
  43                    & \multicolumn{3}{c}{Parameters (step/length)}\\
  44                     & 10/25 & 40/100 & 80/200\\
  45                 \midrule
  46                 \multirow{4}{*}{Hidden Nodes}
  47                  & 0.83 (0.48) & 0.82 (0.48) & 0.82 (0.48)\\
  48                  & 0.85 (0.43) & 0.84 (0.44) & 0.84 (0.44)\\
  49                  & 0.86 (0.41) & 0.86 (0.39) & 0.86 (0.40)\\
  50                  & 0.87 (0.37) & 0.87 (0.38) & 0.86 (0.39)\\
  51                 \bottomrule
  52         \end{tabular}
  53         \caption{Multiclass classification results (accuracy
  54                 (loss))}\label{tbl:singer}
  55 \end{table}
  56
  57 \section{Alien data}
  58 To test the generalizability of the models the system is tested on alien data.
  59 The data was retrieved from the album \emph{The Desperation} by \emph{Godless
  60 Truth}. \emph{Godless Truth} is a so called old-school \gls{dm} band that has
  61 very raspy vocals and the vocals are very up front in the mastering. This means
  62 that the vocals are very prevalent in the recording and therefore no difficulty
  63 is expected for the classifier. Figure~\ref{fig:alien1} shows that indeed the
  64 classifier scores very accurately. Note that the spectogram settings have been
  65 adapted a little bit to make the picture more clear. The spectogram shows the
  66 frequency range from $0$ to $3000Hz$.
  67
  68 \begin{figure}[H]
  69         \centering
  70         \includegraphics[width=.7\linewidth]{alien1}.
  71         \caption{Plotting the classifier with alien data containing familiar vocal
  72         styles}\label{fig:alien1}
  73 \end{figure}
  74
  75 To really test the limits, a song from the highly atmospheric doom metal band
  76 called \emph{Catacombs} has been tested on the system. The album \emph{Echoes
  77 Through the Catacombs} is an album that has a lot of synthesizers, heavy
  78 droning guitars and bass lines. The vocals are not mixed in a way that makes
  79 them stand out. The models have never seen trainingsdata that is even remotely
  80 similar to this type of metal. Figure~\ref{fig:alien2} shows a segment of the
  81 data. It is visible that the classifier can not distinguish singing from non
  82 singing.
  83
  84 \begin{figure}[H]
  85         \centering
  86         \includegraphics[width=.7\linewidth]{alien2}.
  87         \caption{Plotting the classifier with alien data containing strange vocal
  88         styles}\label{fig:alien2}
  89 \end{figure}