\section{\emph{Singing}-voice detection}
-Table~\ref{tbl:singing} shows the results for the singing-voice detection.
-Figure~\ref{fig:bclass} shows an example of a segment of a song with the
-classifier plotted underneath to illustrate the performance. The performance is
-given by the accuracy and loss. The accuracy is the percentage of correctly
-classified samples.
+Table~\ref{tbl:singing} shows the results for the singing-voice detection. The
+performance is given by the fraction of correctly classified samples
+(accuracy). The rows represent the count of hidden nodes, the columns represent
+the analysis window step size and the analysis window length in the \gls{MFCC}
+extraction. A ceiling effect was observed after two epochs for every hidden
+node configuration.
\begin{table}[H]
\centering
- \begin{tabular}{rccc}
+ \begin{tabular}{rrccc}
\toprule
- & \multicolumn{3}{c}{Parameters (step/length)}\\
- & 10/25 & 40/100 & 80/200\\
+ & & \multicolumn{3}{c}{Parameters (step/length)}\\
+ & & 10/25 & 40/100 & 80/200\\
\midrule
\multirow{4}{*}{Hidden Nodes}
- & 0.86 (0.34) & 0.87 (0.32) & 0.85 (0.35)\\
- & 0.87 (0.31) & 0.88 (0.30) & 0.87 (0.32)\\
- & 0.88 (0.30) & 0.88 (0.31) & 0.88 (0.29)\\
- & 0.89 (0.28) & 0.89 (0.29) & 0.88 (0.30)\\
+ & 3 & 0.86 (0.34) & 0.87 (0.32) & 0.85 (0.35)\\
+ & 5 & 0.87 (0.31) & 0.88 (0.30) & 0.87 (0.32)\\
+ & 8 & 0.88 (0.30) & 0.88 (0.31) & 0.88 (0.29)\\
+ & 13 & 0.89 (0.28) & 0.89 (0.29) & 0.88 (0.30)\\
\bottomrule
\end{tabular}
- \caption{Binary classification results (accuracy
- (loss))}\label{tbl:singing}
+ \caption{Binary classification results (accuracy (loss))}%
+ \label{tbl:singing}
\end{table}
-Plotting the classifier under a segment of the data results in
-Figure~\ref{fig:bclass}.
+Figure~\ref{fig:bclass} shows an example of a segment of a song with the
+classifier plotted underneath. For this illustration the $13$ node model is
+used with a analysis window size and step of $40$ and $100$ respectively. The
+output is smoothed using a hanning window. This figure shows that the model
+focusses on the frequencies around $300Hz$ which contain the growling. When
+there is a little silence in between the growls the classifier immediately
+drops. This phenomenon is visible throughout the songs.
\begin{figure}[H]
\centering
- \includegraphics[width=.7\linewidth]{bclass}.
+ \includegraphics[width=1\linewidth]{bclass}
\caption{Plotting the classifier under the audio signal}\label{fig:bclass}
\end{figure}
\section{\emph{Singer}-voice detection}
Table~\ref{tbl:singer} shows the results for the singer-voice detection. The
-same metrics are used as in \emph{Singing}-voice detection.
+same metrics are used as in \emph{Singing}-voice detection. In these
+experiment a ceiling effect was observed after two to three epochs for every
+hidden node configuration.
\begin{table}[H]
\centering
- \begin{tabular}{rccc}
+ \begin{tabular}{rrccc}
\toprule
- & \multicolumn{3}{c}{Parameters (step/length)}\\
- & 10/25 & 40/100 & 80/200\\
+ & & \multicolumn{3}{c}{Parameters (step/length)}\\
+ & & 10/25 & 40/100 & 80/200\\
\midrule
\multirow{4}{*}{Hidden Nodes}
- & 0.83 (0.48) & 0.82 (0.48) & 0.82 (0.48)\\
- & 0.85 (0.43) & 0.84 (0.44) & 0.84 (0.44)\\
- & 0.86 (0.41) & 0.86 (0.39) & 0.86 (0.40)\\
- & 0.87 (0.37) & 0.87 (0.38) & 0.86 (0.39)\\
+ & 3 & 0.83 (0.48) & 0.82 (0.48) & 0.82 (0.48)\\
+ & 5 & 0.85 (0.43) & 0.84 (0.44) & 0.84 (0.44)\\
+ & 8 & 0.86 (0.41) & 0.86 (0.39) & 0.86 (0.40)\\
+ & 13 & 0.87 (0.37) & 0.87 (0.38) & 0.86 (0.39)\\
\bottomrule
\end{tabular}
\caption{Multiclass classification results (accuracy
\begin{figure}[H]
\centering
\includegraphics[width=.7\linewidth]{alien2}.
- \caption{Plotting the classifier with alien data containing strange vocal
- styles}\label{fig:alien2}
+ \caption{Plotting the classifier with alien data containing unobserved
+ vocal styles}\label{fig:alien2}
\end{figure}