between the performance of a model with 8 or 13 nodes. Moreover, contrary than
expected the window size does not seem to be doing much in the performance.
-\subsection{Future research}
+\section{Future research}
\paragraph{Forced aligment: }
Future interesting research includes doing the actual forced alignment. This
probably requires entirely different models. The models used for real speech
\section{\emph{Singing}-voice detection}
-Table~\ref{tbl:singing} shows the results for the singing-voice detection.
+Table~\ref{tbl:singing} shows the results for the singing-voice detection. The
+performance is given by the accuracy (and loss). The accuracy is the percentage
+of correctly classified samples.
+
Figure~\ref{fig:bclass} shows an example of a segment of a song with the
-classifier plotted underneath to illustrate the performance. The performance is
-given by the accuracy and loss. The accuracy is the percentage of correctly
-classified samples.
+classifier plotted underneath. For this illustration the $13$ node model is
+used with a analysis window size and step of $40$ and $100ms$ respectively. The
+output is smoothed using a hanning window.
\begin{table}[H]
\centering
& 0.89 (0.28) & 0.89 (0.29) & 0.88 (0.30)\\
\bottomrule
\end{tabular}
- \caption{Binary classification results (accuracy
- (loss))}\label{tbl:singing}
+ \caption{Binary classification results (accuracy (loss))}%
+ \label{tbl:singing}
\end{table}
-Plotting the classifier under a segment of the data results in
-Figure~\ref{fig:bclass}.
-
\begin{figure}[H]
\centering
- \includegraphics[width=.7\linewidth]{bclass}.
+ \includegraphics[width=1\linewidth]{bclass}
\caption{Plotting the classifier under the audio signal}\label{fig:bclass}
\end{figure}