From: Mart Lubbers Date: Tue, 30 May 2017 15:35:51 +0000 (+0200) Subject: elaborate on results and fix typo in acronyms X-Git-Url: https://git.martlubbers.net/?a=commitdiff_plain;h=b9f787ae8d33ca830f0e3b28aef0a020544f5271;p=asr1617.git elaborate on results and fix typo in acronyms --- diff --git a/acronyms.tex b/acronyms.tex index 5d478e0..d26f028 100644 --- a/acronyms.tex +++ b/acronyms.tex @@ -6,7 +6,7 @@ \newacronym{HMM}{HMM}{Hidden Markov Model} \newacronym{HTK}{HTK}{\acrlong{HMM} Toolkit} \newacronym{IFPI}{IFPI}{International Federation of the Phonographic Industry} -\newacronym{LPCC}{LPCC}{\acrlong{LPC} derivec cepstrum} +\newacronym{LPCC}{LPCC}{\acrlong{LPC} derived cepstrum} \newacronym{LPC}{LPC}{Linear Prediction Coefficients} \newacronym{MFCC}{MFCC}{\acrlong{MFC} coefficient} \newacronym{MFC}{MFC}{Mel-frequency cepstrum} diff --git a/conclusion.tex b/conclusion.tex index 97ba7de..96d4fba 100644 --- a/conclusion.tex +++ b/conclusion.tex @@ -9,6 +9,12 @@ on alien data that uses similar singing techniques as the training set. However, the model does not cope very well with different singing techniques or with data that contains a lot of atmospheric noise and accompaniment. +From the results we conclude that the model generalizes well over the trainings +set, even with little hidden nodes. The models with 3 or 5 hidden nodes score a +little worse than their bigger brothers but there is hardly any difference +between the performance of a model with 8 or 13 nodes. Moreover, contrary than +expected the window size does not seem to be doing much in the performance. + \subsection{Future research} \paragraph{Forced aligment: } Future interesting research includes doing the actual forced alignment. This diff --git a/methods.tex b/methods.tex index 210e135..cb65d48 100644 --- a/methods.tex +++ b/methods.tex @@ -212,6 +212,9 @@ batch size of $32$. \section{Results} \subsection{\emph{Singing} voice detection} +Table~\ref{tbl:singing} shows the results for the singing-voice detection. +Figure~\ref{fig:bclass} shows an example of a segment of a song with the +classifier plotted underneath to illustrate the performance. \begin{table}[H] \centering @@ -226,16 +229,18 @@ batch size of $32$. 13h & 0.89 (0.28) & 0.89 (0.29) & 0.88 (0.30)\\ \bottomrule \end{tabular} - \caption{Binary classification results (accuracy (loss))} + \caption{Binary classification results (accuracy + (loss))}\label{tbl:singing} \end{table} \begin{figure}[H] \centering \includegraphics[width=.6\linewidth]{bclass}. - \caption{Plotting the classifier under the audio signal} + \caption{Plotting the classifier under the audio signal}\label{fig:bclass} \end{figure} \subsection{\emph{Singer} voice detection} +Table~\ref{tbl:singer} shows the results for the singer-voice detection. \begin{table}[H] \centering @@ -250,7 +255,8 @@ batch size of $32$. 13h & 0.87 (0.37) & 0.87 (0.38) & 0.86 (0.39)\\ \bottomrule \end{tabular} - \caption{Multiclass classification results (accuracy (loss))} + \caption{Multiclass classification results (accuracy + (loss))}\label{tbl:singer} \end{table} \subsection{Alien data}