From: Mart Lubbers Date: Wed, 7 Jun 2017 14:41:47 +0000 (+0200) Subject: conclusie X-Git-Url: https://git.martlubbers.net/?a=commitdiff_plain;h=d6a373ce14c802c67199d5e6dba904faea5cf317;p=asr1617.git conclusie --- diff --git a/conclusion.tex b/conclusion.tex index 6bd8087..700b5e8 100644 --- a/conclusion.tex +++ b/conclusion.tex @@ -1,22 +1,26 @@ \section{Conclusion} This study shows that existing techniques for singing-voice detection -designed for regular singing-voices also work respectably on extreme singing -styles like grunting. With a standard \gls{ANN} classifier using \gls{MFCC} -features a performance of $85\%$ can be achieved which is similar to the same -techniques on regular singing. This means that it might be suitable as a -pre-processing step for lyrics forced alignment. The model performs pretty well -on alien data that uses similar singing techniques as the training set. -However, the model does not cope very well with different singing techniques or -with data that contains a lot of atmospheric noise and accompaniment. +designed for regular singing-voices also work on \gls{dm} and \gls{dom} that +contain extreme singing styles like grunting. With a standard \gls{ANN} +classifier using \gls{MFCC} features a performance of $85\%$ can be achieved +which is similar to the same techniques used on regular singing. This means +that it might also be suitable as a pre-processing step for lyrics forced +alignment. -From the results we conclude that the model generalizes well over the trainings -set, even with little hidden nodes. The models with 3 or 5 hidden nodes score a -little worse than their bigger brothers but there is hardly any difference -between the performance of a model with 8 or 13 nodes. Moreover, contrary than -expected the window size does not seem to be doing much in the performance. +To determine whether the model generalizes, alien data has been offered to the +model to see how it performs. It was shown that for similar singing styles the +models perform similar. The alien data offered containing different singing +styles, atmospheric noise and accompaniment is classified less good. + +From the results we can conclude that the model generalizes well over the +trainings set, even with little hidden nodes. The models with 3 or 5 hidden +nodes score a little worse than their bigger brothers but there is hardly any +difference between the performance of a model with 8 or 13 nodes. Moreover, +contrary than expected the window size does not seem to be doing much in the +performance. \section{Future research} -\paragraph{Forced aligment: } +\paragraph{Forced alignment: } Future interesting research includes doing the actual forced alignment. This probably requires entirely different models. The models used for real speech are probably not suitable because the acoustic properties of a regular