conclusion.tex

   1 \section{Conclusion \& Future Research}
   2 This study shows that existing techniques for singing-voice detection
   3 designed for regular singing-voices also work respectably on extreme singing
   4 styles like grunting. With a standard \gls{ANN} classifier using \gls{MFCC}
   5 features a performance of $85\%$ can be achieved which is similar to the same
   6 techniques on regular singing. This means that it might be suitable as a
   7 pre-processing step for lyrics forced alignment. The model performs pretty well
   8 on alien data that uses similar singing techniques as the training set.
   9 However, the model does not cope very well with different singing techniques or
  10 with data that contains a lot of atmospheric noise and accompaniment.
  11
  12 \subsection{Future research}
  13 \paragraph{Forced aligment: }
  14 Future interesting research includes doing the actual forced alignment. This
  15 probably requires entirely different models. The models used for real speech
  16 are probably not suitable because the acoustic properties of a regular
  17 singing-voice are very different from those of a growling voice, let alone
  18 speech.
  19
  20 \paragraph{Generalization: }
  21 Secondly, it would be interesting if a model could be trained that could
  22 discriminate a singing voice for all styles of singing including growling.
  23 Moreover, it is possible to investigate the performance of detecting growling
  24 on regular singing-voice trained models and the other way around.
  25
  26 \paragraph{Decorrelation }
  27 Another interesting research continuation would be to investigate whether the
  28 decorrelation step of the feature extraction is necessary. This transformation
  29 might be inefficient or unnatural. The first layer of weights in the model
  30 could be seen as a first processing step. If another layer is added that layer
  31 could take over the role of the decorrelating. The downside of this is that
  32 training the model is tougher because there are a many more weights to train.
  33
  34 \paragraph{Genre detection: }
  35 \emph{Singing}-voice detection and \emph{singer}-voice can be seen as a crude
  36 way of genre-detection. Therefore it might be interesting to figure out whether
  37 this is generalizable to general genre recognition. This requires more data
  38 from different genres to be added to the dataset and the models to be
  39 retrained.
  40
  41 \paragraph{\glspl{HMM}: }
  42 A lot of similar research on singing-voice detection uses \glspl{HMM} and
  43 existing phone models. It would be interesting to try the same approach on
  44 extreme singing styles to see whether the phone models can say anything about a
  45 growling voice.
  46
  47 %Discussion section
  48 \section{Discussion}
  49 The dataset used is not very big. Only three albums are annotated and used
  50 as training data. The albums chosen do represent the ends of the spectrum and
  51 therefore the resulting model can be very general. However, it could also mean
  52 that the model is able to recognize three islands in the entire space of
  53 grunting. This does not seem the case since the results show that almost all
  54 alien data also has a good performance. However, the data has been picked to
  55 represent the edges of the spectrum. While testing \emph{Catacombs} it seemed
  56 that this was not the case since the performance was very poor. Adding
  57 \emph{Catacombs} or a similar style to the training set can probably overcome
  58 this limitation.