conclusion.tex

   1 \section{Conclusion \& Future Research}
   2 This research shows that existing techniques for singing-voice detection
   3 designed for regular singing voices also work respectably on extreme singing
   4 styles like grunting. With a standard \gls{ANN} classifier using \gls{MFCC}
   5 features a performance of $85\%$ can be achieved which is similar to the same
   6 techniques on regular singing. This means that it might be suitable as a
   7 pre-processing step for lyrics forced alignment. The model performs pretty good
   8 on alien data that uses similar singing techniques as the trainingset. However,
   9 the model is not coping very good with different singing techniques or with
  10 data that contains a lot of atmospheric noise and accompaniment.
  11
  12 Future interesting research includes doing the actual forced alignment. This
  13 probably requires entirely different models. The models used for real speech
  14 are probably not suitable because the acoustic properties of a regular singing
  15 voice is very different from a growling voice, let alone speech.
  16
  17 Secondly, it would be interesting if a model could be trained that could
  18 discriminate a singing voice for all styles of singing including growling.
  19 Moreover, it is possible to investigate the performance of detecting growling
  20 on regular singing-voice trained models and the other way around.
  21
  22 Another interesting research continuation would be to investigate whether the
  23 decorrelation step of the feature extraction is necessary. This transformation
  24 might be inefficient or unnatural. The first layer of weights in the model
  25 could be seen as a first processing step. If another layer is added that layer
  26 could take over the role of the decorrelating. The downside of this is that
  27 training the model is tougher because there are a many more weights to train.
  28
  29 \emph{Singing}-voice detection and \emph{singer}-voice Singing-voice detection
  30 can be seen as a crude way of genre-detection. Therefore it might be
  31 interesting to figure out whether this is generalizable to general genre
  32 recognition. This requires more data from different genres to be added to the
  33 dataset and the models to be retrained.
  34
  35 A lot of similar research on singing-voice detection uses \glspl{HMM} and
  36 existing phone models. It would be fruitful to try the same approach on extreme
  37 singing styles to see whether the phone models can say anything about a
  38 growling voice.
  39
  40 %Discussion section
  41 \section{Discussion}
  42 The dataset used is not very big. Only three albums are annotated and used
  43 as training data. The albums chosen do represent the ends of the spectrum and
  44 therefore the resulting model can be very general. However, it could also mean
  45 that the model is able to recognize three islands in the entire space of
  46 grunting. This does not seem the case since the results show that almost all
  47 alien data also has a good performance. However, the data has been picked to
  48 represent the edges of the spectrum. While testing \emph{Catacombs} it seemed
  49 that this was not the case since the performance was very poor. Adding
  50 \emph{Catacombs} or a similar style to the training set can probably overcome
  51 this limitation.