b71301bcbadbbe1e6dc4750b11026d791ed54243
[asr1617.git] / conclusion.tex
1 \section{Conclusion \& Future Research}
2 This research shows that existing techniques for singing-voice detection
3 designed for regular singing voices also work respectably on extreme singing
4 styles like grunting. With a standard \gls{ANN} classifier using \gls{MFCC}
5 features a performance of $85\%$ can be achieved which is similar to the same
6 techniques on regular singing. This means that it might be suitable as a
7 pre-processing step for lyrics forced alignment. The model performs pretty good
8 on alien data that uses similar singing techniques as the trainingset. However,
9 the model is not coping very good with different singing techniques or with
10 data that contains a lot of atmospheric noise and accompaniment.
11
12 Future interesting research includes doing the actual forced alignment. This
13 probably requires entirely different models. The models used for real speech
14 are probably not suitable because the acoustic properties of a regular singing
15 voice is very different from a growling voice, let alone speech.
16
17 Secondly, it would be interesting if a model could be trained that could
18 discriminate a singing voice for all styles of singing including growling.
19 Moreover, it is possible to investigate the performance of detecting growling
20 on regular singing-voice trained models and the other way around.
21
22 Another interesting research continuation would be to investigate whether the
23 decorrelation step of the feature extraction is necessary. This transformation
24 might be inefficient or unnatural. The first layer of weights in the model
25 could be seen as a first processing step. If another layer is added that layer
26 could take over the role of the decorrelating. The downside of this is that
27 training the model is tougher because there are a many more weights to train.
28
29 \emph{Singing}-voice detection and \emph{singer}-voice Singing-voice detection
30 can be seen as a crude way of genre-detection. Therefore it might be
31 interesting to figure out whether this is generalizable to general genre
32 recognition. This requires more data from different genres to be added to the
33 dataset and the models to be retrained.
34
35 A lot of similar research on singing-voice detection uses \glspl{HMM} and
36 existing phone models. It would be fruitful to try the same approach on extreme
37 singing styles to see whether the phone models can say anything about a
38 growling voice.
39
40 %Discussion section
41 \section{Discussion}
42 The dataset used is not very big. Only three albums are annotated and used
43 as training data. The albums chosen do represent the ends of the spectrum and
44 therefore the resulting model can be very general. However, it could also mean
45 that the model is able to recognize three islands in the entire space of
46 grunting. This does not seem the case since the results show that almost all
47 alien data also has a good performance. However, the data has been picked to
48 represent the edges of the spectrum. While testing \emph{Catacombs} it seemed
49 that this was not the case since the performance was very poor. Adding
50 \emph{Catacombs} or a similar style to the training set can probably overcome
51 this limitation.