+the singers and a probability for the instrumental label. This results in an
+\gls{ANN} of the shape described in Figure~\ref{fig:mcann}. The input dimension
+is yet again thirteen and the output dimension is the number of categories. The
+output is encoded in one-hot encoding. This means that the categories are
+labeled as \texttt{1000, 0100, 0010, 0001}.
+
+\subsection{\acrlong{ANN}}
+The data is classified using standard \gls{ANN} techniques, namely \glspl{MLP}.
+The classification problems are only binary and four-class so therefore it is
+interesting to see where the bottleneck lies. How abstract the abstraction can
+go. The \gls{ANN} is built with the Keras\footnote{\url{https://keras.io}}
+using the TensorFlow\footnote{\url{https://github.com/tensorflow/tensorflow}}
+backend that provides a high-level interface to the highly technical networks.
+
+The general architecture of the networks is show in Figure~\ref{fig:bcann} and
+Figure~\ref{fig:mcann} for respectively the binary classification and
+multiclass classification. The inputs are fully connected to the hidden layer
+which is fully connected too the output layer. The activation function used is
+a \gls{RELU}. The \gls{RELU} function is a monotonic symmetric one-sided
+function that is also known as the ramp function. The definition is given in
+Equation~\ref{eq:relu}. \gls{RELU} was chosen because of its symmetry and
+efficient computation. The activation function between the hidden layer and the
+output layer is the sigmoid function in the case of binary classification. Of
+which the definition is shown in Equation~\ref{eq:sigmoid}. The sigmoid is a
+monotonic function that is differentiable on all values of $x$ and always
+yields a non-negative derivative. For the multiclass classification the softmax
+function is used between the hidden layer and the output layer. Softmax is an
+activation function suitable for multiple output nodes. The definition is given
+in Equation~\ref{eq:softmax}.
+
+The data is shuffled before fed to the network to mitigate the risk of
+over fitting on one album. Every model was trained using $10$ epochs and a
+batch size of $32$.
+
+\begin{equation}\label{eq:relu}
+ f(x) = \left\{\begin{array}{rcl}
+ 0 & \text{for} & x<0\\
+ x & \text{for} & x \geq 0\\
+ \end{array}\right.
+\end{equation}
+
+\begin{equation}\label{eq:sigmoid}
+ f(x) = \frac{1}{1+e^{-x}}
+\end{equation}
+
+\begin{equation}\label{eq:softmax}
+ \delta{(\boldsymbol{z})}_j = \frac{e^{z_j}}{\sum\limits^{K}_{k=1}e^{z_k}}
+\end{equation}
+
+\begin{figure}[H]
+ \begin{subfigure}{.5\textwidth}
+ \centering
+ \includegraphics[width=.8\linewidth]{bcann}
+ \caption{Binary classifier network architecture}\label{fig:bcann}
+ \end{subfigure}%
+%
+ \begin{subfigure}{.5\textwidth}
+ \includegraphics[width=.8\linewidth]{mcann}
+ \caption{Multiclass classifier network architecture}\label{fig:mcann}
+ \end{subfigure}
+ \caption{\acrlong{ANN} architectures.}
+\end{figure}