week3/eml.tex

   1 \documentclass[a4paper]{article}
   2
   3 \usepackage{amssymb}
   4 \usepackage{booktabs}
   5 \usepackage{geometry}
   6
   7 \title{Exercise: Machine Learning}
   8 \author{Mart Lubbers}
   9 \date{\today}
  10
  11 \begin{document}
  12 \maketitle
  13 \subsection*{Chapter 5: Machine Learning}
  14 Table~\ref{t2} shows that there is some difference in classification when
  15 choosing different parameter sets. The results show that adding the focus word
  16 is very important knowledge since the percentage increases quite a bit.
  17 However, knowing only the focus word gives exceptionally low performance.
  18
  19 Knowing the next words or the previous words gives some improvement but not a
  20 whole lot.
  21
  22 Table~\ref{t1} shows that the method does not make a big difference.
  23
  24 Using ten-fold cross validation decreases the percentage.
  25
  26 \begin{table}
  27         \centering
  28         \begin{tabular}{lll}
  29                 \toprule
  30                 Method & Correctly classified & Root relative squared error\\
  31                 \midrule
  32                 \emph{NaiveBayes} & $96.6449\%$ & $35.7222\%$\\
  33                 \emph{NaiveBayes (10FCF)} & $96.4352\%$ & $37.1926\%$\\
  34                 \emph{J48} & $96.6449\%$ & $34.9136\%$\\
  35                 \emph{J48 (10FCF)} & $96.4352\%$ & $36.5122\%$\\
  36                 \bottomrule
  37         \end{tabular}
  38         \caption{Results for \texttt{P1D} and \texttt{FD}\label{t1}}
  39 \end{table}
  40
  41 \begin{table}
  42         \centering
  43         \begin{tabular}{lllllll}
  44                 \toprule
  45                 \texttt{P2D} & \texttt{P1D} & \texttt{N2D} & \texttt{N1D} & \texttt{FW} & Correctly classified\\
  46                 \midrule
  47                 \checkmark{} & \checkmark{} & \checkmark{} & \checkmark{} & \checkmark{} & $98.3225\%$\\
  48                 \checkmark{} & \checkmark{} & \checkmark{} & \checkmark{} & & $97.3905\%$\\
  49                 \checkmark{} & \checkmark{} & & \checkmark{} & \checkmark{} & $98.5555\%$\\
  50                 \checkmark{} & \checkmark{} & & \checkmark{} & & $97.507\%$\\
  51                 \checkmark{} & \checkmark{} & & & \checkmark{} & $98.0429\%$\\
  52                 \checkmark{} & \checkmark{} & & & & $95.5732\%$\\
  53
  54                 & \checkmark{} & \checkmark{} & \checkmark{} & \checkmark{} & $98.6486\%$\\
  55                 & \checkmark{} & \checkmark{} & \checkmark{} & & $97.5769\%$\\
  56                 & \checkmark{} & & \checkmark{} & \checkmark{} & $98.5555\%$\\
  57                 & \checkmark{} & & \checkmark{} & & $97.973\%$\\
  58                 & \checkmark{} & & & \checkmark{} & $98.2992\%$\\
  59                 & \checkmark{} & & & & $96.6449\%$\\
  60
  61                 & & \checkmark{} & \checkmark{} & \checkmark{} & $91.8919\%$\\
  62                 & & \checkmark{} & \checkmark{} & & $85.5079\%$\\
  63                 & & & \checkmark{} & \checkmark{} & $92.579\%$\\
  64                 & & & \checkmark{} & & $85.2516\%$\\
  65                 & & & & \checkmark{} & $88.4436\%$\\
  66                 \bottomrule
  67         \end{tabular}
  68         \caption{\emph{NaiveBayes} on all sensible combinations\label{t2}}
  69 \end{table}
  70
  71 \subsection*{Chapter 6: Exercises}
  72 \begin{itemize}
  73         \item\emph{If we look at the Viterbi algorithm, we see that the
  74                 probability of state at a given position is calculated $n$ the basis of
  75                 the preceding $n$ states. However, it is claimed that the algorithm
  76                 takes into account the whole sequence. Explain in your own words (at
  77                 most $100$) how the probability is influenced by the rest of the
  78                 sequence, i.e.\ both the positions more than $n$ back and the following
  79                 positions.}
  80
  81                 A probability is always based on the most probably preceding sequence,
  82                 however there are no backpointers to all states. Thus the probability
  83                 is not based on all possible previous paths and you can only recover
  84                 the most likely path. The following states are also of an influence
  85                 since when the path does not belong to the most likely sequence it will
  86                 not be connected to the final path via a backpointer and will be lost.
  87
  88         \item\emph{Explain in your own words (at most 50) how the EM algorithm
  89                 works. I don't mean the mathematics, but the underlying concept.}
  90
  91                 The \emph{Expectation-Maximization}-algorithm (EM) searches for the
  92                 settings of parameters where the likelihood is (locally) optimal.
  93                 The algorithm usually takes the derivative of the likelihood function
  94                 to get the maximum value.
  95 \end{itemize}
  96 \end{document}