From: Mart Lubbers <mart@martlubbers.net>
Date: Thu, 19 Jan 2017 13:15:50 +0000 (+0100)
Subject: int
X-Git-Url: https://git.martlubbers.net/?a=commitdiff_plain;h=b87c438eb5a79ff0de9f85b69159ee4c25430a48;p=itlast1617.git

int
---

diff --git a/exam2/exam.tex b/exam2/exam.tex
index 2a00c61..8b0c6b7 100644
--- a/exam2/exam.tex
+++ b/exam2/exam.tex
@@ -1,17 +1,19 @@
 %&exam
 \begin{document}
-\maketitleru[course={Introduction to Language and Speech Technology}]
-%\begin{enumerate}
-%	% Question 1
-%	\item\input{q1.tex}
-%
-%	\newpage
-%	% Question 2
-%	\item\input{q2.tex}
-%
-%	\newpage
-%	% Question 3
-%	\item\input{q3.tex}
-%\end{enumerate}
+\maketitleru[%
+	course={Introduction to Language and Speech Technology},
+	authorstext={Author:}]
+\begin{enumerate}
+	% Question 1
+	\item\input{q1.tex}
+
+	\newpage
+	% Question 2
+	\item\input{q2.tex}
+
+	\newpage
+	% Question 3
+	\item\input{q3.tex}
+\end{enumerate}
 		
 \end{document}
diff --git a/exam2/preamble.tex b/exam2/preamble.tex
index a369782..58c52bb 100644
--- a/exam2/preamble.tex
+++ b/exam2/preamble.tex
@@ -3,8 +3,8 @@
 \usepackage{rutitlepage}
 \usepackage{geometry}
 \usepackage{enumitem}
-\usepackage{listings}
+%\usepackage{listings}
 
-\title{Exam}
+\title{Final exam}
 \author{Mart Lubbers\\s4109503}
 \date{\today}
diff --git a/exam2/q1.tex b/exam2/q1.tex
new file mode 100644
index 0000000..a715220
--- /dev/null
+++ b/exam2/q1.tex
@@ -0,0 +1,92 @@
+\begin{enumerate}[label=\alph*.]
+	% 1a
+	\item
+		There are several differences between dialogues of native speakers
+		without a translator and with an intermediate translator.
+
+		When native speakers are aware of the fact that there is a translation
+		process going they will adapt the conversation style to be more
+		suitable for translation. This is because translation suffers from
+		common problems such as overlap between words and differences in
+		lexical structure that can cause confusion between the speakers.
+		
+		For one the native speakers might use a lot more grounding to make sure
+		the translation was correct and the partner understands the thing that
+		has been said. Moreover, the grounding is probably a lot more
+		deliberate in the case of machine translation because a simple
+		backchannel like \emph{uhu} might not be enough to convince the partner
+		that the utterance was understood. This elaborate grounding might be
+		necessary since translation happens both ways and an utterance in
+		language \emph{A} translated to language \emph{B} and back to \emph{A}
+		could be different from the utterance that they started with.  When two
+		native speakers converse the grounding can be very automatic and
+		simple. Speakers can use nuances in the grounding to determine whether
+		the listener understood the utterance and adapt thereupon.
+
+		Secondly, regarding conversation turn taking there is also likely a
+		change in conversation. Since intricate turn taking behaviour is a lot
+		more difficult to translate the turn-taking will likely be much more
+		concrete and structured. Because the turn-taking is so concrete it is
+		much harder to use references that span over turns.
+
+		Thirdly, the language use will be much more simple in the case of a
+		translator intervening. Lexical divergence can only be solved by a
+		translator when the context is explaining enough.  When native speakers
+		converse the context might be implicit and subtle whereas in translator
+		the context must be concrete. Speakers will adapt to this and provide
+		more context, for example in the form of more adjectives. This remark
+		also includes the use of linguistic constructions that are very
+		difficult to translate such as metaphors and complex referencing.
+
+		Fourth and lastly, the use of conversational implicature will have to
+		be minimised since by mistranslations it might not be clear what the
+		implicature is. Moreover, the speakers are probably not sure what to
+		imply since the partner might come from a very different culture with a
+		very different language structure. For example when an English speaker
+		says \emph{a couple} it probably means \emph{two or more}. In languages
+		that also posses a dual case, such as Russian, next to the common
+		singular and plural this might be translated as \emph{exactly two}.
+		Because of errors of this kind the maxims proposed by Grice are extra
+		important.
+
+	% 1b
+	\item
+		The dialogue manager component should use the aforementioned techniques
+		to improve the understanding and clearness of the conversation.
+		It must use sophisticated grounding techniques such as \emph{explicit
+		confirmation} and \emph{rejection} when there is even a minimal amount
+		of doubt.
+
+		When the system is still not sure it can use rapid reprompting to get
+		the details clear, even when in the native language of the user it
+		might have been clear from other signals such as context.
+
+		This means that the dialogue manager probably should not produce a lot
+		of references, must expect very little implicature from the user and
+		must not use ambiguities.
+
+		When the user reprompts the system it should use a different, maybe a
+		little bit more illogical, construction to say the same. By doing this
+		the translation might be a bit better and therefore easier to
+		understand for the non-native speaker.
+
+	% 1c
+	\item
+		Evaluating dialogue systems can be done via multiple perspectives.
+
+		Expectations on task completion are pretty high. When an error occurs
+		the system can just reprompt. Moreover, when the user does not
+		understand an utterance it can also ask for a reprompt to the system
+		which will then hopefully reformulate the utterance.
+
+		Expectations on efficiency will probably be a lot worse than a system
+		without translation. There will be a lot more grounding, reprompting
+		and other clarification and confirmation techniques. All these
+		technique increase understanding at the cost of efficiency. Luckily in
+		an information providence system the conversations are often short and
+		therefore the overhead will not be as devastating.
+
+		Quality cost expectations will also not be as good as without
+		translation since due to all the problems mentioned above the system
+		has to use more recovery techniques which lower the quality.
+\end{enumerate}
diff --git a/exam2/q2.tex b/exam2/q2.tex
new file mode 100644
index 0000000..c15c231
--- /dev/null
+++ b/exam2/q2.tex
@@ -0,0 +1,48 @@
+\begin{enumerate}[label=\alph*.]
+	% 2a
+	\item
+		Using the existing (English language) infrastructure to process foreign
+		queries might work better than one might expect. A lot of languages
+		share linguistic structures with English such as word positioning.
+		Moreover a lot of specialised domain words and proper names in foreign
+		languages are borrowed from English.
+
+		Of course there are also very major problems. A very big problem would
+		be not translating certain (question) words. For example the query
+		\emph{Stierf Micheal Jackson in 2009?}. When we put this query in the
+		engine it will know that something happened to \emph{MJ} in 2009 but it
+		will not know whether that something is the same as what the user
+		wanted to ask which leads to confusion.
+
+		Moreover, there are several seemingly simple structural
+		divergences (Section 25.1.2) that can cause major problems when not
+		translating such as date notation.
+
+		In conclusion, using no translation, when the language is similar to
+		English it might yield surprisingly good results. However, when the
+		difference is bigger especially the question classification will be
+		wrong and that will result into strange answers.
+
+	% 2b
+	\item
+		Translated material is hardly ever exactly the same as the original
+		materials, it either has more details that were not in the original
+		query, less details or wrong details.
+
+		More details can occur because of lexical gaps (Section 25.1.3). Some
+		language might have been developed in a region where there hardly any
+		fish, such as in the desert, and therefore the need for specialised
+		words in fishing was not there. Maybe this language only has one word
+		for fish whereas English has many. In this way extra details can be
+		inserted. Of course this also works the other way around. A popular,
+		dubious statement is often made that some Inu{\"\i}t language has over a
+		hundred words for snow. When such a specialised word is used it might
+		not be possible to correctly translate it at all to English and
+		therefore we lose detail.
+
+	% 2c
+	\item
+		The quality of the knowledge extraction depends heavily on the user's
+		language because of the aforementioned lexical gaps. However, these
+		lexical gaps might be bridged with a suitable translation system.
+\end{enumerate}
diff --git a/exam2/q3.tex b/exam2/q3.tex
new file mode 100644
index 0000000..e3b4f9e
--- /dev/null
+++ b/exam2/q3.tex
@@ -0,0 +1,28 @@
+\begin{enumerate}[label=\alph*.]
+	% 3a
+	\item
+		The \emph{Levenshtein} algorithm for edit distance is a very usefull
+		tool to detect spelling variants, however there are certain situations
+		where it will not work out of the box. One of such cases is when there
+		is a difference in script. Transliteration between scripts often
+		introduces extra letters.
+
+		For example the russian form of
+		\emph{Muhammad} becomes \emph{Mukhammed}. The \emph{kh} is a
+		construction that is not used in the English language but it sound a
+		lot like the \emph{ch} in the Scottish \emph{loch}. Such added
+		characters can introduce higher edit distances. We can possibly
+		overcome this problem by using a broader notion of characters and look
+		at phonemes for example. 
+
+		\emph{Viterbi} on the other hand 
+
+
+	% 3b
+	\item
+		
+
+	% 3c
+	\item
+
+\end{enumerate}