From: Mart Lubbers Date: Thu, 19 Jan 2017 13:15:50 +0000 (+0100) Subject: int X-Git-Url: https://git.martlubbers.net/?a=commitdiff_plain;h=b87c438eb5a79ff0de9f85b69159ee4c25430a48;p=itlast1617.git int --- diff --git a/exam2/exam.tex b/exam2/exam.tex index 2a00c61..8b0c6b7 100644 --- a/exam2/exam.tex +++ b/exam2/exam.tex @@ -1,17 +1,19 @@ %&exam \begin{document} -\maketitleru[course={Introduction to Language and Speech Technology}] -%\begin{enumerate} -% % Question 1 -% \item\input{q1.tex} -% -% \newpage -% % Question 2 -% \item\input{q2.tex} -% -% \newpage -% % Question 3 -% \item\input{q3.tex} -%\end{enumerate} +\maketitleru[% + course={Introduction to Language and Speech Technology}, + authorstext={Author:}] +\begin{enumerate} + % Question 1 + \item\input{q1.tex} + + \newpage + % Question 2 + \item\input{q2.tex} + + \newpage + % Question 3 + \item\input{q3.tex} +\end{enumerate} \end{document} diff --git a/exam2/preamble.tex b/exam2/preamble.tex index a369782..58c52bb 100644 --- a/exam2/preamble.tex +++ b/exam2/preamble.tex @@ -3,8 +3,8 @@ \usepackage{rutitlepage} \usepackage{geometry} \usepackage{enumitem} -\usepackage{listings} +%\usepackage{listings} -\title{Exam} +\title{Final exam} \author{Mart Lubbers\\s4109503} \date{\today} diff --git a/exam2/q1.tex b/exam2/q1.tex new file mode 100644 index 0000000..a715220 --- /dev/null +++ b/exam2/q1.tex @@ -0,0 +1,92 @@ +\begin{enumerate}[label=\alph*.] + % 1a + \item + There are several differences between dialogues of native speakers + without a translator and with an intermediate translator. + + When native speakers are aware of the fact that there is a translation + process going they will adapt the conversation style to be more + suitable for translation. This is because translation suffers from + common problems such as overlap between words and differences in + lexical structure that can cause confusion between the speakers. + + For one the native speakers might use a lot more grounding to make sure + the translation was correct and the partner understands the thing that + has been said. Moreover, the grounding is probably a lot more + deliberate in the case of machine translation because a simple + backchannel like \emph{uhu} might not be enough to convince the partner + that the utterance was understood. This elaborate grounding might be + necessary since translation happens both ways and an utterance in + language \emph{A} translated to language \emph{B} and back to \emph{A} + could be different from the utterance that they started with. When two + native speakers converse the grounding can be very automatic and + simple. Speakers can use nuances in the grounding to determine whether + the listener understood the utterance and adapt thereupon. + + Secondly, regarding conversation turn taking there is also likely a + change in conversation. Since intricate turn taking behaviour is a lot + more difficult to translate the turn-taking will likely be much more + concrete and structured. Because the turn-taking is so concrete it is + much harder to use references that span over turns. + + Thirdly, the language use will be much more simple in the case of a + translator intervening. Lexical divergence can only be solved by a + translator when the context is explaining enough. When native speakers + converse the context might be implicit and subtle whereas in translator + the context must be concrete. Speakers will adapt to this and provide + more context, for example in the form of more adjectives. This remark + also includes the use of linguistic constructions that are very + difficult to translate such as metaphors and complex referencing. + + Fourth and lastly, the use of conversational implicature will have to + be minimised since by mistranslations it might not be clear what the + implicature is. Moreover, the speakers are probably not sure what to + imply since the partner might come from a very different culture with a + very different language structure. For example when an English speaker + says \emph{a couple} it probably means \emph{two or more}. In languages + that also posses a dual case, such as Russian, next to the common + singular and plural this might be translated as \emph{exactly two}. + Because of errors of this kind the maxims proposed by Grice are extra + important. + + % 1b + \item + The dialogue manager component should use the aforementioned techniques + to improve the understanding and clearness of the conversation. + It must use sophisticated grounding techniques such as \emph{explicit + confirmation} and \emph{rejection} when there is even a minimal amount + of doubt. + + When the system is still not sure it can use rapid reprompting to get + the details clear, even when in the native language of the user it + might have been clear from other signals such as context. + + This means that the dialogue manager probably should not produce a lot + of references, must expect very little implicature from the user and + must not use ambiguities. + + When the user reprompts the system it should use a different, maybe a + little bit more illogical, construction to say the same. By doing this + the translation might be a bit better and therefore easier to + understand for the non-native speaker. + + % 1c + \item + Evaluating dialogue systems can be done via multiple perspectives. + + Expectations on task completion are pretty high. When an error occurs + the system can just reprompt. Moreover, when the user does not + understand an utterance it can also ask for a reprompt to the system + which will then hopefully reformulate the utterance. + + Expectations on efficiency will probably be a lot worse than a system + without translation. There will be a lot more grounding, reprompting + and other clarification and confirmation techniques. All these + technique increase understanding at the cost of efficiency. Luckily in + an information providence system the conversations are often short and + therefore the overhead will not be as devastating. + + Quality cost expectations will also not be as good as without + translation since due to all the problems mentioned above the system + has to use more recovery techniques which lower the quality. +\end{enumerate} diff --git a/exam2/q2.tex b/exam2/q2.tex new file mode 100644 index 0000000..c15c231 --- /dev/null +++ b/exam2/q2.tex @@ -0,0 +1,48 @@ +\begin{enumerate}[label=\alph*.] + % 2a + \item + Using the existing (English language) infrastructure to process foreign + queries might work better than one might expect. A lot of languages + share linguistic structures with English such as word positioning. + Moreover a lot of specialised domain words and proper names in foreign + languages are borrowed from English. + + Of course there are also very major problems. A very big problem would + be not translating certain (question) words. For example the query + \emph{Stierf Micheal Jackson in 2009?}. When we put this query in the + engine it will know that something happened to \emph{MJ} in 2009 but it + will not know whether that something is the same as what the user + wanted to ask which leads to confusion. + + Moreover, there are several seemingly simple structural + divergences (Section 25.1.2) that can cause major problems when not + translating such as date notation. + + In conclusion, using no translation, when the language is similar to + English it might yield surprisingly good results. However, when the + difference is bigger especially the question classification will be + wrong and that will result into strange answers. + + % 2b + \item + Translated material is hardly ever exactly the same as the original + materials, it either has more details that were not in the original + query, less details or wrong details. + + More details can occur because of lexical gaps (Section 25.1.3). Some + language might have been developed in a region where there hardly any + fish, such as in the desert, and therefore the need for specialised + words in fishing was not there. Maybe this language only has one word + for fish whereas English has many. In this way extra details can be + inserted. Of course this also works the other way around. A popular, + dubious statement is often made that some Inu{\"\i}t language has over a + hundred words for snow. When such a specialised word is used it might + not be possible to correctly translate it at all to English and + therefore we lose detail. + + % 2c + \item + The quality of the knowledge extraction depends heavily on the user's + language because of the aforementioned lexical gaps. However, these + lexical gaps might be bridged with a suitable translation system. +\end{enumerate} diff --git a/exam2/q3.tex b/exam2/q3.tex new file mode 100644 index 0000000..e3b4f9e --- /dev/null +++ b/exam2/q3.tex @@ -0,0 +1,28 @@ +\begin{enumerate}[label=\alph*.] + % 3a + \item + The \emph{Levenshtein} algorithm for edit distance is a very usefull + tool to detect spelling variants, however there are certain situations + where it will not work out of the box. One of such cases is when there + is a difference in script. Transliteration between scripts often + introduces extra letters. + + For example the russian form of + \emph{Muhammad} becomes \emph{Mukhammed}. The \emph{kh} is a + construction that is not used in the English language but it sound a + lot like the \emph{ch} in the Scottish \emph{loch}. Such added + characters can introduce higher edit distances. We can possibly + overcome this problem by using a broader notion of characters and look + at phonemes for example. + + \emph{Viterbi} on the other hand + + + % 3b + \item + + + % 3c + \item + +\end{enumerate}