This paper reports on a preliminary study testing
the use of eye tracking as a method for evaluating
machine translation output. 50 French machine
translated sentences, 25 rated as excellent and 25
rated as poor in an earlier human evaluation, were
selected. 10 native speakers of French were instructed to read the MT sentences for comprehensibility. Their eye gaze data were recorded noninvasively using a Tobii 1750 eye tracker. They were also asked to record retrospective protocols while watching a replay of their eye gaze reading data. The average gaze time and fixation count
were found to be significantly higher for the “bad”
sentences, while average fixation duration was not
significantly different. Evaluative comments uttered during the retrospective protocols were also found to agree to a satisfactory degree with previous human evaluation. Overall, we found that the eye tracking method correlates reasonably well with human evaluation of MT output.