Comparable corpora have been shown to
be useful in several multilingual natural
language processing (NLP) tasks. Many
previous papers have focused on how to
improve the extraction of parallel data
from this kind of corpus on different levels. In this paper, we are interested in improving the quality of bilingual comparable corpora according to increased document alignment score. We describe our
participation in the bilingual document
alignment shared task of the First Conference on Machine Translation (WMT16).
We propose a technique based on sourceto-target sentence- and word-based scores
and the fraction of matched source named
entities. We performed our experiments on
English-to-French document alignments
for this bilingual task.