DCU-SEManiacs at SemEval-2016 task 1: synthetic paragram embeddings for semantic textual similarity
Hokamp, ChrisORCID: 0000-0002-7850-9398 and Arora, PiyushORCID: 0000-0002-4261-2860
(2016)
DCU-SEManiacs at SemEval-2016 task 1: synthetic paragram embeddings for semantic textual similarity.
In: 10th International Workshop on Semantic Evaluation (SemEval-2016), 16-17 June 2016, San Diego, Ca. USA.
We experiment with learning word representations designed to be combined into sentence level semantic representations, using an objective function which does not directly make use of the supervised scores provided with the training data, instead opting for a simpler objective which encourages similar phrases to be close together in the embedding space. This simple objective lets us start with high quality embeddings trained using the Paraphrase Database (PPDB) (Wieting et al., 2015;
Ganitkevitch et al., 2013), and then tune these embeddings using the official STS task training data, as well as synthetic paraphrases for each test dataset, obtained by pivoting through machine translation. Our submissions include runs which only
compare the similarity of phrases in the embedding space, directly using the similarity score to produce predictions, as well as a run which uses vector similarity in addition to a suite of features we investigated for our 2015 Semeval submission.
For the crosslingual task, we simply translate the Spanish sentences to English, and use the same system we designed for the monolingual task.
This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:
EXPERT (EU Marie Curie ITN No. 317471), Science Foundation Ireland (SFI) as a part of the ADAPT Centre at Dublin City University (Grant No: 12/CE/I2267),
ID Code:
22800
Deposited On:
23 Nov 2018 14:27 by
Piyush Arora
. Last Modified 31 Jan 2019 12:20