Towards language-agnostic alignment of product titles and descriptions: a neural approach
Stein, Daniel, Shterionov, DimitarORCID: 0000-0001-6300-797X and Way, AndyORCID: 0000-0001-5736-5930
(2019)
Towards language-agnostic alignment of product titles and descriptions: a neural approach.
In: 2019 World Wide Web Conference, 13-17 May 2019, San Francisco, USA.
ISBN 978-1-4503-6675-5
The quality of e-Commerce services largely depends on the accessibility of product content as well as its completeness and correctness.
Nowadays, many sellers target cross-country and cross-lingual markets via active or passive cross-border trade, fostering the desire for
seamless user experiences. While machine translation (MT) is very
helpful for crossing language barriers, automatically matching existing items for sale (e.g. the smartphone in front of me) to the same
product (all smartphones of the same brand/type/colour/condition)
can be challenging, especially because the seller’s description can
often be erroneous or incomplete. This task we refer to as item
alignment in multilingual e-commerce catalogues. To facilitate this
task, we develop a pipeline of tools for item classification based on
cross-lingual text similarity, exploiting recurrent neural networks
(RNNs) with and without pre-trained word-embeddings. Furthermore, we combine our language agnostic RNN classifiers with an
in-domain MT system to further reduce the linguistic and stylistic differences between the investigated data, aiming to boost our
performance. The quality of the methods as well as their training
speed is compared on an in-domain data set for English–German
products.