Portable extraction of partially structured facts from the web
Salway, Andrew and Kelly, Liadh and Skadiņa, Inguna and Jones, Gareth J.F. (2010) Portable extraction of partially structured facts from the web. In: IceTAL 2010 - 7th International Conference on Natural Language Processing, 16-18 August 2010, Reykjavik, Iceland. ISBN 978-3-642-14769-2
Full text available as:
A novel fact extraction task is defined to fill a gap between current information retrieval and information extraction technologies. It is shown that it is possible to extract useful partially structured facts about different kinds of entities in a broad domain, i.e. all kinds of places depicted in tourist images. Importantly the approach does not rely on existing linguistic resources (gazetteers, taggers, parsers, etc.) and it ported easily and cheaply between two very different languages (English and Latvian). Previous fact extraction from the web has focused on the extraction of structured data, e.g. (Building-LocatedIn-Town). In contrast we extract richer and more interesting facts, such as a fact explaining why a building was built. Enough structure is maintained to facilitate subsequent processing of the information. For example, this partial structure enables straightforward template-based text generation. We report positive results for the correctness and interest of English and Latvian facts and for the utility of the extracted facts in enhancing image captions.
Archive Staff Only: edit this record