Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Portable extraction of partially structured facts from the web

Salway, Andrew, Kelly, Liadh orcid logoORCID: 0000-0003-1131-5238, Skadiņa, Inguna and Jones, Gareth J.F. orcid logoORCID: 0000-0003-2923-8365 (2010) Portable extraction of partially structured facts from the web. In: IceTAL 2010 - 7th International Conference on Natural Language Processing, 16-18 August 2010, Reykjavik, Iceland. ISBN 978-3-642-14769-2

Abstract
A novel fact extraction task is defined to fill a gap between current information retrieval and information extraction technologies. It is shown that it is possible to extract useful partially structured facts about different kinds of entities in a broad domain, i.e. all kinds of places depicted in tourist images. Importantly the approach does not rely on existing linguistic resources (gazetteers, taggers, parsers, etc.) and it ported easily and cheaply between two very different languages (English and Latvian). Previous fact extraction from the web has focused on the extraction of structured data, e.g. (Building-LocatedIn-Town). In contrast we extract richer and more interesting facts, such as a fact explaining why a building was built. Enough structure is maintained to facilitate subsequent processing of the information. For example, this partial structure enables straightforward template-based text generation. We report positive results for the correctness and interest of English and Latvian facts and for the utility of the extracted facts in enhancing image captions.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Additional Information:The original publication is available at www.springerlink.com
Subjects:Computer Science > Information retrieval
DCU Faculties and Centres:Research Institutes and Centres > Centre for Digital Video Processing (CDVP)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Published in: Advances in Natural Language Processing. Lecture Notes in Computer Science 6233. ISBN 978-3-642-14769-2
Official URL:http://dx.doi.org/10.1007/978-3-642-14770-8_38
Copyright Information:Copyright 2010 Springer Berlin / Heidelberg
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
ID Code:15910
Deposited On:30 Nov 2010 12:29 by Shane Harper . Last Modified 25 Oct 2018 10:43
Documents

Full text available as:

[thumbnail of Portable_Extraction_of_Partially_Structured_Facts_from_the_web.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
365kB
Metrics

Altmetric Badge

Dimensions Badge

Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record