Experiments in Structure-Preserving Grammar Compaction
Hepple, Mark and van Genabith, JosefORCID: 0000-0003-1322-7944
(2000)
Experiments in Structure-Preserving Grammar Compaction.
In: 1st Meeting on Speech Technology Transfer, 6-10 Nov 2000, Universidad de Sevilla and Universidad de Granada, Sevilla, Spain.
Structure preserving grammar compaction (SPC) is a simple CFG compaction technique originally described in (van Genabith et al., 1999a, 1999b). It works by generalising category labels and in so doing plugs holes in the grammar. To date the method has been tested on small corpra only. In the present research we apply SPC to a large grammar extracted from the Penn Treebank and examine its effects on rule treebank grammar size and on rule accession rates (as an indicator of grammar completeness) . 1 Introduction Tree banks and resources compiled from treebanks are potentially very useful in NLP. Grammars extracted from treebanks --- so called treebank grammars (Charniak, 1996) --- can form the basis of large coverage NLP systems. Such treebank grammars, however, can suffer from several shortcomings: they commonly feature a large number of flat, highly specific rules that may be rarely used, with ensuing costs for processing (load) under the grammar.