On distributing the analysis process of a broad-coverage unification-based grammar of spanish
ColaboratorTheofilidis, Axel; Bel Rafecas, Núria; Martín Rioja, Josep Andreu; Universitat Politècnica de Catalunya. Institut de Ciències de l'Educació
Document typeDoctoral thesis
PublisherUniversitat Politècnica de Catalunya
Rights accessOpen Access
This thesis describes research into the development and deployment of engineered large-scale unification-based grammar to provide more robust and efficient deep grammatical analysis of linguistic expressions in real-world applications, while maintaining the accuracy of the grammar (i.e. percentage of input sentences that receive the correct analysis) and keeping its precision up to a reasonable level (i.e. percentage of input sentences that received no superfluous analysis).In tacking the efficiency problem, our approach has been to prune the search space of the parser by integrating shallow and deep processing. We propose and implement a NLP system which integrates a Part-of-Speech (PoS) tagger and chunker as a pre-processing module of broad-coverage nification-based grammar of Spanish. This allows us to release the arser from certain tasks that may be efficiently and reliably dealt with by these computationally less expensive processing techniques. On the one hand, by integrating the morpho-syntactic information delivered by the PoS tagger, we reduce the number of morpho-syntactic ambiguities of the linguistic expression to be analyzed. On the other hand, by integrating chunk mark-ups delivered by the partial parser, we do notonly avoid generating irrelevant constituents which are not to contribute to the final parse tree, but we also provide part of the structure that the analysis component has to compute, thus, avoiding a duplication of efforts.In addition, we want our system to be able to maintain the accuracy of the high-level grammar. In the integrated architecture we propose, we keep the ambiguities which can not be reliably solved by the PoS tagger to be dealt with by the linguistic components of the grammar performing deep analysis.Besides improving the efficiency of the overall analysis process and maintaining the accuracy of the grammar, our system provides both structural and lexical robustness to the high-level processing. Structural robustness is obtained by integrating into the linguistic components of the high-level grammar the structures which have already been parsed by the chunker such that they do not need to be re-built by phrase structure rules. This allows us to extend the coverage of the grammar to deal with very low frequent constructions whose treatment would increase drastically the parsing search space and would create spurious ambiguity. To provide lexical robustness to the system, we have implemented default lexical entries. Default lexical entries are lexical entry templates that are activated when the system can not find a particular lexical entry to apply. Here, the integration of the tagger, which supplies the PoS information to the linguistic processing modules of our system, allows us to increase robustness while avoiding increase in morphological ambiguity. Better precision is achieved by extending the PoS tags of our external lexicon so that they include syntactic information, for instance subcategorization information.
- Tesis - TDX-UPC