The LinGO Grammar Matrix: Supporting the development of machine-readable grammars for language documentation and linguistic hypothesis testing.
Emily M. Bender,
University of Washington
Dan Flickinger, CSLI Stanford,
Stephan Oepen, CSLI Stanford and Universitetet I Oslo
|Project / Software Title :||LinGO Grammar Matrix|
|Project / Software URL:||http://depts.washington.edu/uwcl/matrix/|
|Access / Availability:||The software is open source and available from http://depts.washington.edu/uwcl/matrix/|
Machine-readable grammars are essential components of software systems
for a diverse range of applications including machine translation and
computer assisted language learning. They are also useful tools in
linguistic hypothesis testing, allowing linguists to encode
grammatical analyses and then let the computer check whether the
analyses make the intended predictions about grammaticality,
ambiguity, and structure over the original data set as well as new
Building machine-readable grammars is a time-consuming labor-intensive task. However, to the extent that languages are similar to each other, we can speed up the creation of machine-readable grammars by reusing code for one language in the grammar of another. In practice, it is extremely difficult (and time-consuming) to extract what is relevant from what is not, and this task also need not be repeated over and over.
The LinGO Grammar Matrix (Bender, Flickinger & Oepen 2002) aims to jump-start the development of linguistically precise grammars for diverse languages by providing a starter-kit encoding the cross-linguistically reusable information in such a way that it can easily be extended and specialized for particular languages.
The Grammar Matrix is a declarative resource including underspecified grammatical rules, ontologies of lexical and grammatical types, and 'collateral' files for interfacing to several processing components, including the LKB (a grammar development environment, including a parser, a generator, and tools for debugging grammars; Copestake 2002). The Grammar Matrix (and grammars derived from it) are in a format that is also interpreted by many other parsing systems (Oepen et al 2003).
Cross-linguistic hypotheses encoded in the Grammar Matrix include things like "words and phrases can be combined to make larger phrases", "the semantics of a phrase is determined by the words in the phrase and the way they are combined" (Frege's principle), and "words can have internal structure which is related to both their syntactic distribution and their meaning". None of these hypotheses are surprising, but by encoding them, we can significantly facilitate the development of grammars for new languages, and also reduce the amount of computational linguistics expertise required to embark on such a project.
In future work we plan to extend the Matrix as part of a suite of software tools to aid in the documentation of endangered languages (Montage; Bender et al 2004). In addition to facilitating the development of machine-readable grammars, the Matrix will enhance the accessibility of data and analyses of languages so documented by providing a standardized format. Furthermore, we intend to link the ontologies of types in the Grammar Matrix to standard ontologies, such as GOLD (Farrar & Langendoen 2003) to further enhance the accessibility of these resources.
This demo will present the functionality of grammars derived from the Matrix as well as what it takes to specialize the Matrix to build a language-particular grammar.
Bender, E.M., D. Flickinger, J. Good and I.A. Sag. 2004. 'Montage: Leveraging Advances in Grammar Engineering, Linguistic Ontologies, and Mark-Up for the Documentation of Underdescribed Languages.' In Proceedings of the Workshop on First Steps for Language Documentation of Minority Languages: Computational Linguistic Tools for Morphology, Lexicon and Corpus Compilation, LREC 2004, Lisbon, Portugal.
Bender, E.M., D. Flickinger and S. Oepen. 2002. 'The Grammar Matrix: An Open-Source Starter-Kit for the Rapid Development of Cross-Linguistically Consistent Broad-Coverage Precision Grammars.' In Proceedings of the Workshop on Grammar Engineering and Evaluation, COLING 2002, Taipei Taiwan. pp. 8-14.
Copestake, A. 2002. Implementing Typed Feature Structure Grammars Stanford, CA: CSLI Publications.
Oepen, S., D. Flickinger, J. Tsujii and H. Uszkoreit. 2003. Collaborative Language Engineering: A Case Study in Efficient Grammar-based Processing. Stanford, CA: CSLI Publications.
Farrar, S. and T. Langendoen. 2003. 'A linguistic ontology for the semantic web'. GLOT International 7:97-100.