Joseph E. Grimes, SIL International, University of Hawaii at Manoa
Designing Tools That Promote Archiving
Tools for linguistics can be designed to promote consistent archiving. This includes structures for metadata on the persons who collect and interpret the data, the works they produce, and the speech varieties they include. Three areas of linguistics highlight diverse metadata patterns:
Comparative Linguistics: Wordcorr, with 447 downloads, helps linguists apply the comparative method to parallel word lists. It requests metadata for each user: an ID, name, email address, and institution. Each data collection has a title prefixed with the creator's ID, collaborators, language of description, publication and copyright information, and information on accessibility. Each speech variety being compared appears as a subject language; it contains metadata on published and unpublished sources, language names and affiliations, Ethnologue code, and other things relevant to comparativists but not yet in the OLAC schema. Wordcorr transforms its metadata into OLAC form for incorporation into a repository.
Sociolinguistics: Multilingual situations cannot be assessed without information on how proficient segments of a community are. One test is based on the observation that you have to know a language well in order to repeat whole sentences in it. The sentence repetition test discriminates lower proficiency well, higher proficiency poorly.
The test is simple, but setting it up correctly is not. One computational tool from the 1980s has the algorithms right, but is completely user unfriendly. So a redesign is needed emphasizing the user interfaces and metadata for
Lexicography: In the early design stage, a Web-based tool for producing dictionaries of endangered or underdocumented languages will probably use a factory design pattern to accommodate diverse structures - alphabetic versus semantic arrangement of entries, internal structuring of entries by sense or by part of speech, different uses of subentries, for example.
Different granularity is need for different presentations: