Biao Min Lexicon
The Biao Min Lexicon, housed on the E-MELD site, consists of nearly 3,000 lexical items from Biao Min documentation collected by David Solnit. Each lexical item consists of a phonological form including tone, a Chinese and English gloss, and where applicable, loan word information. Each compound word is accompanied by a word-by-word analysis, with information about tonal variation. This information has been entered into a database and can be output in a text file format with XML markup. This file format is the recommended best practice because it has the best chance of remaining intelligible over time, despite changes in technology and annotation conventions. Although longterm intelligibility is one of the primary benefits of digitizing and archiving linguistic documentation in an approved format, it is not the only benefit.
The Biao Min lexicon was input into a database via the Field Input Tool developed by E-MELD. It has been archived in an XML text file, created according to a lexicon schema (shown as an XSD file) that is being developed by E-MELD. This lexicon schema is partly as an outgrowth of the 2002 E-MELD Workshop on Digitizing Lexical Information.
One of the reasons that XML has been chosen as the archival format is that it is an open standard, meaning that the complete specification is available to the public. This, in turn, means that anyone who has a text editor can create valid XML files, which can be correctly displayed or manipulated by any XML-aware software program. (To see the difference, imagine editing a text file in a proprietary program like MS Word: all formatting must be added through Word itself--not a text editor--because Word format codes are not an open standard.) XML files can also be easily transformed into many different display and file formats through the use of XSL.
The Biao Min Lexicon was created primarily to exhibit the ways in which linguistic documentation can be preserved
using best practices in digitization, but another E-MELD
objective is to increase awareness of the existence of linguistic resources through the use of search engines such as the OLAC Harvester. Metadata
is information about data (in this case linguistic resources)
which is used by search engines to identify resources accurately and return results relevant to the query. It is important to realize that the metadata covering a project or linguistic resource can be made available to a search engine without making the
resource itself available to the public. Often it is useful to the linguistic
community simply to know that some documentation of a given language exists, even though it might require visiting an archive or making special arrangements with a language community to see the materials. This is particularly true of most endangered languages, which are often under-documented. For this reason,
E-MELD strongly encourages linguists to create metadata describing their language documentation projects and to make it available to the OLAC search engine.
One of the benefits of digitizing linguistic documentation in best practice format is that it can be displayed and distributed in many different ways. As can be seen from the raw XML file of the Biao Min lexicon, although XML is accessible, it is not always the best way to display linguistic documentation. However, through the use of XSL stylesheets, XML files can be easily transformed into different formats. That means that from one archival form, several different displays can be created.
Four different displays have been created for the Biao Min Lexicon to demonstrate how much can be done with a lexicon in an XML archival format. We have called the first display a Linguistic Description, since it displays the lexical items in a dictionary format, with cognate forms in several languages. We have called the second display a Simple Reference List, since it shows only part of speech and definition but links to a page with more information. The third display is a Word List that shows the same information in a slightly different format. The fourth display is a Learners Dictionary. This display includes images illustrating each Biao Min term, as well as English glosses and an icon which the learner can click to hear the word pronounced.
These four displays illustrate how different presentation formats can be created from one archival format. The XSL stylesheets displayed here can be applied to any XML file which conforms to the FIELD lexicon schema; thus they can be used to present lexical documentation of many different languages. To see stylesheets applied to a language with more complex morphology, see the presentation format of the Potawatomi lexicon.
- Get Started: Summary of Biao-Min Conversion
- Digitize Images: Digitizing Images page (Classroom)
- OCR or Keyboard Entry: OCR or Keyboard page (Classroom)
- Digitize Text: Lexical Analysis page (Workroom)
- Store Text: XML page (Classroom)
- Present Text: Stylesheets page (Classroom)
- Create Metadata: Metadata page (Classroom)
|About the Data|
OCR or Keyboard
Search the Lexicon
|About the Language|
About Biao Min