Potawatomi Case Study: From Filemaker Data to the Web
- Scanning and OCR
- Filemaker Pro: The legacy database
- The FIELD database
- Data conversion
- Terminology mapping
- Text storage
- Text presentation
- Follow the path of the Potawatomi data
In the fall of 2003, several years after developing a Filemaker Pro database for her Potawatomi data, Dr. Laura Buszard-Welcher began working with E-MELD to use her data to showcase Potawatomi as one of the ten languages in the E-MELD School of Best Practices. A sample set of data was exported from Filemaker Pro as a tab-delimited file and was uploaded to the FIELD database. This served as a test of the FIELD upload function, as well as presenting special challenges in data conversion. The Potawatomi documentation required modification of the FIELD database and the GOLD linguistic ontology. The outcome was a lexicon in XML format which can be printed as a dictionary through the use of XSLT stylesheets.
Dr. Buszard-Welcher received the initial data in the form of a printout of the database of Dr. John Nichols, an Algonquianist who worked on the Potawatomi language in the 1970s. She scanned the printout and ran an OCR application to convert the images into characters. The file was then manually formatted into tab-delimited text. Using Filemaker Pro, she created a flat file data structure to house the imported data and then used this program to edit and expand the lexicon while working with Potawatomi speakers.
Although Filemaker Pro can be a useful software tool, Filemaker Pro provides an interesting test case for best practice standards. Its drawbacks include the lack of Unicode support and the fact that it is proprietary software. Furthermore, Filemaker Pro does not restrict the linguistic terminology used, e.g. for concepts such as parts of speech and feature values. Although this flexibility is often prized by linguists, use of idiosyncratic terminology jeopardizes the long-term intelligibility of the documentation. For that reason, it is recommended best practice to define terminology with reference to a standard ontology of linguistic concepts, such as GOLD. The FIELD tool is linked to GOLD and therefore automates this function.
FIELD was developed specifically for entry of lexical data in best practice format; it is Unicode-compliant, and has the ability to output the data as an XML document. Furthermore, the FIELD tool is linked to GOLD (General Ontology for Linguistic Description).
The structure of the legacy Filemaker Pro database created some challenges for uploading the data into FIELD. Because it was designed as a flat file, the underlying structure of each record was the same, with a total of six possible fields for inflected forms (Entry Display). In Filemaker Pro data entry, the researcher had to select a particular part of speech (Entry Form); this determined the labels for the inflected forms (Animate Intransitive Verb). Each part of speech had different requirements for inflected forms so, for example, the first inflected form for an animate noun would be a plural form, but the first inflected form for an animate intransitive verb would be a first person singular form. As a result, different kinds of data were found in the same database column. This is not recommended in database design, and it meant that each part of speech had to be separately uploaded into FIELD. Secondly, columns that were created to house the labels of these fields in the flat file (such as 'ni' for inanimate noun) had to be ignored during the upload.
Although it was possible to overcome the challenges presented by the Filemaker Pro database, the experience serves as a reminder that language documentation presents special challenges to database designers. Wherever possible, it is important to follow recognized principles of database design. (For more on the use of databases in linguistics, see the E-MELD 2004 Workshop on Linguistic Databases and Best Practice.)
Click on the image to see an example of the Filemaker Pro entry display:
Click on the image to see the number of different entry forms created to accomodate the flat file database:
Click on the image to see an example of an entry for an animate intransitive verb:
A major part of the uploading process involved mapping Potawatomi grammatical terms to the GOLD ontology being developed by E-MELD. A number of areas of the ontology were modified and expanded by the GOLD team based on the requirements of Potawatomi grammatical features, including gender, size, evaluation, polarity, proclitics, phrase units (main and subordinate clause forms) and participles.
With the Potawatomi sample data housed in the FIELD database, it is now possible for Dr. Laura Buszard-Welcher to modify and edit the lexicon using the FIELD tool. The data can also be exported from FIELD as an XML file. XML stands for eXtensible Markup Language. It defines a standard way of encoding the structure of information in plain text format. It is an open standard of the World Wide Web Consortium that is based on extensible tags (extensible meaning that they are not pre-programmed, but can be defined by the creator). XML is currently considered best practice for the archival encoding of textual data, because it does not depend upon any particular software, and can be formatted through an XSL Stylesheet to be displayed in many formats. Furthermore, it is generally more self-descriptive than other electronic formats, which should make it more accessible to future generations.
XSL stylesheets are used to create example displays of the documentation, and documents in other presentation formats can easily be created. Stylesheets can be used to transform XML documents into different file formats (for instance, HTML, text, or PDF), without changing the original XML document. A stylesheet could transform the same lexicon in XML into a learner's dictionary or an academic dictionary, in online or printed versions. Thus the project demonstrates the flexibility afforded by best practices. The first Potawatomi dictionary can be digitally created from the FIELD database, the exported XML file, and stylesheets. Below, you can see the same XML document transformed by XSL to provide print and online versions of the lexicon.
- Get started: Summary of the Potawatomi conversion
- Scan and OCR: OCR or Keyboard page (Classroom)
- Linguistic Review of Filemaker Pro: Filemaker Pro (offsite)
- Add Text to Database: FIELD page (Workroom)
- Convert Data: Conversion page (Classroom)
- Map Terminology: GOLD Ontology (Workroom)
- Store Text: XML page (Classroom)
- Present Text: Stylesheets page (Classroom)
|About the Data|
Scan and OCR
Search the Lexicon
|About the Language|