Chris Hellmuth, Nakhimovsky & Tom Myers, Colgate University
Linguist's Toolbox and XML Technologies
Toolbox (available at http://www.sil.org/computing/toolbox/) is a powerful linguistic tool, but its capacities for viewing the data on screen and printing are somewhat limited and dependent on proprietary Microsoft formats (.DOC, .RTF). The rapid development of Web and XML technologies have created many new opportunities for information storage, query, and display. Thanks to the new XML-export feature of Toolbox, these technologies can now be deployed to make Toolbox data available on the web, presented in structured way within the browser, and stored in a relational database that can be queried by the back-end code of the web server. This paper presents two web applications that demonstrate possible uses.
The second application adds a web server and a relational database to the framework. (The browser, the server and the database can all be on the same laptop - or they can be on three different computers separated by great distances: the same software can function in both situations.) Just as the first application, it converts Toolbox XML exports (e.g., an interlinear text) to a structured XHTML format that conforms to a "microformat" to be developed for that purpose. (See http://microformats.org) OLAC metadata collected within Toolbox will be converted to the OLAC-established XML format, validated, and also converted to an XHTML microformat for Web display. Both linguistic data and OLAC metadata will be stored in a database for querying and display. The relational database can serve both as an archive and a scholarly resource available on the Web. The ultimate goal of our effort is to provide a smooth, almost completely automated path from field data and analysis to a Web-accessible OLAC-compatible repository that is also a convenient resource for linguists everywhere (subject to intellectual property protections).
We greatly appreciate generous help from Alan and Karen Buseman, Joan Spanne, and Gary Simons, all of Summer Institute of Linguistics. Tom Myers of N-Topus Software (www.n-topus.com) has provided, as often, invaluable advice. This research has been supervised by Dr. Alexander Nakhimovsky and in part supported by Colgate University.