Audio Digitization Quality Issues

Page Index

Introduction

Unlike paper records, which have a long life-span, analog sound recording media are subject to deterioration within a few decades. Much of our information about endangered languages has been recorded on magnetic tapes, whether cassette or reel-to-reel, and these are far from permanent storage materials. Even DAT tapes, which store audio as digital data, are vulnerable to the same physical problems that affect analog tapes. Until recently, audio archivists handled this situation by periodically re-recording audio on fresh tapes. But as technology becomes increasingly digital, it is becoming difficult to purchase reel-to-reel tapes, and even cassettes are becoming less common. Furthermore, there is always a loss of quality when analog recordings are copied, while digital copies are identical in quality, providing an error checking routine is used to verify identity. Digitization provides a better method for preservation of audio recordings, but it is important to realize that this will also require periodic migration to new physical media (e.g., from CDs to DVDs, or from one server to another) and new software formats (e.g., from WAV to the next version).

To preserve the quality of the original recording, audio digitization requires an analog-to-digital convertor (the sound cards available for personal computers are inadequate for this purpose). Just as the quality of digital images depends on resolution, color depth, and storage format, the quality of digital audio depends on the sampling rate and bit depth settings chosen for the procedure, and the choice of compressed or uncompressed storage format. Archival formats differ from the ones best suited to presentation on the Web. Phonetic research requires a much higher quality than discourse analysis. These factors need to be considered when choosing a format.

Sampling Rate and Bit Depth

We hear sounds by perceiving continuously changing air pressure waves around us. When analog sound recordings are digitized, the pressure is sampled at frequent intervals, and the amplitude at each interval is captured as a number, and stored as bit and bytes. The quality of the digital audio file is dependent on two factors. The sampling rate determines how many times per second the wave is measured. The bit-depth is the sample size, the range of possible numbers used to express the sample, affecting the dynamic range of the audio file. Thus, higher sampling rate and bit depth will provide a more faithful reproduction of the original recording. Minimal recommended standards are a sampling rate of 44.1 or 48 kHz (sampling 48,000 times per second) and a 16-bit depth. 96 kHz and 24-bit depth are advised wherever possible, since this captures more than twice as much information (but also requires more storage space). The additional data will be especially useful for phonetic analysis. Even if you do not intend to use the audio file for this purpose, it is best to record at the highest quality possible, because there's no way of knowing what future researchers might want to do with it. Given limited time and resources, it is rarely possible to digitize an item more than once; therefore, it is desirable to use the best procedures available during the initial process.

Storage Format

After the initial digitization, the file can be stored in a number of formats. Uncompressed formats, such as WAV and AIFF, retain all of the original data. Another uncompressed format is the NIST SPHERE format, used by the Linguistic Data Consortium; the raw data is stored just like WAV, but the header is different. These are the best formats for archival copies, but because of their size, they are unsuitable for presentation on the Web. Compressed formats, such as MP3, Real Audio (.ra), and Windows Media Audio (.wma) use different compression algorithms to remove the frequencies inaudible to humans, resulting in much smaller files. It is important to realize that uncompressed files can later be compressed and stored in other formats for presentation, but the reverse is not true. That is, a WAV file can be compressed and stored as MP3 for presentation, but while it is technically possible to change an MP3 file to WAV format, the data lost in the initial compression will not be restored. Compression also means a loss of data that is important for phonetic analysis.

For long-term storage, choose an archival format that offers LOTS:

FORMAT FILE EXT. L O T S ACCEPTABLE BEST PRACTICE?
 WAV
.wav
+
+
+
+
YES
AIFF
.aif
+
-
+
-
NO
MP3
.mp3
-
+
-
+
NO

Recording Equipment

When choosing equipment for field work, look for the following features:

More on choosing equipment for fieldwork

Metadata for Digitized Audio

Metadata provides information about resources. Along with the general forms of metadata recommended for linguistic resources, it is often useful to include technical metadata specific to audio, including original medium (e.g., reel-to-reel), sampling rate and bit depth, digitization date, digitization software used, etc.

More on digitization of audio cassettes


The content of this page was developed following the recommendations from The NINCH Guide to Good Practice,  The Collaborative Digitization Program,  The Vincent Voice Library Digital Audio Specifications, and  Audio Digitization for Archival Purposes.

See also Sound Directions: Best Practices for Audio Preservation.

User Contributed Notes
Audio Digitization Quality Issues
+ Add a comment
  + View comments

Back to top Credits | Glossary | Help | Navigation | Site Map | Site Search