LINGUIST List 35.727

Sat Mar 02 2024

Review: The Typological Diversity of Morphomes: Herce (2023)

Editor for this issue: Justin Fuller <justinlinguistlist.org>



Date: 02-Mar-2024
From: Michael Maxwell <mmaxwellumd.edu>
Subject: Morphology, Syntax, Typology: Herce (2023)
E-mail this message to a friend

Book announced at https://linguistlist.org/issues/34.2138

AUTHOR: Borja Herce
TITLE: The Typological Diversity of Morphomes
SUBTITLE: A Cross-Linguistic Study of Unnatural Morphology
PUBLISHER: Oxford University Press
YEAR: 2023

REVIEWER: Michael Maxwell

SUMMARY

As the title suggests, this book is a discussion of morphomes. In particular, this work takes a typologically wider view than most discussions of the topic, which have mostly emphasized morphomes in Romance languages. The author summarizes his goal (pg.263): "...to advance our understanding of precisely which conditions and forces are operating when unnatural morphosyntactic patterns do manage to get established and successfully replicated in a language."

I will start with a diversion, by defining "morphome", lest some readers consider my spell checker (along with me) to be defective. Morphemes (with an 'e' in the second syllable) are usually considered to be minimal sequences of phonemes that bear some meaning, to include roots and affixes. Immediately there are issues, such as affixes that consist of suprasegmentals, discontinuous affixes (like circumfixes) and roots (as known from Semitic languages), perhaps zero (null) affixes, and so on. Morphemes may also take the form of allomorphs, which are often phonologically conditioned, although they may also be lexically conditioned (as in inflection classes). All this should be familiar to most readers of this review.

Crucially for the present discussion, affixal morphemes are usually considered to have a meaning which reflects natural classes of morphosyntactic features. Inflectional affixes which mark person and number, for instance, typically indicate a contiguous part of the inflectional paradigm: a language may have one suffix marking singular and another marking plural for all persons; or a first person suffix, a second person suffix, and a third person suffix; or affixes marking a combination of features, such as first person singular, and so forth. Likewise, where stems have allomorphs, those allomorphs are usually phonologically distributed, or else distributed (like affixes) according to natural morphosyntactic classes. (In the latter case, the stem allomorphs are sometimes treated as a single allomorph modified by an infix, suprafix, or some other kind of affix.)

One might expect that this is not only commonplace, but that it is the only way languages work--that each affix (I will return to stems momentarily) represents a single set of feature values, i.e. "this feature value AND this feature value AND...", such as [+A -B]. One would be wrong. There are many ways this expectation is violated. Perhaps most familiarly, there are affixes which mark the "elsewhere" case; that is, a paradigm will encode some set of feature combinations, but only one or a few of these combinations will be marked by a single affix, and the remaining feature combinations will be marked by another affix. English present tense verbal morphology comes close to this: there is one -s suffix that marks the third person singular present tense, while all other present tense person/ number combinations are either unmarked or marked by a null affix, depending on your theory. Shuar (a Chicham language of Ecuador) is even clearer; there are distinct possessive suffixes for first person singular and second person singular, plus an additional suffix for everything else.

More startling are affixes which encode more than one set of morphosyntactic features, like [[+A +B -C] OR [+A -B +C]]. This is a form of syncretism ("cases in which the same phonological string is used to express distinct combinations of morphological features"--Albright and Fuß 2012), although sometimes analyzed as two or more distinct but (accidentally) homophonous affixes. An example would be Latin neuter second declension nouns, where the suffix for the nominative, vocative and accusative is -um, while the remaining cases--genitive, dative and ablative--have other suffixes. Assuming a simple feature system, this -um suffix would encode
[Number singular
[ [Case nominative] OR
[Case vocative] OR
[Case accusative]
]
]
Crucially, this disjunction of nominative or vocative or accusative case appears to be a non-natural class, i.e. there is nothing that these three cases have in common that the remaining cases do not--hence the need for the "OR". (One could of course argue that some funky features make this a natural class, or that Latin had three different but homophonous -um suffixes.)

Perhaps still more surprising are instances where stem forms--not affixes--are used in paradigm cells which do not constitute natural classes. A commonly cited example is in Spanish, where for a subset of verbs, all person/ number combinations of the present subjunctive, plus the first person present indicative, contain a velar consonant at the boundary between the stem and the suffix, which consonant is not found in the rest of the paradigm: 'konoses', "you (sg.) know (indicative)", but 'konosko' "I know (indicative)", 'konoskas' "you (sg.) know (subjunctive)". (I use a phonemic rather than orthographic transcription of non-Castilian dialects.) Again, the portion of the paradigm where this happens is clearly not a natural class; if it were only the present subjunctive forms, it would be natural, but the inclusion of the first singular present indicative results in a non-natural class. Under the usual analysis, the velar consonant is treated as part of the stem, so the stem allomorph containing the velar consonant is a morphome. An alternative analysis is that the velar consonant is part of the suffix, but this would still be a morphome because of its distribution. This is only one of several such morphomes in Spanish and in Romance languages more generally, which seem surprisingly stable over centuries of language change.

So much for a brief explanation of the concept of morphome. The Romance language morphomes have been well studied, and their historical evolution is reasonably well understood. What the book being reviewed here brings to the table is brief descriptions of morphomes in other languages, indeed 79 languages across a wide variety of language families.

The brief first chapter introduces the concept of morphome in rather more detail than I have done, mentioning some of the issues that will come up, such as what a natural class is. The much longer second chapter discusses problems in identifying morphomes. Since the concept of morphome is pre-theoretical, belonging more to typology than to any particular theory of morphology or phonology, Herce casts a wide net, highlighting some of the questions that will come up when deciding whether a paradigm in some language does or does not contain morphome(s).

The third chapter, "Morphomes in diachrony", discusses how morphomes come about through language change. A number of different origins are described and illustrated by short case studies, with sound changes being perhaps the most common cause. The diachronic origins are revisited for most of the 79 languages of the next chapter, being omitted when there is not enough data (e.g. for language isolates).

The title of the fourth chapter, "Morphomes in synchrony" (playing off the title of the preceding chapter), is mostly a "database" (more on that term in my evaluation, later) of morphomes in 79 different languages of many different language families. The chapter begins with a brief description of Herce's criteria for inclusion (recapitulating some of the discussion in the second chapter). The bulk of the chapter consists of descriptions of morphomic patterns of individual languages, illustrated by slices of the paradigms, with cells containing morphomes highlighted (more on this formatting later). And the chapter ends with discussion of quantifying various properties of the languages' morphomes, including statistical properties abstracted across the language sample. One take-away here is that some morphomic patterns, expressed as disjunctions of morphosyntactic features, are much more common than others.

The fifth (very brief) chapter, "Implications," brings up some theory-based considerations, such as the place of morphosyntactic features and the resulting non-natural classes in morphology. (Interestingly, similar questions about features and non-natural classes have arisen in phonology, see for example Mielke 2008.)

The final chapter, "Conclusions", summarizes findings based on the data of Chapter Four, such as the fact that morphomes are found in many language families (not just Romance languages, where they had been widely studied); and the fact that morphomes are diachronically resilient, that is, they appear to last for generations of speakers--perhaps militating against the notion that they are somehow peripheral to language. (Probably the last part of Chapter Four and all of Chapter Five could have been combined with the contents of Chapter Six.)

EVALUATION

There is already a substantial literature on morphomes, much of it concerning the synchronic issues, dating back to before Mark Aronoff coined the term around 1994; in the older literature, it often comes under the rubrics of "irregularity", "exceptions", "rule features" and "diacritic features" in phonology and/or morphology (see e.g. Zonneveld 1978, and Harris 1978). In terms of stem-based morphomes, much of that discussion concerned Romance languages--to be sure, an important topic. The added value of this book is that it presents morphomes in many other languages. What the book does not attempt to do is to devise a theoretical explanation for the synchronic analysis of morphomes, although the synchronic place of morphosyntactic features (and thus the theory of such features) is briefly touched on in in Chapter Five. This focus on the data seems to me a perfectly laudable goal.

That said, there is considerable discussion in this book of the diachronic origins of morphomes in many of the 79 languages examined. I am not a historical linguist, but most of the explanations appear at least plausible to me.

One thing that I found odd--although I understand the motivation--is that for the most part Herce does not distinguish lexical (root, stem, whole word) morphomes from affixal morphomes. To me, this distinction is crucial, since in most cases of affixal morphomes, there will be one or at most a couple such morphomes, and the homophonies can therefore often be argued to be unimportant accidents (syncretisms) which need not be explicitly addressed in the grammar. Whereas with lexical morphomes, there is generally a much larger number, and some account must be made.

For example, in the Spanish case I brought up earlier, if the velar consonant is part of the suffix, then there are handful of such morphomes: the first person singular present indicative '-ko', and the present subjunctive affixes, which are at most five, and which could possibly be reduced to one (-ka) under a more agglutinative analysis. (There are also voiced and unvoiced allomorphs, but these are phonologically predictable.) Whereas there are dozens of verbs whose stems take different forms in different parts of the paradigm. (Depending on your theory of phonology, the velar consonant might also be epenthetic, belonging neither to the stem nor the affix.) In fact in many cases, the affix vs. stem distinction is quite clear; another morphome in Spanish has to do with diphthongization of the stem-internal vowel, and it would take quite a contortion to call this monophthong--diphthong alternation an affix.

Moreover, in the case of affixal morphomes, the puzzle is why the same phonological string is used in distinct paradigm cells; whereas in the case of stem morphomes, the puzzle is the opposite: why the same phonological string is not used in all paradigm cells. Hence I would have categorized the instances of morphomes as stem/ root vs. affix vs. ambiguous, with possibly a separate category for suppletive whole word morphomes.

Some languages' descriptions are less clear than others; for example, I found the discussion of the Biak language confusing until I consulted the original source (Heuvel 2006). Herce refers to vowel length, but it is not obvious in his table (4.78) that there is vowel length--it turns out that the forms are cited in the Biak orthography, which uses an acute accent mark to represent length, a fact mentioned in the original source. Herce's discussion also says an epenthetic vowel as unique to certain forms, but this is in fact phonologically predictable (Heuvel 2006: 27). (Herce's discussion also refers to this paradigm's affixes as suffixes; they are in fact prefixes.)

I have alluded above to the table formatting. Given that most readers (myself included) will know only a few of the languages, the choice of how the paradigms are "sliced" (you can't easily show the entire paradigm of the Spanish verb, for example, and it would only be confusing) is crucial; in this, I believe Herce has been eminently successful. Less successful is the use of shading. For tables where there is more than one morphome, the shading is inconsistent between tables and can be confusing. Coloring is used in only a handful of tables, and would have been welcome in many more. Coloring is used in a few figures, but could also have been used more widely. I realize that colored ink can be expensive, but the PDF (where color would be free) is like the printed book. A few tables show morpheme breaks (or the stem is bolded, but only in one table), which is also helpful; more (where these are more or less unambiguous) would have been even better.

There is a slightly misleading discussion of Zipf's Law on p88, where it says "...more frequent words and meanings tend to be shorter. This is known as Zipf's (1935) law." I'm not sure what it means for a "meaning" to be shorter, but in any case Zipf's 1935 law does not refer to the length of words, rather it is strictly about rank as measured by frequency, and token frequency (specifically, the relative frequency of the Nth most frequent word is approximately 0.1/N, although there are other mathematical formulations). Zipf did discuss the length of words in his later (1945) work, claiming that more frequent words tend to be shorter, as measured in phonemes (English) or syllables (Latin); this principle has since been extended to many other languages, and has come to be known as "Zipf's Law of Abbreviation", or the "Brevity Law." (To be fair, Herce is not the only writer to collapse Zipf's Law based on rank with Zipf's Law of Abbreviation.)

Perhaps my greatest criticism of this work is that the data on individual languages (the bulk of Chapter 4) is referred to as a "database", but it is not. A database is contained in some clearly laid out format, a format which is computationally processable by sorting, filtering, extracting, adding and deleting entries, and perhaps other computations, depending on the kind of data (numeric data allows different sorts of processing than text data). Examples of such database formats include relational databases (of which SQL databases are the most common), spreadsheets, tab- or comma-delimited tables, and XML- and JSON-formatted data. Print documents and even PDFs are not databases. I emphasize this because the data gathered here is a goldmine, but because it is not a database, it is far less easy to work with than it deserves to be.

There are also inconsistencies in the information given for each language in Chapter 4. Some of this is to be expected; it is difficult to surmise the diachronic origins of morphomes in language isolates, for example. But some omissions appear to be accidental: for many languages, there is a summary of the morphomic distribution, e.g. "Chinantec, L2: 1PL/2.Completive/3" (meaning the second alternation for "L", where "L" appears to refer to the Lealao variety of Chinantec, and the morphome exists in the three stated regions of the paradigm); but for many other languages, including Palantla Chinantec, there is no such summary.

There is at least one mention (pg.256) of "the supplementary materials that accompany this book." I did not find any indication of where these supplementary materials might be. The book in its entirety is available as an open-access PDF from https://academic.oup.com/book/45787, but there does not appear to be any link from there (including the citation on pg.256 in the PDF) to any supplementary materials. Herce's Google Scholar page (https://scholar.google.com/citations?user=FZ4EX7kAAAAJ) includes a link to his 2023 open access article in the journal Morphology, and this contains a few supplementary materials, but apparently not all the material that was used in this book. That journal article also includes a link to the searchable 2010 Oxford Online Database of Romance Verb Morphology, attributed to Martin Maiden and others, although as the title suggests this is restricted to Romance languages.

Typos appear to be minor, although obviously I could not check most of the language data for accuracy. There were a few errors in the bibliography and citations. The citation to "Harbour 2019" on pg. 257 does not appear in the bibliography, although there is a bibliographic entry for Harbour 2008, which may be the intended reference. There are a few places where an author's name has been given differently in different entries, resulting in misplaced entries.

Bottom line: Herce is exactly right to expand the discussion of morphomes to more languages and language families. Having read many grammars myself, I am amazed that he has managed to read so many, and astounded that he has condensed them in such a brief and insightful manner. No doubt there will be re-analyses of the data for some of these languages (it would not be the first time that someone mis-read or mis-transcribed a grammar), but in general Herce's descriptions appear sound, and for the few languages that I am familiar with, I can say that his descriptions are accurate. I do hope that the descriptions can be ported into a real database, probably in XML or JSON (spreadsheets are probably a poor way to represent the data in a searchable fashion, and a relational database would doubtless be a mess, to use the technical term).

Now it's time for the theorists to take into account the fruits of Herce's research. In particular, these results should inform work on natural classes in morphology (might one hope for spill-over into work on natural classes in phonology), and into morphomes specifically. I have already mentioned the question of morphomic affixes vs. morphomic stems or other lexemes; while Herce's work does not immediately separate those cases (as discussed above), it would not be difficult to add that information, and the distinction will prove relevant to many theories. Another line of effort would be the extent to which the "elsewhere" principle can explain some of the patterns, since "elsewhere" is almost by definition not a natural class. I look forward to other theoretical advances coming out of this work--or equally, disconfirmations of previous theoretical proposals, perhaps on the typology of person/ number features.

REFERENCES

Albright, Adam, and Eric Fuß. 2012. "Syncretism". Pp. 236--288 in Trommer, Jochen (ed.) The Morphology and Phonology of Exponence. Oxford Studies in Theoretical Linguistics. Oxford: Oxford University Press.

Harris, James. 1978. "Two theories of non-automatic morphophonological alternations." Language 54(1): 41--60.

Herce, Borja. 2023. "Morphological autonomy and the long-term vitality of morphomes: stem-final consonant loss in Romance verbs and paradigmatic analogy." Morphology 33(2): 153--187. https://rdcu.be/dusPD.

Heuvel, Wilco Van den. 2006. Biak : Description of an Austronesian Language of Papua. Lot, 138. Utrecht: LOT. https://research.vu.nl/ws/portalfiles/portal/42174909/complete+dissertation.pdf (https://hdl.handle.net/1871/10282)

Mielke, Jeff. 2008. The Emergence of Distinctive Features. Oxford: Oxford University Press.

Zonneveld, Wim. 1978. A Formal Theory of Exceptions in Generative Phonology. Lisse: The Peter de Ridder Press.

ABOUT THE REVIEWER

Dr. Maxwell is a retired researcher in computational morphology and other computational resources for low density languages, formerly at the Center for Advanced Study of Language (later the Applied Research Laboratory for Intelligence and Security) at the University of Maryland. Before that he did research at the Linguistic Data Consortium at the University of Pennsylvania, and studied endangered languages of Ecuador and Colombia with the Summer Institute of Linguistics.




Page Updated: 02-Mar-2024


LINGUIST List is supported by the following publishers: