NZSM Online

Get TurboNote+ desktop sticky notes

Interclue makes your browsing smarter, faster, more informative

SciTech Daily Review

Webcentre Ltd: Web solutions, Smart software, Quality graphics

Feature

How Fat is an Idiom?

Prepackaged phrases for all occasions are part of everyone's vocabulary. Is there a natural limit on how big they can be?

Koenraad Kuiper

Idioms, such as drive someone up the wall, are one of many kinds of linguistic fragments that we, as speakers of a natural language, carry in our heads. They include speech formulae used for particular social purposes such greeting and apologising, slogans, advertising jingles, proverbs and the like. Some of such fragments may be poems learned at school, the complete lyrics of the Beatles or the libretto of The Pirates of Penzance.

Such longer texts are not themselves used to construct sentences whereas certain phrases of them might be. Many phrases from Shakespeare and the Bible have come to be used as constituent parts of sentences which are otherwise new and original. It is useful, on those grounds, to make a distinction between those linguistic fragments which are, like words, used to construct new sentences and those which are held in memory for their own sake. Memorised phrases used in constructing new sentences can be called "phrasal lexical items".

Such phrasal lexical items are normally perfectly ordinary grammatical phrases. Look at cases such as take notice of something, shoot from the hip, stick it up your jumper. A speaker of English could make up innumerable phrases with the same grammatical structure. In that respect these idioms are no different from freely created phrases.

What makes them different is that idioms have an unpredictable meaning as well as their literal meaning. Looking at a Western there might be a cowboy who literally shoots from the hip. But the same expression is also used when someone says something which is direct, rapid and perhaps not well considered. That meaning has become institutionalised so that speakers of English learn it as a special meaning of this expression. That means the expression must be stored in memory along with its specialised and unpredictable meaning.

Phrasal lexical items are essentially like words in having entries in a speaker's mental dictionary. Clearly all the words anyone knows as a speaker must be held in memory in what we can think of as a mental dictionary or lexicon. That is what it means to say that we know a word. What makes phrasal lexical items different from words is that they have grammatical structure where words do not.

That's an Interesting Question

There are many interesting questions to be answered about phrasal lexical items. How many of them do we know? How do we learn them? What do we need them for? One question which interests me is what is their size? Why bother asking about something's size? Look at an analogy. In the universe there are heavenly bodies of great variety. It is interesting to ask just how big the biggest of these are. If we look at the mental lexicon as a kind of universe, namely the universe of natural lexical items, then asking how big the biggest of these might be is just as interesting to a linguist. But it is also interesting to a psychologist since it would tell a psychologist something about the size of "chunks" of information held in memory. Humans hold everything in memory in chunks of various sorts. The size of such chunks tells us something about the way in which human memory stores and retrieves the items in it.

If we were to look at the size of phrasal lexical items it might be that any grammatically possible phrase in a language might be "lexicalised" or placed in the speaker lexicon. That would be to suppose the idioms could be very "fat" indeed. Or it might be that only some such phrases are possible phrasal lexical items, that is to suppose that idioms might be "thin" in some way.

One way to check is to use a database of phrasal lexical items and test theories against this data. To do that the data should show the structural properties of these items, since it might turn out that the size of idioms is a property of their structure.

Some years ago Brenda Zanetti and I set about annotating the idioms contained in the Oxford Dictionary of Current Idiomatic English. This involved taking every entry of the 20,000 odd entries and providing it with a labelled bracketed grammatical annotation. Enough of this has now been done to provide a reasonable testing ground for hypotheses about the structure of idioms.

What is a labelled bracketed notation? The grammar of a phrase consists of two components: a label for every grammatical constituent of the phrase and a location for that constituent unit in a hierarchy of units. For example, the phrase the very old man is a noun phrase which consists of the definite article, the, followed by an adjective phrase, very old, followed by a noun, man. The adjective phrase consists of an adverb, very, and an adjective, old. The normal way such structure is visualised is as a tree diagram:

The same structural information can be represented by putting each constituent of the phrase within brackets and labelling the brackets with the label of the unit within the brackets: [Noun Phrase the very old man]. Abbreviating the labels the full structural information would look like this:

[NP[Art the][AP[Adv very][Adj old]][N man]].

While this is harder to "read" than a tree diagram, it is easier to search using string searching processes on a computer which can automatically search, for example, for all the examples of a noun phrase which has an adjective phrase within it. On the strength of this capacity, a set of idioms with this kind of annotation can be used to test hypotheses about the structural properties of phrasal lexical items. Specifically, if a given search returns no cases then one might be lead to think that such structural configurations are unlikely to occur or perhaps impossible for humans to store in their lexicons.

The Theory

We might begin by supposing that humans can place in memory any phrase which is grammatically well-formed in their language and use it as a lexical item. Such phrases would be potentially infinitely fat. Fortunately that theory is wrong in one very simple way -- phrases can be infinitely long and a brain is finite, so it will not store infinitely long phrases. But the theory that a lexicon can store any phrase so long as it is not infinite is not a very interesting theory because it does not limit the set of possible phrasal lexical items greatly. Possible phrasal lexical items could still be very fat indeed, for example, ten million words long. It seems unlikely that humans are going to place in memory and use as lexical items phrases ten million words long.

Another possibility is to suppose that there is a string length limit on phrasal lexical items. Possibly lexical items might be no longer than x words where x is less than some arbitrary number. That is one way of thinning idioms down. Selecting a smallish number such as five will not be plausible since it is easy to come up with phrasal lexical items which are longer -- get one's knickers in a twist. In fact the selection of some number just because we can find no phrasal lexical items which are longer seems rather arbitrary. We might pick 100 words, or 98, as the upper limit, but that does not explain much. What we want is a mechanism which predicts why phrasal lexical items are as thin as they seem to be and no fatter.

One plausible approach is to look to the grammatical structure of phrases and see if some constraints on that are operating to keep phrasal lexical items thin. To do that involves looking at the general structural properties of phrases. According to many theories of phase structure, all phrases have a single constituent which is the head of the phrase. For example, in a noun phrase, one noun is the head of the phrase. In the case of the tree's very large branches it is branches and not tree which is the head of the phrase. Heads determine many of the grammatical properties of the phrase, for example its number. The example phrase is plural because branches is plural.

Another property of heads of phrases is that they have the capacity to select other parts of the phrase. The classic case of such selection is the way in which certain verbs select their objects. Take, for example, the verb avoid. It will not do in English to just say someone avoided. If there is avoiding being done, then something has to be avoided. The something being avoided is termed the complement of the verb or, in traditional terminology, the direct object of avoid. So what is being avoided is an obligatory part of the verb phrase and is selected by the verb.

One simple proposal which would limit the structural complexity of phrasal lexical items is that such lexical items consist only of a word which is the head of the phrase and another word which is the head of a phrase the first word selects. That would create a minimal skeleton since phrases must have heads and for anything to be a phrasal lexical item rather than a single word lexical item there must be a second word.

If that were the head of a phrase as well, and if the head of the phrasal lexical item had to select it anyway then one would have as thin a phrasal lexical item as possible with only two obligatory constituents, both of them single words.

Interestingly, quite a large proportion of phrasal lexical items have exactly this structure. Look at phrases like see stars, hold hands, pull rank and lock horns. Each of these is a verb phrase where the head verb is selecting the head of its object, the thing being seen, held, pulled and locked respectively. We can call this kind of selection "minimal lexical selection" since the head of the phrasal lexical item is not only selecting the grammatical form of its complement but also its lexical content, the actual word which is the head of the complement. A theory which proposes that the only kinds of phrasal lexical items which humans store in their heads are ones where there is lexical selection of heads is a very restrictive theory, since lots of phrases are ruled out.

Theory Up the Creek

Unfortunately the theory is too strong in ruling out too many phrases. If we look at a phrase like up the creek without a paddle then there are two problems. First creek doesn't require to have a following phrase such as without a paddle. But the idiom also doesn't. We know that we can just use up the creek. So in this respect the data suggests that phrasal lexical items can have optional constituents also held in memory. How many of such optional but lexically complete constituents can an idiom have? The answer appears to be very few. The data suggests that, if they exist at all in an idiom, then that idiom will almost invariably have one and only one optional addition.

What of the in up the creek? It is not optional. (The idiom up the creek could not be changed to up a creek or up ten creeks and still be an idiom). Here it depends which grammatical theory one follows. There are grammatical theories which say that the is also the head of a phrase and that such phrases have noun phrases as obligatory complements. So then creek would be the complement of the. In that case up would select the head of its complement which would be the and, in turn, the would select the head of its complement phrase, creek, creating a selection chain, one head selecting the next. Other grammatical theories have words like the as non-head parts of noun phrases and, if one accepts these grammatical theories, then the is not required to be selected but happens to be in this particular case.

If the is normally an optional constituent which happens to be lexically selected as a fixed part of this phrasal lexical item then how many of such constituents are permitted in a phrasal lexical item? Here again the data is rather clear. In the great majority of cases there are no optional constituents, and phrasal lexical items are just cases of lexical head selection. Fewer than 20% of phrases have optional constituents lexicalised and when they do there is almost invariably one and only one such constituent.

So how fat is an idiom? Idioms are, from a grammatical point of view, skeletal. They consist of only the bare bones of grammatical structure and are, for the most part, without optional fat.

We can speculate on why this might be. Let us suppose that phrasal lexical items are entered in the lexicon for the same reason as single words are -- to represent concepts. Concepts, very roughly speaking, pick out significant aspects of our experience. So do phrasal lexical items. Being up the creek is a particular kind of experience rather like being in trouble. While the single word lexical items pick out the concepts they are associated with, phrasal lexical items pick out subsets of the concept picked out by the head of the phrase. Seeing stars is a particular kind of seeing, pulling rank is a particular kind of pulling.

It is probably part of a conceptual economy to create concepts which are subsets of others but not to make those too specific, and the fatter an idiom gets -- that is, the more lexical material it contains -- the more specific its conceptual content will tend to be. This has implications not just for the way in which language is structured, but also for how language concepts are held in our memory.

Koenraad Kuiper is in Canterbury University's Department of Linguistics.