NZSM Online

Get TurboNote+ desktop sticky notes

Interclue makes your browsing smarter, faster, more informative

SciTech Daily Review

Webcentre Ltd: Web solutions, Smart software, Quality graphics

Quick Dips

Glottogenealogy

Is it possible to quantify the relatedness of spoken languages in some meaningful way? How useful would such a measure be to the historical linguist in tracing how languages have formed and developed? These are some of the issues considered by a new inter-disciplinary research project at Massey University studying glottogenealogy, or language relationships.

The research team, consisting of Professor Jon Patrick from Information Systems, Dr John Newman from the Department of Linguistics and Second Language Teaching, and Anand Raman from Computer Science, is looking at applying artificial intelligence techniques to the problem of sorting out the degree of similarity between natural languages.

Linguists make use of phonological rules which describe systematic sound changes in a language over a period of time. These trace out the evolution of certain core words from one language to another. These traces are then incorporated into a "probabilistic finite state automaton" (PFSA), which can act as a model of the evolution of sounds from the first language to the second.

A PFSA is a theoretical model of some phenomenon, usually a process that proceeds from start to finish in a finite number of steps. It starts in a particular "state" representing, in this case, the original language. From that state, it can move to one of several nearby states (representing a slightly different language), from that state to a different one and so on. Paths of movement from one state to others nearby have associated probabilities which predict their likelihood of occurrence.

The principle is that the more complex the model needed to describe the language evolution is, the further apart the two languages are phonologically. This is an application of "Occam's Razor" which says that, other things being equal, a simpler, shorter hypothesis is preferable a complicated longer one.

The work will initially be tested out on languages known to be closely related, such as those within the Germanic family. There are plans to use the models ultimately to study linguistic anomalies like Basque, which are currently not known to be related to any known families of languages.