What am I doing?

Optimality Theory for non-linguists

For a Hungarian version click here / magyar változat

This introduction is intended to my friends, who always ask me what is the topic of my research... Then, when I tell them that I deal with "language learning in Optimality Theory and Finate State Automata", they are shocked. So, here is a description for all of you...

(I apologize for incorrectness in advance, if a linguist happens to read it...)

1. What is a linguist doing?

I had to realize, that in fact people usually don't know what linguistics is about -- although they believe to know it. Probably the major goal of linguistics is "understanding" the language (after having described it). In other words, "explaining" the phenomena encountered in "the" human language (in general, i.e. general or theoretical linguistics), or in a given language / given set of languages (i.e. the linguistics of a given language, or of a set of languages). But the big question is what "explaining" means?

Following Thomas Kuhn (whose book I am reading at the moment, and it has influenced my way of thinking in a major way), every paradigm in the history of a given science might have different criteria for it. For instance, a correct explanation in historical linguistics (flourished in the 19th century, but it still has many followers) has to show the way the actual state developed from previous states. For instance, why has the English word "ox" the irregular plural form "oxen"? Because it goes back to old, proto-Germanic roots, and has its parallels in related languages. But this way of explaining things is as irrelevant for a linguist following an other paradigm, as would be the explanation given by a primary school English teacher saying: "because that is the way it is written in the text-book". We should acknowledge that this last answer is the most relevant one in certain contexts, such as in the case of beginners learning the English language.

The paradigm followed by the group of linguists I was educated by (and who consider themselves to be the moderners) expects from a relevant explanation to draw back the given phenomenon to general principles that are universally true in the world's languages. In other terms, theoretical linguists try to set up general theories (including models, principles, usually some kind of formalism, etc.) that will hopefully hold for all the relevant phenomena (usually only one type of phenomenon is investigated, like differences in the syllable structure, the stress pattern of a word or the word order within a sentence) in all the languages of the world (or, at least, most of the languages investigated). Such a theory is successfull, if it can be used for many phenomena of many languages; and linguists are even more happy, if it fits well observations of other types, such as the development of child language, second language acquisition, or even non-linguistic psychological theories.

2. Optimality Theory (OT)

This model has been very much in fashion since its appearance in the early 1990's. It is mostly used in phonology (dealing with the system of the sounds), but other fields of linguistics have also made use of it.

In order to understand it, let's take the following example. What is the way one buys, let's say,... hum... let's say, chocolate? (For those, not knowing me, I should explain, I love chocolates!) You enter the supermarket, you go to the row where you have all kinds of chocolates. And then? You look at the different types of chocolate: you look at their price, their quality, the quantity within one package, etc. And then you make your choice. Let's suppose, you can buy only one (it usually doesn't happen with me, that is just an example...). You have to set up a hierarchy of values: what is the property you look at first, second, etc. Suppose, you look first at the quality, which means that after the first round only the best ones will remain. Then you look at the price of those remaining. And still, if there are a few of the same quality and the same price, then you look at which contains the most chocolate. This means, that your hierarchy is: 1. quality, 2. price, 3. quantity.

One can claim that the shopping behavior of all (rational) people in the world can be described by these three universal conflicting factors (price, quality, quantity); and that you can explain the different types of clients by the different hierarchies (rankings) formed by them. If this is not the case, then either you have to improve this model, or you have to prove that no better model exists.

The same is done for languages. As a simplified example, let us suppose,there are different "constraints" on the word-order within a sentence, like "the subject should be at the beginning", "the verb should be at the end", "the verb should be as early as possible", "no noun should appear at the end of the sentence", etc. These conflicting constraints are universal, that is they exist in all languages in the world. Only their ranking (ordering) is different, and this fact leads to the differences of the languages. There is a given number of language types in the world for word-order (Let's say: 1. English, French, Modern Hebrew, etc.; 2. German, Dutch, etc.; 3. Hungarian, Russian, etc.; 4.etc.), and -- according to OT -- each type represents a given ordering of these constraints. Each type will choose from within the set of all possible sentences the one that is best (optimal) according to the language's hierarchy. The same way as the customer types reflected the different orderings of the above-mentioned factors, when it was about choosing not sentences but chocolate.

Summing up: explaining the different behavior of the world's languages means now to find the appropriate constraints, and showing that they reflect the typology observed.

For a better introduction to OT, see: http://cognet.mit.edu/MITECS/Entry/smolensky2

3. Learning OT

Let us suppose now, that you have a new girl-friend, and that you want to know what her hierarchy of values is. (It should fit yours, of course!) What is the way you can do it? You bring her to the supermarket, and you observe the way she makes her choice from among the different kinds of chocolates.

Let us suppose, you put two kinds in front of her, one is cheaper, the other is better. If she takes the cheaper one, it means that price is higher ranked in her hierarchy than quality. But if she takes the better one, then you have to learn that quality is more important for her than price. (Or, alternatively, you understand that she expects you to pay...)

Similar experiences can be done for languages. Computer programs can be written, whose aim is to model the way children learn their mother-tongue, which perform the same: learning the language, that is deducing the correct ranking of the constraints, from the data you give in to the computer.

4. Finite-State Automata (or Transducer)

Finite-State Automaton is the way men usually try to understand their wife. They suppose the women can be characterized by a number of "states": well, tired, in love, angry, very angry, very very angry,... ("Having their monthly cycle" is also an often used state in men's model.) Then, if you tell her something, depending on what this input and her initial state are, she will change her state. As an example men believe, that when she is in the state of "being very angry", if you tell her "I love you", this will bring her into the state of "being in love with you". (Experienced husbands report this is not the case...)

A Finite-State Transducer is the same, but you also consider your wife's reaction, which might also depend on her initial state and on your input. For instance, the input "bring me a beer" will lead to different rections: if she is in the state of love, she will stay in it, and tell you "of course, darling". But if she is in the state of being very angry, she will change her state into "very very angry", and her output will be: "bring one for yourself!"

(An important remark: all of this is just theoretical. I don't have a wife, I don't like beer, and my girl-friend is in a constant state of love.)

This has a well-defined mathematical theory, that helps you understand the reasons why men are wrong when modeling their wife this way. First of all, women are non-deterministic, i.e. in a given state the same input can cause different changes of state and different reactions. Secondly, women have more than a finite number of states. And the last point: Finite State Automata have no (long-term) memory, that is their way of working does not depend on what happened in the past,... which is usually not the case for women.

5. Combining Optimality Theory with Finite-State Automata

How can you model your girl-friend buying chocolate, by supposing she has a finite number of states, and that seeing the price, the quality, etc. of different kinds of chocolate will cause different state transitions? That is a good question...

For an answer, see: http://odur.let.rug.nl/~vannoord/papers/ (Dale Gerdemann, Gertjan van Noord. Approximation and Exactness in Finite State Optimality Theory. In: Jason Eisner, Lauri Karttunen, Alain Thériault (editors), SIGPHON 2000)

...and my dissertation in Autumn 2005!

Back to my home-page