hu.birot.OTKit.otBuildingBlocks
Class GenExamples

java.lang.Object
  extended by hu.birot.OTKit.otBuildingBlocks.GenExamples

public class GenExamples
extends java.lang.Object

Contains a number of static methods that help you to create quickly simple instances of Gen.


Constructor Summary
GenExamples()
           
 
Method Summary
static Gen alphabetPower(java.lang.String[] alphabet, int exp)
          Provide Gen function that maps any underlying form to the set alphabet to the power exp.
static Gen alphabetStar(java.lang.String[] alphabet)
          Provide Gen function that maps any underlying form to the set alphabet* (Kleene star).
static Gen composition(Gen gen1, Gen gen2)
          Composition of two "Gens".
static Gen listOfSurfaceForms(Form[] cand)
          Return a Gen that maps any underlying form to the surface forms contained in cand.
static Gen listOfSurfaceForms(java.lang.String[] cand)
          Return a Gen that maps any underlying form to the surface forms contained in cand.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

GenExamples

public GenExamples()
Method Detail

listOfSurfaceForms

public static Gen listOfSurfaceForms(java.lang.String[] cand)

Return a Gen that maps any underlying form to the surface forms contained in cand. The surface forms are Forms of type Form.String with their field string set to the elements in the array cand.

Method Gen.randomCandidate(Form uf, double rnd) returns a candidate whose underlying form is uf and whose surface form is Form(cand[rnd*l]) (where l is the length of the array cand). If rnd == 1, the last element of the array is used. If rnd < 0, then the first element of the array is used.

Method Gen.firstCandidate(Form uf) returns Candidate(uf, new Form(can[0])).

Method nextCandidate(Candidate c) first searches for c.sf.string in the array cand. If located, then returns the candidate constructed from the subsequent element of that array. If the array contains this string several times, then the last instance is used. If the string is not found, then the surface form of the returned candidate is MapForm.NotInRange. If the string is the last one in the array (thus, there is no subsequent string), then the surface form of the returned candidate is MapForm.NoMoreForm.

Finally, method Gen.allCandidates(Form uf) returns a vector of candidates created by constructing Candidate(uf, sf) for all sf built using the strings in the array cand.

Parameters:
cand - Array of strings containing the surface forms.
Returns:
A Gen containing the surface forms created from the elements of the array in the argument.
See Also:
Gen

listOfSurfaceForms

public static Gen listOfSurfaceForms(Form[] cand)

Return a Gen that maps any underlying form to the surface forms contained in cand.

Method Gen.randomCandidate(Form uf, double rnd) returns a candidate whose underlying form is uf and whose surface form is cand[rnd*l] (where l is the length of the array cand). If rnd == 1, the last element of the array is used. If rnd < 0, then the first element of the array is used.

Method Gen.firstCandidate(Form uf) returns Candidate(uf, can[0]).

Method nextCandidate(Candidate c) first searches for c.sf in the array cand. If located, then returns the candidate constructed from the subsequent element of that array. If the array contains this form several times, then the last instance is used. If the form is not found, then the surface form of the returned candidate is MapForm.NotInRange. If the form is the last one in the array (thus, there is no subsequent form), then the surface form of the returned candidate is MapForm.NoMoreForm.

Finally, method Gen.allCandidates(Form uf) returns a vector of candidates created by constructing Candidate(uf, sf) for all sf in the array cand.

Parameters:
cand - Array of surface forms.
Returns:
A Gen containing the surface forms created from the elements of the array in the argument.
See Also:
Gen

alphabetPower

public static Gen alphabetPower(java.lang.String[] alphabet,
                                int exp)

Provide Gen function that maps any underlying form to the set alphabet to the power exp. That is, to the set of the strings obtained by concatenating exp elements of the set alphabet (not necessarily different ones).

This Gen function maps any underlying form to the same surface forms. The first surface form is the concatenation of the string alphabet[0] exp times. This form, together with uf becomes the output of the firstCandidate(uf) method.

The nextCandidate(cand) method returns a candidate whose underlying form is cand.uf (obviously), and whose surface form is obtained by replacing the last substring of cand.sf: if the latter ends with alphabet[i], then it is replaced by alphabet[i+1]. Yet, if i+1 == alphabet.length, then it is replaced by alphabet[0], while the previous substring is "increased"; and so on.

The allCandidates(uf) method lists all the candidates in this order.

The randomCandidate(uf, rnd) method returns all the candidates with equal probability, provided that rnd is equally distributed. The higher the rnd, the later the returned candidate appears in allCandidates(uf).

The simplest, and hopefully, most wide-spread use of this Gen function family will be based on alphabet[] consisting of single character strings. That is, alphabet[] will really be atomic "letters". However, the method also allows for "alphabets" containing complex substrings. Sometimes then a string may have several parses. If alphabet = {"a", "b", "ba", "ab"}, then the string "bab" can be parsed either as ba+b or as b+ab, if exp == 2. Remember that the nextCandidate() method uses the first parse of a string that it finds, and so it becomes impossible to iterate over Gen using this method. Moreover, this string will occur twice in the vector returned by allCanidates(), and have a double probability to be returned by the randomCandidate() method.

Parameter exp most be non-negative, alphabet must be of positive length, and alphabet must not contain the empty string; otherwise, exceptions are thrown.

Parameters:
alphabet - "Set" of strings (ideally, of "letters") over which the language is being generated.
exp - Exponent, how many elements of the alphabet are concatenated to get an element of the language.
Returns:
An instance of Gen, as explained above.

alphabetStar

public static Gen alphabetStar(java.lang.String[] alphabet)

Provide Gen function that maps any underlying form to the set alphabet* (Kleene star). That is, to the set of the strings obtained by concatenating any number of elements from the set alphabet (not necessarily different ones, and also including zero concatenations resulting in the empty string).

This Gen function maps any underlying form to the same surface forms. The first surface form is the empty string (corresponding to zero concatenations from the set alphabet). Together with uf, they become the output of the firstCandidate(uf) method.

The nextCandidate(cand) method returns a candidate whose underlying form is cand.uf (obviously), and whose surface form is obtained by replacing the last substring of cand.sf: if the latter ends with alphabet[i], then it is replaced by alphabet[i+1]. Yet, if i+1 == alphabet.length, then it is replaced by alphabet[0], while the previous substring is "increased"; and so on. If cand.sf was the concatenation of the last element in alphabet n times, then the surface form in the next candidate is the concatenation of alphabet[0] n+1 times.

The allCandidates(uf) method returns a vector with a single element, reminding you that Gen maps to an infinite set. Instead, you can iterate over the first n elements of the set using the previous two methods (with a caveat, see below!).

The randomCandidate(uf, rnd) method returns a candidate of length n with probability pow(2, -(n+1)). In other words, if rnd < 0.5, you get the candidate with the empty string. If 0.5 <= rnd < 0.75, you receive a candidate whose surface form is a string from alphabet. If 0.75 <= rnd < 0.875, then the surface form is the concatenation of two strings from alphabet. And so forth. Within such an interval, the surface forms of the same length have equal probability (provided that rnd is distributed equally). The surface forms produced while rnd is gradually increased follows the order described regarding the nextCandidate() method.

The simplest, and hopefully, most wide-spread use of this Gen function family will be based on alphabet[] consisting of single character strings. That is, alphabet[] will really be atomic "letters". However, the method also allows for "alphabets" containing complex substrings. Sometimes then a string may have several parses. If alphabet = {"a", "b", "ba", "ab"}, then the string "bab" can be parsed either as ba+b or as b+ab, or as b+a+b. Note that the nextCandidate() method uses the first parse of a string that it finds, and so it becomes impossible to iterate over Gen using this method: the second time "bab" is produced, the iteration falls back to the first parse. Similarly, candidates with ambiguous parses have increased chance to be produced by the randomCandidate(uf, rnd) method.

Parameter alphabet must be of positive length, and alphabet must not contain the empty string; otherwise, exceptions are thrown.

Parameters:
alphabet - "Set" of strings (ideally, of "letters") over which the language is being generated.
Returns:
An instance of Gen, as explained above.

composition

public static final Gen composition(Gen gen1,
                                    Gen gen2)
Composition of two "Gens". In fact, that is the composition of two MapForms: the first one maps the underlying form unto some intermediate form, whereas the second one maps this intermediate form unto the surface form.

Parameters:
gen1 - First map to be applied.
gen2 - Second map to be applied.
Returns:
A Gen that is the composition of the two gens appearing in the arguments.
See Also:
MapFormExamples.composition(MapForm, MapForm).