|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object hu.birot.OTKit.otBuildingBlocks.GenExamples
public class GenExamples
Contains a number of static methods that help you to create quickly simple instances of Gen.
Constructor Summary | |
---|---|
GenExamples()
|
Method Summary | |
---|---|
static Gen |
alphabetPower(java.lang.String[] alphabet,
int exp)
Provide Gen function that maps any underlying form to the set alphabet
to the power exp . |
static Gen |
alphabetStar(java.lang.String[] alphabet)
Provide Gen function that maps any underlying form to the set alphabet*
(Kleene star). |
static Gen |
composition(Gen gen1,
Gen gen2)
Composition of two "Gens". |
static Gen |
listOfSurfaceForms(Form[] cand)
Return a Gen that maps any underlying form to the surface forms contained in cand . |
static Gen |
listOfSurfaceForms(java.lang.String[] cand)
Return a Gen that maps any underlying form to the surface forms contained in cand . |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public GenExamples()
Method Detail |
---|
public static Gen listOfSurfaceForms(java.lang.String[] cand)
Return a Gen that maps any underlying form to the surface forms contained in
cand
. The surface forms are Forms of type Form.String
with
their field string
set to the elements in the array cand
.
Method Gen.randomCandidate(Form uf, double rnd)
returns
a candidate whose underlying form is uf
and whose surface form is
Form(cand[rnd*l])
(where l
is the length of the array
cand
). If rnd == 1
, the last element of the array is used.
If rnd < 0
, then the first element of the array is used.
Method Gen.firstCandidate(Form uf)
returns
Candidate(uf, new Form(can[0]))
.
Method nextCandidate(Candidate c)
first searches for c.sf.string
in the array cand
. If located, then returns the candidate constructed from
the subsequent element of that array. If the array contains this string several
times, then the last instance is used. If the string is not found, then the surface
form of the returned candidate is MapForm.NotInRange
. If the string is the
last one in the array (thus, there is no subsequent string), then the surface
form of the returned candidate is MapForm.NoMoreForm
.
Finally, method Gen.allCandidates(Form uf)
returns a vector of
candidates created by constructing Candidate(uf, sf)
for all sf
built using the strings in the array cand
.
cand
- Array of strings containing the surface forms.
Gen
public static Gen listOfSurfaceForms(Form[] cand)
Return a Gen that maps any underlying form to the surface forms contained in
cand
.
Method Gen.randomCandidate(Form uf, double rnd)
returns
a candidate whose underlying form is uf
and whose surface form is
cand[rnd*l]
(where l
is the length of the array
cand
). If rnd == 1
, the last element of the array is used.
If rnd < 0
, then the first element of the array is used.
Method Gen.firstCandidate(Form uf)
returns
Candidate(uf, can[0])
.
Method nextCandidate(Candidate c)
first searches for c.sf
in the array cand
. If located, then returns the candidate constructed from
the subsequent element of that array. If the array contains this form several
times, then the last instance is used. If the form is not found, then the surface
form of the returned candidate is MapForm.NotInRange
. If the form is the
last one in the array (thus, there is no subsequent form), then the surface
form of the returned candidate is MapForm.NoMoreForm
.
Finally, method Gen.allCandidates(Form uf)
returns a vector of
candidates created by constructing Candidate(uf, sf)
for all sf
in the array cand
.
cand
- Array of surface forms.
Gen
public static Gen alphabetPower(java.lang.String[] alphabet, int exp)
Provide Gen function that maps any underlying form to the set alphabet
to the power exp
. That is, to the set of the strings obtained by concatenating
exp
elements of the set alphabet
(not necessarily different
ones).
This Gen function maps any underlying form to the same surface forms. The first
surface form is the concatenation of the string alphabet[0]
exp
times. This form, together with uf
becomes the output of the
firstCandidate(uf)
method.
The nextCandidate(cand)
method returns a candidate whose underlying
form is cand.uf
(obviously), and whose surface form is obtained by
replacing the last substring of cand.sf
: if the latter ends with
alphabet[i]
, then it is replaced by alphabet[i+1]
.
Yet, if i+1 == alphabet.length
, then it is replaced by
alphabet[0]
, while the previous substring is "increased"; and so on.
The allCandidates(uf)
method lists all the candidates in this order.
The randomCandidate(uf, rnd)
method returns all the candidates with
equal probability, provided that rnd
is equally distributed. The higher
the rnd
, the later the returned candidate appears in
allCandidates(uf)
.
The simplest, and hopefully, most wide-spread use of this Gen function family will be
based on alphabet[]
consisting of single character strings. That is,
alphabet[]
will really be atomic "letters". However, the method also allows for
"alphabets" containing complex substrings. Sometimes then a string may have several parses.
If alphabet = {"a", "b", "ba", "ab"}
, then the string "bab"
can
be parsed either as ba+b
or as b+ab
, if exp == 2
.
Remember that the nextCandidate()
method uses the first parse of a string
that it finds, and so it becomes impossible to iterate over Gen using this method.
Moreover, this string will occur twice in the vector returned by allCanidates()
,
and have a double probability to be returned by the randomCandidate()
method.
Parameter exp
most be non-negative, alphabet
must be
of positive length, and alphabet
must not contain the
empty string; otherwise, exceptions are thrown.
alphabet
- "Set" of strings (ideally, of "letters") over which the language is being generated.exp
- Exponent, how many elements of the alphabet are concatenated to get an element of the language.
public static Gen alphabetStar(java.lang.String[] alphabet)
Provide Gen function that maps any underlying form to the set alphabet*
(Kleene star). That is, to the set of the strings obtained by concatenating
any number of elements from the set alphabet
(not necessarily different
ones, and also including zero concatenations resulting in the empty string).
This Gen function maps any underlying form to the same surface forms. The first
surface form is the empty string (corresponding to zero concatenations from the
set alphabet
). Together with uf
, they become the output of the
firstCandidate(uf)
method.
The nextCandidate(cand)
method returns a candidate whose underlying
form is cand.uf
(obviously), and whose surface form is obtained by
replacing the last substring of cand.sf
: if the latter ends with
alphabet[i]
, then it is replaced by alphabet[i+1]
.
Yet, if i+1 == alphabet.length
, then it is replaced by
alphabet[0]
, while the previous substring is "increased"; and so on.
If cand.sf
was the concatenation of the last element in
alphabet
n times, then the surface form in the next candidate is
the concatenation of alphabet[0]
n+1 times.
The allCandidates(uf)
method returns a vector with a single element,
reminding you that Gen maps to an infinite set. Instead, you can iterate over
the first n elements of the set using the previous two methods (with a caveat,
see below!).
The randomCandidate(uf, rnd)
method returns a candidate of length n
with probability pow(2, -(n+1))
. In other words, if rnd < 0.5
,
you get the candidate with the empty string. If 0.5 <= rnd < 0.75
,
you receive a candidate whose surface form is a string from alphabet
.
If 0.75 <= rnd < 0.875
, then the surface form is the concatenation
of two strings from alphabet
. And so forth. Within such an interval,
the surface forms of the same length have equal probability (provided that rnd
is distributed equally). The surface forms produced while rnd
is gradually
increased follows the order described regarding the nextCandidate()
method.
The simplest, and hopefully, most wide-spread use of this Gen function family will be
based on alphabet[]
consisting of single character strings. That is,
alphabet[]
will really be atomic "letters". However, the method also allows for
"alphabets" containing complex substrings. Sometimes then a string may have several parses.
If alphabet = {"a", "b", "ba", "ab"}
, then the string "bab"
can
be parsed either as ba+b
or as b+ab
, or as b+a+b
.
Note that the nextCandidate()
method uses the first parse of a string
that it finds, and so it becomes impossible to iterate over Gen using this method:
the second time "bab"
is produced, the iteration falls back to the first
parse. Similarly, candidates with ambiguous parses have increased chance to be
produced by the randomCandidate(uf, rnd)
method.
Parameter alphabet
must be
of positive length, and alphabet
must not contain the
empty string; otherwise, exceptions are thrown.
alphabet
- "Set" of strings (ideally, of "letters") over which the language is being generated.
public static final Gen composition(Gen gen1, Gen gen2)
gen1
- First map to be applied.gen2
- Second map to be applied.
MapFormExamples.composition(MapForm, MapForm).
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |