Practicum - week 8


Write a shell script that creates a list of frequency of N-grams based on a text document. The name of the document is given as the argument of your script. Don't do it in the way Canver & Trenkle do (i.e. by inserting additional spaces before and after each word). Just move a window that is N-character-long along your text, and calculate the frequencies. E.g. the following sentence

This is the way to build N-grams

should produce the following bi-grams: _t, th, hi, is, s_, _i, is, s_, _t, th, he, e_, _w, wa,... And the following tri-grams: __t, _th, thi, his, is_, s_i, _is, is_, s_t, _th, etc.

Look at previous lecture notes and previous exercises for some tips how to build N-grams on the character-level.

Now, I would like you to get a list of these N-grams, combining N = 1, 2, 3, and 4, in a decreasing order of frequency. (In a next step this list could serve as the profile used by the algorithm of Canver & Trenkle.)

To check if you have done a good job, my result are for fed1.txt (you can try it yourself...):

birot@hagen:~> ngrams ~/Federalist/fed1.txt
   5342 _
   3576 __
   3417 ___
   2936 ____
    946 e
    752 t
    618 o
    527 i
    522 n
    506 a
    445 s
    416 r
    363 h
    302 e_
    297 _t
    278 l
    246 d
    229 th
    218 c
    214 _th
    209 u
    203 f
    191 _a
    175 he
    169 m
    165 p
    164 _o
    156 the
    146 _the
    142 s_
    135 y
    133 t_
    133 er
    129 n_
    125 on
    122 he_
    118 the_
    117 b
    113 w
    110 _of
    108 g
    106 d_
    106 _i 105 of
    105 in
    105 ,_
    105 ,
    103 an
    102 f_
    100 y_
     99 re
     98 of_
     98 _of_
     97 ti
     95 en
     90 _w
     89 v
     88 es
     86 at
     83 r_
     83 o_
     79 T
     75 io
     74 te
     73 it
etc... (Remark: if you do it for a serious research, it would be helpful to transform all series of more than one space into a single space.)

(3 points)


Create a model for a desk calculator, by writing a Perl program. It should read one argument, then the arithmatic operator (+ , - , * or /), then the second argument, and return the result. Then get back to the beginning, and wait for a new argument. This should go as long as one enters an argument that makes the program halt (e.g. stop, exit, quit, halt, bye). When you have to divide, check beforehand if the second argument is not 0 (in the case of an error you can use the bell of the computer!).

Look at the file: /users1/birot/calcul for an example of how my program runs. Or come to me in the practicum time to let you try out my program.

(4 points)


Write a program in Perl that is a first approach of implementing the Master Mind game. It should do the followings:

- You are asked a number to be found out (colours are replaced with digits 0 to 9; e.g. 4321 would stand for blue-red-green-yellow, if you had colours).

- Then you are asked to guess a number (suppose the other doesn't see the previously given number).

- For each guess, the number of good digits are given: this is the number of digits that are the same in the guess and in the first given number (to be found out).

- If the guess is exactly the same as the first given number, then the game stops.

1. Use a scalar variable (a string) for both the first given number and the guesses.

2. Don't worry about the number of digits (suppose that both players know the number of digits in the number to be found out).

3. Suppose each digit occurs only once in both the number to be found out and in the guesses. (So 3692 is okey, but don't worry about cases like 3355.)

4. Use regular expressions to check if a given digit occurs in a string.

5. Don't worry about telling how many colours (digits) are in the correct place.

If you want to, you can go on, and write a real Master Mind program....

Look at the file: /users1/birot/mastermind for an example of how my program runs. Or come to me in the practicum time to let you try out my program.

(3 points)