Practicum - week 14

Please, in your mails, mention your regular e-mail address to which I can send my corrections.

You will find some of the scripts we had on Wednesday under: /users1/birot/Examples-week14 . Read the README file.

It seems that the $PATH variable on Hagen (or on Hilde{01..14}) does not include the '.' (that is your working directory). Check it by typing 'echo $PATH'. This means that even if you have made your shell script executable (chmod +x), you cannot run it simply by typing the name of the script, because the shell cannot locate it. So you need to launch all your scripts by giving the path, e.g.: ./myscript (or you type: bash myscript; in this case chmod is not necessary).

Good luck for the last set of assignments! Enjoy!

The following is the man page of the command "loveletter"

NAME: loveletter - creates a love letter, and prints it on the standard output

SYNOPSIS: loveletter [addressed] [sender] [FILE]

DESCRIPTION: creates a love letter, beginning with the phrase "Dear [addressed],", followed by some standard message, and ending with the name (signature) of [sender]. If [FILE] is also specified, then the content of [FILE] is read, and inserted somewhere into the body of the message.

For example, the command line:

loveletter Judy Joe message-to-Judy

will print something like this:

Dear Judy,

I love you so much! I miss you so much! I can't live without you!

[the following lines are read from the file called 'message-to-Judy':] Yesterday, when we were in the cinema, and I could hold your hand, you know, it was the most beautiful moment of my life. I would like to meet you again. [end of the file 'message-to-Judy']

I hope to see you again, very soon!

Love,

Joe

Can you write this command as a shell script?

(3 points)

Please read the section on N-grams in the web site of week 14. For instance, if you want to create the tri-grams of the words in the previous sentence, you get "Please + read + the", "read + the + section", and "the + section + on", and so on. If you imagine that you put empty words before the first word, you can also consider "___ + ___ + Please" and "___ + Please + read".

How to create the bi-grams of the words occurring in a given file? First, you need to put each word in a new line:

tr " " '\012' < file > file.words

Then, you create anothe file, which is exactly the same, but we add an empty first line:

(echo ; cat file.words) | cat > file.words.shifted

Finally, you can paste the two columns of words together. Because you have shifted one of them, you receive a list of bi-grams:

paste file.words.shifted file.words > file.bigrams

Can you write a script that receives a file name as its argument, and creates the list of the tri-grams, out of this input file?

Note: if your script creates some intermediate files, you can remove them at the end of the script. It is not necessary, but nice. Also, you probably don't want to remove them before you are sure that your script works, because these files can help you to locate possible problems.

(3 points)

Can you use this shell script in a pipe-line (command | command | your-shell-script | command | command...), in order to extract all the trigrams that form an USA acrosticon? An acrosticon is when the first letter of the first word, the first letter of the second word and the first letter of the third word form something meaningful. In our case, the first word in the tri-gram starts with a U, the second word starts with an S, and the third one with an A.

(1 point)

I would like to know which are the most frequent 20 trigrams on the level of the characters in fed12.txt. For instance, I guess that the trigram 't+h+e' is very frequent (cf. words like: the, these, there, they, them,...), whereas trigrams like 'z+q+g' or 'a+u+e' or not so frequently met.

In other words, you need to create the trigrams on the level of the characters, and not on the level of the words, this time. A tip: why not putting each character into a new line now? Furthermore, once that you have the tri-grams, you need to create a frequency list. Remember the way you create a word frequency list, it can help you (cf. sort, uniq -c,...).

Once you have the frequency of the trigrams, you can sort them according to diminishing frequency. You may want to use the option of 'sort' which will know that 15 > 2 > 1. Otherwise, 'sort' will sort like if it was an alphabetical list: 1 > 15 > 2, the same way, as alphabetically, e.g. A > AN > I.

(3 points)