Practicum - week 12


1.

Give the command line that will give the following permissions to some file:

(You can check if your solution is good by looking at the information contained in the long list of that file.)

(1 point)



2.

Let's build the vocabulary of the Federalist papers:

With one (long) command line (a pipe-line), create a file that contains all the words of all the Federalist papers, in an alphabetical order, each of them only once. Save the file under the name wordlist.

However, you need some "preprocessing" before you create the word list. First, you don't want numbers appearing in your list. Second, strings like "abolition", "ABOLITION", "ABOLITION." and "abolition," represent the same word, so you may want to change all upper case characters to lower case characters, and to delete punctuation marks (full stop, comma, quotation marks, etc.). What else?

(2 points)



3.

Let's look for palindrome words in the Federalist papers, using the word list that you have just created.

A palindrome is a word (or a phrase) that reads the same backwards as forwards, such as madam or eye. (A palindrom phrase is nurses run, but we are not concerned with them in this assignment.)

A way to look for palindrome in your word list of the Federalist papers, is the following. First, create a list that contains both the original list of the words, as well as the mirror image of the words (e.g. both `table' and 'elbat'). In your original list, each word occurs exactly once. If you sort the combined list alphabetically, a word and its mirror image will follow each other. Thus, using `uniq -c', you will obtain a list in which a `2' preceeds the palindrom words, and a `1' preceeds the other words (remember what `uniq' and `uniq -c' do!). Lastly, use grep to collect all palindrom words.

In fact, you will be returned not only palindrom words, but also words whose mirror image is another word occurring in the Federalist papers, such as the mirror pair live and evil, or time and emit. This is also something worth finding. Furthermore, a one-letter word is always a palindrom.

Your task now is to realize all this story within one command line. (You do not have to create the word list once again, but you can use the file that you built for the assignment 1.) You may want to use parentheses -- the characters '(' and ')' -- and semi-colons (;).

(2 points)



4.

Let us collect all sentences in fed21.txt that start with the word "The" (but not with "Therefore", "Then", "They", "Them", "There", etc.).

You may want to use the trick that Lonneke showed you in the class: because you can search for lines and not for sentences with Unix, it is handy first to transform your file so that one line corresponds to one sentence.

Can you send me one command line that returns these sentences (or, if you prefer: the number of such sentences)?

A remark, if you happen to be returned no such sentences: don't forget the space after the punctuation marks (except the beginning of a paragraph).

Another remark: suppose that each full stop (.), question mark and exclamation mark refers to the end of a sentence (don't care about abreviations, etc.).

A last remark:

You can be confused if the output of a command contains too long lines. Some programs visualizing the content of a file will break the line automatically, and therefore you don't know if the left edge of the screen is indeed the beginning of a line in the file (i.e. it follows a "new line character"), or it is just the continuation of the previous line. What can be useful (while you are experimenting) is to redirect the output of the command line into a file, and then look at that file with XEmacs or pico.

The lines in pico really correspond to the lines in the file. If a file is too long, then there is a '$' sign at the right end of the screen, and you can read the rest of the line by moving the cursor beyond the '$' sign. (Or you position your cursor at the beginning of the next line, and then you move your cursor left.) Be careful: if you insert a character in pico, then pico will automatically break the lines (that is, inserting an end-of-line character, a '\n', i.e. a '\012').

XEmacs will show you the whole line by "breaking it visually": no end-of-line character is inserted, but the continuation of the line appears bellow. You can recognize, nevertheless, that this is not a new line by the small curved arrow at the right end of the screen.

Try typing a too long line both in XEmacs and pico. Enlarge the window, and try again...

(2 points)



5.

You remember that the long list contains information about the owner of a file, its permissions and its file type (if it is a symbolic link, a regular file or a directory). You also remember that you can create a T-junction in a pipe line with the command tee.

Give one (long) command line, that performs all of the following tasks:

(3 points)