1. Some examples
Here is a program realizing the Master Mind game. It is more sophisticated
than the one you were supposed to write for the week 8 assignment, and
it uses arrays, instead of strings.
#!/usr/bin/perl -wYou can find this file, as well the example files from previous week on Hagen, under ' /users1/birot/Examples '. You can even run there these programs.$N = 5; # number of digits
$M = 5; # to be found out: 0...M-1# ------------------------------------ #
# Creating the numbers to be found out #srand; # initializing the random number generator
# this should be done in order to get different
# values when running your program again and againfor ($j = 0 ; $j < $N; $j++)
{
$prob[$j]= int(rand($M));
}# rand($M) returns a floating point (a real numer)
# random value between 0 and $M.
# int($X) return the integer part of $x# the numbers to be found out are stored in this array
# ----------------------------------- #
# Guesses #$bingo = 0; # 0 = false
system ("clear"); # clears the screemprint "@prob \n";
while (! $bingo)
{
print "What is your guess?\nEach character in a new line, followed by ^d\n";
@guess = <STDIN>; # reading the guess directly into an array@prob1 = @prob; # I will remove those that have a match in @guess
$white = ( $black = 0 );
# white : same colour at different place
# black : same colour, same placefor ($i = 0; $i < $N ; $i++) # check for black
{
if ($prob1[$i] == $guess[$i])
{
$black ++;
$prob1[$i] = ($guess[$i] = 999)--;
}}
for ($i = 0; $i < $N ; $i++) # check for white
{
for ($j = 0 ; $j < $ N ; $j++)
{
if ($prob1[$i] == $guess[$j])
{
$white ++;
$prob1[$i] = ($guess[$j] = 999)--;
$j = $N; # stop checking for this} # end of if
} # end of for (j)
} # end of for (i)print "Black: $black White: $white:";
$bingo = ($black == $N);
# $bingo becomes = 1 = true
# if and only if $black == $N (maximal)
} # end of while( ! $ bingo)
print "\n\n BINGO !!! \n\n ";
2. Some more regular expressions
(Maybe useful for somebody for the final assignment...)
Pre-defined classes:
\d = [0-9] (digits)A new meaning of the =~ operator: it alters the string on the left hand side, if the right hand side is a replacing expression.
\D = [^0-9] (non digits)
\w = [a-zA-Z0-9_] (word characters)
\W = [^a-zA-Z0-9_] (non word characters)
For example you can make changes on strings that are similar to what the tr and sed commands do in Unix:
$name =~ tr/A-Z/a-z/; # replace all upper case with the corresponding lower case letterMore multipliers (beside +, ? and *):$name =~ s/[aou]e/German/; # replace the first string 'ae', 'oe' or 'ue' with the string 'German'
$name =~ s/[\.,;]//g; # delete full periods, comas and semicolons. (replace with the empty string)
# 'g' stands for "global", that is not only the first one$name=~ s/\W.*//; # get rid of everything after the first word (starting with the first non alphabetic character)
/x{5,10}/ # 5, 6,...or 10 pieces of the character 'x'/[a-n]{3,}/ # 3 or more pieces of the characters [a...n], e.g. 'bcd' or 'kka'
/\w{0,4}/ # maximum 4 word characters.
/.{5}/ # exactly five
A useful trick to build N-grams:
#!/usr/bin/perl -w$N = 2; ## here you can set up what the value of N is
$string = <STDIN>;
chomp($string);for ($l = 1; $string =~ /.{$l}/ ; $l++) {};
$l--; # $l equals the length of the input string
print "$l"; # check it, if it is really true...
for ($k = 0; $k < $l-$N+1 ; $k++)
{
$chunk = $string;$chunk =~ s/^.{$k}//; # cut the first $k characters
$last = $l-$k-$N;
$chunk =~ s/.{$last}$//; # cut the last characters ($last pieces)print "\n $chunk"; # this is the $k-th $N-gram of the input string
}print "\n";
3. Perl-programs in context
Why not putting a Perl-program in a pipe-line?
The result of the previous program put into a pipe-line is:
birot@hagen:~/> echo "My dear, I love you" | ngrams
19
My
y
d
de
ea
ar
r,
,
I
I
l
lo
ov
ve
e
y
yo
ou
Or what about:
birot@hagen:~/> echo "My dear, I love you" | ngrams > filenamebirot@hagen:~/> echo "My dear, I love you" | ngrams | sort > sorted_file
The same way as you can run a Perl program from the prompt line
simply by typing it its name, you can of course run a Perl program from
a Shell script, too.
4. Undefined structures
You will get an error message, if you try to refer to something that is not defined: for instance to a hash with a key for which it has not been previously defined. How to avoid this problem?
Examples: you have two hashes, each of them containing a phone book,
and you want to compare if they contain the same phone numbers for each
person:
foreach $name (sort keys (%phonebook1))If there is a name which is stored in %phonebook1, but not stored in %phonebook2, then you will have an error message?
# this a loop in which $name takes its values from a list
# of the keys of the hash %phonebook1, sorted alphabetically{
if ($phonebook1{$name} ne $phonebook2{$name})
{ print "There is a difference at $name" ;}
}
Another example: the ranks of n-grams for one file are given in one
hash, and the n-grams for another file are given in another hash, and you
want to calculate the sum of the differences of ranks for a given n-gram:
$sum += $file1{$ngram} - $file2{$ngram}. But suppose, one of the ngrams
does not occur in one of the files. What to do? First of all, you have
to decide what the program should have to do in that case. This is an
important rule: define first what your program should exactly do, before
you write the program! For instance, in such a case the program should
add a given pre-defined maximum value to $sum. Informally speaking:
if ($file2{$ngram} is defined) # This is not a Perl code obviously!
{
$sum += $file1{$ngram} - $file2{$ngram};
}
else
{
$sum += $maximum_value;
}
How to check if a hash is defined for a given key?
There are two options:
Either use a double foreach cycle: this is slow (because of the double cycle, including plenty of unnecessary tries), but sure and simple:
foreach $name1 (sort keys (%PHONEBOOK1))Or learn about the defined function. When a variable is not given any value (like a scalar variable or a hash at a certain key), then it is told to contain the undef value. It is seen as zero in an expression like $var++.
{
foreach $name2 (sort keys (%PHONEBOOK2))
{
if ($name1 eq $name2)
{
if ($PHONEBOOK1{$name1} ne $PHONEBOOK2{$name2})
{
print "Different numbers for $name1 \n";
} # end of inner if
} # end of outer if
} # end of inner foreach
} # end of outer foreach
Remark: That is the reason why you can just count the frequencies of n-grams by simply writing: $frequency{$ngram}++ . (Have a look at it among the example files showed in class on week 9: /users1/birot/Examples/oct30/p10 on Hagen.) The value of the hash at that given key becomes set to 1 if this was the first time of encountering that given n-gram (i.e. the value of the hash has been previously undef). Otherwise the value of the hash is incremented by one.In other cases (for example if a key-value pair of a hash is not defined) you will get error messages. To avoid that, use beforehand the "defined" function, that checks if the given expression returns the undef value or not. In the previous case the defined function returns 'false', in the latter case it returns 'true'. For instance, our previous example would be:
foreach $name1 (sort keys (%PHONEBOOK1))
{
if (defined($PHONEBOOK2{$name1}))
{
if ($PHONEBOOK1{$name1} ne $PHONEBOOK2{$name1})
{
print "Different numbers for $name1 \n";
} # end of inner if
} # end of outer if
} # end of foreach
For hashes containing the rank of n-grams in given files, it is
very similar. (But I don't want to give you the entire solution of the
final assignment...)
5. More about UNIX
A remark about the command line:
If your command line is too long, press the \ symbol at the end of the
first line. Then you will get the so-called "secondary prompt" (usually
'>'), and you can go on with typing in your command line. If it happens
to you to type the \ key by chance just before typing the Enter key (it
happens pretty often to me...), just press Enter again.
Processes:
The idea of UNIX is to have more processes running in the same time. For each person logged in, there is a copy of the shell running. If you start running a program, then this means that you start a new process, running parallel to the previous ones. Some programs start new "child processes" again.
If you put a & sign at the end of your command line, this means that you want the process to run "in the background". That is the shall that has launched that process is ready to receive new commands (going forward in a pipe-line or giving you the prompt back), before that process ends.
The ps command will print you all the running processes (run by you; if you encounter some troubles, use the -ef option for getting much more running processes).
If you want to stop a program, or "to kill a process" in the UNIX slang, then use:
- ^C (ctrl+C) if the process runs in the foregroundSuppose you want a process to run at a certain time, but not now. (Imagine you want to run a huge calculation, that needs a lot of resources, but it is not urgent. So you want to run it in the night when other people are not disturbed if your calculation slows down the machine.) Then use the ' at ' command.- ' kill -9 <pid> ' if you want to kill a process running in the background (where pid is the process identification number that you can learn from ps)
The ' nohup ' (= NOt Hanging
UP) command (combined with '&') is useful when you want your program
to run even after you have logged out.
What happens when you log in?
A lot of things... The interesting part is that the first file that
is run is the script /etc/profile,
then the .bash_profile file from
your home directory. The latter calls the file .bashrc,
and this one calls then /etc/bashrc.
These files will set your system variables, etc. It is worth looking at
them. If you feel selfconfident enough in Unix, you can alter the ones
being in your home directory (after having created a safety copy!!! so
that you can recover the original state at the end!!!), by entering some
lines. For example, enter an "echo Hi, how are you\?" line, log out and
then log in again. Look at the consequence of this line.
File structure above your home directory
I've mentioned on the first week that your home directory is a crucial point: bellow that (within it) you can do whatever you want, but nothing above it. Although different Unix systems are set up in a different way, there are still some standards that are used by most system managers.
The home directories of the different users are usually within a directory called /home or /users (or 'users1', 'user2', 'users3', etc.: if there is some reason for differentiating among them).
Executable files (e.g. programs that executes the standard Unix commands) are almost always to be found in /bin and /usr/bin. You can simply run most of the commands in the prompt line because these paths are set in your $PATH variable. Some people have a ~/bin directory within his or her home directory where (s)he puts his/her executable files.
/dev contains files concerning devices, in /etc you will find configuration files, /root contains the files of the root (the system administrator), /var contains variable files, /tmp contains temporary files, etc.
There are some funny files in /etc. For instance if the file /etc/nologin
exists, then you cannot login, but get a message on the screen (this is
the case if the system administrator wants to halt the system.) The passwords
are sometimes stored in /etc/passwd (don't worry, in an encrypted form).
6. FTP
The abbreviation ftp stands for File Transfer Protocol. Look at the remarks about different protocols in the lecture notes of week 5. This is the simplest way to transfer files from one machine to another one, supposing you can log in to the remote machine, as well. The ftp program creates an interface which allows you only to do the basic steps that you need to transfer files. (The interfaces created by other protocols let you do more: in the case of 'telnet' you can do on the further machine whatever you could do from a non graphic terminal; while in the case of web browsers, like lynx and others :-)), they will automatically present you the files transferred.)
You will need it if for instance a Windows fun friend of you will send you some funny attachments that you are not able to open on Linux. Something happening pretty often...
In the early and middle 90s, before the web became so popular, there used to be lot of "anonymous ftp servers". If you knew their addresses, you could just log in as a "guest", without any password, and download public files, programs, images, etc. that were made available by others. Nowadays people prefer putting these files on their web page.
Unless you have a fancy ftp program that shows you your local directory and your remote directory, and let you transfer files by clicking on a button, you have the following commands:
ftp machines.name.nl - connect to the given machine (you can give the name of the remote machine as an argument already when running ftp).disconnect - disconnect from the remote machine
bye - exit the ftp program (not 'exit', not 'quit', but 'bye'!)
bin - change the transfer protocol to binary (8 bits, instead of 7 bits; very important when transferring images, word files, etc.)
put - put a file from the local system to the remote system
get - get a file from the remote system to the local system
mput, mget - the same but allow you using standard UNIX wildcards
cd, pwd - change directory, print working directory on the remote system
lcd, lpwd - same on the local system
help - for more commands