Tekstmanipulatie, week 7

This week we are enlarging our techniques for using Unix, especially for writing shell scripts.

1. Variables, 'set'

Unix can and does handle a high number of variables. You can get the list of these with the command called ' set '. In fact a useful way of using it is by pipelining it with grep, like:

set | grep a=
set | grep PATH=

The system itself has a high number of variables. They have always upper case names. Here are some of them:

SHELL : gives the path of the running shell
PATH : a set of paths that are checked (in this order) when you give a command (i.e. the name of a program), and the shell looks for it in the file system
HOME : the path of the home directory of the actual user (you)
MAIL: the path where your mails are located
PWD : the actual working directory
OLDPWD : the previous working directory (before the last cd command)
LOGNAME : your login name
HISFILE : the file where your 'history' is (the list of your previous commands, max. HISTSIZE / HISTFILESIZE number of them, and you can read them with the ' history ' command)
PS1, PS2: the settings of your primary and secondary promp
TERM: the type of your terminal

You can check their settings on your account.

The way you can give them a new value is the following:

PWD=Federalist

N.B.: no space before and after the = symbol. (Try out what happens if you put one.)

Changing the PWD variable results in changing your prompt, but in fact does not change your directory. Change the other system variables only if you are sure of yourself, or there is a system administrator standing just behind you... (Not in practicum time, please...)

You can define new variables yourself, just by giving them values. It is important to remember that all variables in UNIX are strings. (Remember: methacharacters, quotes, escapes,...)

Refering to a variable (let it be a system variable or a variable you have just defined) is done by putting the $ symbol before the name of the variable: in this case the shell replaces the string $<var_name> by the value of the variable, in the shell's pre-processing phase. This happens within the double quotation marks (".."), but not within the simple quotation marks ('...').

Examples:

birot@hagen:~> pear=apple
birot@hagen:~> set | grep pear=
pear=apple
birot@hagen:~> echo $pear
apple
birot@hagen:~> echo "$pear"tree
appletree
birot@hagen:~> echo '$pear'tree
$peartree
birot@hagen:~> echo $TERM
xterm
birot@hagen:~> echo '$TERM'
$TERM

Now it is logical that if you want to give the value of one variable to another variable, the way to do it is:

birot@hagen:~> bannana=$pear
birot@hagen:~> echo $bannana
apple

Remark: If you want to use a variable in one shell that you defined in another one (like in a running program), than you have to 'export' it. Consult any Unix book or ' man export' on how to do that.

2. Expressions

It has been mentioned (week 5) that the arguments of shell scripts (the so-called 'positional parameters') can be reffered to as: $1 for the first parameter, $2 for the second one, etc. (until $9).

$0 refers to the name of the program itself. (E.g. some commands, like cp, ln and mv, are in fact one program, linked to several names. Using $0, one can check under which name the program has been actually called, and this can influance what the program will actually do.)

$# is replaced by the number of parameters, while $* gets replaced by all the parameters.

The variables are always treated as strings. How can I use numerical variables then? The command expr has been introduced earlier. How to use it? With back quotations. Examples:

birot@hagen:~> a=3
birot@hagen:~> b=4
birot@hagen:~> c=`expr $a \* $a + $b \* $b`
birot@hagen:~> echo $c
25

N.B.: Don't forget escapes before the arithmetic symbols of expr.

3. Conditions

There are two way of branching your program in a shell script.

The first one is 'case'. Its syntax is:

case <selector> in
   <value1> ) <commands1> ;;
   <value2> ) <commands2> ;;
   <value3> ) <commands3> ;;
...
   <valueN> ) <commandsN> ;;
esac

Notice the ) paranthesis after the values, as well as the double ;; semicolons after the commands.

The way this works is the following: you take the value of <selector> (e.g. an argument or a positional parameter / argument of your script) and compares it with <value1>. (The values can include the same wildcards as seen with ls, etc.) If there is a match then <commands1> are executed, and continue with the line after 'esac' (= 'case' reversed). Otherwise we try to match the selector with <value2>: if there is match then <commands2> are executed, and continue with the line after 'esac', otherwise we go forward to <value3>, etc. If there is no match even with the last line, then we go forward. Notice that only the first match is executed, and not all of them! An example:

birot@hagen:~> cat branch
case $1 in
al?a) echo 'al?a';;
????) echo '????';;
[ab]*) echo '[ab]*';;
*) echo else;;
esac
birot@hagen:~> branch alfa
al?a
birot@hagen:~> branch beta
????
birot@hagen:~> branch apple
[ab]*
birot@hagen:~> branch everything else
else

Notice the way that we have constructed the last branching value: it is an "anything else" branch.

The other way to insert conditional branchings into your shell script is by using the if command. Its syntax is:

if <commands1> ; then <commands2> ; else <commands3> ; fi

How does this work? How comes that we have commands even after 'if', where one would expect conditions? The answer is that the execution of each command line has three 'results' (see also week 3): the output (in the -- redirected -- standard output, that is usually the real set of "results"), the "error" (not necessarily real errors, but another type of output text, used usually for messages related to the way the program runs), as well as the return value. For most commands a return value of 0 means that the program ran without any problems, and other values are characteristical to different sorts of errors, or other special cases (e.g. the return value of grep is 0 if the pattern was found in the given file, 1 if the pattern was not found, a case that is not a real error, and the return value is 2 if there were syntax errors in the pattern or if the input file was inaccessible, etc.).

If there were a pipeline or a series of commands (devided by ; for instance) then the return value of the last command is the return value to be taken into consideration. It is also important to remember that these are real commands that are really running: they can change your file system and write on the screen (to avoid it, you can redirect the input to a trash-file or to /dev/null, see the notes on the buttom of this page).

Let's come back to 'if'. Now we'll get a little bit confused. If the return value is 0 (i.e. the command line didn't encounter any trouble) then <commands2> are executed. Otherwise <commands3> are executed. In other words: 0 is considered "true", and other values are considered "false" in the condition, unlike in any other cases (e.g. the way expr deals with logical values). But to get back to normal logic, the return value of expr is 0 if its output (the value of the expression to be evaluated) is non-0, and the return value is 1 if the output is 0. (A return value of 2 means some syntactic error has occured.)

Therefore you can use the following construction:

if expr $1 = apple > /dev/null; then echo Goed; else echo Niet goed; fi

If the first argument of your script is 'apple' then you will get the text 'Goed' on your screen, otherwise the text 'Niet goed' will appear.

Notice the syntax: all commands are terminated by a semicolon (;), or alternatively by an end-of-line. So the commands 'then', 'else' and 'fi' either should be preceeded by a semicolon (;) or should appear in a new line.

The entire construction terminates by the command 'fi', that is the reverse of 'if'. It shows the end of the structure, and once either the 'then' branch or the 'else' branch has been executed (or nothing has been done if the else-branch was missing and the condition was false), we go to the command appearing after ' fi '.

If the 'else'-branch starts with another 'if' then 'else if' can be abriviated as 'elif'.

A last example:

birot@hagen:~> cat example-if
#!/bin/bash
if echo $*; echo These are your arguments; expr $1 + $2 = 2; then echo You are lucky; fi
echo Bye...
birot@hagen:~> example-if 1 1
1 1
These are your arguments
1
You are lucky
Bye...
birot@hagen:~> example-if 2 3
2 3
These are your arguments
0
Bye...
birot@hagen:~> example-if 3 -1
3 -1
These are your arguments
1
You are lucky
Bye...
birot@hagen:~> example-if 3 -1 2 tree house and so on...
3 -1 2 tree house and so on...
These are your arguments
1
You are lucky
Bye...

Notice that:

1. The else-branch is not compulsory, so if you have only one branch, try to formulate your condition in such a way that you would use just the then-branch.
2. The command line between 'if' and 'then' is always executed,...
3. ...including the outputting of the 'expr' command (0 stands for false, 1 for true). If you want to avoid this, use > /dev/null (everything is a file... even the nothing is a file...).
4. $* is replaced by all my arguments.

4. Cycles (loops)

There are three types of cycle in Unix, and all of them checks the condition before the core of the cycle. Here they are:

for <variable> in <list> ; do <command command ...> ; done

The variable takes all the values in the list, and the commands are executed for each value. Example:

birot@hagen:~> for a in hello 'how are you' fine thanks ; do echo $a; done
hello
how are you
fine
thanks

In a shell script the ; symbol can be replaced by just putting the next command in a new line.

The second type of cycle is:

while <command command...> ; do <command command ...> ; done

The third one is:

until <command command...> ; do <command command ...> ; done

The way they work is the following:

1. They check the return value of the first command line (the return value of the last command there), as seen in the case of "if".
2. If it is true for the while-cycle, or if it is false for the until-cycle, then the second command line is executed (van 'do' tot 'done'), and then we get back to the first commands (the conditions)
3. if the condition was false for the while-cycle, or true for the until-cycle, then the second command line is not executed any more, we continue with the command after 'done'.

The condition is called "cycle condition" for the while-cycle (the core of the cycle gets executed while / as long as the cycle condition is true). The condition is called "exit condition" for the until-cycle, because we leave the cycle when it is true (the core of the cycle gets executed as long as the condition is false).

An example:

birot@hagen:~> a=1; while expr $a \< 10 > /dev/null; do echo $a ; a=`expr $a + 1`; done
1
2
3
4
5
6
7
8
9

A useful tip: if you use expr (or something else), but you don't want to get the result on the screen because you just need the return value (e.g. it is a command in an if-condition or in the condition of a cycle in a shell script), you should re-direct its input. You can do that either into a file of yours (e.g. > trash), or to the file /dev/null. This latter reminds you the "terminal-files" (/dev/tty1,...), but is just a "black hole": whatever you put into it, would never appear.

E.g.:

while expr $fibo \< 10000 > /dev/null ; do ... ; done

Summary of the syntax:

It is always very important to pay attention to syntax, whatever programming language or operating system you are working with. Pay attention to the exact spelling of the commands, the order of the arguments, to compulsory semicolons or other symbols, to what is compulsory and what is optional, etc.

In the following <commands> means a list of one or more commands (or pipelines) separated either by a semicolon (;) or by a new-line character. Although it is possible to write the commands in one line, I propose to use the multi-line syntax, in order to make your script readable. (Otherwise it will become "write only"...)

case <selector> in

<value1> ) <commands> ;; // Notice the unusual closing bracket and the double semi-column!
<value2> ) <commands> ;;
...
<valueN> ) <commands> ;;
esac

if <commands> ; then <commands> ; [else <commands> ;] fi

if <commands>
then <commands>
[else <commands>] // The [..] brackets stand for optionality
fi

for <variables> in <list> ; do <commands> ; done

for <variables> in <list>
do <commands>
done

while <commands> ; do <commands> ; done

while <commands>
do <commands>
done

until <commands> ; do <commands> ; done

until <commands>
do <commands>
done