Tekstmanipulatie, week 14


1. expr and bc


In order to make easy calculations you can use the 'bc' command (bell's calculator). Type bc <RETURN> and you can immediately type in any expressions, like 3+4 or (45/3400)*100. In fact, similarly to the way we were writing short files by using 'cat', we are just using the fact that this command needs an input file, and if nothing else is specified, then it is the standard input. Therefore the program can be ended by ^d (CTRL + D: end-of-file). Or, alternatively, by ^c (CTRL + C: stop the running program).

Therefore why not doing things like:

echo 3+4 | bc
echo 23/46 | bc
Hey! Why is 23 / 46 = 0 ?! Because, if otherwise not specified, bc works with integers. Type ' scale = 4 ' to be able to receive your results with four decimals.
How to do this within one command line? You need an input file of two lines:
(echo scale = 4; echo 5/8) | bc
What does
echo 13 % 3 | bc
mean? The remainder of the division. And what is the problem with this one:
 echo (13/26)*4 | bc
Try rather the following, and remember what you know about the escape characters:
 echo \(13/26\)*4 | bc

What is the difference between echo and cat?

You can find the same dichotomy among the commands dealing with mathematical expressions: Examples for expr:
expr 3 + 4
7

expr 3+4
3+4

expr \( 3 + 4 \) \/ 4
1

expr 2 * 3
expr: syntax error

expr '-2' \* 3
-6

expr 13 \% 3
1

expr 8 = 8
1

expr 15 = 2
0

expr \( 8 = 8 \) \& \( 3 = 3 \)
1

expr '(' 8 = 8 ')' '|' '(' 3 = 4 + 5 ')'
1

Remarks: The numbers, parantheses and airthmetic symbols are different arguments, therefore you should separate them by a space (if you don't: see the second example). Some out of the arithmetic symbols are metacharacters, therefore they should be protected using quotes or the escape character ('\') (what is the reason of the error message in the fourth example?). Division is understood as division of integers, and % refers to the modulo of the division. The last four examples show how logical statements are evaluated: 0 stands for the logical value FALSE, while 1 stands for the logical value TRUE. The '&' symbol means AND, '|' means OR. Check man expr for further possibilities (e.g. what happens if you use these logical operations between numerals, and not between statements?).

The expr command, combined with back quotes (that is replaced by the shell with the output of the command line within the quotes) makes us an easier way to calculate type-token ratio or word-frequencies. How to calculate for instance the frequency of the word "the" in a given a a given file?


2. Variables



 

Unix can and does handle a high number of variables. You can get the list of these with the command called ' set '. In fact a useful way of using it is by pipelining it with grep, like:

set | grep a=
set | grep PATH=
The system itself has a high number of variables. They have always upper case names. Here are some of them:
SHELL : gives the path of the running shell
PATH : a set of paths that are checked (in this order) when you give a command (i.e. the name of a program), and the shell looks for it in the file system
HOME : the path of the home directory of the actual user (you)
MAIL: the path where your mails are located
PWD : the actual working directory
OLDPWD : the previous working directory (before the last cd command)
LOGNAME : your login name
HISFILE : the file where your 'history' is (the list of your previous commands, max. HISTSIZE / HISTFILESIZE number of them, and you can read them with the ' history ' command)
PS1, PS2: the settings of your primary and secondary promp
TERM: the type of your terminal
You can check their settings on your account.

The way you can give them a new value is the following:

PWD=Federalist
N.B.:  no space before and after the = symbol. (Try out what happens if you put one.)

Changing the PWD variable results in changing your prompt, but in fact does not change your directory. Change the other system variables only if you are sure of yourself, or there is a system administrator standing just behind you... (Not in practicum time, please...)

You can define new variables yourself, just by giving them values. It is important to remember that all variables in UNIX are strings. (Remember: methacharacters, quotes, escapes,...)

Refering to a variable (let it be a system variable or a variable you have just defined) is done by putting the $ symbol before the name of the variable: in this case the shell replaces the string $<var_name> by the value of the variable, in the shell's pre-processing phase. This happens within the double quotation marks (".."), but not within the simple quotation marks ('...').

Examples:

birot@hagen:~> pear=apple
birot@hagen:~> set | grep pear=
pear=apple
birot@hagen:~> echo $pear
apple
birot@hagen:~> echo "$pear"tree
appletree
birot@hagen:~> echo '$pear'tree
$peartree
birot@hagen:~> echo $TERM
xterm
birot@hagen:~> echo '$TERM'
$TERM
Now it is logical that if you want to give the value of one variable to another variable, the way to do it is:
birot@hagen:~> bannana=$pear
birot@hagen:~> echo $bannana
apple


Remark: If you want to use a variable in one shell that you defined in another one (like in a running program), than you have to 'export' it. Consult any Unix book or ' man export' on how to do that.
 

3. Type-token ratio



4. Shell scripts



 

After having solved a number of assignments, you might want to save some of them so that you won't need the reinvent them each time you need them. You can save them in a file (that is what you do when sending the solution to Mariette), and just check that file each time before retyping the long chain of commands. But what not let the computer itself read this file, before executing it? To make the long story short, can we write programs using UNIX?

There are two arguments pointing toward this possibility:


Is Unix a programming language? It has been designed as an operating system, but it has so many possibilities that you can even write simple programs using it. What is a program?

All of these are possible within UNIX. We shall come back some of these later.

At the moment what we want is to put a sequence of commands into a file, and then just run it.

How to have a sequence of (complex) commands? If you want to simply combine a sequence of commands, pipes, etc., just write them into new lines, or separate them with a semi-collumn (;).

For instance:

cat > a_simple_shell_script
echo Now I will list the subdirectories of the directories whose name contains exactly 4 characters.
ls -l ???? | grep ^d
echo Thank you for your waiting.
echo What about an alphabetical order of these?
ls -l ???? | grep ^d | sort
echo Here you have it.
^d
Now, we have a file named a_simple_shell_script that contains six lines. What can we do with this? We want to run it. Let's type the file name after the promt, type enter, and... we get an error message:
bash: a_simple_shell_script: command not found
What is wrong? Let's type './a_simple_shell_script', in some systems this is the way you can run the programs that are within your own directory. Did it help? No, you get the same error message. Because the machine doesn't know that this file has been written to make it run (and not only a text-file, that can be, e.g. sent to Mariette as the solution of your assignment). What to do? There are two steps:


When you have a file that you want to use pretty often, it might be complicated to give always the entire path. Why not to make it into a "real" command? There is a system variable (we will speak about them later) that give you a set of paths: when you type the name of a program to be run, without determining the exact (absolut or relative) path, the Shell will look for the directories given in this variable. You can add additional paths to this variable by typing:

PATH=$PATH:$HOME/shellscripts
The meaning of this is the following: the new value of the variable PATH should be its actual value, followed by a column (separating the differents paths within the variable), and then you can give the new path to be added. Suppose it is a directory called shellscripts within your own home directory. You can save typing the exact path of your home directory by refering to this other system variable.

You might want to add arguments to your shell scripts, similarly to the arguments of the standard Unix commands. The way to do this is by reffering to them within your shell script as $1, $2, etc. These will refer respectively to the first, second, etc. argument given aftern the script's name. The arguments will be separated by a space. (Unless the space is neutralized by an escape character or a quote.)

Thus a shell script containing

ls -l $1 | grep $2
will look for the second argument as a regular expression within the long list of directory given by the first argument.

$* refers to all arguments, $# gives the number of arguments, while $0 gives the zeroth argument, which is the file's name itself.
 
 
 

N-gram-based text categorization



Bíró Tamás:
e-mail
English web site
Magyar honlap

Last modified: Thu Jul 3 11:39:17 METDST 2003