Tekstmanipulatie, week 1
What is a computer in fact?
Computing (almost never used in humanities) vs. other tasks:
building and manipulating databases, texts, etc. This course will deal
with manipulating texts.
Examples: searches, statistics (number of words, sentences,
N-grams, etc.), concordances, systematic transformation of texts,... Machine
Is an iron (strijkijzer) a computer? And a calculator or
Main characteristics of a computer:
John v. Neumann - Herman Goldstine (1948): binary system
+ controled by a program
mechanical vs. electronical
analogue vs. digital
multi-level organization (hardware vs. software)
Several levels between the machine and the user:
Why using PC's at post offices or for typing simple letters? The advantages
of this kind of modularity are:
machine code, assembly, assembler
operating system: connection between the different parts of the hardware
and the user(s) / software(s)
user friendly applications
... therefore wider market to sell them, so they are cheaper.
Easier to program
Portability between different types of computers
Higher degree of flexibility (e.g. including new options)
Unix as an operating system. And also a culture, a way of thinking.
Starting at AT&T, in 1969... standards and plenty of variations (LINUX).
(Miles Osborne's Unix slides.)
The Unix philosophy:
Multi-users: one host - many terminals (nowdays: XWindows and virtual terminals)
Use the drive, rather than the memory
Shell (e.g. bash)
Standard functions, different buttons on the keyboard (^h = delete, ^l=rewrite
the screen, ^c=stop the command, ^d=eof,...)
Standard processes with commands (possibility to redefine them)
Logging in, login name, password, changing password ('passwd'), loging
out ('logout', 'exit', ^d)
Root privileges vs. 'regular users'.
A shell is really the envelope around the computer: it receives the commands
and executes them. How does it work?
Different types of shells [different kinds of prompts]: Korn shell (ksh)
[%], C-shell (Csh) [$], Bourne shell [$], Bourne Again shell (bash) [>]
Printing the "prompt" and waiting for a command (or a group of commands)
from the input, i.e. e.g. from the keyboard (standard input). (In fact
from a file, such as a Shell script or the standard input, which is also
a file, ending with ^d... "everything is a file!", now we understand while
is ^d = eof = exit!)
Preprocessing it (see later, e.g. *, ?,...)
Executing it (line-by-line: errors not becoming obvious in the beginning!)
(Additional information possibly before the prompt.)
The Bash-shell is practical, saves you typing a lot:
- cursur up: previous commands
- TAB: fills in the file names, if there is only one possibility
- TAB TAB: if there are more possibilities, you can get a list of them
by typing TAB a second time
The general syntax of them is:
command [-options] [arguments]
Remark: [...] always means that this part is optional. What means
e.g. abc [de [fg]] ?
Getting help / information about a given command:
The UNIX file system
<command's_name> - -help | more
What is a file? ...
Finding the needed one out of thousand ones...
Everything is a hierarchical tree: root + branches (=directories) +
Path. Absolute and relative paths.
the root directory
.../... : subdirectories
(cf. \ in DOS!)
: home directory (crucial border between the system and the user)
: the actual working directory
: the parent directory of .
../.. : grand-parent,
/abc : a file in the root
~/abc : a file in my home directory
./abc : a file in the actual
../abc : a file in the parent
directory of the actual working directory.
'pwd' : print working directory
'ls' : list
mkdir : make directory
cd : change (working) directory
cat: catenate (concatenate) (lat. catena = 'chain') : use now this
to create the simplest files.
An important principle in Unix: EVERYTHING IS A FILE!
Directories are files.
Drives are also files (directories) within the same hierarchy (e.g.
/media/floppy/) (unlike DOS, like Windows).
The "screen" is also a file (e.g. /dev/tty1 or /dev/pts/1).
This principle will be very important when speaking of pipes.
A few important remarks about file names:
UNIX is case sensitive, therefore 'tamas', 'Tamas', 'TAMAS' or 'tamAs'
would be four different file names!
No extension (like .exe, .bat, .doc or .rtf) exists in UNIX, therefore
the period ('.') is just considered to be one character of the file name,
therefore 'my.first.file.name' would be a legitimate file name in UNIX.
It is up to you to find out a file naming system that is useful for yourself.
Some newer programs do use extensions (the character string after the last
but this is independent from UNIX itself.
No constraints exist in UNIX for naming files. But you should consider
not having too short, neither too long file names, but names that are informative
enough and easy to handle.
Furthermore, you do better not use the following characters in your
file names: space, coma, /, (, ), ', ", +, *, ?, <, >, $, \, and avoid
using '-' as the first character.