Tekstmanipulatie, week 1
General Introduction
What is a computer in fact?
-
Computing (almost never used in humanities) vs. other tasks:
building and manipulating databases, texts, etc. This course will deal
with manipulating texts.
Examples: searches, statistics (number of words, sentences,
N-grams, etc.), concordances, systematic transformation of texts,... Machine
Translation?
-
Is an iron (strijkijzer) a computer? And a calculator or
an abacus?
Main characteristics of a computer:
-
mechanical vs. electronical
-
analogue vs. digital
-
multi-level organization (hardware vs. software)
John v. Neumann - Herman Goldstine (1948): binary system
+ controled by a program
Several levels between the machine and the user:
-
machine code, assembly, assembler
-
operating system: connection between the different parts of the hardware
and the user(s) / software(s)
-
user friendly applications
Why using PC's at post offices or for typing simple letters? The advantages
of this kind of modularity are:
-
Easier to program
-
Portability between different types of computers
-
Higher degree of flexibility (e.g. including new options)
... therefore wider market to sell them, so they are cheaper.
Unix
Unix as an operating system. And also a culture, a way of thinking.
Starting at AT&T, in 1969... standards and plenty of variations (LINUX).
(Miles Osborne's Unix slides.)
The Unix philosophy:
-
Multi-users: one host - many terminals (nowdays: XWindows and virtual terminals)
-
Use the drive, rather than the memory
-
Shell (e.g. bash)
-
Standard functions, different buttons on the keyboard (^h = delete, ^l=rewrite
the screen, ^c=stop the command, ^d=eof,...)
-
Standard processes with commands (possibility to redefine them)
The User
Logging in, login name, password, changing password ('passwd'), loging
out ('logout', 'exit', ^d)
Root privileges vs. 'regular users'.
Unix shells
A shell is really the envelope around the computer: it receives the commands
and executes them. How does it work?
-
Printing the "prompt" and waiting for a command (or a group of commands)
from the input, i.e. e.g. from the keyboard (standard input). (In fact
from a file, such as a Shell script or the standard input, which is also
a file, ending with ^d... "everything is a file!", now we understand while
is ^d = eof = exit!)
-
Preprocessing it (see later, e.g. *, ?,...)
-
Executing it (line-by-line: errors not becoming obvious in the beginning!)
Different types of shells [different kinds of prompts]: Korn shell (ksh)
[%], C-shell (Csh) [$], Bourne shell [$], Bourne Again shell (bash) [>]
(Additional information possibly before the prompt.)
The Bash-shell is practical, saves you typing a lot:
- cursur up: previous commands
- TAB: fills in the file names, if there is only one possibility
- TAB TAB: if there are more possibilities, you can get a list of them
by typing TAB a second time
Unix commands
The general syntax of them is:
command [-options] [arguments]
Remark: [...] always means that this part is optional. What means
e.g. abc [de [fg]] ?
Getting help / information about a given command:
man <command's_name>
<command's_name> - -help | more
The UNIX file system
What is a file? ...
Finding the needed one out of thousand ones...
Everything is a hierarchical tree: root + branches (=directories) +
leaves (=files).
Path. Absolute and relative paths.
/... :
the root directory
.../... : subdirectories
(cf. \ in DOS!)
~
: home directory (crucial border between the system and the user)
.
: the actual working directory
..
: the parent directory of .
../.. : grand-parent,
etc.
/abc : a file in the root
directory
~/abc : a file in my home directory
./abc : a file in the actual
working directory
../abc : a file in the parent
directory of the actual working directory.
'pwd' : print working directory
'ls' : list
mkdir : make directory
cd : change (working) directory
cat: catenate (concatenate) (lat. catena = 'chain') : use now this
to create the simplest files.
An important principle in Unix: EVERYTHING IS A FILE!
Directories are files.
Drives are also files (directories) within the same hierarchy (e.g.
/media/floppy/) (unlike DOS, like Windows).
The "screen" is also a file (e.g. /dev/tty1 or /dev/pts/1).
This principle will be very important when speaking of pipes.
A few important remarks about file names:
-
UNIX is case sensitive, therefore 'tamas', 'Tamas', 'TAMAS' or 'tamAs'
would be four different file names!
-
No extension (like .exe, .bat, .doc or .rtf) exists in UNIX, therefore
the period ('.') is just considered to be one character of the file name,
therefore 'my.first.file.name' would be a legitimate file name in UNIX.
It is up to you to find out a file naming system that is useful for yourself.
Some newer programs do use extensions (the character string after the last
period),
but this is independent from UNIX itself.
-
No constraints exist in UNIX for naming files. But you should consider
not having too short, neither too long file names, but names that are informative
enough and easy to handle.
-
Furthermore, you do better not use the following characters in your
file names: space, coma, /, (, ), ', ", +, *, ?, <, >, $, \, and avoid
using '-' as the first character.