This week, we start using SPSS. The assignments are getting less labor-intensive, as you will get the mathematics for free. Still, you should always keep in mind that software can only help you if you understand what they do and in which case you can use this or that function.

**SPSS** stands for *Statistical Package for the
Social Sciences*, and is the most frequently used
software among psychologists, sociologists and linguists
(and probably in many other fields) to perform statistical
computations.

You can send me the solutions (report) as a pdf, a Word file
(please, not in a Windows Vista format, that is,
with a .docx extension), or print it out and put in my postbox.
In the latter case, a plastic bag is not necessary, just fold it up.
Do not forget to mention **your name** in the report, as well as in
the file name.

Each assignment is worth 10 points (as the previous ones... sorry for the delay in correcting them). Handing in the report more than a week after the lab results in half of the points, and no assignment is accepted with a delay of two weeks.

Reports must be **as short as possible**, that is,
copy-and-paste only the SPSS results that are necessary.
Explain the results in one sentence, especially if you needed to do more
than just copy-paste (e.g., find the lowest value or calculate
the difference of two values). Do not add any
further information. Reporting irrelevant information can result in less
points, as filtering the relevant information is one of your tasks.

Tasks that you simply have to do (before you get to
the questions) appear below with a **> starting the line**
in **bold** letters. Concerning these, you need not report anything,
simply perform these tasks.
The questions to be answered in the report are given
below with a *** starting the line**, and in **bold** letters.

Answer the questions in a short but exact way, starting the number of the question. For instance:

…

3.20 measurements.

4.Word length 3.

…

Try to finish all assignments during the lab. Should this fail, you can go on working on the assignments in your own time.

**A Getting familiar with SPSS****B1 Entering data by hand****B2 Using “Variable View”****B3 Creating a frequency table****C Creating a histogram****D Creating a boxplot****E Calculating mean, modus and median****F Calculating measures of spread**

**> Turn on computer and screen.**

**> Enter username and password to log in.**

**> Look up SPSS 14 (or higher) within
the RUG menu (within Mathematics & Statistics) and launch SPSS.**

In case SPSS has not been installed on your machine yet, you get a window saying that you have to restart your computer. Do that, otherwise SPSS may have problems running.

**> Once SPSS is running, you are
offered a menu with choices. Click on “cancel”.**

Now you are in the ** Data Editor**, the window of SPSS in which you can
enter data and work on them. It is a spreadsheet you might be familiar
with from other
applications. On its top you find the name of the data file you are working
with, but at this moment it is still:

In
the Data Editor, each (vertical) column of numbers represents a ** variable**.
Each variable is given a name, which appears on the top of the column. Use
meaningful names, such as LENGTH, and not something like X24A06.

Each (horizontal) row represents a ** case**.
A case is a series of observations belonging together, such as the answers of a
respondent to the questions in a questionnaire, or different values measured on
the same subject of the experiment. For instance, if you have 32 respondents,
then you need 32 rows for the 32 cases. If the questionnaire contained 40
questions, then you most probably need 40 columns, and so you have 40
variables. (Next week, we learn how to calculate new, derivative
variables from existing ones.)

The
Data Editor is composed of two parts: the ** Data View **and
the

The *Variable View* offers an overview of your
variables, and you can also define some features of these variables. The most
important features are:

1.: the name of the variable.Name

2.: defines the type of the variable. Some of the types offered by SPSS:Typea.3.Numeric: the usual way of rendering numbers (e.g., 12345,67).

b.Comma: comma before each group of three digits, dot before decimal digits (e.g., 12,345.67).

c.Dot: dot before each group of three digits, comma before decimal digits (e.g., 12.345,67).

d.String: any textual information (e.g., answers to an open question).: the number of positions available in the Data View window.Width

4.: the number of decimal digits after the comma/dot.Decimals

5.: text providing more information about the variable.Label

6.: texts providing information about each value of the variable.Values

7.: the value used to denote missing values (e.g., “no answer”).Missing

8.: the width of the column in the Data View window.Column

9.: the “measurement scale” of the variable (nominal, ordinal or scale, the last covering all types of numeric scales).Measure

On the top of the window you find the menu of SPSS: *FILE,
EDIT, VIEW, DATA*, etc. All statistical calculations are found under
ANALYZE, and all diagrams and charts under GRAPHS. To calculate new variables
based on the existing ones, use the commands under TRANSFORM. The HELP menu
provides you help with further assistance, but which may prove quite concise in
the beginning.

**> Have a look at the different menus to get a general overview of
them.**

The **MLU** (*Mean Length of Utterance*) measures the length
of an utterance (a well-formed sentence or a sentence-like
series of words) by counting the number of words it contains.
It is an important measure of linguistic capabilities of children
acquiring a language, of patients with impaired language, but it
is also useful in identifying authors of texts.

Here is the length of a test utterence measured on 20 patients:

3, 5, 4, 4, 10, 4, 11, 4, 4, 6, 3, 4, 4, 8, 8, 8, 5, 8, 4, 9.

**> Enter these values by hand and add the variable the
name MLU.**

**> In the Variable View, set the number of decimals to 0 (as
utterence length always has an integer value).**

When you work with SPSS (as with any other application), it is good practice to regularly save your data files. Output files are often simpler to create again, but data files are certainly not. Moreover, SPSS 14 is not always stable, causing the program to terminate unexpectedly. Finally, we may want to use some of the data files during several labs.

**> Therefore, save your data file to your own network drive (X:\) in a separate
folder that you create specifically for this lab.**

A frequency table is a table that shows how often each value of a variable appears among your data.

**> Create a frequency table from this variable. **

Hint: 'Analyze', 'Descriptive Statistics', 'Frequencies'.

During the data entry process, one quite often makes errors. Hence, it is imperative to check always the data you have just entered. Beside rereading the numbers in Data View, you should also look for outliers “created” by erroneous data entry: for instance, typing too many zeros or entering two values in a single cell will create values much greater than other values. In the present case, check if the frequency table contains only values you remember having entered (and that make sense). Compare also your frequency table to the one of your neighbours in the lab.

**> Check the frequency of each value
in you frequency table together your neighbour.**

*** 1. Copy the table in your report.**

*** 2. How many measurements (data) do you have? **

*** 3. Which MLU is the second most frequent?**

*** 4. How often does the highest value of MLU occur?**

A histogram (or frequency diagram) is a graph displaying how frequently the possible values of a variable occur (or how frequently values falling within a certain range occur) among the data having been entered.

**> Create a histogram based on the variable MLU.**

Hint: 'Graphs', 'Histogram'.

**> Do it again, but have SPSS also draw a Normal curve.**

Hint: mark the checkbox ‘display normal curve’.

*** 5. Copy this second graph to your report.**

*** 6. What does the vertical axis
display: numbers or percentages?**

*** 7. What is the highest value and what
is the lowest value of the variable?**

*** 8. How many peaks are there?**

*** 9. There is a gap is the graph. At
what value can this gap be found? What does this observation mean?
Would you expect to find this gap if you had many more data?**

*** 10. Is this distribution approximately Normal?**

A boxplot can be seen as a simplified histogram turned to its side, but it will also prove useful for other purposes later on.

**> Create a boxplot of your variable.**

Hint: 'Graphs', 'Boxplot'. Choose: “Simple” and “Summary separate variable”.

*** 11. Copy this boxplot to your report.**

*** 12. Which is the lowest and highest value according
to the boxplot?**

*** 13. Which is approximately the median
according to the boxplot?**

*** 14. How many percentages of the data
are outside of the box?**

*** 15. Which data are outside of the “whiskers”
of the boxplot?**

We often would like to summarize a variable as a
single number that tells you roughly where the values of that variable are
located. Generally the **mean** (average) is used for that purpose. Another
option is employing the **modus**, that is, the value that appears most
frequently.
One can also use the **median**, the middle value if the
observations are sorted from lowest to highest.

When a histogram is created, the mean is automatically calculated. The modus, the median and the mean can also be derived by choosing “Analyze”, “Descriptive Statistics”, and then “Frequencies“ in the menu. If you wish, uncheck the mark next to ‘Display Frequency Table’, and ignore the warning. Then choose the mean, the modus and the median via the Statistics.

**> Have SPSS calculate the mean, the modus and the median, and report them to you
in a single table.**

*** 16. Copy this table to your report.**

*** 17. Suppose you make an error during
data entry: you type 80 instead of 8. Which of these values will change, and
which will not? (Why? How does M&M call this feature of a statistical
measure?)
**

*** 18. The median of MLU is lower
than its mean. This is because the histogram is skewed to the … (left or
right?), and it has a longer tail to the … (left or right?).**

In many cases we are not only interested in where more
or less the values of the variable are located, but also in the “width” of the
frequency distribution. There are different measures of describing the “width”
of the histogram. The most known one is **standard deviation** (SD), but **range**
and **interquartile range** are also used. The drawback of the range (the
difference of the maximum and minimum values) is that it is fully dependent on
the two most extreme values being measured.

**> Have SPSS calculate for you the SD, the range and the
quartiles.**

Hint: “Analyze”, “Descriptive Statistics”,
“Frequencies”.

*** 19. Report the SD, the range and the IQR.**

*** 20. If the range is seen as the width
of the histogram, then how many SD is the width of this histogram?**
(How many times is the range larger than the SD?)