Some Remarks about the Validity of the Test
The distraction measure, which is used in the final evaluation
of the test is highly correlated with the standard deviation of
the reaction times corrected for unintended experimental and subject
effects.
History
In the early history of psychology
several authors, among them Binet (1900) and Godefroy (1915),
already stressed the importance of the fluctuation in reaction times
suggesting the mean deviation as a measure of performance.
Subsequent researchers became also increasingly aware that in
concentration test tasks the relevant information should be
searched for in the short term oscillation of the measure of
performance. Spearman considered even
oscillation to be a separate universal factor, in addition to
what he called the general factor
and perseveration (Spearman, 1927, p. 327).
According to Spearman (1927, p. 320) a typical
manifestation of this factor oscillation
... is supplied by the
fluctuations which always occur in any person's continuous output
of mental work, even when this is so devised as to remain of
approximately constant difficulty.
and at page 321 he remarks
"... almost any kind of continuous work can
be arranged so as to manifest the same phenomenon. In all cases
alike, the output will throughout exhibit fluctuations that cannot
be attributed to the nature of the work, but only to the worker
himself.
Jensen (1982), discussing his reaction time
experiments, noted that trial-to-trial variability (the
standard deviation of each subject's reaction times) frequently
surpassed response speed as a predictor of intelligence. According
to Larson and Alderton (1990), numerous studies suggest that
Jensen's observation was correct and that variability has a robust
statistical relationship to intelligence.
The Star Dance study
In a small validity study at the Sterredans [Star Dance] School in Nijmegen (The Netherlands)
the standard deviation of the reaction times
corrected for unintended test and subject effects was correlated with
the School Achievement and Intelligence tests of the Dutch ISI test
(van Boxtel, Snijders en Welten, 1982). ISI is a short for Vocational
Interest, School Achievement and Intelligence. Only one task was used: Colours.
The school achievement tests consisted of Computation,
Arithmetic Reasoning, Dictation I (visual) and Dictation II (aural),
and Comprehensive Reading.
The intelligence tests consisted of three verbal tests: Synonyms,
Opposites, Word Analogies, and of three figural tests: Cut Figures,
Rotation and Figure Analogies.
The sample consisted of 40 children, 28 boys and 20 girls, of the last
two grades of primary school, ranging in age from 9 to 12 years old.
The test measure has significant correlations at 0.05 level (1-tailed)
with Computation and Arithmetic Reasoning and with the non-verbal
intelligence tests Cut Figures, Rotation and Figure Analogies.
The correlations are of about the same size as the mutual correlations
between the School Achievement and Intelligence tests. Maximum Likelihood
factor analysis yielded three factors (p = 0.265). One factor showed high
loadings for the arithmetical tests, the figural tests and the test measure,
which had been used. The same factor was found by Reuver and Peters
(2003) in a factor analysis of the Dutch version of the Wechsler
Intelligence Scale for Children-III (2002). Again maximum likelihood
factor analysis was used. A solution with four factors (p = 0.099)
yielded the well-known Verbal and Performance factor and two factors,
one of which could be identified as a memory factor with high loading
on Information and Symbol Search and the other as an attention
concentration factor with high loadings on the subtests Arithmetic
and Digit Span.
The Olof Palme study
In another small validity study at the Olof Palme School in Drunen (The Netherlands)
two different tasks were used: Colours and Positions.
The standard deviation of the reaction times corrected for unintended test and subject
effects was correlated with some CITO school achievement tests, Raven's Standard
Progressive Matrices tests and a Judgement Intelligence Rate made by the parents as
well as by the teachers.
The CITO School Achievement tests consisted of Spelling, Writing,
Comprehensive Reading, Vocabulary, Computation I (operations), Computation II
(measurement, geometry), Learning Skills, Maps and Statistical Representations (scheme,
tables, graphs). Raven's Standard Progressive Matrices tests consisted of the subsets A
through E.
The sample consisted of 38 children, 16 boys and 22 girls, of the last two grades of
primary school, ranging in age from 10 (only one subject) to 13 years old.
The test measure for Positions has a significant correlation at the 0.05 level (1-tailed)
with the CITO school achievement test Learning Skills (r = -0.406, p = 0.011) and with
Raven's Standard Progressive Matrices tests subset D (r = -.353, p = 0.030) and E
(r = -.362, p = 0.025).
No significant correlations were found for Colours. Colours was probably to easy
for the children to have discriminative power. However, a significant correlation
of Colours and Parent's Judgement Intelligence Rate was found (-.340, p = 0.037).
Significant correlations at 0.01 level (1-tailed) were found for Positions with
Judgement Intelligence Rate of the parents (r = -.457, p = .004) as well as with
Judgement Intelligence Rate of the teachers (r = -0.472, p = 0.003).
In order to evaluate the value of the various
correlations one should take in mind the correlation between Colours and Positions
(r = 0.259, p = 0.117). This correlation is rather low and not significant,
which is remarkable, because both tasks measure the same underlying variable.
The low correlation is probably a result of a restriction of range effect and
a small sample size. In this respect the above mentioned significant correlations are
most peculiar.
The Axis study
In a third validity study children of regular primary school
were compared with children of special primary school with respect to
differences in reading difficulties. A total number of eleven schools
participated in the study. Most subjects came from the primary school
De Spil [Axis] in Duiven (The Netherlands), which was the reason for referring to this study as
the Axis study. Again two different versions of the
ACT were used: Colours with Stimulus number Fixed and Colours with Stimulus
number Random. For each version of the test the standard deviation of
the reaction times corrected for unintended test and subject effects was
correlated with two word reading tests: the Three-Minutes-Reading Test
(Verhoeven, 1993) and the Lexical-Decision Test (van Bon, 2004).
The Three-Minutes-Reading Test is often used in Dutch education to evaluate
reading achievement. The Three-Minutes-Reading Test is an oral reading task.
The Lexical-Decision Test is a silent reading task. It consists of regular
words and pseudo words. Subject's task is to cross out the pseudo words.
The Three-Minutes-Reading Test actually consists of three different One-Minute-Reading tests,
the first two consisting of five columns of monosyllabic
words and the third one consisting of four columns of disyllabic words.
The first card has monosyllabic CV, VC, and CVC words (C = consonant, V = Vocal).
The second card consists of more complex monosyllabic words;
that is, words with at least one consonant cluster (from CCVC like SPIN [spider]
to CVCCCC words like HERFST [autumn]). The third card has disyllabic words.
The words have to be read aloud by the testee with a testing time of one minute.
For each card, two different scores were used: the number of words that can be read
in one minute and the number of words that can be read correctly in one minute.
Lexical Decision Test 1 consisted of 80 monosyllabic words (regular and pseudo)
arranged in three columns. Lexical Decision Test 2 consisted of 120 disyllabic words
arranged in four columns. The subject has to read the words column wise.
Each test is a one minute test. Depending on the didactical age, most of the jounger
subjects were given test 1 (17 in the range 8 to 10 years). The more older subjects were given
test 2 (33 in the range 9 to 14). However, in the computation of the correlations
underneath no distinction was made between test 1 and 2.
The sample consisted of 50 children, 33 boys and 17 girls, of grade 4, 6 and 8
of primary school, ranging in age from 8 (only one subject) to 14 years old.
Raw correlations
The correlation of Colors Fixed with the Lexical-Decision Test (r = -0.276, p = 0.026) and the correlations
of Colors Fixed with the three One-Minute Tests (r = -0.223, p = 0.060, r = -0.188, p = 0.096 and
r = -.270, p = 0.029) are low, but still significant at 0.10 level.
However, the correlations of Colours Random with the Reading Tests are hihger and all significant
at 0.001 level (Lexical Decision Test: r = -.570, One mInute Reading Tests: r = -0.571, r = -0.558
and r = -.578). The correlations of Colors Random with the Reading
tests are even higher (neglecting the sign) then the correlation of Colors Random with Colors Fixed (r = 0.456).
Partial correlations
One could argue, that not concentration, or in terms of
inhibition theory, the rate of inhibition increase during
working intervals, is the reason for the observed correlations,
but that it is pure mental speed of performance. In that case the
partial correlations between the ACT score (Colours Random)
and the Reading tests while controlling for speed of performance
(minimum reaction time)
should be reduced to zero. However, in the case of Colours Random
the partial correlations with the Reading Tests while controlling
for both the natural logarithm of the minimum reaction time in the Colurs Fixed Task
and the Coulours Random task remained significant
at the 1% level (Lexical Decision Test: r = -0.331, p = 0.011,
One mInute Reading Tests: r = -0.342, p = 0.009, r = -0.317, p = 0.014,
and r = -0.333, p = 0.010) indicating that that the attention
concentration effect is the explaining factor. The natural logarithm was taken because
it was normally distributed accross subjects.
The correlations of Colors Fixed with the Reading Tests are generally lower then
the correlations of Colors Random with the Reading Tests.
Probably Colours Fixed is to easy for these children and, therefore, has a lack of discriminative power.
in comparison with Colours Random. For the practical application of the test one must keep in
mind, that the test score is partly dependent on the mental
capacity of the subject and partly dependent on the mental
load of the task. If the load is to low, the test may loose
discriminability.
Factor analysis
A factor analysis was performed to test the assumption that all
common variance of these six variables could be explained
by a single factor. A one factor solution, using
maximum likelihood factor analysis, was not significant
(Chi-Square = 14.174, df = 9, p = 0.116). All the
variances of these variables could, therefore, be explained by one
common factor. No systematic specific variance has to be
assumed or, stated differently, all specific variance is
random. This means that the systematic variance of the Reading Tests
can completely be explained by the ACT measures.
It is quite clear that what is needed for the various Reading Tests is Word Reading.
However, that is not necessarily, what is measured by the corresponding test score.
The faculty of Word Reading is certainly needed for these tests, however, it is not
allowed to conclude, that, consequently, these tests measure something like Reading Ability.
A similar situation occurs,
for example, in the case of the well-known Snellen test. This test has been the most popular
clinical measurement of visual acuity for over a century. No one would argue, that the test measures
letter reading ability, only because the chart consists of letters. The discriminative factor is distance.
Some subjects need a larger distance to be able to read the letters of a certain line, whereas other subjects
need a shorter distance. The ability to read the letters of the alphabet is what is presumed, not what is measured.
That, which is needed for a test is not necessarily, what is measured by the test.
For example, something like the faculty of sight is also needed for most intelligence tests,
but no one would claim that it is the faculty of sight which is measured. Similarly,
in the case of the Reading Tests the ability to read is presumed, not measured. What is measured
by the test score is the concentration of attention, which explains why in a factor analysis,
which includes the ACT all common variance is explained by a single factor.
References
Binet, A.
(1900). Attention et adaptation [Attention and adaptation].
L'annee psychologique, 6, 248-404.
Bon, W.H.J., van, Bouwmans, M., Broeders, I., Hoevenaars, L.T.M., Jongeneelen, J.E. (2003).
Een klassikale toets voor 'technische leesvaardigheid'; vragen van validiteit en
betrouwbaarheid. Tijdschrift voor orthopedagogiek, 42(2), 71-86.
Boxtel, H.W. van. Snijders, J.Th. and Welten, V.J. (1982).
ISI-Reeks [ISI-Series]. Groningen (The Netherlands),
Wolters-Noordhoff.
Godefroy, J.C.L. (1915).
Onderzoekingen over de aandachtsbepaling by gezonden en
zielszieken [Studies on the measurement of concentration using
healthy subjects and mentally ill subjects]. Groningen (The
Netherlands), University of Groningen, Dissertation.
Jensen, A.R. (1982).
Reaction time and psychometric "g". In H.J. Eysenck (Ed.),
A model for intelligence. New York: Springer-Verlag.
Larson, G.E. and Alderton, D.L. (1990). Reaction
Time Variability and Intelligence: A "Worst Performance" Analysis
of Individual Differences. Intelligence 14, 309-325.
Reuver, J. and Peters, W. (2003). Hoogbegaafdheid, schoolproblemen en
intelligentieprofielen [School Problems, Giftedness, Intelligence
Profiles]. Pedagogisch Tijdschrift, 28, 263-280.
Spearman, C. (1927). The Abilities of Man. London, Macmillan.
Verhoeven, L. (1993), Drie-minutentoets [Three-Minutes Test]. Arnhem: Cito.
Wechsler, D. (2002). Wechsler Intelligence Scale for Children
Derde Editie [Third Edition]
NL: handleiding [manual]. Amsterdam: NDC.