Some Remarks about the Validity of the Test

The distraction measure, which is used in the final evaluation of the test is highly correlated with the standard deviation of the reaction times corrected for unintended experimental and subject effects.

History

In the early history of psychology several authors, among them Binet (1900) and Godefroy (1915), already stressed the importance of the fluctuation in reaction times suggesting the mean deviation as a measure of performance. Subsequent researchers became also increasingly aware that in concentration test tasks the relevant information should be searched for in the short term oscillation of the measure of performance. Spearman considered even oscillation to be a separate universal factor, in addition to what he called the general factor and perseveration (Spearman, 1927, p. 327). According to Spearman (1927, p. 320) a typical manifestation of this factor oscillation

... is supplied by the fluctuations which always occur in any person's continuous output of mental work, even when this is so devised as to remain of approximately constant difficulty.
and at page 321 he remarks
"... almost any kind of continuous work can be arranged so as to manifest the same phenomenon. In all cases alike, the output will throughout exhibit fluctuations that cannot be attributed to the nature of the work, but only to the worker himself.
Jensen (1982), discussing his reaction time experiments, noted that trial-to-trial variability (the standard deviation of each subject's reaction times) frequently surpassed response speed as a predictor of intelligence. According to Larson and Alderton (1990), numerous studies suggest that Jensen's observation was correct and that variability has a robust statistical relationship to intelligence.

The Star Dance study

In a small validity study at the Sterredans [Star Dance] School in Nijmegen (The Netherlands) the standard deviation of the reaction times corrected for unintended test and subject effects was correlated with the School Achievement and Intelligence tests of the Dutch ISI test (van Boxtel, Snijders en Welten, 1982). ISI is a short for Vocational Interest, School Achievement and Intelligence. Only one task was used: Colours.

The school achievement tests consisted of Computation, Arithmetic Reasoning, Dictation I (visual) and Dictation II (aural), and Comprehensive Reading.
The intelligence tests consisted of three verbal tests: Synonyms, Opposites, Word Analogies, and of three figural tests: Cut Figures, Rotation and Figure Analogies.
The sample consisted of 40 children, 28 boys and 20 girls, of the last two grades of primary school, ranging in age from 9 to 12 years old.

The test measure has significant correlations at 0.05 level (1-tailed) with Computation and Arithmetic Reasoning and with the non-verbal intelligence tests Cut Figures, Rotation and Figure Analogies. The correlations are of about the same size as the mutual correlations between the School Achievement and Intelligence tests. Maximum Likelihood factor analysis yielded three factors (p = 0.265). One factor showed high loadings for the arithmetical tests, the figural tests and the test measure, which had been used. The same factor was found by Reuver and Peters (2003) in a factor analysis of the Dutch version of the Wechsler Intelligence Scale for Children-III (2002). Again maximum likelihood factor analysis was used. A solution with four factors (p = 0.099) yielded the well-known Verbal and Performance factor and two factors, one of which could be identified as a memory factor with high loading on Information and Symbol Search and the other as an attention concentration factor with high loadings on the subtests Arithmetic and Digit Span.

The Olof Palme study

In another small validity study at the Olof Palme School in Drunen (The Netherlands) two different tasks were used: Colours and Positions. The standard deviation of the reaction times corrected for unintended test and subject effects was correlated with some CITO school achievement tests, Raven's Standard Progressive Matrices tests and a Judgement Intelligence Rate made by the parents as well as by the teachers.

The CITO School Achievement tests consisted of Spelling, Writing, Comprehensive Reading, Vocabulary, Computation I (operations), Computation II (measurement, geometry), Learning Skills, Maps and Statistical Representations (scheme, tables, graphs). Raven's Standard Progressive Matrices tests consisted of the subsets A through E.

The sample consisted of 38 children, 16 boys and 22 girls, of the last two grades of primary school, ranging in age from 10 (only one subject) to 13 years old.

The test measure for Positions has a significant correlation at the 0.05 level (1-tailed) with the CITO school achievement test Learning Skills (r = -0.406, p = 0.011) and with Raven's Standard Progressive Matrices tests subset D (r = -.353, p = 0.030) and E (r = -.362, p = 0.025). No significant correlations were found for Colours. Colours was probably to easy for the children to have discriminative power. However, a significant correlation of Colours and Parent's Judgement Intelligence Rate was found (-.340, p = 0.037). Significant correlations at 0.01 level (1-tailed) were found for Positions with Judgement Intelligence Rate of the parents (r = -.457, p = .004) as well as with Judgement Intelligence Rate of the teachers (r = -0.472, p = 0.003).

In order to evaluate the value of the various correlations one should take in mind the correlation between Colours and Positions (r = 0.259, p = 0.117). This correlation is rather low and not significant, which is remarkable, because both tasks measure the same underlying variable. The low correlation is probably a result of a restriction of range effect and a small sample size. In this respect the above mentioned significant correlations are most peculiar.

The Axis study

In a third validity study children of regular primary school were compared with children of special primary school with respect to differences in reading difficulties. A total number of eleven schools participated in the study. Most subjects came from the primary school De Spil [Axis] in Duiven (The Netherlands), which was the reason for referring to this study as the Axis study. Again two different versions of the ACT were used: Colours with Stimulus number Fixed and Colours with Stimulus number Random. For each version of the test the standard deviation of the reaction times corrected for unintended test and subject effects was correlated with two word reading tests: the Three-Minutes-Reading Test (Verhoeven, 1993) and the Lexical-Decision Test (van Bon, 2004). The Three-Minutes-Reading Test is often used in Dutch education to evaluate reading achievement. The Three-Minutes-Reading Test is an oral reading task. The Lexical-Decision Test is a silent reading task. It consists of regular words and pseudo words. Subject's task is to cross out the pseudo words.

The Three-Minutes-Reading Test actually consists of three different One-Minute-Reading tests, the first two consisting of five columns of monosyllabic words and the third one consisting of four columns of disyllabic words. The first card has monosyllabic CV, VC, and CVC words (C = consonant, V = Vocal). The second card consists of more complex monosyllabic words; that is, words with at least one consonant cluster (from CCVC like SPIN [spider] to CVCCCC words like HERFST [autumn]). The third card has disyllabic words. The words have to be read aloud by the testee with a testing time of one minute. For each card, two different scores were used: the number of words that can be read in one minute and the number of words that can be read correctly in one minute.

Lexical Decision Test 1 consisted of 80 monosyllabic words (regular and pseudo) arranged in three columns. Lexical Decision Test 2 consisted of 120 disyllabic words arranged in four columns. The subject has to read the words column wise. Each test is a one minute test. Depending on the didactical age, most of the jounger subjects were given test 1 (17 in the range 8 to 10 years). The more older subjects were given test 2 (33 in the range 9 to 14). However, in the computation of the correlations underneath no distinction was made between test 1 and 2.

The sample consisted of 50 children, 33 boys and 17 girls, of grade 4, 6 and 8 of primary school, ranging in age from 8 (only one subject) to 14 years old.

Raw correlations

The correlation of Colors Fixed with the Lexical-Decision Test (r = -0.276, p = 0.026) and the correlations of Colors Fixed with the three One-Minute Tests (r = -0.223, p = 0.060, r = -0.188, p = 0.096 and r = -.270, p = 0.029) are low, but still significant at 0.10 level. However, the correlations of Colours Random with the Reading Tests are hihger and all significant at 0.001 level (Lexical Decision Test: r = -.570, One mInute Reading Tests: r = -0.571, r = -0.558 and r = -.578). The correlations of Colors Random with the Reading tests are even higher (neglecting the sign) then the correlation of Colors Random with Colors Fixed (r = 0.456).

Partial correlations

One could argue, that not concentration, or in terms of inhibition theory, the rate of inhibition increase during working intervals, is the reason for the observed correlations, but that it is pure mental speed of performance. In that case the partial correlations between the ACT score (Colours Random) and the Reading tests while controlling for speed of performance (minimum reaction time) should be reduced to zero. However, in the case of Colours Random the partial correlations with the Reading Tests while controlling for both the natural logarithm of the minimum reaction time in the Colurs Fixed Task and the Coulours Random task remained significant at the 1% level (Lexical Decision Test: r = -0.331, p = 0.011, One mInute Reading Tests: r = -0.342, p = 0.009, r = -0.317, p = 0.014, and r = -0.333, p = 0.010) indicating that that the attention concentration effect is the explaining factor. The natural logarithm was taken because it was normally distributed accross subjects.

The correlations of Colors Fixed with the Reading Tests are generally lower then the correlations of Colors Random with the Reading Tests. Probably Colours Fixed is to easy for these children and, therefore, has a lack of discriminative power. in comparison with Colours Random. For the practical application of the test one must keep in mind, that the test score is partly dependent on the mental capacity of the subject and partly dependent on the mental load of the task. If the load is to low, the test may loose discriminability.

Factor analysis

A factor analysis was performed to test the assumption that all common variance of these six variables could be explained by a single factor. A one factor solution, using maximum likelihood factor analysis, was not significant (Chi-Square = 14.174, df = 9, p = 0.116). All the variances of these variables could, therefore, be explained by one common factor. No systematic specific variance has to be assumed or, stated differently, all specific variance is random. This means that the systematic variance of the Reading Tests can completely be explained by the ACT measures.

It is quite clear that what is needed for the various Reading Tests is Word Reading. However, that is not necessarily, what is measured by the corresponding test score. The faculty of Word Reading is certainly needed for these tests, however, it is not allowed to conclude, that, consequently, these tests measure something like Reading Ability. A similar situation occurs, for example, in the case of the well-known Snellen test. This test has been the most popular clinical measurement of visual acuity for over a century. No one would argue, that the test measures letter reading ability, only because the chart consists of letters. The discriminative factor is distance. Some subjects need a larger distance to be able to read the letters of a certain line, whereas other subjects need a shorter distance. The ability to read the letters of the alphabet is what is presumed, not what is measured. That, which is needed for a test is not necessarily, what is measured by the test. For example, something like the faculty of sight is also needed for most intelligence tests, but no one would claim that it is the faculty of sight which is measured. Similarly, in the case of the Reading Tests the ability to read is presumed, not measured. What is measured by the test score is the concentration of attention, which explains why in a factor analysis, which includes the ACT all common variance is explained by a single factor.

References

Binet, A. (1900). Attention et adaptation [Attention and adaptation]. L'annee psychologique, 6, 248-404.

Bon, W.H.J., van, Bouwmans, M., Broeders, I., Hoevenaars, L.T.M., Jongeneelen, J.E. (2003). Een klassikale toets voor 'technische leesvaardigheid'; vragen van validiteit en betrouwbaarheid. Tijdschrift voor orthopedagogiek, 42(2), 71-86.

Boxtel, H.W. van. Snijders, J.Th. and Welten, V.J. (1982). ISI-Reeks [ISI-Series]. Groningen (The Netherlands), Wolters-Noordhoff.

Godefroy, J.C.L. (1915). Onderzoekingen over de aandachtsbepaling by gezonden en zielszieken [Studies on the measurement of concentration using healthy subjects and mentally ill subjects]. Groningen (The Netherlands), University of Groningen, Dissertation.

Jensen, A.R. (1982). Reaction time and psychometric "g". In H.J. Eysenck (Ed.), A model for intelligence. New York: Springer-Verlag.

Larson, G.E. and Alderton, D.L. (1990). Reaction Time Variability and Intelligence: A "Worst Performance" Analysis of Individual Differences. Intelligence 14, 309-325.

Reuver, J. and Peters, W. (2003). Hoogbegaafdheid, schoolproblemen en intelligentieprofielen [School Problems, Giftedness, Intelligence Profiles]. Pedagogisch Tijdschrift, 28, 263-280.

Spearman, C. (1927). The Abilities of Man. London, Macmillan.

Verhoeven, L. (1993), Drie-minutentoets [Three-Minutes Test]. Arnhem: Cito.

Wechsler, D. (2002). Wechsler Intelligence Scale for Children Derde Editie [Third Edition] NL: handleiding [manual]. Amsterdam: NDC.