Continuous Performance Tests reviewed by Ron Dumont, Anna Tamborra and Brian Stone

(published here by kind permission of Ron Dumont Ed.D., Director of the School of Psychology Program, Fairleigh Dickinson University)
Link to comparison table

Three computerized continuous performance tests were reviewed by these authors. The goal of these reviews was to compare the ease of use, computer requirements, normative data, test result, and interpretability. No attempt was made to distinguish which program might “better identify” a sample of ADHD children from a control group. Materials reviewed generally were those that a practitioner would receive when purchasing the software package. Although extensive research may already have been published on CPTs, it was our goal to review only the materials that a school psychologist would receive when purchasing the tests themselves. Admittedly, no attempt has been made to extensively research the background of the tests and their historical use. These reviews should be considered general overviews of the tests and are not meant to be comprehensive in their nature.

Test of Variables of Attention (Link to users comments)

The Test of Variables of Attention (T.O.V.A.), is a computerized, 23-minute (11 minutes for 4-5 year olds), non-language based, fixed interval, visual performance test for use in the screening; diagnosis; and monitoring of treatment of children and adults with attention deficits. It was created by Dr. Lawrence Greenberg and is distributed by Universal Attention Disorders, Inc. as well as American Guidance Service (AGS). Cost for this test is $495. This price includes the T.O.V.A. disk, micro switch (button), the T.O.V.A. box (for keeping track of additional tests), two T.O.V.A. videos, an interpretation manual, and an installation manual. The initial cost also allows 5 interpretations. Each additional interpretation costs between $5 and $6 depending on the number you purchase. (Included in packets sent to us a year before doing this review were a number of interesting materials that probably reflect the difference in the professions of those who might use the T.O.V.A.. On one promotional page, under the heading “Benefits of T.O.V.A.”, the following were listed: Enhance revenues; Retain patient within doctor’s practice; Builds practice/builds referral base; and Reimbursable through major medical/psychological benefits. Also included were two sample letters to insurance companies demonstrating how to bill for using the T.O.V.A. for either a C.N.S. diagnosis or an Organic Brain Syndrome Diagnosis.) Could school districts or school psychologists that use the T.O.V.A. request third party reimbursement?

The manual states three clinical uses of the T.O.V.A.: 1. as a screen for students suspected of having ADHD or learning problems; 2. as a diagnostic tool as part of a multi-disciplinary assessment of children and adults who may have attention deficit; and 3. as an aid in helping to determine the dosage level and to monitor the use of medication over time.

The test itself consists of repeated exposures on the computer screen of two different squares. The squares differ in that one has a ‘hole’ near the top (target figure) while the second has a ‘hole’ near the bottom. The subject is to press the button every time the square with the hole near the top is flashed on the screen. The T.O.V.A. variables include: Errors of omission (inattention) and commission (impulsivity); response time; standard deviations, anticipatory responses, post-commission responses, and multiple responses.

The Test of Variables of Attention (T.O.V.A.) computer program (version 1.3.1) was reviewed using a PowerMac 7100/66, with 16MB of memory. The manual that accompanied the software was for version 1.2. Installation of the T.O.V.A. software itself was flawless. Simply dragging the T.O.V.A. icon to the hard drive installs the program. The problem came when trying to connect the T.O.V.A. button to the modem port. It didn’t fit. The configuration of the Mac’s serial port had evidently changed from earlier models to the Power Macs and the 8 pin plug provided for the T.O.V.A. would not fit into the serial port. Luckily a toll free number is provided for technical support. After speaking with Andrew Greenberg, we chose to attempted the “low tech” solution of using an exacto knife to do away with the plastic surrounding the pins. When this didn’t work, Andrew gladly sent, and we received in 1 day, a micro processing switch that solved the problem. Technical support for computer problems and the availability of people knowledgeable of the T.O.V.A. when questions arose was excellent throughout the reviewing process.

One strange alert box appeared before the correct micro-switch arrived. The alert suggested that we might choose to use the computer mouse button instead of the micro-switch and directs the user to push the “Use mouse” button. There was however no button to click on! This is probably a good thing, since the test measures response time in microseconds and any inaccuracies would greatly affect the interpretation of the T.O.V.A..

What appears to be an error in the computer program was discovered when we entered the age of a child as 7 years old and the computer generated the incorrect form (#6 – Age 4-5 (IF)). Once the form had been set by the computer at #6, it could not be brought back to the correct age form (#1) without creating a new test subject set-up. Examiners not aware of this form change requirement could in fact administer the wrong test to the subject. (Andrew Greenberg reported that this error would be fixed immediately.)

Another caution must be noted. It is possible for the results to differ from the child’s actual performance. We found during one administration that when the results were sent to the printer, certain scores (omission errors) were reported when they had not occurred during the test taking. This was possibly caused by a ‘powering-down’ energy saving system in the printer hardware. To avoid this problem, examiners are cautioned to be sure they are using a printer that is fully on line from start to finish and that examiners remain with, and closely observe the actual performance of each person tested.

The manual provides normative data on 1590 subjects, at 15 different ages separated by sex. Male and female norms are reported separately because, on the average, males have faster reaction times but make significantly more errors of commission (impulsive guessing). The norms clearly show that sustained attention increases with age, levels out at adulthood, and then deteriorates slightly in older adults. The norms are not stratified and little, if any, information is provided about the makeup of these children and adults. No breakdowns for socioeconomic levels, geographic regions, education levels, or race information is provided. There is no evidence in the manual that the normative sample includes (or for that matter, excludes) special education students or children on stimulant medication. Above age 20, there are very few males in the norming tables. For ages above 19 the numbers in the norming sample age groups drops considerably from an average of 168 subjects per group (age 4 to 19) to 36 subjects per group (age 20-80+). At some ages male subjects in the norm sample made no errors, hence there was no variability. Thus, actual standard scores at these points are quite artificial. (In separately published and unpublished information not sent with the test materials, the T.O.V.A. normative group appears to be created from “rolling norms”, the continual addition of people to the sample at varying stages and then recalibrating the averages. An early sample included 775 children aged 6 to 16. These children came from grades 1, 3, 5, 7 and 9 in three Minneapolis, Minnesota, suburban public schools. The children were “mainly middle to upper-middle social class and was predominantly Caucasian (99%). A second sample of 821 children and adults was later added to the original total. These new subjects came from an early education screening project; randomly selected classes in one grade- and one high-school in a rural Minnesota community; volunteer undergraduates in three Minnesota liberal arts colleges; and adults living in six adult community settings. Children in special education classes were excluded from each sampling. The total number of subjects in the norming sample is somewhat confusing. The print-outs from the T.O.V.A. states that the norming base is “of 2000 children and adults.” The manual presents data dated 7/94 that includes 1590 children and adults. These are entitled “revised norms” yet are the same as those published in a paper dated 9/92. There is no mention of the remaining 400+ subjects.) Still, the norm sample is impressive for a test not published by a major company.

The primary author of the T.O.V.A., Dr. Lawrence Greenberg, is a psychiatrist, and the concepts of reliability and validity appear to be addressed in a somewhat different fashion than is typical in our field. To it’s credit, many differential diagnoses studies are cited where the T.O.V.A. is used (alone, and in conjunction with the Connors Parent Teacher Questionnaire (CPTQ)) to discriminate between children with attention deficit disorder and normals (also children with behavior disorders/and other diagnoses). The T.O.V.A. appears to have good sensitivity and specificity in this regard, particularly when used, as the authors recommend, in conjunction with other instruments. The T.O.V.A. was best at differentiating between attention deficit and normals. Still some normals overlap with some attention deficit disorder children. (There are no studies to show the T.O.V.A. is able to differentiate an attentional disorder from a specific learning disability. One statement made in the promotional materials and restated on the video is that because the T.O.V.A. uses a task that is “non-language based” it can differentiate ADD from learning disorders. We are not sure that that statement is sufficient to prove the point. If this was in fact true, why were the special education students excluded from at least the original normative sampling? It might have been helpful to have tested children identified as having a specific learning disability and then compare those results to the normative group.)

One concurrent validity study with very few subjects looked at the overlap between the T.O.V.A. and the CPTQ. Unfortunately, the authors employed a canonical correlation with 23 subjects and approximately 10 variables (it was unclear exactly how many variables were utilized). This is far too few subjects for such a study, and is therefore uninterpretable.

The authors looked at test-retest data for 97 subjects across ages and found no significant differences between testings “except for commission errors which…improved during the first half of the test from first to second test but not for two subsequent tests.” (Manual, p. 2). Interestingly, the authors note that practice effects tend to be reverse of other tests, in that subjects tend to do worse on it, as the novelty of the stimulus wears off. Overall, the authors concept of reliability in the manual refers to what are basically “lie scales”, however, these scales appear very useful in telling if the subject is merely responding at random. Psychometric reliability data would be welcome.

More validity studies would be useful, particularly in a divergent/convergent framework (e.g., does reaction time (or any of the measures) as used in the T.O.V.A. correlate with cognitive ability; do they correlate with other observable behaviors, etc.). Correlation between the different T.O.V.A. measures would also be useful. The authors state that they assume that a child 2 standard deviations below the mean on IQ would also be 2 standard deviations below the mean on the T.O.V.A.. Actually, the lower the g-loading of a given T.O.V.A. measure, the more it would tend to regress (be closer to) the mean.

The authors do an excellent job at showing how stimulant therapy affects T.O.V.A. responses. The T.O.V.A. appears particularly useful in being used to establish a baseline, prior to stimulant medication, then used to monitor stimulant medication afterwards. The T.O.V.A. measures appear very sensitive to stimulant therapy. This finding is quite impressive and certainly bolsters the validity of the T.O.V.A..

The authors take great care to point out the T.O.V.A. is not meant to diagnose attention deficit disorders, but is a good screener, and is useful as a part of a larger battery. They advocate behavioral interventions, possibly in conjunction with stimulant medication.

The T.O.V.A. was easy to load and run. The program worked effortlessly with the minor exceptions noted above. It is purposefully boring, and probably more so for the examiner who must sit patiently through the 23 minute test. Examiners may find themselves leaving the client alone while the test continues, but this seems like a bad idea since the clients behavior during the testing may be important in the interpretation of the results.

The T.O.V.A. looks promising and would make a good tool for further research. The manuals are replete with typos, but perhaps that was a test of our vigilance. The manuals could have benefited from a historical and theoretical perspective, as well. Overall, the test would certainly benefit from more of the typical reliability and validity data, but was impressive in many areas, including differential diagnosis and sensitivity to stimulant medication. It should serve researcher’s well, and would be fun to use for those considering masters’ thesis and doctoral dissertation work in the area.

Conners’ Continuous Performance Test (CPT)

The Conners’ Continual Performance Test (CPT) is a computerized, 14-minute, visual performance task in which the subject must respond repeatedly to non target figures and then inhibit responding whenever the infrequently presented target figure appears. The test is a “useful attention and learning disorder measure for children, and is sensitive to drug treatment in hyperactive children.” The manual states that the program is most useful for children between the ages of 6 and 17. Among the many variables are: Number of Hits, Omission, Commission; Response Time. It was created by Dr. Keith Conners and is distributed by Multi-Health Systems, Inc. as well as The Psychological Corporation. Cost for this test is $495 (this is for Version 4; current Version 5 is $595 with added features). This price includes the CPT disk, and an interpretation and installation manual. The program offers unlimited administration, scoring and interpretations of the complete “Standard” paradigm. For research purposes, the computer program offers the ability to create customized paradigms with varying letters, presentation time, trials per block, etc.. It must be noted that normative data is only available for the standard paradigm. Anyone using the customized paradigm must do so with the understanding that no normative data is available for any such changes.

The “Standard” test itself requires the subject is to press the appropriate mouse button or the keyboard’s spacebar for any letter except the letter X. There are 6 blocks, with 3 sub-blocks each of 20 trials (letters presented whether target or not). For each block, the sub-blocks have different stimulus intervals. These intervals vary between blocks.

The Conners’ CPT computer program was reviewed using an IBM computer as well as a Power Mac 7100/66, running Soft Windows with 16MB of memory. Although the program was easily loaded onto the Power Mac, it could not be run under the simulated DOS. A toll free technical support number is available for anyone having difficulty with the program. The first time we called we were put on hold for 30 minutes before the technical support person came on. The next two times we were connected to technical support within a minute. All questions were answered quickly and courteously. Once properly installed on the IBM, the program ran flawlessly.

The manual provides normative data on 1190 subjects, at 8 different age groupings. This sampling is further broken down into two groups: General population (n=520) and Clinical sample (n=670). Careful reading of the manual indicates that this clinical sample was further broken down to 484 people after 130 subjects were removed for a cross validation study, 46 removed for being “outliers”, and 10 more removed because of being on medication. The 484 was finally reduced to 238 subjects comprised of ADD/ADHD and comorbid cases (including ADD/ADHD as one of the diagnoses). Male and female norms are used by the computer program but are not reported separately in the manual. The “general population” and “clinical population” consisted of 51.2% and 75.4% males respectively. No breakdown by age category is offered. (In fact, in the manual, no normative score data is given with the exception of that stated above). The norms are not stratified and little, if any, information is provided about the makeup of these children and adults. Very little information regarding socioeconomic levels, geographic regions, education levels, or race information is provided. It is noted that data for the general population came from 5 states and “Southern Ontario.”

The program provides data as both raw scores, T scores, percentiles, and descriptive classifications (e.g.., Within Average range, Mildly atypical, etc.). Reports are available on screen, as a print out, and as an ASCII file saved to disk.

The concepts of reliability and validity were not addressed thoroughly in the manual. It appears from reading the extensive annotated bibliography that some studies may have been carried out by independent researchers. However, with only the manual to rely on, we were left with many questions regarding these issues.

The major validity issue addressed in the manual looked at the ability of the CPT to discriminate between children with attention deficit disorder, “normals” (includes children with behavior disorders/and other diagnoses), and a comorbid group (children with dual diagnoses of ADHD and other disorders).

The CPT appeared to discriminate well, typically having the poorest mean score in the pure ADHD group, a somewhat better mean score in the comorbid group, and the “best” mean score in the “other” group, for the majority of variables. Unfortunately, the standard deviations were not listed, so the degree of overlap between groups on these variables is unknown. Another statistical technique, such as discriminant analysis would have been nice. Also problematic would be the existence of subtypes of ADHD within the ADHD sample. Perhaps the greatest display of validity is the letter of support issued in Russell Barkley’s newsletter that states the CPT is very much in line with current theory compared to many other instruments on the market (1993, June).

The manual seemed more concerned with history and theory than reliability and validity issues. The admissions in the manual were well appreciated, including the variability in sustained attention across times with the same subject, and the idea that, like IQ, there are many reasons for poor scores.

More research is needed on the stability of the many variables this test offers. Also needed is information regarding the independence of these variables (are they highly correlated with each other? What other measures do they correlate with?). Some of the independent research listed addressed these questions, but often the short abstracts of the studies listed were far too scanty to cull such information from.

However, kudos to the publisher for compiling the reference bibliography with abstracts (the little information contained was tantalizing and should send many buyers to their respective research libraries.)

To the author’s credit, an excellent job is done at showing how stimulant therapy affects CPT responses. The author also takes great care to point out throughout the manual that the CPT is not meant to diagnose attention deficit disorders by itself, and is useful as a part of a larger battery.

Intermediate Visual and Auditory Continuous Performance Test (IVA)

The Intermediate Visual and Auditory Continual Performance Test (CPT) is a computerized, 13-minute, visual and auditory performance task in which the subject must click the mouse only when he or she sees or hears the number 1 and not click when he or she hears or sees the number 2. The test is designed to assess two major factors: response control and Attention. In addition, the IVA provides “an objective measure of fine motor hyperactivity.” The manual states that the program is useful for persons between the ages of 5 and 90+. Among the many variables are six core quotients and 22 subscales. It was created by Drs. Joseph Sanford and Ann Turner and is distributed by BrainTrain. Cost for this test varies. A limited use kit (25 administrations) costs $598. This price includes the IVA disk (IBM 3.5 or IBM 5.25), and an interpretation and installation manual. Disks with an additional 25 tests may be purchased at a cost of $75. Users also have the option of purchasing an “Unlimited Use Version” for $1495.

The IVA CPT computer program requires an IBM computer with DOS 5.0 or later; 1 MB RAM/2 MB harddrive; a graphic monitor; serial mouse (Microsoft recommended); Creative labs Soundblaster card; Headphones or external speakers. (A toll free technical support number is available for anyone having questions with the program or the interpretation.) These requirements caused the most difficulty for these reviewers. In order to properly run the program, we had to find a computer that met each of the requirements, the most important being the Soundblaster card, the Microsoft mouse, and the headphones. Those well versed in IBM computers may feel right at home with this product, but these reviewers struggled for over an hour trying to get the mouse driver configured and the sound card and driver running. The installation of the program itself was not difficult. Step by step directions are provided in the well written manual. One hopes that the program can be re-written for a Macintosh since those computers come with voice capability and speakers built in.

The program uses normative data from 781 subjects (423 female, 358 male), at 10 different age groupings. No breakdown by age category is offered in the manual. There is no evidence that the norms are stratified and little, if any, demographic information is provided about these subjects. No information regarding socioeconomic levels, geographic regions, education levels, or race is provided. It is noted that the groups were comprised only of persons “who do not report any attention, learning, neurological or psychological problems.” The normative data file, contained on the program disk, was easily read by us using a Macintosh computer. The 10 age groups averaged approximately 42 female (range 15 at age 55+ to 75 at age 7-8) and 36 males (range 17 at age 45 – 54 to 68 at age 7-8). Age groupings are: 2 years (5 – 10), 3 years (11 – 13), 4 years (14 – 17), 7 years (18 – 24) and 10 years (25 – 55+).

The program provides data as both Quotient scores (mean 100, SD 15), percentiles. Graphs also are used to represent the results. The interpretation section of the manual is easy to read yet quite complex. The 6 quotient scores plus 22 subscales offer a large number of decisions and comparisons. The manual presents 17 pages of description and definition for each scale and 34 pages of interpretive suggestions. Included is a 21 step “procedural guidelines” for interpreting the IVA. The program offers 3 “Validity” scales used to confirm or refute the IVA results. Reports are available both on screen and as printouts. Data is stored and available on disk for retesting and comparisons.

The packet we reviewed contained 5 unpublished studies (presented at the 1995 APA conference). These studies address: normative, reliability, and validity data, differences in auditory and visual processing, and finally developmental age and sex differences on the IVA.

The extremely well-written manual was readable and informative. It addressed both reliability (stability?) and validity issues. It also reported (admitted) the less than stellar test/retest correlations across some variables (ranged from .37 to .75 for the composite variables). Even though the studies were only APA conference presentations, the authors have attempted to look at the important issues. An important question as yet unanswered by the materials included was to what extent did the auditory and visual variables correlate – how separate are they? Are they as highly correlated as the Wechsler verbal and performance scales? If so, are they subject to the same argument proposed by MacMann and Barnett who suggest that the Wechsler scales are so highly correlated as to render them similar measures of the same construct (for a review, see Kaufman, 1994).

Particularly impressive were the three validity scales, which ensure scores in ADHD ranges come from ADHD behaviors and not motor problems, fatigue, or random answering. The manual also acknowledged the relationship between IQ and sustained attention, and suggested IQ scores of 120 and above would be well-served by comparison to the next age norm table up. The authors also acknowledged the issue of subtypes of ADHD (inattentive, impulsive, “mixed,” and other). The authors addressed the issues head on, and asked the best questions. While all three tests addressed reliability and validity to some degree, the IVAs authors did the best job at asking the right questions. While all three tests are really in the beginning stages of compiling research data on reliability and validity, the IVAs authors are headed in the most compelling direction.

Test takers’ point of view

Each of the CPTs was administered to one or more of these reviewers to assess ease of use from the test takers point of view. Since the tests varied in length from 13 to 23 minutes, plus some additional time for practice testing, we found that our attention varied as the tests extended in time. Early in the testing sessions, we found ourselves being very cautious and completely focused on the screen, but as the tests continued on in time, it seemed more and more difficult to maintain our focus on the stimuli from the computer. The speed of some of the tests’ stimuli presentations was so rapid that we found ourselves almost afraid to blink. This led to our eyes becoming tired, and to a heightened sense of anxiety. (Because the tests are standardized, we assume that those included in the norming sample probably felt some similar feelings, and the normative scores adjust for such feelings.) Testing was done in a fairly sterile room, but we did not attempt to ‘sanitize’ the room completely. There were some materials on the walls and in shelves around the computer. We found that even these few things became very tempting distractions during the testing. Not only did we find ourselves easily distracted by these visual materials, we found ourselves being drawn to and distracted by sounds outside the testing room. Each manual provided instructions to the testers about how to create a positive testing environment, and we strongly reinforce the need to follow these instructions. Any extraneous material may have the potential to interfere with performance. Because the examiner needs to be present during each of these CPTs, the examiners must assure that they do not become a distraction themselves by unnecessarily moving about or making any noise. (This may become somewhat difficult, especially after sitting through a number of these admittedly ‘boring’ tests.) One final caution we learned by taking the tests was the need to give the directions exactly as they are presented in the manual. For example, one of us took a test without having the instructions read verbatim from the manual, and without any emphasis placed on the direction to do the task “As fast as you can.” Interestingly, the resulting printout recommended further assessment because of the suspicion of an attention disorder.

CPT Bottom Line

Choosing between the different tests will depend on many individual factors. The three CPTs reviewed in this issue of the Communiqué each offers something unique to the examiner and examinee. The T.O.V.A. uses a design (square), Conners’ letters, and the IVA numbers (1 and 2). The IVA is the only CPT to offer both auditory and visual procedures. If cost is a factor, the Conners’ is the least expensive while the IVA the most expensive. If the computer system is an issue, the T.O.V.A. is the only CPT that runs on a Macintosh, while all three offer versions for IBM based machines. The T.O.V.A. and the Conners’ require the least amount of “extra” hardware. Normative data for the CPTs was largest for the T.O.V.A. although none of the test provided enough demographic information about the subjects to make informed judgments about the suitability of the data. If time is a factor, the Conners’ and the IVA were the shortest tests. Ease of use was comparable for each of the tests. Support for the programs by way of toll free telephone numbers is provided by each system.

Our experience using these three programs was generally very positive. We stress however that the programs are simply one tool to be used in a multi-dimensional assessment. Each test product included clear warnings about not basing diagnosis on the single instrument or result. We whole-heartedly endorse this caution, especially given the vast differences between computers, computer systems, and the few ‘kinks’ we discovered during our limited reviews.

The editors of the Communiqué would like to thank each of the three companies for providing the programs for review.

Content on these pages is copyrighted  by Dumont/Willis © (2001) unless otherwise noted. The original website can be found at