Evaluating the quality of tests


This is a course on tests and measurement. So where does measurement fit, and why have we not discussed it more? The answers to these two questions are: “Everywhere,” and “We have.” That might sound evasive, but consider this: measurement is the quantification of the constructs of interest. Assigning a number to the results of a psychological test is an example of measurement. It allows us to compare individual and group differences. Psychological measurement requires a basic understanding of the statistics used in the development, selection, use, scoring, and interpretation of tests. These statistics include validity, reliability, measurement of error, factor analysis, and others. These statistics have many uses, which include describing the relationship between standardization samples.


Some categories of tests are more difficult to measure than others because not everything we want to know can be easily identified. For example, it is easier to develop a test to measure what a person learned in a course. A valid test will have similar qualities as the course content; they can be compared, and they are tangible. In comparison, IQ measures what we define as intelligence. Since we cannot easily identify intelligence by looking at it, the results of any intelligence test can create the following circular reasoning: an IQ is what an IQ test measures; it measures IQ. Of course we have quantified concepts such as intelligence and personality based on their theoretical foundations. Choosing the most appropriate test starts with a background such as you are receiving in this course. It also requires knowing what to look for and where to find it.


What to Look for in a Test


Whether we construct, administer, or otherwise use tests, we have the responsibility to do it ethically. That means having the competence for whatever we do. The same thing is true for psychological tests. They must also pass tests to show their competence. They must undergo rigorous evaluations of their validity and reliability— and they do not even get to study! The quality of a test is based on many factors. For example, we know a test must be valid and reliable. However, we must start by asking two key questions: 1) What information do I need to know? and 2) What is the publisher’s stated purpose for the test? The two should match.


One example is the Myers-Briggs Type Indicator (MBTI). Based on Carl Jung’s theory of psychological types, the basic purpose of the test is to determine where a person is on the introversion-extroversion continuum. Some learners who choose the MBTI for their course project want to show it is a good predictor of job success. When their literature review does not show this, they state the MBTI is not valid or reliable. The publisher even states that the test has not been shown to predictor of job success, and using it for that purpose is not recommended. This is an example of finding fault with a test for not doing what it was never intended to do. The importance cannot be overstated.


There are many sources that will give you the information you need to evaluate the quality of a test, but the details and accuracy can vary. It is important to know what data you need for your purposes. For example, the publisher of a test can tell you that a test is valid and reliable but might not show you the data. (Keep in mind that publishers have a vested interest in selling you their test and may not want to provide any information that you might use to eliminate their test as your choice.)


The test manual is a good place to start. It can contain most of the information needed to understand the basic properties mentioned earlier. However, to know the current state of its use or additional applications, more in-depth information is needed. In addition to the current professional literature, reading the most current test review will give you an objective report that covers much of the basic information you need. The Mental Measurements Yearbook is widely used and can be found in the Capella Library. There are many criteria used in the development and administration of a psychological test. Besides the test’s purpose (constructs measured), the minimum information you will want to know are the type of test, the psychometric properties, and the standardization sample.


Most tests covered in this course are standardized, but non-standardized tests have their own purpose. Non-standardized test can have different questions, time limits and even be evaluated differently depending on the person, group, or condition. Often seen in school settings where instruction is individualized, the results cannot be compared or generalized. Standardized tests are more reliable and are used for different reasons. Other considerations are individual or group tests, and category of test (intelligence, achievement, or personality).


Psychometric properties are statistical properties of the strengths and weaknesses of a test. Two of the most common are validity and reliability. This data can often be found in the test manual. Validity is especially important, because the data must show validity for all of the intended uses.


Knowing the standardization sample can often help avoid any real or perceived issues of test bias. The person or group being tested should have similar characteristics as the standardization sample. Test manuals should have this information.


The Ethical Use of Tests


At the end of your first course in tests and measurement, you are on your way to being able to administer, score, and interpret psychological tests. That means you will have the background to think and read more critically about tests and their use. It means you can better understand journal articles that report test results. Sharing and discussing the meaning of a person’s test results can have a great impact on his or her life. Remember, there are many different types of tests that are used in making decisions and diagnoses.


An understanding of what tests are and what they can do is necessary; that is what you are learning now. Further education in tests and measurement, along with supervised experience administering tests, is required to become competent in their use. Test publishers rate their tests as to the minimal level of education and experience required. Remember, all test administrators began by taking their first course, just like this one.




To successfully complete this learning unit, you will be expected to:


Apply psychometric features related to the construction of psychological testing.

Apply standardization, normative data, and sources of error to psychological tests and measurements.

Apply properties and techniques used in psychological tests and measurements.



Complete the following in your Standards for Educational and Psychological Testing text.



Chapter 1, “Validity.”

Chapter 2, “Reliability and Errors of Measurement.”

Chapter 5, “Test Administration, Scoring, and Reporting.”


Chapter 6, “Supporting Documentation for Tests.”

Chapter 11, “The Responsibilities of Test Users.”

(Visited 5 times, 1 visits today)