Essential Measurement and Assessment Terms

Posted on Oct 3, 2024 in Psychology and Sociology

Measurement and Assessment Glossary

Basic Concepts

Absolute-score interpretation: a form of interpreting raw scores by comparing them to a defined criterion; a type of interpretation of raw scores used with a criterion-referenced test.

Assessment: the systematic process of obtaining information from data from tests, observations, interviews, and checklists.

Battery: a set of tests, typically administered as a unit, and used to obtain a more complete picture of the individual being tested.

Concept: an intangible idea.

Construct: a concept or characteristic that a test is designed to measure.

Criterion: a standard that allows someone to make a decision or judgment.

Criterion-referenced test: a test in which the raw score of the test taker is interpreted in the context of a pre-established criterion; uses absolute score interpretation of raw scores.

Empirical: to obtain knowledge through experience or observation.

Empirical referent: the observable behaviors specified to be signs of the construct being measured.

Examinee: the individual who is given the test; also called the test taker or testee.

Inference: reasoning from something known to something unknown.

Inventory: a type of test; a questionnaire or checklist used to sample the test taker’s behavior in a specific domain; used to elicit information on test takers’ opinions, characteristics, attitudes, interests, or responses to situations.

Item: a specific question to be answered, a task to be performed, or an observation that is recorded for interpretation in a test.

Measurement: the act of assigning a number to a variable following a set of rules.

Measure: a device or technique in which a sample of the test taker’s behavior in a specific domain is obtained; also called a test or scale.

Non-standardized test: an informal test that does not have standard procedures for constructing, administering, scoring, and interpreting results.

Norm group: a representative sample of individuals from the population for which the test is intended.

Norm-referenced test: a test that is interpreted by comparing answers from the test taker to those of other people from the norm group; uses relative score interpretation of raw scores.

Operational definition: specifies and describes the empirical referents for a conceptual variable; it identifies the observable behaviors we will measure.

Psychometrician: a person formally trained in psychometrics.

Psychometrics: the science of test construction and evaluation.

Qualitative scale: following an established set of rules, an individual is placed into an appropriate category based on the level of the variable being measured.

Quantitative scale: following an established set of rules, a number is assigned to each level of what is being measured.

Raw score: a score based on the test taker’s response to each item on a test prior to interpretation.

Relative-score interpretation: the raw score is compared to the performance of others from a defined population; a type of interpretation of raw scores used with a norm-referenced test.

Scale: a device or technique in which a sample of the test taker’s behavior in a specific domain is obtained; also called a test or measure.

Sign: a type of item that asks if something visible is present or not.

Skill: complex behavior implemented by an individual.

Standardized test: a type of test that has a specified method of constructing, administering, scoring, and interpreting the results of a test.

State: the current “being” of an individual.

Subject variable: a characteristic or attribute of a person that we want to measure.

Test administrator: a person who administers a test; also called the tester; the test user and test administrator may be the same individual.

Test taker: the individual who is given the test; also called the testee or examinee.

Test user: the person or organization who is responsible for the selection, administration, and interpretation of a test.

Testee: the individual who is given the test; also called the test taker or examinee.

Tester: a person who administers a test; also called the test administrator; the test user and test administrator may be the same individual.

Test: a device or technique in which a sample of the test taker’s behavior in a specific domain is obtained; also called a scale or measure.

Trait: a characteristic that an individual possesses that is fairly stable across many situations and over time.

Variables: any attribute, characteristic, or condition that can take on different values at different times in different individuals.

Types and Applications of Tests

Historical Context

Chronological age (CA): a person’s age in years.

Craniometry: an attempt to measure intelligence by measuring the size of the skull.

Empirical keying: the meaning of a test taker’s answer to an item is determined by comparing it to the answers of the standardization group.

Group test: a test that may be administered to all the members of a group at one time.

Individual test: a test that can only be administered by one test administrator to a single test taker at a time.

Intelligence quotient (IQ): the MA divided by the CA and multiplied by 100. The IQ is no longer calculated this way.

Mental age (MA): the typical age at which people in the standardization group can answer an item.

Mental level: what Binet initially characterized an item to be, based on the typical age at which people in a standardization group can answer an item. Selected by Binet to highlight that intelligence isn’t a fixed entity. This eventually became a mental age.

Non-structured personality test: a personality test where the examinee is presented with a test item and allowed to respond with a sentence or word or words of his or her choice.

Performance test: a test that minimizes the use of language and emphasizes the examinee’s performance or behavior on test items.

Structured personality test: a test that presents the examinee with a set of items, each requiring a defined answer such as “yes” or “no.”

Categorization by Purpose

Ability test: evaluates the current level of an individual in a specified domain. Ability tests can include tests of cognition, motor skills, or physical functioning (AERA).

Achievement test: evaluates knowledge or skill that has been learned by the individual being tested and tends to be specific to the content being tested (AERA).

Adaptive test: can have many forms. However, what all adaptive testing has in common is that the items an individual sees on this type of test can be dependent upon how the individual had performed on the prior test item or items.

Aptitude test: a type of ability test that measures a person’s natural aptitude (that is, what an individual is capable of doing).

Diagnostic-and-treatment test: designed to determine if the test taker is exhibiting behavior that would result in being diagnosed with a particular condition. Information from diagnostic-and-treatment tests is also used to prescribe a course of treatment and can be used to evaluate if the treatment plan is effective.

Group test: administered simultaneously to many test takers as the test takers record their own responses.

Individual test: administered to a test taker one person at a time as the examiner codes responses.

Intelligence test: a test designed to measure cognitive function.

Interest, Value, or Attitude Test: a test designed to measure enduring beliefs, desires, and affective, behavioral, or cognitive predispositions to respond to situations, people, or objects.

Neurological assessments: the process of using tests and other neurological tools to evaluate the functioning of the central nervous system for evaluating psychological or behavioral function and dysfunction.

Neuropsychological tests: tests designed to measure behavior that is associated with functioning or malfunctioning components of the central nervous system.

Personality test (personality inventory): measures one or more characteristics associated with psychological attributes or traits or interpersonal skills of an individual. Like intelligence tests, personality inventories are typically based on a single theory of what constitutes personality.

Program-evaluation test: created to be part of an assessment plan for the purpose of evaluating some component of a program.

Research test: designed for the purpose of evaluating a variable of interest in the scientific study of psychology, education, political science, business, or other disciplines.

Screening tests: tests used to make broad categorizations of test takers for the purpose of making an initial selection decision or diagnosis. However, screening tests cannot be used for selection or diagnosis; they are merely a means of indicating that an individual qualifies for further evaluation for the purpose of selection or diagnosis based on the more invasive test results. Thus, screening tests tend to be shorter than tests designed for diagnosis. They also tend to be less expensive and easier to give than most diagnostic tests.

Selection-and-placement tests: designed to select an individual for a particular task or place an individual into a particular category based on his or her performance on the test; also called classification tests, as they can be used to classify an individual based on his or her performance on a test.

Self-interest test: designed for the purpose of providing self-understanding about interests, preferences, and careers to the test taker.

Test administration: the format of how a test is provided to the test taker; tests can be administered to either a group of people or to a single individual.

Test application: how a test is to be used; can be clinical or nonclinical.

Test domain: the knowledge, skill, processes, attitudes, values, emotions, or behaviors that are being measured in a test. It is common to separate the domain of the tests into two categories: human ability and persona. These categories can be further divided into additional subcategories.

Test purpose: the intent for which a test was created to evaluate individuals; the intent for which test results can be interpreted.

Test use: tests can be used to screen or evaluate an individual.

Vocational testing: testing designed to make general hypotheses and inferences about an individual’s work needs, work-related values, interests, goals for career development, vocational maturity and, when appropriate, to help make sense of indecision within an individual with regard to his or her vocation.

Test Interpretation and Scores

Absolute-score interpretation: the comparison of raw scores to a defined criterion.

Achievement levels: descriptions of a test taker’s competency in a particular area of knowledge or skill, usually defined as ordered categories on a continuum. Also called proficiency levels.

Age equivalencies: normative results that provide the performance level of the test taker based on the typical age of the similarly performing individuals from the norm group.

Content domain: the domain of knowledge and skills needed as specified by an established criterion.

Credentialing purposes: granting to a person, by some authority, a credential such as a certificate, license, or diploma, signifying an acceptable level of performance in some domain of knowledge or activity.

Criterion: a standard that allows someone to make a decision or judgment in evaluating a test taker’s responses to the items of a test.

Criterion-referenced test: a test that provides measurements directly related to performance on specific behaviors or task performance.

Cut score: a specified point on a score scale such that scores at or above that point are interpreted or acted upon differently from scores that are below that point.

Domain: the set of behaviors, knowledge, skills, or abilities that we are attempting to measure.

Expectancy table: presents the behaviors, knowledge, or skills expected of a test taker based on the score he or she received on the test. It specifies a set of behaviors or symptoms expected to be present for a specified diagnosis or classification.

Grade equivalencies: normative results that provide the performance level of the test taker based on the typical grade level of the similarly performing individuals from the norm group.

Local norms: norms used to provide information about how a test taker has done in comparison to a reference group composed solely of other test takers from the surrounding regions.

Measurement error: an error that occurs when a test measures something beyond the constructs it is intending to measure.

Nonrandom error: systematic error that consistently either adds to or takes away from the true score in the form of the observed score.

Norms: the values used to compare the performance of an individual to the specified reference population.

Norm group: a sample of individuals similar to the individuals who will be taking the test.

Norm-referenced tests: tests that are interpreted by comparing answers to those of other people.

Percentile rank of a score: indicates the percentage of scores in the distribution that are equal to or less than that score.

Performance standards: specify what is acceptable knowledge or performance of the content domain.

Proficiency levels: descriptions of a test taker’s competency in a particular area of knowledge or skill, usually defined as ordered categories on a continuum. Also called achievement levels.

Raw score: score on a content-referenced test, that is, the score based on the test taker’s response to each item.

Random error: unsystematic error in a measurement in the observed score. This is as likely to increase as it is to decrease the true score.

Reference population: another name for the norm group. It is a representative sample selected for a specific test for the purpose of providing normative information, that is, information on how an individual test taker compares to the overall reference population.

Relative-score interpretation: interpreting the raw score by comparing it to the scores of the norm group.

Reliable: another name for consistency when speaking about tests or test items. Thus, a reliable test is a test that will consistently provide the same score for the same person over multiple measurements. A reliable test item is one that the test taker will respond to in the same manner over multiple attempts.

Standard score: a transformation of a raw test score that indicates how far away it is from the mean of the distribution of the test scores. Standard scores differ based on what the established mean and standard deviation are. New standard score = (the new standard deviation)z + new mean.

Systematic error: consistent error that can mask the actual measurement of the construct of interest.

T score: a type of standard score where the mean is 50 and the standard deviation is 10. T = (10)z + 50.

Table of specifications: specifies the number of items on a test per component as identified by the task analysis.

Task analysis: the process of defining the content domain, breaking it into component parts, then defining each part in a measurable manner.

Unsystematic error: random error that can be measured along with the construct of interest with any measurement.

Z-score: the transformation of an observed score to a z-score to identify the location of that observed score on the standard normal distribution. A z-score distribution has a mean of 0 and a standard deviation of 1.