Effective Testing: Validity, Reliability, and Practicality

Posted on Mar 12, 2025 in Naval Architecture and Marine Engineering

The Ideal Test: Validity, Reliability, and Practicality

1. Validity

A test is valid when it measures precisely the abilities in which we are interested. This involves choosing the appropriate content and techniques.

2. Reliability

A test is reliable when it measures these abilities consistently. This implies, for example, that the same score will be obtained whether the test is taken on one particular day or on the next. Sources of unreliability may be found in some features of the test itself or in the ways of scoring it.

3. Practicality

A test should be appropriate for the resources that are at hand. This means that it should be easy and cheap to construct, administer, score, and interpret.

Why Test?

Testing is important because:

It is a measure to be used as part of the evaluation of student performance for the purposes of comparison and selection.
It is a measure to be used as part of the evaluation of teaching to increase its effectiveness by making the necessary adjustments to enable certain groups of students or individuals in the class to benefit more.
Tests help teachers to evaluate the effectiveness of the syllabus as well as the methods and materials she or he is using.

What is the Purpose of Testing?

Tests are used:

To measure students’ proficiency.
To discover how far students have achieved the objectives of a course of study.
To diagnose students’ strengths and weaknesses, to identify what they know and what they do not.
To assist placement of students by identifying the stage or part of a teaching program most appropriate to their ability.

Teaching and Testing

The effect of testing on teaching and learning is known as:

Beneficial: Tests focus on purposeful, everyday communication activities. They are used as devices to reinforce learning and motivate the students.
Harmful: Test contents and teaching techniques are at variance with the objectives of the course.

Elements in Test Construction

1. Test Specifications

It is a detailed document, and is often for internal purposes only. It is sometimes confidential to the examining body. The syllabus is a public document, often much simplified, which indicates to test users what the test will contain.

2. Item Writing and Moderation

Item writers have to begin their writing task with the test’s specifications. This may be an obvious point, but it is surprising how many writers try to begin item writing by looking at past papers rather than at the specifications. This recourse to past papers is probably due to the fact that many tests lack proper test specifications.

Types of items: the method effect; MCQ/SAQ – cloze/c-tests.

3. Pretesting and Analysis

Trials: However well designed an examination may be, and however carefully it has been edited, it is not possible to know how it will work until it has been tried out on students.

Test Analysis:

Correlations (different versions-different skills); item analysis (facility value-discrimination index); reliability indexes (retest-versions-split half); training of examiners (holistic vs. Analytic marking; rasch analysis); validation (validity: rational-representative simple Thorndike & Hagen 1986); empirical -predictive validity/concurrent validity Hughes 1989); construct-what the test scores actually mean (MCQ)/what the test tests specifications; post-test reports: examiners; candidates; teachers who prepare students for the test.

Validity

It means that the test measures what it has to measure. It has to do what it has to measure. Communicate efficiently in the language.

Reliability

It has to do with the idea that if you repeat the same exam several times you always get the same points. An exam has to be valid and reliable, but sometimes valid is not always presented.

Content Validity

Measure all what it has to measure. But you cannot put in an exam everything, real situation. You have to put the most frequent. The core of a good exam is the frequent questions. You should include what is relevant. An exam is to demonstrate what we know and not what we don’t know.

Criterion-Related Validity

Concurrent validity; predictive validity. Has to do with the empirical measure. We have to compare 2 different exams. We have to prove that our exam is as good/valid as others. It’s theoretically compare the results.

Concurrent Validity

You take concurrent (at the same time) exam. At random we separate in different groups & we put one short exam & long exam.

Predictive Validity

Placement exam, how we can know the validity. The ability of the test to measure the behavior of students.

What’s Evaluating?

Assessment: General term. Test-testing is a part of the assessment. We learn what we students has learnt. Broader term.

Evaluation: An evaluation in process. It’s not concerned with evaluation itself. “Control de calidad”.

Achievement & Proficiency Test: Formative & summative evaluation. F: progress & S: final evaluation. Its important to know what our students know & what is their progress & to help them to learn.

When to Evaluate

Initial Evaluation: Progress of learning, Help teacher that know how their students are & motivate the students.
Summative Evaluation: Final evaluation. What they really know at the end of the course.