Os IGNORANTES, que acham saber tudo, privam -se de um dos maiores prazeres da vida: APRENDER.

How Stable and Consistent Is Your Instrument?

 

 
A Brief Look at Reliability
 
This web page was designed to provide you with basic information on an important characteristic of a good measurement instrument: reliability. Prior to starting any research project, it is important to determine how you are going to measure a particular phenomena. This process of measurement is important because it allows you to know whether you are on the right track and whether you are measuring what you intend to measure. Both reliability and validity are essential for good measurement, because they are your first line of defense against forming inaccurate conclusions (i.e., incorrectly accepting or rejecting your research hypotheses). Although this tutorial will only address general issues of reliability, you can access more detailed information by clicking on the words or titles that are highlighted.
 
What is Reliability?
I am sure you are familiar with terms such as consistency, predictability, dependability, stablity, and repeatability. Well, these are the terms that come to mind when we talk about reliability. Broadly defined, reliability of a measurement refers to the consistency or repeatability of the measurement of some phenomena. If a measurement instrument is reliable, that means the instrument can measure the same thing more than once or using more than one method and yield the same result. When we speak of reliability, we are not speaking of individuals, we are actually talking about scores.
The observed score is one of the major components of reliability. The observed score is just that, the score you would observe in a research setting. The observed score comprised of a true score and an error score. The true score is a theoretical concept. Why is it theoretical? Because there is no way to really know what the true score is (unless you're God). The true score reflects the true value of a variable. The error score is the reason why the observed is different from the true score. The error score is further broken down into method (or systematic) error and trait (or random) error. Method error refers to anything that causes a difference between the observed score and true score due to the testing situation. For example, any type of disruption (loud music, talking, traffic) that occurs while students are taking a test may cause the students to become distracted and may affect their scores on the test. On the other hand, trait error is caused by any factors related to the characteristic of the person taking the test that may randomly affect measurement. An example of trait error at work is when individuals are tired, hungry, or unmotivated. These characteristics can affect their performance on a test, making the scores seem worse than they would be if the individuals were alert, well-fed, or motivated.
Reliability can be viewed as the ratio of the true score over the true score plus the error score, or:
true score

true score + error score
 
Okay, now that you know what reliability is and what its components are, you're probably wondering how to achieve reliability. Simply put, the degree of reliability can be increased by decreasing the error score. So, if you want a reliable instrument, you must decrease the error.
As previously stated, you can never know the actual true score of a measurement. Therefore, it is important to note that reliability cannot be calculated; it can only be estimated. The best way to estimate reliability is to measure the degree of correlation between the different forms of a measurement. The higher the correlation, the higher the reliability.
 
3 Aspects of Reliability
Before going on to the types of reliability, I must briefly review 3 major aspects of reliability: equivalence, stability, and homogeneity. Equivalence refers to the degree of agreement between 2 or more measures administered nearly at the same time. In order for stability to occur, a distinction must be made between the repeatability of the measurement and that of the phenomena being measured. This is achieved by employing two raters. Lastly, homogeneity deals with assessing how well the different items in a measure seem to reflect the attribute one is trying to measure. The emphasis here is on internal relationships, or internal consistency.
 
Types of Reliability
Now back to the different types of reliability. The first type of reliability is parallel forms reliability. This is a measure of equivalence, and it involves administering two different forms to the same group of people and obtaining a correlation between the two forms. The higher the correlation between the two forms, the more equivalent the forms.
The second type of reliability, test-retest reliability, is a measure of stability which examines reliability over time. The easiest way to measure stability is to administer the same test at two different points in time (to the same group of people, of course) and obtain a correlation between the two tests. The problem with test-retest reliability is the amount of time you wait between testings. The longer you wait, the lower your estimation of reliability.
Finally, the third type of reliability is inter-rater reliability, a measure of homogeneity. With inter-rater reliability, two people rate a behavior, object, or phenomenon and determine the amount of agreement between them. To determine inter-rater reliability, you take the number of agreements and divide them by the number of total observations.
 
The Relationship Between Reliability and Validity
The relationship between reliability and validity is a simple one to understand: a measurement can be reliable, but not valid. However, a measurement must first be reliable before it can be valid. Thus reliability is a necessary, but not sufficient, condition of validity. In other words, a measurement may consistently assess a phenomena (or outcome), but unless that measurement tests what you want it to, it is not valid.
Remember: When designing a research project, it is important that your measurements are both reliable and valid. If they aren't, then your instruments are basically useless and you decrease your chances of accurately measuring what you intended to measure.