It has been suggested that the 'ideal' measure of reliability of an examination is obtained by test and retest using the one examination on the same group of students. However, because of practical and theoretical arguments, most reported reliabilities for multiple choice examinations in medicine are actually measures of internal consistency. While attempting to minimize the effects of potential interfering factors, we have undertaken a study of true test-retest reliability of multiple true-false type multiple choice questions in preclinical medical subjects. From three end-of-term examinations, 363 items (106 of 449 from term 1, 150 of 499 from term 2, and 107 of 492 from term 3) were repeated in the final examination (out of 999 total items). Between test and retest, there was little overall decrease in the percentage of items answered correctly and a decrease of only 3.4 in the percentage score after correction for guessing. However, there was an inverse relation between test-retest interval and decrease in performance. Between test and retest, performance decreased significantly on 33 items and increased significantly on 11 items. Test-retest correlation coefficients were 0.70 to 0.78 for items from the separate terms and 0.885 for all items that were retested. Thus, overall, these items had a very high degree of reliability, approximately the 0.9 which has been specified as the requirement for being able to distinguish between individuals.
* Title and MeSH Headings from MEDLINE®/PubMed®, a database of the U.S. National Library of Medicine.