Constancy and Repeatability in Hip Evaluations
|
.
A magazine, newsletter, and website article by Fred Lanting
July 2000
|
Some questions have been raised about how long the PennHIP reading, and the two-year-old OFA (or one-year old "a" stamp, OVC, or GDC) results are good for.
The terms “precision” and “accuracy” may be used by your PennHIP vet. What he means by the former is that there is general agreement between “scrutineers/readers” as to the diagnosis; what he means about the latter is relative to additional factors. Imagine a sharpshooter’s bull’s-eye target, and half a dozen riflemen who shoot at it. All of their bullet holes are clustered somewhere in the upper left quadrant of the target, some distance from the bull’s-eye but all about 2 inches from each other. The people at Penn liken that to the hip-extended evaluators who agree fairly well, but miss many cases of HD with that position. Now imagine equally talented shooters with better rifles (scopes, longer barrel, etc.) who not only cluster their holes in the bull’s-eye (“accurate”), but also within one-sixteenth inch of each other (precise). Further, imagine those with the better guns being able to repeat their performance at every match (reliable). Accuracy, however, is not well defined in the context of making genetic change toward better hips. For that, you need to add the effect of heritability. The hip phenotype (as most accurately reflected in DI and percentile scores) with the highest heritability is the one that should be considered most accurate. And the distraction method has a much higher heritability than older methods of viewing hips.
The claim by Penn that OFA is not the best method breeders have for progress in reducing HD in their line or breed involves the accuracy factor. They call our attention to the fact that there are many dogs (usually of certain breeds) that do not develop DJD but are OFA-assessed as dysplastic because of laxity at two years’ age. Even more importantly, there is the greater number that were adjudged “normal” at two years but later developed DJD or, if not re-radiographed, produced an unacceptably high percentage of dysplastic descendants. This led to the conclusion that the accuracy of OFA’s method is gravely flawed. Even if reliability (by this is usually meant repeatability) were high from younger ages up to the two-year qualification age for OFA certification, and I do not think it is, the absence of accuracy is worrisome to breeders, and diminishes the importance of published reliability figures in 1997.
Remember the difference between reliability and accuracy, described above. As an example, Penn cites the 1996 OFA-type JAAHA evaluation of military dogs in a longitudinal study in which all the dogs with “normal” hips at two years had mild degenerative changes by nine years of age. At the same time, 22 of 52 dogs that had been judged “positive” for HD at two had similar changes by nine years! The conclusion is that the OFA-type evaluation at two years does gives a relatively high rate of misdiagnosis, and blurs the distinction between true positive-for-HD and true negative (no HD) diagnoses even at the supposedly “safe” age of two years. Admittedly, the OFA hedges its emphasis on laxity a little by using the phrase “normal for age and breed” when grading radiographs; they do allow for some differences between Saint Bernards’ and Borzois’ hips this way, though it is still a totally subjective evaluation. And the point that some people who wrote to me wanted explained was about OFA-Normals (at 2 yrs) going bad later in life. Read that study: Banfield CM, Bartels JE, Hudson J. A retrospective study of canine hip dysplasia in 116 military working dogs; Angle measurements and OFA grading. J Am Anim Hosp Assoc. 1996; 32:413-22. Other reading material for you: Corley EA, Keller GC, et al. Reliability of early radiographic evaluations for canine hip dysplasia obtained from the standard ventrodorsal radiographic projection. JAVMA 1997;211:1142-1146. But along with that you should also read the letter to the editor on p. 487-488 in JAVMA's Volume 212 of Feb. 15, 1998 . That letter corrects some misconceptions or possible misinterpretations.
Comparitively, the actual testing error of PennHIP (as determined by repeated tests of the same dog at one point in time) is extremely small (<0.05 DI units). The biological variability of a given dog’s hip laxity over time, however, could be much larger. There is variation in anything biological, one example being the evaluation of hip laxity, taken and scored such-and-such on a given day, and then on some future day scored so-and-so.
Consider three points. First, biological variability is evident in all measured biological parameters, e.g., serum cholesterol, blood pressure, heart rate, etc., even hip laxity to a smaller degree. Second, if a breeder believes the naysayers and feels distraction radiography should not be used because of the perception of too much variability (error), he should realize that in all studies that compared the DI with OFA score, the OFA diagnostic test was found to have even more error when evaluated longitudinally. The data is clear on this issue. Many people have been lulled into believing that since their dogs receive one OFA score at 2 years of age, that the score is absolute and will not change. If they would take the time, and spend the money to have repeated OFA testing done, they would find a troubling amount of error, and more error than with PennHIP.
Third, to circumvent the potential uncertainties occasioned by inherent biological variation (using any diagnostic test), it is wise to average many observations rather than fully rely on any single observation. Multiple tests will regress to the mean, giving a truer measure of the phenotype, the same way you will get closer to 50% “heads” the more often you flip a coin, or approach 12.5% “monorchids” the more often you breed two “carriers”. In other words, if a breeder is arguing that PennHIP should not be used because it has too much “error”, the OFA method should also be abandoned because its error has been shown to be even greater. Complaining because a biological situation does not have mathematical precision is simply wrong-headed. By the way, if there were any differences in reading DI at different times in a dog’s life, I would suspect that the “margin of error” probably is greater in the higher end of the scale. It has been my observation that the looser the joint, the greater variation in readings.