SECTION III - RECOMMENDATIONS FOR CLINICAL BIOCHEMISTS

RECOMMENDATIONS FOR CLINICAL BIOCHEMISTS

A. THYROTROPIN (TSH)

Clinical Utility of the Serum TSH Assay
Measurement of the serum TSH concentration is now the preferred initial test for the assessment of thyroid status in almost all ambulatory patients (see Section I). The underlying assumptions are that the pituitary’s ability to secrete TSH is intact and thyroid status is stable (see Sections II and IV for some exceptions). This TSH-centered strategy, used throughout this monograph, replaces the "thyroid panel" because it provides the needed information and is more cost-effective (30,31).

The TSH-centered strategy reflects insights that arose from the use of more sensitive TSH immunometric assay (IMA) methods. These methods showed that the relationship of the serum TSH and free T₄ concentrations is log/linear and made it possible to show, throughout the range of serum TSH concentrations, that a given change in thyroid hormone levels, even within the reference range, produces a proportionately larger change in the serum TSH concentration (see Section II, page 15) (32).

As a consequence, we now realize that patients with milder degrees of hypo- or hyperthyroidism ("subclinical") are more common than those with overt disease. Those with milder disease often have a serum T₄ level within the reference range despite a serum TSH concentration that is clearly too high or too low, while those with overt disease usually have a clearly abnormal level of serum T₄. The milder degrees of thyroid dysfunction are important to diagnose because some patients with mild dysfunction will benefit from treatment (33-36).

With this strategy, one needs assurance that the low levels of serum TSH found in hyperthyroidism are reliably measured; assay sensitivity is important. However, recent data show that many current IMA methods for serum TSH are not consistently reliable in the subnormal range (Figure 1) (37,38) even though they are quite sensitive at detecting the high serum TSH concentrations characteristic of primary hypothyroidism. That is, their functional sensitivity (see figure 1 on next page) is poor at subnormal levels of serum TSH.

Click image for a larger view [ opens new window ]

Figure 1. TSH measurement of four human serum pools (each with a different symbol) each of which had a TSH concentration between 0.02 and 0.04 mU/L (target range). The results shown are those obtained with 16 different IMA methods for serum TSH performed in at least 10 different clinical laboratories using each method; the samples were analyzed as unknown clinical specimens. For each method, the manufacturer’s euthyroid reference range is shown in dark shading and the range in proven Graves’ hyperthyroidism (37,38) is shown by light shading. Assays are grouped as having "third generation" or "second generation" functional sensitivity based on published or experimental data.The methods used were: 1:Abbott IMX; 2:Sanofi Access; 3:Becton-Dickenson Simultrac; 4:Biorad CoTube; 5:BM TSH; 6:Corning ACS 180; 7:Dako Novoclone; 8:Diagnostic Products Immulite; 9:Wallac Delfia; 10:IDS Washington; 11:Kodak (now J&J) Amerlite; 12:Kodak (now J&J) Coated Tube; 13:Kodak (now J&J) TSH-30; 14:Netria IRMA; 15:Nichols Chemiluminescent; 16:Serono Maiaclone.

Status of Current TSH Methods
The sensitivity of TSH measurement has improved 100-fold over the last twenty years. The sensitivity of 1 to 2 mU/L, typical of the TSH radioimmunoassay (RIA) methods developed in the early 1970s (39,40), has fallen to 0.01 to 0.02 mU/L, achieved by some of the current non-isotopic immunometric assay (IMA) methods (31,41).

These new methods, which use a monoclonal TSH antibody on a solid support, have eliminated the problem of lack of specificity due to other glycoprotein hormones. However, heterophilic antibodies (23), and other less well-defined serum constituents, may still cause loss of specificity with some sera.

Historically, the "quality" of a serum TSH assay has been judged by an assay’s ability to discriminate euthyroid concentrations (approximating 0.4-4.0 mU/L) from the profoundly low TSH concentrations typical of overt Graves’ thyrotoxicosis (often <0.02mU/L) (32). In addition to this clinical benchmark, quality can be judged by two experimentally determined measures of assay sensitivity: analytical sensitivity and functional sensitivity.

Analytical sensitivity is an intra-assay measure based on the imprecision of the zero matrix or the tube with no serum (42); the value is an estimate of the lowest value distinguishable from zero. Functional sensitivity is a measure based on the inter-assay precision of low values determined by a standardized protocol (38); it is generally a higher value than the analytical sensitivity and is more clinically relevant because it reflects the assay sensitivity in actual use over a period of time.

Note clearly that the analytical sensitivity and specificity of the assay method are not directly related to the clinical sensitivity and specificity of the assay in the diagnosis of a particular disease. It is nevertheless true that a more sensitive assay for serum TSH is more specific for the diagnosis of hyperthyroidism.

Performance Goals for TSH Assays

Specificity. Specificity here refers to the assay’s ability to measure all of what it claims to measure and nothing else. The structure of the TSH molecules circulating in the blood is not quite the same as those in the pituitary gland or in the pituitary extracts used for standardization (43). The solid-phase monoclonal "capture" antibodies used in current TSH IMA methods may have different specificities for the epitopes of the serum TSH isoforms in some sera compared to those of the pituitary extracts. However, these differences are apparently clinically insignificant and do not lead to differences in the reference range. Nevertheless, standardization would be improved if the standard were a directly weighed, chemically defined entity instead of a tissue extract defined in arbitrary units. Recombinant human TSH (rhTSH) is such an entity (44).

We recommend as a goal that recombinant human TSH be used in the future for gravimetrically-based standardization of the serum TSH assay.

Loss of specificity unrelated to TSH, perhaps due to heterophilic antibody or other less well defined serum constituents, may still occur in some sera. It is difficult for the laboratory to detect this problem in advance; it is usually suspected by the physician, who alerts the laboratory that there is a discordance between the TSH result and the clinical status of the patient. Such a difference is most likely due to technical or human error, but it might be caused by an unusual serum constituent.

We recommend that, when serum TSH results are discordant, the laboratory be prepared to confirm the specimen’s identity, repeat the test with a new specimen, use a different assay, check for parallelism and/or suggest repetition of the test after suppression with oral T4 or stimulation with TRH.

Sensitivity. Functional sensitivity is defined as the TSH value at which the inter-assay coefficient of variation (CV) equals 20%, using the interassay precision profile (38). This percentage, though somewhat arbitrary, encompasses both analytic and biologic variations over time, thus reflecting the assay in actual use, and the measured value is consistently above the analytic sensitivity, thus ensuring that the measured TSH concentration is clearly different from zero.

We recommend that both laboratories and manufacturers use functional sensitivity to define the lower reporting limit of the serum TSH assay.

This measure is valid only if determined by a protocol (Table 3) that mimics the use of TSH in clinical practice and uses a clinically relevant time-span.

Table 3. Recommended protocol for assessing functional sensitivity.

Use human serum rather than modified serum or non-human based protein matrices.
Use concentrations that cover the proposed assay range above the expected functional sensitivity limit, including a value of 0.02 mU/L.
Establish the interassay precision profile from ten or more analyses of each serum, performed in different runs.
Make a random, not ordered, analysis of these sera in order to reflect any carry-over effect on the values obtained with low concentrations.
Use more than one batch of reagents, and employ more than one instrument calibration, when assessing interassay precision.
Use a clinically relevant time-span for interassay precision assessments; for the TSH assay, this is about 6 to 8 weeks in an outpatient setting.

We recommend that the functional sensitivity be determined with a standard protocol.

In practice, a number of factors; such as lot-to-lot variation in reagents, reagent stability, instrument calibration, and technician variation; can cause precision to erode as a result of cumulative variations. Other poorly defined variables, such as temperature, voltage, etc., can also affect functional sensitivity (38).

Stating sensitivity in descriptive terms such as "sensitive" or "ultrasensitive" is uninformative and should no longer be done. "First", "second", and "third" generations, in which each "generation" of the TSH assay has about a ten-fold difference in functional sensitivity, are useful terms. Unfortunately, the value of these "generational" terms has become eroded by commercial marketing practices. In addition, there is wide variation in functional sensitivity among different clinical laboratories using the same method. This suggests that some claims of "third generation" functional sensitivity, the level currently needed to optimally assess a low serum TSH concentration, can be as misleading as descriptive terms like "ultrasensitive".

A "third generation" assay for serum TSH should only be one that has a functional sensitivity <0.02 mU/L. Failure to use a realistic sensitivity level increases the risk of missing a diagnosis of hyperthyroidism (Figure 1, page 22).

We recommend that the functional sensitivity be used to describe an assay’s sensitivity, rather than a stated "generation".

Functional sensitivity should be the most important performance criterion to influence selection of a method to measure serum TSH concentrations because current methods are comparable in their ability to detect raised levels of serum TSH. Other factors, such as specificity as well as the practical points of instrumentation, incubation time, cost, and technical support, though important, should be secondary.

Package inserts in commercial assay kits should depict the interassay precision profile, assessed by the standard protocol; state the functional sensitivity; and demonstrate that the functional sensitivity can be met by a range of laboratories in clinical practice. The insert should not be limited to a statement of the analytical sensitivity alone because these data alone might lead laboratories to adopt an overly optimistic sensitivity limit.

Manufacturers should help clinical laboratories establish their own functional sensitivity limits with the standard protocol both when the method is first used and at periodic intervals thereafter. This may require that manufacturers provide human serum pools with suitably low TSH concentrations to their customers.

Laboratories should use calibration intervals that optimize functional sensitivity, even if recalibration needs to be more frequent than recommended by the manufacturer.

We recommend that the frequency of calibration be set to optimize functional sensitivity.

Reference intervals. Adult euthyroid reference intervals have progressively contracted from the early RIA ranges of 2.0 to 15.2 mU/L to current estimates of between 0.4-4.0 mU/L (Figure 1) (38). The refinement in reference interval values reflects three factors: (a) the recognition that euthyroid serum TSH concentrations are log-Gaussian or log-normal in distribution; (b) the exclusion of those with any thyroid disease or an abnormal level of thyroid anti-TPO antibody, (c) the exclusion of those with a family history of thyroid disease, and (d) the elimination of cross-reactivity with other pituitary glycoprotein hormones by the use of a monoclonal "capture" antibody specific for the beta-subunit of TSH. With this approach, in some methods the upper limit of the reference range is higher in infants, children and older persons than in younger adults so the reference range for serum TSH concentration needs to be age-adjusted.

We recommend that each laboratory use euthyroid persons, as defined above to verify the reference interval for its serum TSH method.

Each laboratory is required by CLIA 1988 to check the reference interval for its assays. The reference interval for serum TSH concentration should preferably be established from specimens drawn across times of day typical of ambulatory visits (0800-1800 hrs) although this is not essential; as discussed above (see Section II). The reference interval does not need to be based on gender or race.

Additional Recommendations

For Manufacturers. Manufacturers should actively share with customers their data on lot-to-lot variation and on the results of studies using their method, although without assuming responsibility for the validity of the studies. These data could be provided directly through bulletins or included in package inserts.

For Physicians. Many of the current TSH IMA methods operate with suboptimal sensitivity in clinical practice (38). This prevents the detection of clearly subnormal serum TSH concentrations and impairs the use of the TSH-centered testing strategy.

An important clinical bench-mark is the observation that a profoundly low serum TSH concentration (<0.02 mU/L) is expected when a patient has clinical symptoms of overt Graves’ disease. If an easily detectable level of serum TSH is reported in such a patient, there is usually a problem with the assay and not with the patient.

We recommend that physicians work with laboratory directors to correlate clinical status with the TSH assay’s functional sensitivity and to resolve perceived problems with the TSH assay.

B. THYROID HORMONES: THYROXINE AND
TRIODOTHYRONINE

Clinical Utility of Measurements of Serum T₄ and T₃

Measurements of free or total T₄ and, occasionally free or total T₃, may be needed for diagnosis of thyroid dysfunction when TSH measurements alone do not provide an accurate indication of thyroid hormone status. Measurement of T₄ is not needed in the diagnosis of the most common thyroid dysfunction, primary hypothyroidism, but is useful in the diagnosis of hyperthyroidism and some unusual thyroid disorders, and often in the monitoring of most patients with thyroid dysfunction (see Sections I and IV). T₃ measurements are only rarely necessary; they are needed, for example, in the diagnosis and monitoring of patients with T₃-toxicosis.

Status of Current T₄ and T₃ methods

All current thyroid hormone assays are immunoassays employing radioactive iodine, an enzyme, or a fluorescent or chemiluminescent label attached to a known quantity of hormone or antibody to that hormone; in either instance, the assay involves a high-affinity antibody specific for the hormone being assayed. The endogenous serum hormone being measured competes with the fixed amount of added hormone for a fixed number of binding sites on the added antibody. The assay signal varies with the amount of hormone in the original sample; it may be directly or inversely proportional to this signal depending on the design of the assay.

Assays of the total T₄ or T₃ content of serum attempt to block both endogenous and labeled hormones from binding to endogenous thyronine-binding proteins during analysis; thus, the reaction involves the competitive binding to the added antibody of both the added hormone and the total endogenous hormone,
including that originally bound to serum proteins.

Assays for the small fraction of free T₄ or T₃ in serum attempt to maintain the endogenous equilibrium between bound and free hormone during analysis so that only the endogenous free hormone interacts with the added reagents; often in these estimates of free hormone, a known amount of an analogue of the thyroid hormone is added rather than of the thyroid hormone itself. The currently available commercial assays for free T₄ actually give an estimate rather than the concentration itself and so can give anomalous results in certain situations (see Section IV).

Performance Goals for Thyroid Hormone Assays

Analytical bias, imprecision, and recovery. Analytical performance goals based on biologic information are important for the medical decision-making process (45-48). Variability in the measurement of thyroid hormone is a composite of the analytical variation of the method and the biological variation in and among individuals, that is, within-subject (intra-individual) and between-subject (inter-individual) variation. Mean within-subject and between-subject variations for serum thyroid hormones are known (45) (see Section II).

Suggested goals for maximum acceptable bias and imprecision, derived from these biological variations, have also been reported (47,48).

When testing is for the purpose of diagnosis, that is, to rule in or rule out disease, the relevant biologic variation is the composite of within-subject and between-subject variations. When testing is for the purpose of monitoring changes in an individual over time, such as therapeutic monitoring, within-subject variation is relevant (45-48).

The performance goals for bias and precision can then be calculated. Note that, for the most part, clinicians and biochemists have little control over these aspects. Manufacturers of kits should strive to meet these goals and have the data available for users.

The recommendations below are derived from published data (45) and rounded to whole numbers (see Appendix C for the specific calculations and data used).

We recommend that the performance goals for bias & precision in thyroid hormone assays be as follows:

Diagnosis		Monitoring
Bias Imprecision		Bias Imprecision
Free T₄	<4%	<8%	<2%	<5%
Total T₄	<3%	<6%	<1%	<3%
Free T₃	<6%	<12%	<2%	<4%
Total T₃	<6%	<12%	<5%	<5%

Recovery. Goals for analytical recoveries can reasonably be defined as 100% recovery plus the goal for maximum bias.

Working Ranges

An ideal working range (the range between the lower and upper limits of quantification) would be a range that encompasses all patient values. At present, this has not been achieved for thyroid hormone assays. Nonetheless, it is desirable that working ranges encompass as many patients with untreated thyroid disease as possible, because thyroid hormone measurements are useful to confirm diagnoses and sometimes to monitor the early therapy of thyroid disease, e.g., hyperthyroidism (see Section I).

Published data (49) on the serum free and total T₄ concentrations in untreated hypothyroidism (n=42) show that the serum free T₄ ranged from <2-7 ng/L (<0.2-0.7 ng/dL) and the serum total T₄ from <5-69 µg/L (<0.5-6.9 µg/dL) (49). Similarly, in untreated hyperthyroidism (n=30) the serum free T₄ ranged from 32-478 ng/L (3.2-47.8 ng/dL) and the total T₄ from 102-324 µg/L (10.2-32.4 µg/dL). The reference ranges were 8-27 ng/L (0.8-2.7 ng/dL) for free T₄ and 53-114 µg/L (5.3-11.4 µg/dL) for total T₄.

If one arbitrarily combines the upper 75% of the values found in untreated hypothyroidism with the lower 75% of those found in untreated hyperthyroidism, the range of these values is 0.1 ng/dL to 15.0 ng/dL for serum free T₄ and 1.0 µg/dL and 24.0 µg/dL for serum total T₄. These values can be used as goals for working ranges. This approach is based on clinical observations and is a reasonable compromise between the ideal and the achievable. The derivation of recommended goals for a working range in terms of the limits of the relevant reference interval is logical because reference intervals vary among methods. These recommendations are presented below.

Because measurements of serum free and total T₃ are not needed for the diagnosis or therapeutic monitoring of hypothyroidism, it should be acceptable to set

the goal for the lower end of the working range for T₃ assays at 50% of the lower limit of the reference interval. Because serum T₃ measurements will occasionally be needed for the diagnosis and therapeutic monitoring of T₃-toxicosis, and because the proportional elevation of T₃ in hyperthyroidism is as great or greater than that for T₄, goals for the upper end of the working ranges for serum free and total T₃ assays should be at least as high or higher than those for free and total T₄ (50).

Sera with measured values above the working ranges of serum total T₄ and total T₃ assays can be diluted to obtain values within the working ranges. Sera with high values of free T₄ and free T₃ cannot be diluted to obtain reportable values because dilution of these sera results in a disproportionate change in the value (51). Therefore, goals for the upper limits of the working ranges will be more demanding for serum assays of free hormones than of total hormones. Direct equilibrium dialysis methods are exceptions, because dialysates can be diluted prior to immunoassay for free hormone concentrations.

We recommend that goals for the working ranges of thyroid hormone assays be, with respect to the reference interval for each:

Free T₄: 15% of the lower limit to 550% of the upper limit
Total T₄: 20% of the lower limit to 200% of the upper limit
Free T₃: 50% of the lower limit to 550% of the upper limit
Total T₃: 50% of the lower limit to 200% of the upper limit

Specificity

Because there is no reason to believe that cross-reactivity in current thyroid hormone assays is a problem, goals for cross-reactivity can be derived from state-of-the-art methods. With the availability of monoclonal and affinity purified polyclonal antibodies, cross-reactivities of less than 0.1% of T₄ and T₃ with all studied iodinated precursors and metabolites of L-thyroxine have been achieved with several methods (52). This provides, therefore, a desirable and achievable goal for maximum cross-reactivity.

Cross-reactivities should be determined at both 50% and 80% of maximum label binding in a thyroid hormone assay because cross-reactivity curves are often not parallel to standard curves. The manufacturer should determine these cross-reactivities but the responsibility for ensuring that these data are available is the laboratory’s.

We recommend that the permissible degree of cross-reactivity of thyroid hormones with other likely iodinated compounds in assays for these hormones be <0.1% at 50% and 80% of maximum label binding.

Parallelism

Because the issue of matrix effects on thyroid hormone assays is a critical one (51-53), parallelism needs to be examined. When parallelism between the analyte in unknown samples and the analyte in standard solutions is to be tested, the chemical matrix (most importantly, the protein concentration) of the solution with which hormone-rich samples are diluted must be closely similar to the chemical matrix of the unknown samples. Otherwise, both analyte concentration and matrix composition will be varied at the same time, making it impossible to distinguish nonparallelism of the analyte from progressive matrix effects.

When matrix effects are to be studied, analyte concentration in the diluent must equal that in the hormone-rich sample. Again, it is preferable for the manufacturer to provide data on parallelism and matrix effects but the responsibility for obtaining them lies with the laboratory.

We recommend that measurements of thyroid hormones over the working range of an assay be shown to parallel calculated hormone concentrations in samples diluted with a hormone-free diluent that does not substantially alter the chemical matrix.

Interferences

The ideal goals for interferences in assays for thyroid hormone would be zero interference by any compound in any sera at any concentration. Studies available from manufacturers vary widely in the number of compounds studied and in the concentrations used. Data are not readily available with which to derive rational numerical goals for specific compounds at specific concentrations.

Autoantibodies to T₄ or T₃ interfere with all total and most free T₄ or T₃ methods but, fortunately, are uncommon. They do not interfere with the direct equilibrium dialysis methods for free T₄ and T₃ because these methods separate the autoantibodies from the free hormones prior to measurement.

We recommend that manufacturers provide complete listings of known interferences at medically relevant thyroid hormone levels, including the magnitude and direction of the resulting errors.

Calibration

It is clearly desirable to standardize calibration among all thyroid hormone methods. Highly purified preparations of crystalline L-thyroxine and L-triiodothyronine are readily available with which thyroid hormone assay calibrations can be standardized. The United States Pharmacopoeia (16201 Twinbrook Parkway, Rockville, MD 20852) provides such reference preparations (52).

We recommend that all manufacturers standardize T₄ and T₃ assays to a single reference preparation of each.

Comparison of Current Methods to Recommended Goals

When the analytical performance of several current thyroid hormone assays is compared to the goals recommended above, disparities are apparent, especially with regard to maximum bias, maximum imprecision, and analytical
recovery (51-53). For example, Figure 2 on next page shows that, at estimated levels of free T₄ within the reference interval using current assays as diagnostic tests, the %CV is often substantially above the recommended 8% (52).

Analytical recovery and bias can vary among methods, even when the actual free T₄ concentration is held constant. For example, Figure 3 on next page shows that the estimated free T₄ was substantially lower than its actual concentration in three of four methods when the percentage of the total T₄ bound to protein was decreased (53) and in two of three methods when the total protein concentration was increased (Fig. 4, page 34) (52). Comparable systematic studies of bias in free and total T₃ assays are not available, but similar deficiencies are likely (54).

We recommend that manufacturers, when reporting analytical recovery, include measurements of the T₄ binding proteins (thyroxine-binding globulin, transthyretin and albumin), and measurements of thyroid hormone binding to proteins.

Although some assays do not currently meet the performance standards recommended here, some do; the goals are not only desirable but achievable.

Click image for a larger view [ opens new window ]

Figure 2. Overall imprecision in current serum free T₄ assays. The broken horizontal line represents a CV of 8%, the recommended goal for maximum imprecision when free T₄ is used for diagnosis (52).

Click image for a larger view [ opens new window ]

Figure 3. Analytical recoveries in four free T₄ methods as a function of serum protein T₄ binding. Bias between methods, and analytical recoveries within a single method, varied with serum protein T₄ binding (determined as the ratio of total T₄ concentration to free T₄ concentration). Free T₄ concentrations in test samples were identical at all levels of serum T₄ binding. The broken horizontal lines represent analytical recoveries of 96% and 104% (the recommended goal when free T₄ testing is used for diagnosis). The value for 100% analytical recovery was determined by immunoassay of T₄ in equilibrium dialysates of the preparations used in the study [calculated from data in (53)].

Click image for a larger view [ opens new window ]

Figure 4. Analytical recoveries in three total T₄ methods when serum protein concentrations were varied. Recoveries varied between 66% and 116% depending upon serum protein concentrations. At any particular protein concentration there was bias among methods. With two particular methods there was bias within the method when protein concentrations varied. Analytical recoveries of 97% and 103% are indicated by the broken horizontal lines (the recommended goal when total T₄ testing is used for diagnosis). The value for 100% recovery was determined from the mass of L-thyroxine dissolved in T₄ free serum protein solutions [calculated from data in (52)].

C. THYROGLOBULIN

Clinical Utility of Thyroglobulin Measurement

Thyroglobulin (Tg) is the molecular site of normal thyroid hormone synthesis and, in thyroid pathophysiology, it is involved in the pathogenesis of autoimmune thyroid disease and, rarely, in genetic biosynthetic defects that result in inborn errors of thyroid hormone metabolism.

The serum level of Tg reflects three principal factors: (1) the mass of differentiated thyroid tissue is roughly proportionate to the serum Tg level; (2) inflammation or destruction of thyroid tissue can cause the release of Tg; and (3) stimulation of the TSH receptor by either TSH or a stimulating antibody can also release Tg. Thus, although circulating Tg arises only from thyroid tissue, a raised serum Tg level is not specific for a specific disease; measurement of the serum Tg concentration thus has limited clinical utility.

Measurement of Tg is used primarily as a tumor marker after treatment of patients with an established diagnosis of differentiated thyroid carcinoma. At that point, a high level of, or a rise in, the serum level of Tg points to the
persistence or recurrence of the disease.

Status of Current Tg Methods

The measurement of Tg in serum is technically challenging. Currently, immunometric assays (IMA) are gaining popularity over radioimmunoassay (RIA) methods, because IMA methodology has the practical advantage of a shorter incubation time than RIA, a wider working range, and a more stable labeled antibody reagent less prone to labeling damage. Most Tg IMAs are isotopic (IRMAs) although non-isotopic immunochemiluminometric assays (ICMAs) with longer reagent shelf life and the potential for automation are becoming available.

The clinical value of serum Tg measurements, already limited to thyroid cancer, is further limited by a number of technical problems. These include a lack of a uniformly accepted standard, suboptimal sensitivity, poor interassay precision, interference by thyroglobulin autoantibodies (TgAb) (when present), and so-called "hook" effects.

Performance Goals for Tg Assays

Standardization. Serum Tg values measured by different RIA or IMA methods vary by as much as 65 percent (55). A recent collaborative effort, sponsored by the Community Bureau of Reference (CBR) of the Commission of the European Communities (CEC) has developed a new Tg reference preparation (which can be obtained from Dr. Christos Profilis, BCR, Rue de la Loi 200, B 1049 Brussels, Belgium) (56). The use of this CBR standard decreases inter-method variability (CV) from 42.9±3.9% (SE) to 28.8±3.4 % (57).

The inter-method variability that remains after CBR standardization presumably reflects the different specificities of the Tg antibody(ies) used in the various methods and will remain, because it is impractical to require that all methods use the same antibody reagents.

Universal use of CBR standardization would be an advantage when comparing scientific publications but has the potential to disrupt the value of serial quantitative monitoring of serum Tg in patients with differentiated thyroid carcinoma. It is critical that physicians be informed before a laboratory changes or restandardizes an existing Tg method so that physicians will have the opportunity to establish a new baseline value for each patient.

Sensitivity. Many current Tg methods have suboptimal sensitivity as judged by an inability to identify a lower limit of the normal euthyroid range. All Tg methods should be able to detect Tg in the sera of all normal subjects, even when TSH is suppressed (58). This level of sensitivity will be even more important in the follow-up of patients with thyroid cancer because, after total thyroidectomy, the desired level of Tg is zero. The need for a sensitive and precise assay for the serum level of Tg is apparent.

Moreover, because current methods lack uniform standardization, the numeric value for comparing the sensitivity of different methods cannot be derived. A recent study that used recombinant human TSH (rhTSH) to stimulate a rise in serum Tg suggested that non-isotopic Tg ICMAs, as with the measurement of TSH, may be more sensitive than isotopic (IRMA) methods (44).

Precision at low values determines the functional sensitivity of an assay and is important when the expected value is close to zero (59). As discussed for the measurement of serum TSH, one can define the functional sensitivity of the Tg assay as the value determined when the interassay CV is 20%, provided one uses a clinically relevant protocol (59) (see page 24). This requires that precision be evaluated with TgAb-negative human sera over 6 to 12 months, a typical interval used in monitoring patients with differentiated thyroid carcinoma.

Ideally, we should be able to determine the physiologic sensitivity of a Tg assay, that is, determine how well the method discriminates between the functional sensitivity limit and the level in normal persons whose serum TSH concentration has been suppressed by T₄.

Reference interval. Serum Tg values have a log-normal distribution in euthyroid persons (60). Persons chosen to establish a reference range should be carefully selected; one should exclude those with a personal or family history of thyroid disease, abnormal levels of thyroid autoantibodies (anti-TPO detected by IMA), or a history of cigarette smoking as well as those taking oral T₄ for any reason.

Even when a reference interval is established, there is no appropriate reference interval for the level of serum Tg in a patient who has had differentiated thyroid carcinoma because, as noted, there should be none after total thyroidectomy. The actual interpretation of a serum Tg result is, however, colored by clinical factors such as the completeness of the thyroidectomy, radioiodine therapy, and the current level of serum TSH. Note that the pattern of the serum Tg over time, while oral T₄ suppresses the serum TSH level, is more important than a single serum Tg value.

Antibodies to Tg. The TgAb in some sera can interfere with both the RIA and IMA methods for Tg. Recovery studies do not reliably detect such interference and should not be used (57). The presence of TgAb may result in either an over-estimation or an under-estimation of the true value; with the newer IMA methods, it is usually an underestimate.

"Hook effect." The "hook effect" is the term for an inappropriately low result when a serum sample contains a very high level of analyte; thus, as the actual concentration rises to quite high levels, the reported value, rather than being linearly higher, "hooks" downward to a lower value than the true one. A falsely low Tg value that results from an assay with a "hook effect" (57,61) is of particular relevance when it occurs in the serum of a patient with metastatic thyroid carcinoma or in needle washings from aspiration of a lateral neck mass (62).

An unrecognized "hook effect" can be minimized. In an IMA method one can either check the concordance of two dilutions, for example, undiluted and 1/10, or use a pool of batched specimens in each assay run (57,63); in a RIA method one can periodically check the upper assay limit for parallelism with dilutions of sera known to contain high concentrations of Tg.

Recommendations

For manufacturers. The manufacturer should define realistic performance characteristics of a Tg method and show that the performance claims can be reproduced across a wide range of clinical laboratories.

The manufacturer should also determine whether the method provides appropriate values when sera are TgAb-positive. Because recovery of Tg is not reliable in such sera, correlations between the measured serum Tg and the clinical status of patients with metastatic differentiated thyroid cancer but with normal levels of serum TSH will be useful in deciding whether the method can give reliable values for serum Tg levels in TgAb-positive patients.

We recommend that manufacturers define, and include in Tg assay kits, data on: the sensitivity; the reference interval; the range in persons with treated thyroid cancer, if possible; and the nature of interference from TgAb.

Use of the CBR Tg standard is advised but, if it is not used, a correction factor based on the CBR standard is needed; this will facilitate the comparison of serum Tg results among laboratories.

We recommend use of the CBR standard, or an appropriate correction factor based on this standard, at low, middle, and high values of the assay's standards.

For laboratories. The characteristics of the Tg method have a significant effect on the management of patients with differentiated thyroid carcinomas, both with respect to cost-effectiveness and potentially to morbidity and mortality.

We recommend that a Tg method be changed only after consultation with endocrinologists and if there are complete data on the validation and performance of the assay.

The laboratory should, using data from the manufacturer and the clinician, define the performance characteristics of the Tg assay and validate its clinical utility so that the reported concentration generates an appropriate clinical response, especially when patients are TgAb-positive.

We recommend for the Tg assay that the laboratory establish reference intervals using confidence limits, define the functional sensitivity, minimize a possible "hook effect."

The laboratory is also responsible for minimizing the effect of interference by TgAb.

We recommend that each sample assayed for Tg also be assayed for TgAb by immunoassay and that the TgAb concentration be reported when positive. The method should state whether TgAb interference causes an over- or under-estimation of the serum Tg level.

For physicians. The reliability of serum Tg measurement affects the clinical management of patients with known differentiated thyroid carcinoma.

We recommend that serial Tg measurements in a patient be performed by the same method for all measurements, preferably in the same laboratory.

A sensitive and reliable method minimizes costly imaging procedures. The reliability is affected by both the method and laboratory chosen because the values obtained with different methods are not interchangeable even when standardized with the same CBR reference preparation.

We recommend that a laboratory be chosen for a Tg assay based on a low functional sensitivity, the ability to meet the above recommendations for laboratories, and the ability to store or return samples for re-assay and comparison with a later specimen.

D. THYROID ANTIBODIES

Clinical Utility of Thyroid Autoantibody Measurement

Primary thyroid disease usually develops as a result of an autoimmune process in which antibodies are produced against one or more of three thyroid-specific antigens: thyroid peroxidase (TPO), thyroglobulin (Tg) and TSH receptors (TR). These antibodies can either cause direct thyroid dysfunction, e.g., Graves’ hyperthyroidism, caused by antibodies to the TSH receptor (TRAb), or are closely associated with the autoimmune destructive process in the hypothyroidism resulting from Hashimoto’s disease or atrophic thyroiditis, i.e., the antibodies to Tg (TgAb) or to TPO (TPOAb). Table 4 shows the thyroid disorders in which these antibodies are commonly present. However, only occasionally is their measurement of clinical utility.

Table 4. Thyroid autoantibodies commonly present with various thyroid diseases.

Thyroid disease	TPOAb	TgAb	TRAb
Hashimoto’s thyroiditis	+	+	-
Atrophic thyroiditis	+	+	-
Postpartum thyroiditis	+	+	+
Graves’ disease	+	+	+
Pregnancy with previous or present Graves’	+	+	+
Thyroid carcinoma	+	+	-

Status of Current Autoantibody Tests

The diagnostic and prognostic use of thyroid autoantibody measurements has been hampered by methodological problems such as suboptimal sensitivity, specificity, and inadequate interlaboratory and international standardization (64). It is thus difficult to compare the analytical sensitivity, specificity and precision profiles of the various methods currently available.

Nomenclature. There has been a proliferation of nomenclature used for the thyroid autoantibodies. The terms used here, TPOAb, TgAb and TRAb appear to be the most widely used at present and are preferred (57,64,65).

Thyroperoxidase (TPO) Antibodies (TPOAb). TPO antibodies were initially known as anti-microsomal antibodies (AMA). Older techniques for measuring AMA used human thyroid microsomes as a source of antigen to develop immunofluorescence, passive tanned red cell hemagglutination and enzyme-linked immunosorbent methods (64). The microsomal antigen has been identified as TPO (66). Because techniques based on thyroid microsomes are somewhat non-specific and prone to interference by various factors, assays based on TPO itself are preferable although in routine clinical use the general utility of either assay (anti-TPO or anti-microsomal) is about the same.

We recommend that assays specific for TPOAb be used in preference to the less specific anti-microsomal antibody (AMA) assays.

Assays for TPOAb can be based on radioimmunoassay (RIA) or immunometric (IMA) techniques (64). The antigen used for developing the assay can be a human, porcine or, more recently, recombinant human TPO. Most current TPOAb immunoassays are quantitated in international units (U/L) using the MRC 66/387 reference preparation. Despite the use of the same standard, the specificity of the various tests appear to differ and the methodological principles underlying the test, as well as the purity of the TPO antigen, appear to influence assay performance. Inter-method coefficients of variation range from 65% to 87%.

The analytical sensitivity for most methods ranges from 0.3 to 1.0 U/L; however, because functional sensitivities have not been reported, it is difficult to compare interassay precision estimates except for values easily detected in the assays being assessed.

The normal reference interval for TPOAb assays remains controversial. When very sensitive method are employed, TPOAb and/or other thyroid autoantibodies are detected in many healthy persons with completely normal thyroid function; thus the biological significance of low levels of TPOAb is not clear. These low levels of TPOAb may be normal variants, false positives, or reflect true underlying thyroid autoimmunity. Conversely, however, there is no doubt that the large majority of patients with autoimmune thyroid diseases, such as Hashimoto’s disease, thyroiditis, atrophic thyroiditis, postpartum thyroiditis, or Graves’ disease, have detectable TPOAb. The variable reported values for diagnostic specificity and sensitivity of TPOAb measurements in these conditions largely relates to the inter-assay variables just noted and to the different cut-off levels used to diagnose TPOAb positivity (67,68).

Antithyroglobulin antibodies (TgAb). As with TPOAb, serum TgAb methods have evolved from the relatively insensitive tanned red cell hemagglutination assay to the more sensitive radioimmunological and, more recently, chemiluminescent IMA techniques (Spencer CA, unpublished). Some methods are calibrated against the MRC 65/93 reference preparation whereas other methods have used Tg affinity chromatography or in-house calibrators made from pools of patients’ TgAb-positive sera (64,69). As with TPOAb methods, the use of the same standard has not ensured that the several methods are quantitatively similar. The inter-method variability of serum TgAb values probably reflects qualitative differences in autoantibody affinities in different serum samples from patients with different underlying immunologic defects or matrix differences among dilution media (70). As with TPOAb assays, functional sensitivities have not been assessed and the inter-assay precision of different methods is difficult to compare.

When sensitive techniques are used, a high prevalence of TgAb is again found in adults without apparent thyroid disease (71); the clinical significance of low TgAb levels is not clear. Because TgAb are most often present in association with TPOAb in patients with autoimmune thyroid disease, the measurement of TgAb adds little diagnostic information to the definition of suspected thyroid dysfunction (36). Sensitive serum TgAb methods are mainly needed to identify sera with TgAb that may interfere with serum thyroglobulin measurements (57,72,73).

TSH receptor antibodies (TRAb). Methods for measuring thyrotropin (TSH) receptor antibodies are even more varied that those for TgAb and TPOAb because they include various bioassays and different types of receptor assays. The exact epitope(s) on the TSH receptor reacting with these various assays for TRAb is not known.

TRAb are heterogeneous with respect to their biologic actions which include stimulating or blocking the thyrotropin receptor, stimulating thyroid growth, and inhibiting the binding of TSH. The biological action of an individual patient’s TRAb may even change over the course of time, e.g., from blocking to stimulating the TSH receptor or vice versa.

The usual methods for measuring TRAb and their clinical value has been recently reviewed (74). The only commercially available methods are based on thyroid-binding inhibitory immunoglobulin (TBII) activity expressed as a percentage inhibition of the binding of 125-I-bovine TSH to a TSH receptor preparation. No international reference preparation exists and the values obtained thus depend on the individual methods and the reference population used to determine the cut-off limit for positivity.

The level for positivity of TRAb depends on the design of the method. Depending on whether the abnormality sought is stimulation or inhibition, it may be the value more than 2SD or less than 2SD from the mean of a group of

normal persons. Precision is highly variable and has rarely been related to the measured level of positivity.

Measurement of TRAb has mainly been used to diagnose or predict the relapse of Graves’ disease; recent studies suggest that TRAb adds little clinical information (74). The marginal clinical value together with the typically high cost of these tests (at least in the United States) suggests that TRAb measurements are not cost-effective in the diagnosis and management of Graves’ hyperthyroidism. However, the measurement of TRAb makes good sense in pregnant women with present or past Graves’ disease; it allows one to assess the risk of fetal or neonatal thyrotoxicosis secondary to transplacental passage of maternal TRAb (75). In this clinical situation it is important to use a method that detects both TSH receptor-stimulating and -blocking types of TRAb.

Performance Goals for Autoantibody Tests

Specificity and standardization. The specificity of methods for measuring TPOAb, TgAb and TRAb varies widely. Compounding this problem is the fact that most sera with thyroid autoantibodies contain a variety of antigen-specific immunoglobulins of different classes and subclasses. In the case of TRAb, these differences may even lead to different biological actions. The inherent heterogeneity of thyroid autoantibodies presents a major problem for standardizing the methods, because different methods appear quantitatively different even when standardized by the use of an international reference preparation (64).

A source of pure antigen is clearly important. In the case of TPOAb, there is an international preparation (MRC 66/387) available for a fee from the National Council for Biological Standards and Control; Hertfordshire, U.K.

We recommend that the standard used be stated for all TPOAb assays.

Sensitivity and precision. The sensitivity of the thyroid antibody (TAB) assays is less important than it is for the TSH assay but its definition is still of help in setting the lower reporting limit of the assays. For those laboratories that perform large numbers of assays of TAB, it is useful to determine the functional sensitivity as for TSH over six- to eight-week periods.

We recommend that functional sensitivity be determined for each thyroid antibody assay using the same approach as for TSH.

A realistic determination of functional sensitivity is important since serial thyroid autoantibody measurements may be used to track the progression of a thyroid condition or evaluate a response to therapy (74).

Reference intervals. As with the reference intervals for serum Tg and TSH, the reference intervals for TAB should be based on a population of biochemically euthyroid persons without family or personal evidence of thyroid disease. Whether individuals with low levels of TPOAb and/or TgAb should be included remains in question until long-term follow-up studies on such individuals show that they have no increased risk for developing thyroid dysfunction; for the
present.