Oxford Textbook of Public Health
Anthony B. Miller and Vivek Goel
General principles of screening
General principles governing the introduction of screening
The validity of a screening test
The acceptability of the test
The ethics of screening
The population to be included in screening programmes
Diagnosis and treatment of the discovered lesions
Evaluation of screening programmes
Organized screening programmes
Health-related quality of life and screening
Economics of screening
Genetic susceptibility testing
General principles of screening
Ideally, the control of a disease should be achievable, either by preventing the disease from occurring or, if it does occur, by curing those who develop it with appropriate treatment. Completely successful prevention would make treatment obsolete. However, completely successful treatment would not make prevention obsolete, as there are costs and undesirable sequelae from the disease and treatment which patients and society would like to avoid if at all possible, especially from diseases such as cancer, diabetes, and hypertension. At present, neither prevention nor treatment is completely successful for most diseases. They will continue to complement each other while, for a number of conditions, another approach to control may prove to be appropriate and complementary to one or both of the other approaches. Such an approach is screening.
Because of the deep-rooted belief among physicians that ‘early diagnosis’ of disease is beneficial, many regard screening as bound to be effective. However, for a number of reasons this is not necessarily the case, as shown for example by the failure of screening for lung cancer using sputum cytology or chest radiographs to reduce mortality from the disease (Prorok et al. 1984). It is the purpose of this chapter to attempt to define some of the fundamental issues that are relevant to the consideration of screening in public health. In that context, we shall describe approaches to evaluating the efficacy of screening before it can be accepted as an established disease control measure and briefly consider the extent to which programmes proposed or underway for some diseases comply with these criteria.
It is often assumed that screening tests must involve some sort of technological procedure, such as a radiograph or laboratory test. However, screening can involve simple clinical examinations, such as assessment of blood pressure, or a history, such as the Michigan Alcohol Screening Test which is a set of questions (Selzer et al. 1979). However, it is the advent of expensive technologically based screening tests in the last few decades which has focused attention on the need for critical evaluation of screening and the importance of screening programmes. With the introduction of genetic susceptibility testing, as a result of the human genome programme, it becomes even more essential for public health practitioners to have a thorough understanding of the principles of screening.
Screening was defined by the United States Commission on Chronic Illness (1957) as ‘the presumptive identification of unrecognized disease or defect by the application of tests, examinations or other procedures that can be applied rapidly.’ A screening test is not intended to be diagnostic. Rather, a positive finding will have to be confirmed by special diagnostic procedures.
By definition, screening is offered to those who do not suspect that they may have a disease. This is subtly different from being asymptomatic. Symptoms may be revealed by careful questioning related to the organ of interest which may not be regarded by the screenee as being related to a possible disease. Furthermore, in a public health programme open to all comers, it may not be possible to determine that all subjects who enrol were truly asymptomatic. Indeed, many may enrol because they have a suspicion that they have the disease of interest, and they hope that their suspicion will not be confirmed. Thus although we, in common with others, will make the assumption subsequently that participants in screening programmes are asymptomatic, we do not imply that this is a necessary or an absolute prerequisite for participation in public-health-based screening programmes.
General principles governing the introduction of screening
The principles that should govern the introduction of screening programmes were first enunciated by Wilson and Junger (1968) and have since been refined by a number of authors (Miller 1978; Cuckle and Wald 1984; Miller 1996). These principles (in their refined form) are considered below.
The disease should be an important health problem. In practical terms this means that the disease prevalence should be high and the disease should be the cause of substantial mortality and/or morbidity. However, it is important to recognize that the life expectancy of a screened population may be changed little even if the programme is successful. For example, even if all cancer were to be eradicated in most technically advanced countries, the effect of other competing causes of death is such that life expectancy would be increased only by about 2.5 years. The benefits of cholesterol screening to prevent heart disease are measured, at the population level, in days of life expectancy gain per person screened (Kristiansen et al. 1991). It is not possible to provide a precise estimate for the level of burden necessary to mount a screening programme. The level of morbidity and mortality considered to be important will depend on a combination of factors such as the age distribution of the population affected, or the severity of the illness. There may be certain circumstances when the major benefit from screening may follow, not from reduction in mortality, but from reduction of morbidity consequent upon the diagnosis of the disease in a more treatable phase in its natural history. This could mean that the extent of treatment required and the possibility that treatment may be debilitating or mutilating would be much less. Such advantages may be difficult to quantify; however, as they may be considerable in psychological terms to individuals, and to communities in the lowering in the requirements for extensive rehabilitation services, they should not be overlooked.
The disease should have a detectable preclinical phase. It is important to recognize that this principle is not ‘The natural history of the disease should include a phase with a detectable precursor’. For example for many cancers, including breast and prostate cancer, the detectable preclinical phase is largely asymptomatic invasive cancer rather than a precursor. Conversely, for cervix cancer the detectable preclinical phase probably includes the whole range from dysplasia through to occult invasive cancer. In screening for cardiovascular disease with cholesterol or hypertension, physiological markers of risk are being identified rather than detectable precursors. Nevertheless, such a point in time can be considered to be a preclinical phase. The key principle is that there should be a phase that screening detects prior to when clinical diagnosis of the disease is possible.
The natural history of the condition should be known. Ideally, such a requirement implies that it is known at what stage in the disease process progression, disability and/or death can no longer be prevented. If such information is available and the stage that the development of the disease had reached in individuals is determinable, it would be possible to decide precisely when a screening test should be applied in order to achieve maximum benefit and minimal overutilization of resources. Unfortunately it seems unlikely that knowledge will be accumulated to be able to determine the natural history of disease in individuals in such a precise fashion. It is recognized that the rate of progression of clinically detected disease from the point of diagnosis to cure or to death varies substantially in different individuals. The distribution of rates of progression of preclinically detectable disease that might be identified by screening is likely to be equally wide. Thus, although an objective for research on screening has to be to determine the extent of the distribution of the sojourn times of the detectable preclinical phase, in considering the introduction of screening programmes and the scheduling of tests within programmes, it is necessary to balance benefits with costs. This means that a schedule will have to be determined that will enable the detection of the maximal number of still curable cases compatible with the longest interval between tests.
It should also not be assumed that disease processes are inevitably progressive. For cancer of the cervix, for example, it has been determined that in situ carcinoma may undergo regression in a large proportion of cases (Boyes et al. 1982), as do most cases of mild and moderate dysplasia of the cervix (Holowaty et al. 1999). Such conclusions have substantial implications with regard to the optimum frequency of screening examinations. Designing a programme directed to those lesions that, in the absence of screening will progress and more rapidly escape curability, if they can be identified, will be the appropriate approach. Designing a programme which maximizes the detection of cases with good prognosis, but which in the absence of screening may be unlikely to progress, will waste resources.
The disease should be treatable, and there should be a recognized treatment for lesions identified following screening. This principle has been elaborated as follows:
There should be evidence of the effectiveness of treatment of lesions discovered as a result of screening in reducing mortality and the level of improvement expected should be stated; [and secondly,] there should be reasonable expectation that recommendations for the appropriate management of the lesions discovered from a screening programme will be complied with both by the individual with the lesion and by the physician responsible for his (or her) health care. (Miller 1978)
Screening programmes should only be set up when there are adequate facilities for treating lesions discovered as a result of screening and functioning referral systems for securing such treatment. There is obviously no point in establishing a screening programme and identifying lesions that should be treated if the facilities are not available, or the infrastructure for referral, confirmation of diagnosis, and treatment is not in place. In general this is not a problem for technically advanced countries, but it can be for developing countries. Unfortunately problems allied to these have occurred. Thus on occasions it has not been certain whether or not lesions identified as a result of screening should be regarded as true disease precursors. When lesions are first identified in a screening programme, information may not be available as to their appropriate treatment, and special studies may be required. Otherwise, errors in terms of observation rather than treatment on the one hand, or too extensive treatment on the other, are possible. In prostate cancer screening, for example, if too radical treatment is applied in the elderly to the latent or good prognosis prostate cancers that may be identified in a screening programme, the morbidity in terms of incontinence and impotence, and even the mortality from treatment, could offset any benefit from the earlier detection of lesions with truly malignant potential (Chodak and Schoenberg 1989; Miller 1991; Krahn et al. 1994).
A different sort of difficulty could arise when, as a result of screening, lesions are diagnosed earlier in their natural history, but in spite of this, death is still inevitable. For example, if the available screening methods will not succeed in diagnosing disease before it is outside the range of current therapy, then screening to detect such disease is not worthwhile. The early studies of screening for lung cancer suggested that this was not a condition amenable to screening, probably for this reason (Prorok et al. 1984).
There is also a difficulty with the last two criteria when some different screening paradigms are considered. For example, with prenatal screening of the mother, as in maternal serum screening (or triple marker screening), a marker is being sought for risk of congenital anomaly in the fetus (Wald et al. 1997). A precursor to the disease is not sought, rather disease in the fetus is being identified. In the absence of truly effective fetal surgery for most of these conditions, treatment is not available. However, screening can lead to greater information for the family which can lead to better preparation for delivery or assist with decisions about termination of the pregnancy.
The screening test to be used should be acceptable and safe. In general, this implies a non-invasive test with high validity. Other criteria of a good screening test include ease of use and relatively low cost. These principles and various approaches to assessing validity will now be discussed.
The validity of a screening test
Two measures suffice to describe the validity of screening tests: sensitivity and specificity.
Sensitivity is defined as the ability of a test to detect all those with the disease in the screened population. This is expressed as the proportion of those with the disease in whom a screening test gives a positive result.
Specificity is defined as the ability of a test to identify correctly those free of the disease in the screened population. This is expressed as the proportion of people free of the disease in whom the screening test gives a negative result.
Table 1 illustrates how these measures can be derived. These two terms may be further expressed in terms of test results as follows: sensitivity is calculated as the true positives divided by the sum of the true positives and false negatives and expressed as a percentage; specificity is calculated as the true negatives divided by the sum of the true negatives and the false positives and expressed as a percentage.
Table 1 Derivation of the sensitivity and specificity of a screening test
In practice, difficulties with these measures arise over defining a positive result from the test as well as distinguishing the true positives from the false positives among those who test positive, and the true negatives from the false negatives among those who test negative. A relatively imperfect test of a quantitative continuously distributed measurement can be artificially given a very high sensitivity by setting the boundary between negative and positive to incorporate a high proportion of those who are eventually found to have the disease in the positive category, but at a substantial cost in terms of low specificity. Conversely, the same test can be made to appear highly specific, but will then become insensitive, if the boundary between positive and negative is shifted in the opposite direction.
If the test result is expressed in a quantitative form so that the boundaries between what is defined as positive and negative can be varied at will, it is possible to plot a receiver operating characteristic curve (Swets 1979). What is plotted is the sensitivity on the vertical axis and 1-specificity (proportion of false positives) on the horizontal axis. The point on the curve that is chosen as optimal is that furthest from the 45° diagonal. Receiver operating characteristics curves are most easily derived for blood tests, but have also been applied to mammography, by varying the extent to which different mammographic abnormalities were regarded as an indication of suspicion of malignancy (Goin and Haberman 1982). Such curves cannot be applied to a test with a dichotomous outcome. Furthermore, they imply a similar weight to sensitivity and specificity, which as discussed below may not be ideal.
The position of the boundaries that are set between what is regarded as disease and non-disease (or benign disease) can also considerably influence the numerical values placed on sensitivity and specificity. This arises because of uncertainty as to what truly constitutes an abnormality in the context of a screening programme. In order to come to such a decision it is essential that the conditions identified as a result of screening should have a known natural history. However, as discussed above, such knowledge may not be available at the initiation of a screening programme and may only be obtained as a result of careful study of findings from screening programmes.
Nevertheless, the definition as to what constitutes disease is crucial in order to determine sensitivity and specificity. Most people have a clear idea as to what they regard as disease in terms of that which surfaces in standard medical practice. By definition, screening is conducted on asymptomatic individuals, so that many conditions that are identified through screening are likely to be at an early stage and may not have the generally recognized clinical characteristics of relatively advanced disease. This difficulty should theoretically be overcome by having clearly defined definitions of disease. However, for cancer, diagnosis is usually made on histology, and histology only imperfectly characterizes behaviour, especially for lesions within the detectable preclinical phase. For cancer, one hope for the future is that some of the markers for prognosis currently being evaluated such as markers of oncogene expression or other markers of DNA change may serve to identify those precancerous or in situ components of the detectable preclinical phase that are likely to progress.
A common error in evaluating potential screening tests is to determine the sensitivity by utilizing the experience of the test in relation to people who have clinical disease. A test that may appear to be highly sensitive under these circumstances may later be found to be much less sensitive when its ability to detect the detectable preclinical phase is evaluated. A similar error is substituting an intermediate marker as the gold standard for calculating sensitivity and specificity. For example, the test characteristics for a thyroid assay may be compared with a putatively more accurate assay, rather than the presence of the condition in the individual tested. Therefore, in the screening context, sensitivity and specificity may vary according to whether they are estimated for early disease or preclinical lesions, and sensitivity for both should be determined in active screening programmes. To do so for specificity is very much easier than for sensitivity. This is because the diagnostic process put in train by a positive screening test generally fairly rapidly identifies those who have the disease and thus distinguishes the true from the false positives. As under most circumstances the proportion of those who have the disease in relation to the total population screened is low, a very good approximation to specificity is obtained by calculating the proportion of all those who tested negative of the sum of the test negatives and false positives. Including the unidentified false negatives in the numerator and denominator of this expression will in practice introduce little error.
However, sensitivity is a difficult measure to determine initially in a screening programme. The reason is that the false negatives are not immediately apparent, as there is no justification to retest all the test negatives just to identify a few false negatives. Only by following the total population who screened negative is it eventually possible to identify those who had the disease at the time the test was administered but were not so identified at the time of screening. This is facilitated if test materials are retained; for example, cervical smears or mammograms originally classed negative can be reassessed for those who are found to have disease at the next scheduled screen, or who develop disease during the interval between screens. Such reassessments should preferably be made blind to avoid bias. We have used such an approach in the assessment of the sensitivity of the ‘reader error’ for cervical cancer screening (Boyes et al. 1982) and in the assessment of the sensitivity of mammography in a trial of breast cancer screening (Baines et al. 1988).
When test materials cannot be retained, such as in the assessment of the sensitivity of physical examination as a screening test for breast cancer, and for what we have called the ‘taker error and the biological component’ of false negatives in cervical cytology, i.e. disease that was indeed present but was not incorporated in the smear or for some reason did not exfoliate (Boyes et al. 1982), a direct identification of false negatives will not be possible. The usual approach, which we used in estimating the sensitivity of physical examination (Baines et al. 1989), is to assume that disease occurring within a certain period are false negatives. However, a possibly more satisfactory approach is to assess the expected detection rate of disease on screening after repeated screens, assuming that most of the false negatives had by then been identified, and to regard the excess disease above this level at the second screen as a measure of the false negatives at the first screen. As a result of such an approach it was determined that the taker and biological component of false negatives was approximately equal to the directly measured reader error, so that the level of sensitivity for cervical cytology approximated to 78 per cent (Miller 1981).
In a workshop on screening for cancer (Miller 1978), two recommendations were made with regard to the validity of screening tests:
the sensitivity and specificity of the screening test to be used should have been evaluated and their expected values stated
there should be an acceptable programme of quality control to ensure that the stated levels of sensitivity and specificity are attained and maintained.
Quality control involves issues that concern not only the validity of the screening test but also its safety. For example, it is necessary to ensure that radiation exposure does not drift upwards in a mammography screening programme. Quality control encumbers the training of those who will actually administer and read screening tests, their supervision, and the introduction of procedures to check actively on the extent to which those positive or negative are misclassified.
Quality may suffer because of overwork and boredom. One of the reasons why a recommendation was made to change the frequency of examination for most women in cervical cytology screening programmes in Canada was to avoid repetitive rescreening of normal women, and the flooding of laboratories with unnecessary and unrewarding work (Task Force 1976). The Task Force described the mechanisms for ensuring appropriate quality control. It is relevant that these requirements had to be re-emphasized more than a decade later (Miller et al. 1991a).
That such issues are not simple was underlined by consideration of observer variation in mammography reading (Boyd et al. 1982). Relevant to all screening programmes is not only the accuracy with which abnormalities are identified, but also, if they are identified, the extent to which appropriate recommendations are made on their management. Our experience suggested that including a category of ‘probably benign’ in a screening mammography report increases the extent of observer variation. Readers differ substantially in the extent they use this category, the extent to which they recommend special observation of individuals placed in this category, and the extent to which they recommend biopsy. Dual reading helps to increase specificity without much, if any, loss of sensitivity. This permits the simplification of recommendations into two groups—’suspicious of malignancy’ and ‘satisfactory (normal)’ examination—and results in far greater consistency. Furthermore, it is compatible with the appropriate separation of findings from screening tests into the probably abnormal (test positive) and probably normal (test negative) dichotomy. The probably abnormal group is subjected to diagnostic tests in the normal way. This approach to use of screening mammography was accepted with difficulty in North America, because of an initial tendency for most radiologists to regard mammography as a diagnostic rather than a screening test. This resulted in greater use of biopsy as a diagnostic test in North America than that reported from Europe (McLelland and Pisano 1992), where more use was made of diagnostic mammography subsequent to screening mammography (often called by European radiologists ‘complete’ mammography) with a consequent reduction in biopsies and a much lower benign to malignant ratio.
Most commentators in the past, when considering the relative weight to be placed on sensitivity and specificity, tended to encourage high sensitivity at the cost of relatively low specificity, as it was felt important to attempt to avoid missing individuals who truly had disease. One vigorous exponent of this view for breast cancer screening was Moskowitz, who coined the term ‘aggressive screening’, as he felt that ‘minimal’ breast cancers with an excellent prognosis would only be identified by such an approach (Moskowitz et al. 1976). However, there continues to be little evidence that such cancers are really responsible for the mortality reduction following breast cancer screening (Miller 2000a). Rather, there is much evidence that the early diagnosis of more advanced disease results in a benefit (Miller 1987, 1994). A disadvantage of aggressive screening was a high benign-to-malignant ratio and low specificity of the screen. Although the objective of screening is to identify disease in the detectable preclinical phase before it reaches the stage of escaping from curability, if a test is made so sensitive that it picks up lesions that would never have progressed in that individuals lifetime, there will be substantial additional costs for diagnosis and treatment (this is one consequence of the ‘overdiagnosis’ bias, which is more fully discussed in relation to survival of cases following screen detection below). There is no point in identifying through screening disease which would never have presented clinically, and little point (other than less radical therapy) in identifying early disease that would have been cured anyway if it had presented clinically. Similarly, identifying disease that results in death, even following screen identification and subsequent treatment, only results in greater observation time and no benefit to the screenee. It is only disease that results in death in the absence of screening, but which is cured following treatment after screen detection, from which the real benefit of a screening programme derives. Hence, if high ‘sensitivity’ is largely based on finding more good prognosis disease, but results in lowering specificity, the programme will incur much greater costs without corresponding benefit.
The process measure, as distinct from a measure of validity, that most clearly expresses this difficulty is the predictive value of a positive screen. This is defined as the proportion of those who test positive who truly have the disease. This measure is influenced not only by the sensitivity and specificity of the test, especially the latter, but by the prevalence of disease in the population, whereas sensitivity and specificity are invariant with regard to disease prevalence. If tests are administered under circumstances that incur a low predictive value positive, then not only may costs be high in terms of correctly identifying those who are falsely positive, but also the potential hazard may be high, as an individual classified as positive falsely derives no benefit and potentially a substantial risk from the associated diagnostic procedures. A test with a low predictive value positive rapidly enters into disrepute.
To complete discussion of process measures, the predictive value negative should be defined. This is the proportion of those who test negative who are truly free of the disease. This measure, like sensitivity, is dependent on identifying the false negatives, and therefore is rarely determined while being of little operational value. In practice, however, it is usually high.
As Day (1985) has pointed out, because of the difficulty in identifying false negatives, and because of the overdiagnosis bias, the usual approach to defining sensitivity is not ideal, nor particularly biologically meaningful. He suggested an alternative measure of sensitivity which can be derived if the expected incidence of disease in the absence of screening can be determined, ideally from the control group in a randomized trial, but sometimes in population based programmes from historical data or data from comparable unscreened populations. The method basically computes the extent a programme is successful in reducing the expected incidence of disease in the absence of screening. The lower the proportion of expected incidence occurring after screening the greater the sensitivity.
The acceptability of the test
One of the desirable attributes of a good screening test is that it should be acceptable to the population to which screening is offered and acceptable to those who will administer the test. In general, cervical cytology screening programmes have found acceptance with women and their physicians, except for women who tend to be at highest risk of the disease. This results in lower effectiveness of programmes than would be the case if all women were to be included. This lack of acceptance is largely related to lower socio-economic status, where presumably other health concerns take precedence over a long-term preventive manoeuvre such as screening.
Breast cancer screening has encountered different problems over acceptability, although this varies substantially in different countries, ranging from the 90 per cent acceptance with screening invitations in Sweden (Tabar et al. 1985) to the difficulties with both physician and women compliance in the United States (Howard 1987). In Europe, a median uptake of 74 per cent has been reported (European Society for Mastology 1993).
Screening for colorectal cancer also has its own difficulties, particularly in the inevitable distaste of individuals for a procedure that involves manipulation of faeces. In a number of pilot programmes, therefore, the return rates for haemoccult slides have been low, although they have been better in well-organized studies (Chamberlain and Miller 1988), and achieved approximately 75 per cent in the Minnesota Colon Cancer Screening Trial (Mandel et al. 1999).
Therefore a screening test has to be acceptable to the population in its widest sense. The test should be simple and as far as possible easily administered. It should involve procedures that are not unacceptable, and its use should not have unpleasant or potentially hazardous implications. There are also economic advantages in a test being administered or read by allied health professionals, such as use of technologists in initial screening of cervical cytology slides (Anderson 1985), or the use of nurses to perform breast examinations (Bassett 1985; Miller et al. 1991b).
The ethics of screening
In general medical practice the special nature of the relationship between patient and physician has dictated the need to build up a core of ethical principles that govern this relationship. Furthermore, it is generally accepted that additional issues arise when a patient becomes the subject of a research investigation that is superimposed upon his or her search for and receipt of appropriate medical care. However, it was not initially appreciated that screening opened up a completely new spectrum of issues, possibly requiring more restrictive boundaries of ethical behaviour than those applied in usual medical care. For example, when a patient goes to see a physician for relief of a symptom or treatment of an established condition, the physician is required to exercise his or her skills only to the extent that knowledge is currently available, while doing what is possible with available expertise and appropriate assistance to help the patient. Treatment may be offered without any implied guarantee that it is necessarily efficacious or will do more than just temporarily relieve the symptoms of which the patient complains. Thus the physician promises to do his or her best for the patient; there is no implied promise that the patient will be cured.
In screening, however, those who are approached to participate are not patients and most of them do not become patients. The screener believes that as a result of screening the health of the community will be improved. He or she does not necessarily intend to imply that the condition of every individual will be improved. However, screening is often promoted as if it implies a benefit to everyone who is screened. In fact, in some circumstances individuals included in a screening programme may be placed at a disadvantage, as discussed above. Furthermore, the harm from a screening test is not only related to the risk of being false positive or false negative. Those who are screened may also incur psychological consequences, sometimes merely from being labelled as being at risk of disease (Glanz and Gilboy 1995). Therefore, at the very least, those planning to introduce a screening programme should be in a position to guarantee overall benefit to the community and a minimum of risk that certain individuals may be disadvantaged by the programme. It was the inability to guarantee overall benefit and lack of disadvantage for those screened that led to the proscription of mammography in women under the age of 50 in the Breast Cancer Detection Demonstration Projects in the United States (Beahrs et al. 1979).
A second ethical issue, which is directed more to the obligations for appropriate care in the community than towards individuals, concerns how limited resources are equitably distributed across the whole community to obtain maximum benefit. Under certain circumstances the offer of screening could diminish the total level of health in a community. This may be a particular problem for developing countries by diverting resources intended for routine health care into screening. Thus resources diverted to a screening project, which might be regarded as prestigious, especially if involving high technology, could lower the resources available for other more pressing but also more mundane health problems. Although several screening programmes have been proposed for developing countries, there is a particular need for caution and care in order to ensure that they do not overbalance the health care system in the area in which they are introduced.
A final ethical dilemma for screening programmes is how to implement informed consent. Information about risks and benefits of tests and treatments are expected to be provided in usual clinical practice. For screening, providing information about the test alone is not sufficient. Information about the consequences of the test, the diagnostic assessment process and the diseases to be detected, and their treatments should also be presented, if a truly informed decision is to be made. Presenting such a large amount of information is obviously difficult, particularly in a primary care setting where several screening tests may be done at the same time. Furthermore, presenting such information becomes even more cumbersome when the evidence base for a screening test is controversial, such as the case with prostate screening.
Therefore screening programmes carry an ethical responsibility as least as great as that for medical practice in that approaches to participate are made to ostensibly healthy people. Indeed, the burden of proof for efficacy of the procedures and the necessity to avoid harm are greater than may be required for diagnostic or therapeutic procedures carried out when a patient presents with symptoms to a physician. In screening the physician or public health worker initiates the process and he or she bears the onus of responsibility to be certain that benefit will follow.
The population to be included in screening programmes
For a screening programme to be successful, the population to be included should be one in which it is known that the disease has a high prevalence. This will not only encourage a high predictive value for a positive test, but it will also tend to promote higher quality of performance and assessment of results of screening tests, and will result in lower costs per case detected. Thus in all screening programmes it is desirable to attempt to include only those who are at risk of the disease and to concentrate particularly on those who are at high risk of the disease. This approach was recognized by the Canadian Task Force on Cervical Cytology Screening Programs (Task Force 1976) which defined those whom it believed were at such low risk for the disease that they need not be included in cervical cytology screening programmes, thus defining the remaining ‘at-risk’ population on whom major efforts should be concentrated to bring them into screening. In the case of cardiovascular disease, family history and factors such as smoking have been used by some to define populations to be screened (Toronto Working Group 1990).
However, the known risk factors, apart from age, for other diseases may not suffice to distinguish adequately between those who should be considered for inclusion in screening programmes compared with those who should not. For breast screening, for example, although some discrimination using risk factors has been achieved (Schechter et al. 1986), this has not been sufficient to justify selection on this basis alone. However, age is an important predictor of risk, and for breast cancer in technically advanced countries, all women in the appropriate age group can be regarded as at high risk. Thus for breast cancer currently, it seems unlikely that any programme could justify routine screening of women under the age of 40, while screening women aged 40 to 49 with mammography is controversial.
One possible approach to concentrating on the relevant segment of the population for screening might be to administer a prescreening test, especially if for a marker for a factor necessary in the causation of the disease. Such a test can be envisaged for human papilloma virus infection as a prescreen for cancer of the cervix (Miller et al. 2000b), although a difficulty here is the high proportion of infections that are self-limiting without the development of high-grade cervical intraepithelial neoplasia. This means that the test is too non-specific if used among women under the age of 35. The development of genetic susceptibility testing opens up the possibility of prescreening for a range of diseases. For example, women found to carry genes showing susceptibility to breast cancer may undergo screening with MRI.
Hakama (1986) has pointed out that in programmes that attempt to select for screening on the basis of risk, there will usually be cases occurring in the unscreened group. Another consequence of such programmes, however, will be reduced numbers of false positives (in absolute terms) which with the increased prevalence of the disease will result in a higher predictive value positive of the screen. Hakama (1984) coined the terms ‘programme sensitivity’ and ‘programme specificity’ which help in understanding the effects of screening concentrating on high risk. The more a programme concentrates on ‘high-risk’ groups, the lower the programme sensitivity will be, as more and more cases will occur in unscreened people Conversely, however, the programme specificity will increase because of the increase in healthy people unscreened with a reduction in the costs of screening. The reduction in programme sensitivity will result in a reduction in the overall effectiveness of the programme because of the imprecision by which high-risk groups are identified, therefore the overall result of such an approach could be unacceptable.
One other approach to using risk factors is to help determine the optimal periodicity of rescreening. Once again, however, much of the necessary research is incomplete, and we do not know how appropriate such an approach may be. It will probably be necessary to calculate the marginal cost-effectiveness of extending screening from high- to low-risk groups (i.e. the additional cost for such an extension of screening related to the increase in effectiveness of the screening) in order to make the necessary policy decisions.
Diagnosis and treatment of the discovered lesions
As a screening test is not diagnostic, inevitably the success of the programme will ultimately depend upon the extent those identified as having an abnormal test result accept the procedures offered to them for further evaluation, and the effectiveness of the therapy offered.
A number of difficulties may arise. For example, in the initial phases of many breast cancer screening programmes, it was necessary to demonstrate to the general community of medical practitioners that the abnormalities identified were indeed of importance and that they required care and expertise to biopsy. Indeed, in the absence of skills in diagnosis and management, there can be unnecessary biopsy (potentially reducible by the use of diagnostic mammography and fine-needle aspiration biopsies) as well as failure to excise the lesion when biopsy is performed. This is part of the spectrum of problems that arise over the fact that lesions may be identified in screening programmes whose biological features, natural history, and other characteristics may be in doubt. The screening participants may require special education so that they understand the diagnostic process to reduce as far as possible one of the major adverse consequences of screening, the anxiety accompanying the identification of an abnormality, as well as ensuring that they comply with the recommendations for management. There may even be major disagreements over the histological interpretation of the excised lesions, with uncertainties over the borderline between benign and malignant. Thus the public and the professionals at all levels in a screening programme may require education and/or retraining dependent on their responsibilities. One mechanism of reducing difficulties in the professional area that should be encouraged is the provision of special diagnostic and treatment centres where the necessary expertise in diagnosis and management can be concentrated and where the necessary facilities are available (Miller and Tsechovski 1987). Such centres could be regionally based, serving a number of screening centres.
Evaluation of screening programmes
A number of issues have to be noted when evaluating screening programmes. Almost invariably individuals with disease identified as a result of screening will have a longer survival time than those diagnosed in the normal way. Four biases associated with screening explain this. The first is ‘lead time’, defined as the interval between the time of detection by screening and the time at which the disease would have been diagnosed in the absence of screening. In other words, it is the period by which screening advances the diagnosis of the disease. For example, if as a result of screening, the average point of diagnosis is advanced by 1 year, then inevitably cases diagnosed by screening will survive 1 year longer even if there is no long-term benefit. It is important to recognize that the lead time for different cases will vary, depending in part on the timing of the screening test in relation to the duration of the detectable preclinical phase in that case, as well as the rapidity of progression of the detectable preclinical phase in that individual. Thus there will be a distribution of lead times (Morrison 1985). The lead time for fatal cases will be fairly short, but in one study some fatal cases have been identified as having a lead time of a year or more following mammography screening (Miller et al. 1992).
The determination of lead time is complex, but models have been developed that do so providing there are control data that permit comparison of screen detection with that expected (Walter and Day 1983).
Differential lead time can be an important factor in comparing the outcome among cases detected by different screening modalities, making it almost impossible to make a comparison based on survival, unless it is possible to estimate and correct for differential lead time (Walter and Stitt 1987).
The second bias that accounts for improved survival of screen-detected cases is ‘length-biased sampling’. This relates to the fact that individuals who have rapidly progressive disease will tend to develop symptoms that cause them to consult physicians directly. Thus only less rapidly progressive cases are likely to remain to be detected by screening. Yet the former have a poorer and the latter a better prognosis—hence the improved survival of screen-detected cases, over and above lead time. This bias is most obvious at the initiation of a screening programme, at the first or prevalent screen. However, length bias will also affect the type of cases detected at rescreening, with the more rapidly progressive cancers diagnosed in the intervals between screens. Hence in evaluating the total impact of programmes, the interval cases must be identified and taken into consideration as well as the screen-detected cases.
The third bias which can artefactually improve survival is selection bias. Those who enter screening programmes are volunteers, and almost invariably more health conscious than those who decline to enter. This means that, even in the absence of screening, they are likely to have a better outcome from their disease than the overall rates in the general population.
The fourth bias is overdiagnosis bias. This means that some lesions identified and counted as disease would not have presented clinically in those individuals during their lifetimes in the absence of screening. This is, in practice, an extreme example of length bias. It is difficult to obtain absolute confirmation of the existence of this bias, although it seems likely that it is at least in part an explanation for the substantial excess of cancers detected by prostate-specific antigen screening for prostate cancer.
The only design that effectively eliminates the effect of all these biases is the randomized controlled trial (Prorok et al. 1984), but only if mortality from the disease (i.e. deaths related to the person-years of observation) is used as the endpoint, rather than survival. Survival could be used in a randomized controlled screening trial only under special circumstances. These are that there is good evidence because of the equivalence in cumulative numbers of cases during the relevant period of observation that there is no overdiagnosis bias, and provided that the start of the period of observation of the cases is taken as the date of randomization, as that will eliminate differential lead time. This is the approach that will be used in a study of breast self-examination in Russia, where it will not be possible to follow all entrants to determine their alive and dead status at the end of the trial (Semiglazov et al. 1993). Length bias and selection bias are not issues, the latter having been equally distributed by the randomization, and the former by having started at the same point in time and by including all cases that occur during follow-up in the evaluation.
Outside a randomized trial, if the screening test detects a precursor, reduction in incidence of the clinically detected disease can be expected and evaluated. This effect has been well demonstrated in the Nordic countries in relation to screening for cancer of the cervix (Hakama 1982). If the screening test does not detect a precursor, or even if it does but the main yield is invasive cancer, then incidence can be expected to increase initially following the introduction of screening, and remain elevated while screening continues. Under such circumstances, when reduction in incidence cannot be anticipated, and improvement in survival cannot be relied upon because of the biases already discussed, the only valid outcome for assessment of results of a screening programme is mortality from the disease in the total population offered screening in comparison with the mortality that would be expected in the same population if screening had not been offered.
As already emphasized, the design of choice for evaluation of changes in mortality is the randomized controlled trial. This can be either an efficacy trial or an effectiveness trial. Efficacy trials are based on randomization of the screening test, which answers the biologically relevant question as to whether mortality is reduced in those screened. An effectiveness trial is based on the randomization of invitations to attend for screening, and more nearly replicates the circumstances that may eventually pertain in practice in a population. Both those who accept the invitation and those who refuse will have to be included in the assessment of outcome. Thus it tests the impact of introducing screening in a population. Some trials of this type involve randomization by cluster.
If for some reason randomization is believed inappropriate, a second-best method is the quasi-experimental study in which screening is offered in some areas, and unscreened areas as comparable as possible are used for comparison purposes. However, this design is not a cheap and easy way out but demands the same methodological accuracy as required for randomized trials. Furthermore, in view of the substantially larger populations that may have to be studied than in randomized trials, it may prove to be more expensive than the preferred design. In addition, difficulties in analysis may ensue if the baseline mortality in the comparison areas differ (United Kingdom Trial of Early Detection of Breast Cancer Group 1988).
Nevertheless, ethical issues may preclude the utilization of randomized trials, particularly for programmes that were introduced before the necessity of utilizing trials as far as possible for evaluation was appreciated. This has been the case for screening for cancer of the cervix for example. One approach under these circumstances is to compare the mortality in defined populations before and after the introduction of screening programmes, preferably with data available on the trends in acceptance of screening so that changes in mortality can be correlated with the mortality trends. Such a correlation study will be strengthened if other data that could be related to changes in the outcome variable are entered into a multivariate analysis (Miller et al. 1976).
A case–control study of screening is another approach that can be used to evaluate programmes that were introduced sufficiently long before the study that an effect can be expected to have occurred. Case–control studies depend on comparing the screen histories of the cases with the histories of comparable controls drawn from the population from which the cases arose. If sampled, individuals with early-stage disease would be eligible as controls, provided that the date of diagnosis was not earlier than that of the case, as diagnosis of disease truncates the screening history. However, a bias would arise if advanced disease is compared only with early-stage disease, as the latter is likely to be screen detected, although this is just a function of the screening process, not its efficacy (Weiss 1983). Cases have to reflect the endpoints used to evaluate screening, that is, those that would be expected to be reduced by screening. Thus cases are often deaths from the disease or advanced disease as a surrogate for deaths, or if a precursor of the disease is detected through screening, incident cases in the population. If incident cases are screen detected, the controls should be drawn from those screened in the same programme; if the cases are not screen detected, the controls should be population based (Sasco et al. 1986).
One difficulty with case–control studies of screening is that they may be affected by selection bias as the health conscious may select themselves for screening. This may be difficult to correct in the analysis, although such a correction should be attempted if the relevant data on risk factors for the disease (confounders) are available. However, such a bias may not be a problem if it can be demonstrated that the incidence of cancer in those who declined the invitation to the screening programme is similar to that expected in an unscreened population.
Even if data are available on risk factors for disease, control for them may not result in avoiding the effect of selection bias. For example, experience in studies of breast cancer in Sweden and the United Kingdom, where case–control studies were performed within trials, show that although those who refuse invitations for screening show a breast cancer incidence similar to that of controls, their breast cancer mortality experience is worse than that of controls. This means that the estimate of the effect of screening in such case–control studies will show a greater effect than could be expected in the total population (Miller et al. 1990; Moss 1991).
In addition to assessing effectiveness of screening, case–control studies may also be of use to assess other aspects of screening programmes. For example, a method has been proposed for estimation of the natural history of preclinical disease from screening data based on case–control methodology (Brookmeyer et al. 1986).
The cohort study design may also provide an estimate of the effect of screening. In this design the mortality from the cancer of interest in an individually identified and followed screened group (the cohort) is compared with the mortality experience in a control population, often derived from the general population. This approach has been used to evaluate the mortality experience in the United States Breast Cancer Detection Demonstration Project (Morrison et al. 1988) and in a cohort of women in Finland included in a breast self-examination program (Gastrin et al. 1994). In these studies it has to be recognized that those recruited into a screening programme are initially free of the disease of interest so that it is not appropriate to apply population mortality rates for the disease to the person-years experience of the study cohort. Rather, as is required in estimating the sample size required for a controlled trial of screening, it is first necessary to determine the expected incidence of the cases of interest, then apply to that expectation the expected case–fatality rate from the disease to derive the expectation for the deaths (Moss et al. 1987). In practice, a cohort study of screening suffers from the same problem of selection bias as for case–control studies, so the results have to be interpreted with caution.
Indirect indicators of effectiveness are often desired in evaluating screening programmes, especially one that would predict subsequent mortality. Compliance with screening, and rate of screen detection, as well as the ratio of prevalence and incidence can be indicators of potentially effective screens (Day et al. 1989). The cumulative prevalence (not the percentage distribution) of advanced disease is one such measure (Prorok et al. 1984). For example, reduction in advanced disease predicted subsequent breast cancer mortality reduction in a trial of mammography screening versus no screening in Sweden (Tabar et al. 1989). However, case detection frequency, numbers of small tumours, and stage shift in percentages of the total should not be used as indicators of effectiveness as they potentially reflect all four screening biases.
Organized screening programmes
There are a number of features of effective screening programmes that are largely related to good organization. Indeed, there is good evidence, at least for cancer of the cervix, that unorganized or opportunistic screening programmes, which depend on the willingness of individuals to volunteer for screening, and the extent to which their physicians offer screening, often to low-risk women, are far less successful (Hakama et al. 1985).
Hakama et al. (1985) defined certain essential elements of organized programmes:
the target population has been identified
the individual women are identifiable
measures are available to guarantee high coverage and attendance such as a personal letter of invitation
there are adequate field facilities for performing the screening tests
there is an organized quality control programme on performing and reading the tests
adequate facilities exist for diagnosis and for appropriate treatment of confirmed abnormalities
there is a carefully designed and agreed referral system, an agreed link between the participant, the screening centre, and the clinical facility for diagnosis of an abnormal screening test, for management of any abnormalities found, and for providing information about normal screening tests
evaluation and monitoring of the total programme is organized in terms of incidence and mortality rates among those attending and among those not attending, at the level of the total target population
quality control of the epidemiological data should be established.
Although these elements are present in many European cancer screening programmes, especially in the Nordic countries, and contribute greatly to their success, several elements are missing from programmes elsewhere, especially those largely based on the private medical care system in North America. In Canada, there are opportunities for introducing some of them, such as the first three, and these were recommended by the two Canadian Task Forces on cervical cancer screening (Task Force 1976, 1982). Unfortunately, only three of the provincial health care authorities (Ontario, Manitoba, and British Columbia—the latter having accepted from the beginning the need for centralized laboratory services) have taken the initiative in establishing such programmes. However, all provinces that introduced breast-screening programmes accepted from the outset the necessity for them to be organized (Workshop Group 1989), thus attempting to replicate the organization of breast cancer screening that is proving successful in some of the Nordic countries, The Netherlands, and the United Kingdom.
Health-related quality of life and screening
An important evaluation measure for screening is the extent to which overall quality of life is improved or impaired by screening compared with usual care. Decision-making for health care policy is only possible if information is available on quality of life as well as health costs of screened and unscreened participants including mortality reduction from screening. For example, it requires an ‘optimistic’ estimate of screening effectiveness to derive an overall benefit from screening for prostate cancer (Krahn et al. 1994). Issues concerning health-related quality of life may well vary with different cultural value systems, and different health care systems.
Because of lead time, health-related quality of life events will tend to occur earlier in life than similar events associated with usual care. Given that the adverse quality of life associated with false-positive screening tests, and those associated with treatment will tend to occur relatively early, it could be easy to convince oneself (as it has convinced some commentators for prostate cancer screening already) that the health-related quality of life issues are overwhelming and that screening should not be conducted. It will require prolonged follow-up, probably more than 10 years, for the detriments associated with advanced disease late in life that may be prevented from occurring in the screened group, to appear in the non-screened group.
If the outcome of screening were to be a major benefit in terms of mortality reduction, the issues related to health-related quality of life would be overwhelmed. It is only if the outcome is a moderate to small mortality reduction that these issues become critical, and paradoxically then it would be necessary for them to have been measured with as much precision as was possible during screening, as, particularly for health-related quality of life, the decrements could not be measured retrospectively with precision. For this reason, in screening trials where adverse health-related quality of life can be anticipated, it is important for such events to be identified and quantified.
Health-related quality of life measurement is a new and developing field. There are many instruments available for assessing health-related quality of life. In the screening setting a range of instruments are often required. Disease-specific measures assess the impact of particular diseases, for example, prostate-specific symptoms can be assessed with the UCLA Prostate Cancer Index (Litwin et al. 1998). However, since many screening tests can have a range of effects, generic instruments are also often included in screening studies. The most widely used such instrument is the 36-item Short Form Health Survey (SF-36) (Ware and Sherbourne 1992). Such psychometric instruments are not easily applied in economic analysis, and so utility- or preference-based instruments are often used. The Patient Oriented Prostate Cancer Utility Scale (PORPUS) (Krahn et al. 1996) and Health Utility Index (Torrance et al. 1996) are examples of disease-specific and generic utility measures. The psychological impact of screening tests is often not assessed with these health measures and specific scales, such as the State-Trait Anxiety Inventory, have been used for such effects (Spielberger et al. 1970; Goel et al. 1998).
Economics of screening
Space does not permit a detailed evaluation of the various principles that have to be considered in assessing the economics of screening. In brief, it is necessary to determine the costs of the test and the subsequent diagnostic tests. The costs associated with any hazard of the test, as well as the costs of overtreatment, should also be included. These costs may be balanced by reduced costs of therapy of the primary condition, reduced costs associated with less expenditure on the treatment of advanced disease, and the economic value of the additional years of life gained. This can become quite complex when the value of treatment of disease in years of life gained, transfers such as pensions, and economic productivity are considered. The latter is often disputed, if not regarded with some distaste, so that often what is computed is the cost per year of life saved. The marginal costs of additional tests in relation to the benefit may be critical, especially when considerations of the frequency of rescreening arise.
Part of the difficulty in economic assessment is that costs are often incurred early, while benefits flow later, so that for proper comparisons of such costs they have to be discounted to the present day. Additional complexity ensues if attempts are made to assess quality of life in economic terms, while the calculations rarely attempt an economic assessment of the fact that if a death is prevented by screening, the relevant individual will inevitably die of some other condition, and that death could be more costly.
It is likely that economic assessments will increasingly guide policy decisions in the future, so that those interested in evaluation of screening must collect the necessary data. Although some economic assessments have suggested that cost-effective programmes are achievable (e.g. programmes of breast cancer screening using single-view mammography in Sweden (Jonsson et al. 1988)), others have suggested that programmes may not be cost-effective (e.g. breast cancer screening programmes for younger women in the United States (Eddy et al. 1988)). Economic analysis is particularly important for making decisions within screening programmes, for example around screening intervals or method of follow-up.
In this section, we shall summarize our conclusions on the appropriateness of screening for several cancer sites.
Screening for lung cancer
Both the UICC project (Prorok et al. 1984) and the American Cancer Society (1980) concluded that screening with sputum cytology and/or chest radiographs could not be recommended. The conclusive nature of the negative evidence from the three American controlled trials was such that the National Cancer Institute working guidelines (Early Detection Branch 1987) did not discuss screening for this site. However, there has been concern that screening using annual chest radiography has never been properly evaluated, and therefore this is being re-examined in a large study evaluating screening for a number of cancer sites (Gohagan et al. 1995). Furthermore, there is some indication that spiral CT scanning may diagnose lung cancers at a much earlier stage than conventional chest radiography (Henschke et al. 1999). Therefore it is possible that screening by this modality will be re-evaluated. The increasing proportion of lung cancers being diagnosed in ex-smokers in North America is adding a clinical imperative to the need for reconsideration of screening in those at high risk for lung cancer.
Screening for breast cancer
It has been recognized for some time that mass screening for breast cancer can reduce mortality from the disease (Day et al. 1986; Miller et al. 1990). Both single-view mammography alone and double-view mammography combined with physical examination are effective as screening modalities. Current data are insufficient to determine whether appreciable extra benefit in terms of mortality reduction derives from adding physical examination to mammography, or from double-view as distinct from single-view mammography. However, it now seems that mammography adds little extra benefit to screening by physical examination, a question raised by the working group to review the uncontrolled United States Breast Cancer Detection Demonstration Projects (Beahrs et al. 1979) and investigated in the Canadian National Breast Screening Study in women aged 50 to 59 on entry to the study (Miller et al. 1981, 1992).
The American Cancer Society guidelines for breast cancer detection are that every woman should be urged to practice breast self-examination every month from the age of 20, that women should have a breast physical examination every 3 years from the age of 20 and every year from the age of 40, and that mammography should be given every 1 to 2 years from 40 to 49 and every year from the age of 50 (Mettlin and Smart 1994) However, the United States Preventive Services Task Force (1996c) did not recommend mammography screening for women aged 40 to 49, and the National Cancer Institute, after accepting that the scientific evidence does not confirm efficacy of screening in women aged 40 to 49 (Kaluzny et al. 1994), reversed that position later despite the recommendations by a consensus conference (National Institutes of Health Consensus Development Panel 1997).
Organized breast screening programmes, all involving mammography and women aged 50 to 64 (or 69), have been set up in Canada and several European countries (e.g. Finland, The Netherlands, Sweden, the United Kingdom), but only some counties in Sweden actively invite women aged 40 to 49 for screening. The majority invite women to return every 2 years (every 3 years in the United Kingdom). It is still too early to judge the effectiveness of these programmes, but it is likely that mortality reductions attributable to screening will be seen within a few years in those programmes that have achieved the planned level of compliance (70 per cent or more), although these could be less than has been anticipated (Miller 2000).
The other screening test for which we currently have little evidence of effectiveness is breast self-examination. Only breast self-examination has the potential to improve the outlook for interval cancers, while its teaching probably goes some way to diminish false reassurance, and it has the potential to provide early diagnosis of breast cancer in many parts of the world (Miller et al. 1985). Two case–control studies have shown no overall benefit in the reduction of advanced disease (Muscat and Huncharek 1991; Newcomb et al. 1991), but one suggested benefit in breast self-examination compliers (Newcomb et al. 1991). A cohort study of breast self-examination compliers in Finland suggested a benefit in reducing breast cancer mortality (Gastrin et al. 1994), while a case–control study nested within the Canadian National Breast Screening Study also showed benefit from good breast self-examination practice in reducing breast cancer mortality and the cumulative prevalence of advanced (metastatic) breast cancer (Harvey et al. 1997).
Screening for cancer of the cervix
It is generally agreed that screening for cancer of the cervix is effective in reducing the incidence and mortality from the disease, but that for maximal effectiveness attention needed to be paid to the organizational aspects of screening (Hakama et al. 1985; Miller et al. 1990). The organized programmes that have shown the greatest effect, while using fewer resources than the unorganized programmes.
An IARC study, based on the records of a number of screening programmes in Europe and Canada (IARC Working Group on Cervical Cancer Screening 1986) indicated that starting at age 25 and stopping at age 64 with 3-year intervals gives 90 per cent of the maximal protection and only requires 13 tests a lifetime. Even screening starting at age 20 with annual screens gives only just over 90 per cent protection yet requires 45 tests per woman. The difficulty with annual screening, even in a wealthy country, is that it places emphasis on rescreening women already in programmes, while the emphasis needs to shift to bring women who are poorly screened, or not screened at all, into programmes if failures of screening policies are to be avoided (Task Force 1976; Chamberlain I>et al. 1986; Miller et al. 1991).
Another aspect of cervical cancer screening requires more attention. There are substantial costs associated with the management of the large numbers of cases of mild and moderate dysplasia found as a result of cervical cytology, the majority of which will regress spontaneously (Holowaty et al. 1999). A marker that may enable the identification of the lesions that would progress if left untreated is needed; tests for oncogenic types of human papilloma virus may provide this, but only in women over the age of 35 (Miller et al. 2000b). However, time is on the physician’s side, and cytological surveillance is appropriate for mild if not moderate dysplasia (cervical intraepithelial neoplasia grade I or II) until it is clearer whether the lesion will regress spontaneously (Miller et al. 1991; Holowaty et al. 1999).
In developing countries there are often insufficient resources to support a cytology-based screening programme for cancer of the cervix. Therefore attention has shifted to attempts to detect early disease by visual inspection of the cervix with the naked eye using a speculum. Unaided visual inspection is too insensitive, and non-specific, but visual inspection after acetic acid application to the cervix seems to have equivalent sensitivity to cervical cytology (Chirenje et al. 1999; Miller et al. 2000b). Although specificity is lower than for cytology, attempts are now being made to improve upon this.
Screening for gastric cancer
Screening programmes for gastric cancer were introduced in Japan over 20 years ago (Chamberlain et al. 1986). The screening test used has gradually been standardized and now comprises a photofluorographic barium meal technique with six standard views. Considerable observational evidence, including time trend analyses and case–control studies, has accumulated in Japan that the widespread application of screening has contributed to a fall in mortality, although its contribution is probably small in relation to that resulting from falling incidence (Chamberlain et al. 1986; Miller et al. 1990). In view of the uncertainty over its effectiveness, screening for gastric cancer in countries other than Japan cannot at present be recommended as public health policy.
Screening for colorectal cancer
In contradistinction to gastric cancer screening, a number of controlled trials of colorectal cancer screening have been conducted. The earliest evaluated rigid sigmoidoscopy as part of a multiphasic health screen. Although a reduction in mortality from colorectal cancer was seen in the study group, this is probably due to chance and not due to the effect of sigmoidoscopy (Selby et al. 1988).
Further evidence to support sigmoidoscopy as possibly appropriate for colorectal cancer screening has come from two case–control studies (Newcomb et al. 1992; Selby et al. 1992). The report of Selby et al. (1992) indicates an apparent benefit from sigmoidoscopy lasting for up to 10 years. However, as such studies cannot eliminate the effect of selection bias, benefit may have been overestimated. This is why there is a trial in the United States evaluating the effect of flexible sigmoidoscopy, initially conducted at 3-year intervals (Gohagan et al. 1995) and now every 5 years, and one in the United Kingdom evaluating one-time-only sigmoidoscopy (Aitkin et al. 1993).
All the other trials have evaluated the effect of the faecal occult blood test. Of these, two in the United States and two in Europe have reported mortality results. One of these, in New York, evaluated the effect of the addition of the faecal occult blood test to routine sigmoidoscopic screening (Flehinger et al. 1988; Winawer et al. 1993).
The other trial, in Minnesota, used the faecal occult blood test alone, annually in one group and biennially in another. The initial mortality result indicated that annual but not biennial faecal occult blood tests reduce mortality from colorectal cancer after about a 10-year period (Mandel et al. 1993). This was achieved at a substantial cost in terms of false-positive results. A more recent report, with follow-up to 18 years, confirmed the colorectal cancer mortality reduction from annual screening, but also showed mortality reduction at a lower level from biennial screening (Mandel et al. 1999). The trials in Europe also showed mortality reduction from biennial screening (Hardcastle et al. 1996; Kronborg et al. 1996).
It is clear, especially from the Minnesota trial, that a major difficulty with screening using the faecal occult blood test is lack of specificity, especially if the test is rehydrated. Furthermore, there seems to be a lack in sensitivity for adenomas.
Taken together, the faecal occult blood test trials suggest that, after an interval of about 10 years, there could be a reduction of up to 20 per cent in colorectal cancer mortality from biennial screening, and higher for annual screening. However, with the likelihood that the relatively high compliance achieved in the Minnesota trial could not be replicated in the population, the benefit that could be obtained would probably be much less.
Screening for prostate cancer
Screening for prostate cancer using the digital rectal examination is recommended by the American Cancer Society. However, it is not clear that this is a sensitive screening test for early disease. Other screening tests under consideration include the prostate-specific antigen and transrectal ultrasound, although the latter may be of more value as a diagnostic test (Miller et al. 1990). In the United States, the American Cancer Society recommends screening with prostate-specific antigen yearly from the age of 50 (Mettlin et al. 1993).
There are many obstacles in the way of an effective screening programme for a disease that is a relatively unimportant cause of premature mortality. Not only has an acceptable and valid screening test to be available, but an acceptable and effective treatment for the preclinical lesions found as a result of screening (Miller et al. 1990). This problem is particularly acute for prostate cancer because of the increasing frequency of latent prostate carcinoma with increasing age and the not inappreciable morbidity and mortality of the radical procedures usually used to treat prostate cancer. There is no question that it is necessary to establish the effectiveness of screening programmes for prostate cancer by well-designed randomized trials, before a recommendation on public health policy could be developed (IPSTEG 1999).
Screening for bladder and mouth cancer (Prorok et al. 1984), endometrial and ovarian cancer (Hakama et al. 1985; Miller et al. 1990), oesophagus and liver cancer (Chamberlain et al. 1986), and melanoma, neuroblastoma, and nasopharyngeal carcinoma (Miller et al. 1990) cannot be recommended as public health policy. In the majority of instances this is because of the absence of a valid screening test, but the issue for oral cancer and melanoma is the lack of documented effectiveness of screening, especially, in the case of oral cancer, from developing countries where the disease is sufficiently common to propose programmes based on inspection of the mouth by allied health professionals. The Quebec Neuroblastoma Screening Project indicated no benefit from screening for this disease (Bernstein et al. 1996). Indeed, good evidence was derived that the results of earlier studies from Japan suggesting good survival from screen-detected cases were in fact due to overdiagnosis of neuroblastoma in 6-month-old children.
Unlike cancer, cardiovascular disease can be controlled by both primary and secondary means. Considerable controversy has existed as to whether or not a population-based approach using health education and promotion strategies or a high-risk approach, based on early detection and treatment, will result in the greatest reduction in cardiovascular diseases (Rose 1992). Two main approaches have been proposed for reducing the burden of cardiovascular disease—screening for hypertension and screening for hypercholesterolaemia.
Hypertension screening involves a simple test that can easily be incorporated as part of a routine clinical examination. There is clear evidence that detection of hypertension and its management can effectively reduce the subsequent risk of coronary heart disease, congestive heart failure, stroke, and renal failure (MacMahon et al. 1986; Collins et al. 1990). The test is acceptable to the population although compliance with treatments can be difficult, particularly when side-effects are encountered. However, there are problems with the test, particularly with respect to its reliability and intraobserver variability. There is also controversy over the exact level of blood pressure at which hypertension worthy of treatment should be diagnosed, and what treatments should be used. In particular, there is concern as to what the appropriate level for initiation of pharmacological interventions (United States Preventive Services Task Force 1996b). There is also controversy with respect to the age at which to initiate hypertension screening and the appropriate interval. Most guidelines recommend screening to commence in early adulthood (about the age of 30) with an interval of 3 to 5 years. However, blood pressure checks have become a routine part of every clinical examination, and are usually done far more frequently.
Cholesterol screening is similar in that there are also controversies regarding the appropriate level at which to consider a test to be positive and as to the need to initiate treatment with pharmacological agents for mild to moderate elevations (United States Preventive Services Task Force 1996a). Assessment of cardiovascular risk must be multifactorial, including behavioural factors such as exercise and smoking, clinical measures such as hypertension and family history. Preventive strategies need to be customized for the individual based on the risk profile, and not simply on a single test such as cholesterol. In the United States an aggressive approach to screening for cholesterol was taken (‘know your number’), but other countries, such as the United Kingdom and Canada, have adopted more selective approaches.
Evaluation of screening prenatally requires applications of the general screening principles but does lead to some special considerations. As noted above, the disease condition is usually detected in the fetus, either directly, through a test such as ultrasound, or indirectly through a maternal marker such as maternal serum a-fetoprotein (Wald et al. 1997). Thus one difference from other screening tests is that the individual being tested is not necessarily the one affected with the disease. Secondly, treatment options are not usually available, although this is changing with the advent of fetal surgery. Thus ethical issues arise with the use of therapeutic abortion for the management of abnormalities detected.
Screening for infectious disease is an important public health strategy. Unlike most of the screening tests for chronic disease, screening for infectious disease often affords the opportunity for benefit to the population as a whole, rather than just an individual through reduction in risk of disease transmission. Thus there may be situations when screening for infectious disease is warranted, even when not all the criteria for screening are met. For example, even if there is no treatment for a disease, screening may be warranted if effective infection control mechanisms are available. Such screening is being done in many tertiary care facilities for ‘superbugs’ such as methicillin-resistant Streptococcus type A (Cookson 1997).
However, screening for infectious disease often leads to considerable controversy, particularly when quarantine is the suggested control mechanism. Calls for population-wide screening for HIV have often included such proposals. The criteria for screening presented in this chapter can be very useful in assessing such proposals, in this instance quarantine becomes the treatment for the condition that is identified through screening.
Genetic susceptibility testing
The completion of mapping of the human genome holds great promise for disease control (Wadman 1999). At the same time, the availability of a range of markers for disease susceptibility will lead to increasing controversies about the use of screening tests (Goel 2001). While the general principles of screening outlined above will still apply, they will need to be modified and updated. Screening will identify individuals at risk of disease, rather than those with precursors or early-stage disease. Thus, rather than diagnostic assessment and treatment strategies, preventive strategies will be required. Ideally, primary prevention strategies will be available, but for many conditions the preventive strategies will be the application of other screening tests, further complicating the evaluation of these strategies.
There are a number of fundamental issues that have to be resolved when considering disease control by screening. The general principles that govern the introduction of screening programmes include the following:
the disease should be an important health problem
the disease should have a detectable preclinical phase
the natural history of the lesions identified by screening should be known
there should be an effective treatment for such lesions
the screening test should be acceptable and safe.
The other issues range from ethics to economics. Critical issues include the population to be included in screening programmes and whether or not it is possible to introduce an organized screening programme. It cannot necessarily be assumed that a screening programme will benefit the population to which it is applied. Not only do ethics demand that only programmes with proven effectiveness be widely disseminated, it is also necessary to ensure that the programme is continually monitored to confirm that effectiveness is maintained. Furthermore, the benefits derived from the programme must be clearly shown to exceed the costs, both in terms of ill health induced by the test and accompanying procedures, and in economic terms.
Despite these caveats, screening carries the potential for a fairly rapid and important impact on mortality from disease, often exceeding what can currently be anticipated from other approaches to disease control. Hence there is continuing interest in, and expectation from, existing and potential programmes.
American Cancer Society (1980). Guidelines for the cancer-related check-up. Recommendations and rationale. CA: A Cancer Journal for Clinicians, 30, 193–240.
Anderson, G.H. (1985). Cervical cytology. In Screening for cancer (ed. A.B. Miller), pp. 87–103. Academic Press, Orlando, FL.
Atkin, W.S., Cusick, J., Northover, J.M.A., and Whymes, D.K. (1993). Prevention of colorectal cancer by once-only sigmoidoscopy. Lancet, 341, 736–40.
Baines, C.J., McFarlane, D.V., Miller, A.B., et al. (1988). Sensitivity and specificity for first screen mammography in 15 NBSS centres. Journal of the Canadian Association of Radiologists, 39, 273–6.
Baines, C.J., Miller, A.B., Bassett, A.A., et al. (1989). Physical Examination; evaluation of its role as a single screening modality in the Canadian National Breast Screening Study. Cancer, 63, 160–6.
Bassett, A.A. (1985). Physical examination of the breast and breast self-examination. In Screening for cancer (ed. A.B. Miller), pp. 271–91. Academic Press, Orlando, FL.
Beahrs, O.H., Shapiro, S., Smart, C., et al. (1979). Report of the working group to review the National Cancer Institute, American Cancer Society Breast Cancer Detection Demonstration Projects. Journal of the National Cancer Institute, 62, 640–709.
Bernstein, M.L. and Woods, W.G. (1996). Screening for neuroblastoma. In Advances in screening for cancer (ed. A.B. Miller), pp. 149–63. Kluwer Academic, Boston, MA.
Boyd, N.F., Wolfson, C., Moskowitz, M., et al. (1982). Observer variation in the interpretation of Xeromammograms. Journal of the National Cancer Institute, 68, 357–63.
Boyes, D.A., Morrison, B., Knox, E.G., et al. (1982). A cohort study of cervical cancer in British Columbia. Clinical and Investigative Medicine, 5, 1–29.
Brookmeyer, R., Day, N.E., and Moss, S. (1986). Case–control studies for estimation of the natural history of preclinical disease from screening data. Statistics and Medicine, 5, 127–138.
Chamberlain, J. (1986). Reasons that some screening programmes fail to control cervical cancer. In Screening for cancer of the uterine cervix (ed. M. Hakama, A.B. Miller, N.E. Day), pp. 161–8. IARC Scientific Publication 76. International Agency for Research on Cancer, Lyon.
Chamberlain, J. and Miller, A.B. (ed.) (1988). Screening for gastrointestinal cancer. Hans Huber, Toronto.
Chamberlain, J., Day, N.E., Hakama, M., et al. (1986). UICC workshop of the project on evaluation of screening programmes for gastrointestinal cancer. International Journal of Cancer, 37, 329–34.
Chirenje, Z.M. et al. for the University of Zimbabwe/JHPIEGO Cervical Cancer Project (1999). Visual inspection with acetic acid for cervical-cancer screening, test qualities in a primary-care setting. Lancet, 353, 869–73.
Chodak, G.W. and Schoenberg, H.W. (1989). Progress and problems in screening for carcinoma of the prostate. World Journal of Surgery, 13, 60–4.
Collins, R., Peto, R., MacMahon, S., et al. (1990). Blood pressure, stroke, and coronary heart disease. Part 2, short-term reductions in blood pressure, overview of randomised drug trials in their epidemiological context. Lancet, 335, 827–38.
Commission on Chronic Illness (1957). Chronic illness in the United States: prevention of chronic illness. Harvard University Press, Cambridge, MA.
Cookson, B. (1997). Is it time to stop searching for MRSA? Screening is still important. British Medical Journal, 314, 664–5.
Cuckle, H.S. and Wald, N.J. (1984). Principles of screening. In Antenatal and neonatal screening (ed. N.J.Wald) Oxford University Press.
Day, N.E. (1985). Estimating the sensitivity of a screening test. Journal of Epidemiology and Community Health, 39, 364–6.
Day, N.E., Baines, C.J., Chamberlain, J., et al. (1986). UICC project on screening for cancer, Report of the workshop on screening for breast cancer. International Journal of Cancer, 38, 303–8.
Day, N.E., Williams, D.R.R., and Khaw, K.T. (1989). Breast cancer screening programmes: the development of a monitoring and evaluation system. British Journal of Cancer, 59, 954–8.
Early Detection Branch (1987). Working guidelines for early cancer detection. Division of Cancer Prevention and Control, National Cancer Institute, Bethesda, MD.
Eddy, D.M., Hasselblad, V., McGivney, W., et al. (1988). The value of mammography screening in women under age 50 years. Journal of the American Medical Association, 259, 1512–19.
European Society for Mastology (1993). Report of the European Society for Mastology Breast Cancer Screening Evaluation Committee. Presented at Consensus Conference on Breast Cancer Screening.
Flehinger, B.J., Herbert, E., Winawer, S.J., et al. (1988). Screening for colorectal cancer with fecal occult blood test and sigmoidoscopy, preliminary report of the colon project of Memorial Sloan-Kettering cancer center and PMI-Strang clinic. In Screening for gastrointestinal cancer (ed. J. Chamberlain and A.B. Miller), pp. 9–16. Hans Huber, Toronto.
Gastrin, G., Miller, A.B., To, T., et al. (1994). Incidence and mortality from breast cancer in the Mama program for breast screening in Finland, 1973–1986. Cancer, 73, 2168–74.
Glanz, K. and Gilboy, M.B. (1995). Psychological impact of cholesterol screening and management. In Psychosocial effects of screening for disease prevention and detection (ed. R.T. Croyle), pp. 39–64. Oxford University Press, New York.
Goel, V., for Crossroads 99 Group (2001). Appraising organised screening programmes for testing for genetic susceptibility to cancer. British Medical Journal, 322, 1174–8.
Goel, V., Glazier, R., Summers, A., and Holsapfels, S. (1998). Psychological outcomes following maternal serum screening, a cohort study. Canadian Medical Association Journal, 159, 651–6.
Gohagan, J.K., Prorok, P.C., Kramer, B.S., et al. (1995). The Prostate, Lung, Colorectal, and Ovarian cancer screening trial of the National Cancer Institute. Cancer, 75, 1869–73.
Goin, J.E. and Haberman, J.D. (1982). Comments on the logistic function in ROC analysis, Applications to breast cancer detection. Methods of Information Medicine, 21, 26–30.
Hakama, M. (1982). Trends in the incidence of cervical cancer in the Nordic countries. In Trends in cancer incidence (ed. K. Magnus), pp. 279–92. Hemisphere, Washington, DC.
Hakama, M. (1984). Selective screening by risk groups. In Screening for cancer (ed. P.C. Prorok and A.B. Miller), pp. 71–9. UICC Technical Report Series, Vol. 78. International Union Against Cancer, Geneva.
Hakama, M. (1986). Cervical cancer, risk groups for screening. In Screening for cancer of the uterine cervix (ed. M. Hakama, A.B. Miller, and N.E. Day), pp. 213–16. IARC Scientific Publication 76. International Agency for Research on Cancer, Lyon.
Hakama, M., Chamberlain, J., Day, N.E., et al. (1985). Evaluation of screening programmes for gynaecological cancer. British Journal of Cancer, 52, 669–73.
Hardcastle, J.D., Chamberlain, J.O., Robinson, M.H., et al. (1996). Randomised controlled trial of faecal-occult-blood screening for colorectal cancer. Lancet, 348, 1472–7.
Harvey, B.J., Miller, A.B., Baines, C.J., and Corey, P.N. (1997). Effect of breast self-examination techniques on the risk of death from breast cancer. Canadian Medical Association Journal, 157, 1205–12.
Henschke, C.I., McCaulay, D.I., Yankelevitz, D.F., et al. (1999). Early lung cancer action project, overall design and findings from baseline screening. Lancet, 354, 99–105.
Holowaty, P., Miller, A.B., Rohan, T., and To, T. (1999). The natural history of dysplasia of the uterine cervix. Journal of the National Cancer Institute, 91, 252–8.
Howard, J. (1987). Using mammography for cancer control, an unrealized potential. CA: A Cancer Journal for Clinicians, 37, 33–48.
IARC Working Group on Cervical Cancer Screening (1986). Summary chapter. In Screening for cancer of the uterine cervix (ed. M. Hakama, A.B. Miller, and N.E. Day) pp. 133–42. IARC Scientific Publication 76. International Agency for Research on Cancer, Lyon.
IPSTEG (International Prostate Screening Trial Evaluation Group) (1999). Rationale for randomised trials of prostate cancer screening. European Journal of Cancer, 35, 262–71.
Jonsson, E., Hakansson, S., and Tabar, L. (1988). Cost of mammography screening for breast cancer. Experiences from Sweden. In Screening for breast cancer (ed. N.E. Day and A.B. Miller), pp. 113–15. Hans Huber, Toronto.
Kaluzny, A.D., Rimer, B., and Harris, R. (1994). The National Cancer Institute and guideline development: lessons from the breast cancer screening controversy. Journal of the National Cancer Institute, 86, 901–3.
Krahn, M.D., Mahoney, J.E., Eckman, M.H., et al. (1994). Screening for prostate cancer: a decision analytic view. Journal of the American Medical Association, 272, 781–6.
Krahn, M.D., Naglie, G., Ritvo, P., Irvine, J., and Trachtenberg, J. (1996). Patient and expert quality of life ratings in the construction of an empirically derived domain-linked utility instrument for prostate cancer. Medical Decision Making, 16, 470.
Kristiansen, I.S., Eggen, A.E., and Thelle, D.S. (1991). Cost effectiveness of incremental programmes for lowering serum cholesterol concentration, is individual intervention worth while? British Medical Journal, 302, 1119–22.
Kronborg, O., Fenger, C., Olsen, J., et al. (1996). Randomized study of screening for colorectal cancer with faecal-occult-blood test. Lancet, 348, 1467–71.
Litwin, M.S., Hays, R.D., Fink, A., Ganz, P.A., Leake, B., and Brook, R.H. (1998). The UCLA Prostate Cancer Index: development, reliability, and validity of a health-related quality of life measure. Medical Care, 36, 1002–12.
MacMahon, S.W., Cutler, J.A., Furberg, C.D., et al. (1986). The effects of drug treatment for hypertension on morbidity and mortality from cardiovascular disease, a review of randomized, controlled trials. Progress in Cardiovascular Disease, 29S, 99–118.
McLelland, R. and Pisano, E.D. (1992). The politics of mammography. Radiology Clinics of North America, 30, 235–41.
Mandel, J.S., Bond, J.H., Church, T.R., et al. (1993). Reducing mortality from colorectal cancer by screening for fecal occult blood. New England Journal of Medicine, 328, 1365–71.
Mandel, J.S., Church, T.R., Ederer, F., and Bond, J.H. (1999). Colorectal cancer mortality, Effectiveness of biennial screening for fecal occult blood. Journal of the National Cancer Institute, 91, 434–7.
Mettlin, C. and Smart, C.R. (1994). Breast cancer detection guidelines for women aged 40–49 years: rationale for the American Cancer Society reaffirmation of recommendations. CA: A Cancer Journal for Clinicians, 44, 248–55.
Mettlin, C., Jones, G., Averette, H., et al. (1993). Defining and updating the American Cancer Society guidelines for the cancer-related checkup, prostate and endometrial cancers. CA: A Cancer Journal for Clinicians, 43, 42–6.
Miller, A.B. (ed.) (1978). Screening in cancer. A report of the UICC International Workshop in Toronto. UICC Technical Report Series, Vol. 40. International Union Against Cancer, Geneva.
Miller, A.B. (1981). An evaluation of population screening for cervical cancer. In Advances in clinical cytology (ed. L.G. Koss and D.V. Coleman), pp. 64–94. Butterworths, London.
Miller, A.B. (1987). Early detection of breast cancer. In Breast diseases (ed. J.R. Harris, I.C. Henderson, S. Hellman, et al.), pp. 122–34. J.B. Lippincott, Philadelphia, PA.
Miller, A.B. (1991). Issues in screening for prostate cancer. In Cancer screening (ed. A.B. Miller et al.), pp. 289–93. Cambridge University Press,
Miller, A.B. (1994). Screening for cancer, Is it time for a paradigm shift? Annals of the Royal College of Physicians and Surgeons of Canada, 27, 353–5.
Miller, A.B. (1996). Fundamental issues in screening for cancer. In Cancer epidemiology and prevention (2nd edn) (ed. D. Schottenfeld and J.F. Fraumeni Jr), pp. 1433–52. Oxford University Press.
Miller, A.B. (2000). Effect of screening programme on mortality from breast cancer. Benefit of 30 per cent may be substantial overestimate. British Medical Journal, 321, 1527.
Miller, A.B. and Tsechkovski, M. (1987). Imaging technologies in breast cancer control, Summary report of a World Health Organization meeting. American Journal of Roentgenology, 148, 1093–4.
Miller, A.B., Howe, G.R., and Wall, C. (1981). The national study of breast cancer screening. Clinical and Investigative Medicine, 4, 227–58.
Miller, A.B., Chamberlain, J., and Tsechovski, M. (1985). Self-examination in the early detection of breast cancer. A review of the evidence, with recommendations for further research. Journal of Chronic Disease, 38, 527–40.
Miller, A.B., Chamberlain, J., Day, N.E., Hakama, M., and Prorok, P.C. (1990). Report on a workshop of the UICC project on evaluation of screening for cancer. International Journal of Cancer, 46, 761–9.
Miller, A.B., Anderson, G., Brisson, J., et al. (1991a). Report of a National Workshop on Screening for Cancer of the Cervix. Canadian Medical Association Journal, 145, 1301–25.
Miller, A.B., Baines, C.J., and Turnbull, C. (1991b) The role of the nurse-examiner in the National Breast Screening Study. Canadian Journal of Public Health, 82, 162–7.
Miller, A.B., Baines, C.J., To, T., et al. (1992). Canadian national breast screening study. 2: Breast cancer detection and death rates among women age 50–59 years. Canadian Medical Association Journal, 147, 1477–88.
Miller, A.B., Lindsay, J., and Hill, G.B. (1976). Mortality from cancer of the uterus in Canada and its relationship to screening for cancer of the cervix. International Journal of Cancer, 17, 602–12.
Miller, A.B., To, T., Baines, C.J., and Wall, C. (2000a). Canadian National Breast Screening Study-2: 13-year results of a randomized trial in women aged 50–59 years. Journal of the National Cancer Institute, 92, 1490–9.
Miller, A.B., Nazeer, S., Fonn, S., et al. (2000b). Report on consensus conference on cervical cancer screening and management. International Journal of Cancer, 86, 440–7.
Morrison, A.S. (1985). Screening in chronic disease, pp. 48–63. Oxford University Press.
Morrison, A.S., Brisson, J., and Khalid, N. (1988). Breast cancer incidence and mortality in the breast cancer detection demonstration project. Journal of the National Cancer Institute, 80, 1540–7.
Moskowitz, M., Pemmaraju, S., Fidler, J.A., et al. (1976). On the diagnosis of minimal breast cancer in a screenee population. Cancer, 37, 2543–52.
Moss, S.M. (1991). Case–control studies of screening. International Journal of Epidemiology, 20, 1–6.
Moss, S., Draper, G.J., Hardcastle, J.D., and Chamberlain, J. (1987). Calculation of sample size in trials of screening for early diagnosis of disease. International Journal of Epidemiology, 16, 104–10.
Muscat, J.E. and Huncharek, M.S. (1991). Breast self-examination and extent of disease: a population-based study. Cancer Detection and Prevention, 15, 155–9.
National Institutes of Health Consensus Development Panel (1997). Consensus statement. Monographs of the National Cancer Institute, 22, vii–xviii.
Newcomb, P.A., Weiss, N.S., Storer, B.E., et al. (1991). Breast self-examination in relation to occurrence of advanced breast cancer. Journal of the National Cancer Institute, 83, 260–5.
Newcomb, P.A., Norfleet, R.G., Storer, B.E., et al. (1992). Screening sigmoidoscopy and colorectal cancer mortality. Journal of the National Cancer Institute, 84, 1572–5.
Prorok, P.C., Chamberlain, J., Day, N.E., et al. (1984). UICC workshop on the evaluation of screening programmes for cancer. International Journal of Cancer, 34, 1–4.
Rose, G. (1992). The strategy of preventive medicine. Oxford University Press.
Sasco, A.J., Day, N.E., and Walter, S.D. (1986). Case–control studies for the evaluation of screening. Journal of Chronic Disease, 39, 399–405.
Schechter, M.T., Miller, A.B., Baines, C.J., et al. (1986). Selection of women at high risk of breast cancer for initial screening. Journal of Chronic Disease, 39, 253–60.
Selby, J.V., Friedman, G.D., and Collen, M.F. (1988). Sigmiodoscopy and mortality from colorectal cancer: the Kaiser Permanente multiphasic evaluation study. Journal of Clinical Epidemiology, 41, 427–34.
Selby, J., Friedman, G.C.D., Quesenberry, C.P., Jr et al. (1992). A case–control study of screening sigmoidoscopy and mortality from colorectal cancer. New England Journal of Medicine, 326, 653–7.
Selzer, M.L., Gomberg, E.S., and Nordhoff, J.A. (1979). Men and women’s responses to the Michigan Alcoholism Screening Test. Journal of Studies in Alcohol, 40, 502–4.
Semiglazov, V.F., Sagaidak, V.N., Moiseyenko, V.M., and Mikhailov, E.A. (1993). Study of the role of breast self-examination in the reduction of mortality from breast cancer. European Journal of Cancer, 29A, 2039–46.
Spielberger, C.D., Gorsuch, R.L., and Kushene, R.E. (1970). Manual for the State-Trait Anxiety Inventory. Consulting Psychologists Press, Palo Alto, CA.
Swets, J.A. (1979). ROC analysis applied to the evaluation of medical imaging technologies. Investigative Radiology, 14, 109–21.
Tabar, L., Fagerberg, C.J.G., Gad, A., et al. (1985). Reduction in mortality from breast cancer after mass screening with mammography: randomized trial from the breast cancer screening working group of the Swedish National Board of Health and Welfare. Lancet, i, 829–32.
Tabar, L., Fagerberg, G., Duffy, S.W., and Day, N.E. (1989). The Swedish two county trial of mammographic screening for breast cancer: recent results and calculation of benefit. Journal of Epidemiology of Community Health, 43, 107–14.
Task Force (1976). Cervical cancer screening programs. The Walton Report. Canadian Medical Association Journal, 114, 1003–33.
Task Force (1982). Cervical cancer screening programs. Summary of the 1982 Canadian task force report. Canadian Medical Association Journal, 127, 581–9.
Toronto Working Group on Cholesterol Policy (1990). Asymptomatic hypercholesterolemia: a clinical policy review. Journal of Clinical Epidemiology, 43, 1028–121.
Torrance, G.W., Feeny, D.H., Furlong, W.J., Barr, R.D., Zhang, Y., and Wang, Q. (1996). Multiattribute utility function for a comprehensive health status classification system. Health Utilities Index Mark 2. Medical Care, 34, 702–22.
United Kingdom Trial of Early Detection of Breast Cancer Group (1988). First results on mortality reduction in the United Kingdom Trial of Early Detection of Breast Cancer. Lancet, ii, 411–16.
United States Preventive Services Task Force (1996a). Screening for high blood cholesterol and other lipid abnormalities. In Guide to clinical preventive services 2, pp. 15–38. Williams & Wilkins, Baltimore, MD.
United States Preventive Services Task Force (1996b). Screening for hypertension. In Guide to clinical preventive services 2, pp. 39–51. Williams & Wilkins, Baltimore, MD.
United States Preventive Services Task Force (1996c) Screening for breast cancer. In Guide to clinical preventive services 2, pp. 73–87. Williams & Wilkins, Baltimore, MD.
Wadman, M. (1999). Human Genome Project aims to finish ‘working draft’ next year. Nature, 398, 177.
Wald, N.J., Kennard, A., Hackshaw, A., and McGuire, A. (1997). Antenatal screening for Down’s syndrome. Journal of Medical Screening, 4, 181–246.
Walter, S.D. and Day, N.E. (1983). Estimation of the duration of a pre-clinical disease state using screening data. American Journal of Epidemiology, 118, 865–86.
Walter, S.D. and Stitt, L.W. (1987). Evaluating the survival of cancer cases detected by screening. Statistics in Medicine, 6, 885–900.
Ware, J.E. and Sherbourne, C.D. (1992). The MOS 36-Item Short-Form Health Survey (SF-36). I: Conceptual framework and item selection. Medical Care, 30, 473–83.
Weiss, N.S. (1983). Control definition in case–control studies of the efficacy of screening and diagnostic testing. American Journal of Epidemiology, 116, 457–60.
Wilson, J.M.G. and Junger, G. (1968). Principles and practice of screening for disease. World Health Organization, Geneva.
Winawer, S.J., Flehinger, B.J., Schottenfeld, D., and Miller, D.G. (1993). Screening for colorectal cancer with fecal occult blood testing and sigmoidoscopy. Journal of the National Cancer Institute, 85, 1311–18.
Workshop Group (1989). Reducing deaths from breast cancer in Canada. Canadian Medical Association Journal, 141, 199–201.