1 Comment

6.6 Cohort studies

6.6 Cohort studies
Oxford Textbook of Public Health

6.6
Cohort studies

Manning Feinleib and Norman E. Breslow

Introduction
Design of cohort studies

Forms of cohort studies
Selection of the study cohorts

Objectives

Example 1—A historical cohort study of the relation between artificial menopause and breast cancer

Example 2—A prospective cohort study of the relation between cigarette smoking and mortality: the British Doctors Study (Breslow and Day 1987, Appendix IA)

Example 3—A prospective cohort study of risk factors for heart disease: the Framingham Heart Study
Gathering of baseline information

Objectives

Sources of baseline information

Example 4—Artificial menopause and breast cancer

Example 5–The British Doctors Study

Example 6—The Framingham Heart Study
Follow-up

Objectives

Example 7—Artificial menopause and breast cancer

Example 8—The British Doctors Study

Example 9—The Framingham Heart Study
Sampling from the cohort
Data analysis

Grouped data and person-years

Example 10—Artificial menopause and breast cancer

Example 11—Coronary heart disease and smoking among British doctors

Example 12—The Framingham Heart Study

The Cox proportional hazards model
Types of bias and their resolution

Selection bias

Follow-up bias

Information (misclassification) bias

Confounding bias

Post hoc bias

Resolution of bias
Summary
Chapter References

Introduction
The cohort study is an observational epidemiological study which, after the manner of an experiment, attempts to study the relationship between a purported cause (exposure) and the subsequent risk of developing disease. As in other observational epidemiological studies, and unlike experimental studies, the suspected causal factor or exposure is not randomly assigned to the study population. However, the cohort study follows the same time direction as an experiment in that the suspected exposure is identified as having or not having occurred in the study population before the occurrence of disease is investigated. Thus, certain biases that may occur in other forms of epidemiological studies can be avoided, specifically those concerned with ascertaining the exposure status of the population. Furthermore, because disease occurrence is identified subsequent to enumeration of exposure groups, this type of study allows direct estimation of the risk of developing disease and how risk varies with time since exposure.
Cohort studies have been given a variety of names including incidence studies, prospective studies, follow-up studies, longitudinal studies, and panel studies, although the latter two terms have more generally been applied to studies involving repeated measurements of the same variables over time. They are similar to the usual scientific experiment in that they proceed from the suspected cause or aetiological agent to the disease outcome with controls or comparison groups selected on the basis of absence of exposure to the putative cause. As a type of observational study, there is no randomization to exposure classes nor is there any attempt to manipulate the exposure. In contrast, case–control studies (also known as case–referent studies and, formerly, as retrospective studies) have no counterpart in experimental science since they work from the outcome event back towards the supposed aetiological factor. Indeed, case–control studies are often viewed conceptually in terms of sampling data from an ongoing (and possibly fictitious) cohort study.
Cohort studies offer the possibility of studying the full range of effects of the suspected aetiological factor. Frequently the suspected aetiological factor is not only related to the occurrence of the disease of primary interest, but may influence the natural history of the disease and may be related to a variety of other health conditions that may not have been suspected at first. A particularly important aspect of cohort studies is that they provide direct estimates of the risk of disease for each exposure group separately. These separate estimates of risk can then be used to estimate a variety of measures of interest to epidemiologists such as the attributable risk, the relative risk, and the aetiological fraction. (These measures of risk are discussed in the section on analysis below and in Table 3.) Although these risks can often be estimated from other types of studies when certain assumptions are made or ancillary information is available, cohort studies permit direct estimates of these measures from the data obtained in the study itself.

Table 3 Measures of association

The disadvantages of cohort studies are primarily logistic and administrative. Often, relatively large populations have to be followed for long periods of time, thus entailing considerable expense in terms of funding and professional resources. If the disease outcome of interest is rare, the sample sizes required for concurrent studies may be prohibitively large. If the follow-up period is long, which is often the case for chronic diseases, the problem of attrition of the study group due to loss from follow-up, migration, competing causes of death, or gradual deterioration of interest in participation may present serious analytical problems that might negate the value of the overall study. Longitudinal follow-up requires careful attention to maintaining standardized diagnostic methods and criteria. Finally, of course, the longer the study is continued, the more difficult it is to maintain a committed investigative team and stable funding for the project.
In the first part of this chapter we discuss the major methodological aspects of cohort studies: forms of cohort studies, selection of study cohorts, gathering of baseline information, follow-up, and analysis. To illustrate these points we use examples from three studies: a historical cohort study of artificial menopause and breast cancer using available hospital and death certificate information (Feinleib 1968), a prospective cohort study, using mail questionnaires, of cigarette smoking and mortality among British doctors (Doll and Peto 1976), and a prospective cohort study of heart disease in Framingham, Massachusetts, using periodic medical examinations (Kannel et al. 1961; Dawber et al. 1963). In the second part of the chapter we present the various types of bias that can confound interpretation of cohort studies and suggests ways of identifying, reducing, and/or resolving these biases. In this section examples are drawn from a wider range of studies.
Design of cohort studies
Forms of cohort studies
Cohort studies may take a variety of forms. The key distinction that has been established in the past is based primarily on the availability of data. In prospective (or observational) cohort studies, data on exposure status and disease outcome are not available at the outset of the study: they must be ascertained through the direct efforts of the investigator in the future. In ambispective cohort studies, data on exposure status have been collected in the past and are available from existing records while disease outcome is unknown or incompletely known the investigator is obliged to follow the cohort for subsequent occurrence of the disease. In historical (or non-concurrent) cohort studies, data on exposure status and disease outcome have been collected in the past and are available from existing records, the investigator’s efforts are devoted primarily to linking the relevant data files. The basic steps involved in each type of study include selecting the study and comparison groups, obtaining baseline information with regard to exposure and initial health status, and follow-up of the members of the cohort and surveillance for disease outcome.
Selection of the study cohorts
Objectives
There are two approaches to the selection of representative samples of exposed and non-exposed groups to be followed in a cohort study.

1.
The identification of a special exposure group defined because of (i) unusual exposure to a suspected causative (aetiological factor), or (ii) unusual lifestyle or work experience.

2.
Using a general population sample in which there is heterogeneity of exposure to the suspected aetiological factor.
Where the study group is a special exposure group, it is necessary to find appropriate comparison groups or the means to make comparisons with the general population. When the general population sample is used as a starting point, the various levels of exposure within the study group provide the basis for internal comparisons. Each approach also takes into consideration various logistic constraints, for example accessibility and co-operativeness of the study groups, availability of medical and other records, and anticipated completeness and cost of endpoint surveillance.
Example 1—A historical cohort study of the relation between artificial menopause and breast cancer
Seven case–control studies performed between 1926 and 1962 all reported that artificial menopause (surgical removal of the uterus and/or ovaries) occurred significantly less frequently among breast cancer patients than among a variety of controls. Because the case–control studies did not present information about the extent of surgery (the effect of removal of only the uterus versus removal of the ovaries) or the effects of the age at which the artificial menopause occurred, it was decided to investigate these issues by means of a cohort study. The disadvantage of using a prospective cohort method in elucidating the relation between artificial menopause and breast cancer is that there is a long interval between the gynaecological procedure and the appearance of the disease in appreciable frequency. To reduce this delay it was decided to use the historical cohort approach. The cohorts were selected from the records of two technical hospitals in the Boston area. The study cohorts included all eligible patients seen at these hospitals from 1920 to 1940. Women aged 55 years or younger were eligible for inclusion in the study if they had undergone any of the following procedures as determined from surgical and pathological records: (i) hysterectomy; (ii) unilateral oophorectomy; (iii) bilateral oophorectomy; (iv) radium or X-ray treatment of the ovaries or uterus; (v) cholecystectomy. The last group served as a control cohort.
Certain patients were excluded from the study: (i) women who had a prior mastectomy or a prior breast malignancy or who had undergone castration as part of the treatment for an existing breast tumour; (ii) women treated for pelvic malignancies; (iii) women who had previous removal of their ovaries or a history of natural menopause before the age of 40 years; (iv) women who did not survive their index admission; (v) all who were not residents of Massachusetts at the time of their index procedure. At the final editing of the study abstract forms and the elimination of duplicate records, there were 8387 patients in the study populations. They were subdivided into four ‘exposure’ categories.

1.
Natural menopause—1479 women (including 953 women who underwent cholecystectomy and 526 women who were postmenopausal at the time of the gynaecological procedures for benign conditions).

2.
Hysterectomy and bilateral oophorectomy—3241 women (this constitutes the surgically castrated group who were believed to have no residual ovarian activity).

3.
Those undergoing hysterectomy and/or unilateral oophorectomy who, as far as could be ascertained from the surgical and pathological records, retained at least one intact ovary—2149 women (referred to as the ‘partial surgery’ group).

4.
Radiation-induced artificial menopause—1518 women.
The partial surgery group constituted a second control cohort and ‘sham operations’ with which to contrast the women subjected to hysterectomy and bilateral oophorectomy.
It should be noted that it is not possible to relate the actual cohort studies to a clearly definable population. Although in this case adequate records were available for virtually every woman admitted to these hospitals who was eligible for the study, it is not known from what source population these women came. However, it is assumed that the reasons for coming to these particular hospitals were not correlated with both the type of procedure and the subsequent risk of developing breast cancer, i.e. they were not confounding factors (see section on biases below).
Example 2—A prospective cohort study of the relation between cigarette smoking and mortality: the British Doctors Study (Breslow and Day 1987, Appendix IA)
By 1950 several case–control studies had been published and were in agreement in showing that a larger proportion of lung cancer patients had been heavy cigarette smokers and a smaller proportion had been non-smokers than patients with other diseases. Because of the possibility of a variety of biases in these case–control studies, a prospective study was launched in 1951 among the members of the medical profession in the United Kingdom. This group was chosen because it was felt that physicians would respond to mailed questionnaires, would report their smoking histories accurately, and could be followed economically through the death records of the Registrars-General and through the registries of the General Medical Council and the British Medical Association. It was felt that the relation of smoking to health among physicians would be similar to that in the general population. A simple questionnaire was mailed out on 31 October 1951 to 59 600 men and women on the Medical Register.
The replies received from 40 637 doctors (34 445 from men and 6192 from women) were sufficiently complete to be used. From a one-in-ten random sample of the register, it was estimated that this represented answers from 69 per cent of the men and 60 per cent of the women alive at the time of the inquiry. The degree of self-selection in those who replied was assessed in terms of the overall mortality using this one-in-ten sample. The standardized death rate of those who replied was only 63 per cent of the death rate for all doctors in the second year of the inquiry and 85 per cent in the third year. In the fourth to tenth years the proportion varied about an average of 93 per cent and there was no evidence of any regular change with the further passage of years. Evidently the effect of selection did not entirely wear off, but after the third year it had become slight.
Example 3—A prospective cohort study of risk factors for heart disease: the Framingham Heart Study
The Framingham Heart Study is a long-term follow-up study of a sample of adults who lived in the town of Framingham, Massachusetts, in 1950. The first participants were actually examined in 1948 as part of an effort to conduct a demonstration programme in the detection and natural history of cardiovascular diseases. In 1950, however, the study was reconstituted as a long-term epidemiological investigation of coronary heart diseases, and the original voluntary participants were incorporated into a random sample drawn from all adults aged 30 to 60 years living in the town. Of the eligible random sample of 6507 persons, 4469 (68.7 per cent) participated in the examinations. When this number was supplemented with the volunteers, a total cohort of 5209 was obtained. The possible effects of supplementing the cohort to replace the originally selected participants who refused to participate in the reconstituted study are discussed in Example 9 below.
Although it was recognized from the outset that the town of Framingham could be considered neither a random nor a completely representative sample of the United States, the town did have certain characteristics that made it extremely suitable for a long-term epidemiological study. The population of the town was of adequate size (28 000) to provide enough individuals in the desired age range. It was sufficiently compact that the study population could be observed conveniently by means of an examination at a single examining facility, and most of the residents received their hospital care at a single central hospital in the town. Owing in part to a relatively stable economy supported by a diversity of employment opportunities, the population was relatively stable so as to enable adequate follow-up for a long period of time. Both the general community and the medical profession of the town were felt to be co-operative. The town was not believed to be ‘grossly atypical in any respect that appeared relevant’.
Since only 68.7 per cent of the eligible random sample participated in the 1950 examinations, it is possible that they might not be representative of the total population. This is a serious concern in all epidemiological studies where participation is voluntary and may be subject to self-selection. In this study it was felt that reasons for not participating were not appreciably related simultaneously to both the characteristics to be studied in the investigation and the risk of developing heart disease (see the section on biases below).
Gathering of baseline information
Objectives
There are multiple objectives to be achieved in gathering baseline information.

1.
Valid assessment of the exposure status of the members of the cohort groups.

2.
Define the individuals ‘at risk’; exclude those individuals with known disease at baseline.

3.
Establish a basis for follow-up: obtain identifying data, informed consent, commitment to co-operating in the follow-up (e.g. permission to contact family members and physicians, and to obtain hospital and employment records).

4.
Obtain data on important covariates (i.e. other exposures that may be associated with the risk of acquiring the disease) so that adjustments can be made for their contribution to the incidence of disease in analysis (see section on confounding variables below).
Sources of baseline information
Existing records
Baseline information about the cohorts can be obtained from a variety of sources such as available records from hospitals or employment records, interviews of the cohort members or other informants, direct medical and other special examinations, and indirect measures of exposure estimated from investigations of the environment. The availability of written records such as medical or employment records may provide useful information to select and define the cohort. If high-quality records are available, they may permit the study to begin from the point of the recording of the information, thereby adding a considerable period of follow-up time before the actual initiation of the investigation. Studies based on such records with follow-up of patients from such a prior point in time to the present have been given special names such as retrospective cohort studies, non-concurrent cohort studies, and historical prospective studies. There are several other advantages for using previously recorded information. The data are apt to be free from certain biases since they are recorded before any knowledge of the particular study for which they are used. Written records may provide information that is not fully known to the subject, such as details on medical conditions or actual levels of exposure. However, such records may also have certain drawbacks. Records may not be uniformly available for all cohort members. Even when available, the detail and quality of the data in the records are not controllable by the investigator and it is difficult to verify the accuracy of questionable items.
Interviews
One of the more common methods of obtaining information is to interview the cohort members or other informants. A variety of techniques can be used: direct personal interviews, mailed questionnaires, telephone interviews, having the subject complete a questionnaire administered by computer, and using a tape recorder and headphones for asking intimate questions in a crowded setting. When approaches are made to individual cohort members, there are varying rates of response to requests to participate in the study. A wide variety of cohort studies has reported response rates of approximately 65 to 75 per cent for direct interviews. Mail questionnaires, depending on the length of the questionnaire and motivations of the group, often have appreciably lower rates of response. The advantages of interviewing the cohort members include the ability to obtain information on a wide variety of topics. Interviews can provide data on attitudes and permit quite complex questions to be asked with the possibility of probing to ensure accurate recording of responses (such as eliciting histories about diet, exercise, or measures of stress). However, interview data may not always be reliable because the subject may fail to recall information or may not be aware of his or her own habits or history. There is also the possibility that the information may be biased by the subject’s knowledge of the aims of the investigation.
Examinations
Medical and other special examinations are necessary to obtain information of which the subject cannot be expected to be aware. Direct examination is often necessitated by the nature of the aetiological factor to be investigated and may be the only way to obtain biologically meaningful information. Subjects often appreciate the availability of an examination, and this may enhance the response rate to certain types of investigations. However, special examinations are usually expensive and require attention to standardization of procedures, training of appropriate observers or laboratory personnel, and quality control across observers and over time. It has also been reported that response rates to medical examinations tend to be biased towards subjects who are relatively free from disease. Direct examination can also be used to validate information obtained from interviews. For example, testing for urinary thiocyanate has been a useful adjunct to smoking studies.
Measure of environment
The fourth type of baseline information is that obtained for each of the groups as a whole, particularly when one is dealing with special exposure groups. Thus it might be appropriate to measure air pollution, exposure to radiation or other toxicological substances, or exposures on the job for an entire group of workers and to apply this measure to each of the individuals in the group. Although this type of information is usually quite useful, particularly when individual measures of exposure cannot be obtained directly, one should be aware that it essentially constitutes ‘ecological data’, i.e. the measurement of a mean or modal value for a group, which may conceal individual variability within the group.
Example 4—Artificial menopause and breast cancer
All the baseline information for this investigation was obtained from the available surgical and pathological records already filed in the record rooms of two hospitals between 1920 and 1940. The data were felt to be adequate for providing a valid assessment of the exposure status of the members of the cohorts in terms of whether or not they had received the indicated operation. Furthermore, as indicated above, those individuals with known disease could be identified from the available records. In part, the high quality of the records was due to the fact that the hospitals chosen were teaching hospitals for a major medical school, and the records were generally filled out by medical students and interns who provided careful and detailed histories. However, if there was no mention of existing or pre-existing breast cancer, there was no means of confirming this independently of the available records. Likewise, the existence of breast cancer was based solely on the report of the patient to the interviewing physician. Covariates that were available from the hospital records were the age at the time of the index procedure and the parity of the women. Other covariates of possible interest, since they could have been related to both the risk of cancer and the risk of gynaecological procedures, were not available from the records. These included body weight, history of breast feeding, and exposure to diagnostic X-rays.
Example 5–The British Doctors Study
The initial mail questionnaire was intentionally kept short and simple to encourage a high proportion of replies. The doctors were asked to classify themselves into one of three groups: (i) whether they were, at that time, smoking; (ii) whether they had smoked but had given it up; (iii) whether they had never smoked regularly (i.e. had never smoked as much as one cigarette a day, or its equivalent in pipe tobacco, for as long as 1 year). Present smokers and ex-smokers were asked additional questions. The former were asked the age at which they had started smoking, the amount of tobacco that they were currently smoking, and the method by which it was consumed. The ex-smokers were asked similar questions, but relating to the time just before they had given up smoking.
In a covering letter, the doctors were invited to give any information on their smoking habits or history that might be of interest, but, apart from that, no information was sought on previous changes in habit (other than the amount smoked prior to last giving up, if smoking had been abandoned). The decision to restrict question on amount smoked to current smoking habits was based mainly on the results of [an] earlier case–control study … [which showed] that the classification of smokers according to the amount that they had most recently smoked gave almost as sharp a differentiation between the groups of patients with and without lung cancer as the use of smoking histories over many years—theoretically more relevant statistics, but clearly based on less accurate data. (Breslow and Day 1987, Appendix IA)
Example 6—The Framingham Heart Study
On the basis of an initial examination and detailed interview, the sample was characterized according to a variety of ‘risk factors’: blood cholesterol, blood pressure, cigarette smoking status, body mass index, and the presence of a variety of other diseases and conditions. Careful attention was given to standardization of the examination procedures and the structure of the interview.
On the basis of a medical history and examination, an electrocardiogram, and other medical tests, it was found that 82 individuals in the base cohort of 5209 had a cardiovascular event before the baseline examination. Thus the cohort of individuals ‘at risk’ for the key cardiovascular endpoint of coronary heart disease numbered 5127.
To establish the basis for follow-up, each of the subjects was advised at the initial interview that it was intended to re-examine him [or her] at two intervals, and that he [or she] would be approached directly at the appropriate time. The names of a relative, a friend, and the family physician were all recorded so that the subject would be traced in case he [or she] moved during the interval. An abstract of the initial examination was sent to the family physician and the subject was advised by letter as to whether the physician should be consulted or not. The objective of this procedure was to provide some tangible benefit to the subject other than the knowledge of his [or her] contribution to medical science. At the same time, care was taken not to become involved in the medical management of the subjects and to avoid interfering in any way with the relationship between the subject and his [or her] physician. This helped to maintain rapport, not only with the subjects themselves, but with the medical community as well. (Dawber et al. 1963)
Follow-up
Objectives
There are multiple objectives to be achieved in follow-up:

uniform and complete follow-up of all cohort groups

complete ascertainment of outcome events

standardized diagnosis of outcome events.
One of the key criteria by which the quality of a longitudinal incidence study can be judged is the extent to which the investigator achieves complete ascertainment of outcome events in all exposure classes. Although a variety of methods are available for follow-up, it is desirable that the follow-up methods be independent of the method used to classify the exposure category in order to ensure uniform ascertainment across all subgroups. Methods of follow-up include correspondence with the subject and other informants, periodic re-examination of the subjects, and indirect surveillance of hospital records and death certificates. (Some countries such as the United Kingdom, the United States, and some Scandinavian countries maintain central death registers which facilitate efficient and complete mortality follow-up.) The duration of follow-up will be governed primarily by the natural history of the disease and the length of the incubation period between exposure and the onset of illness. It is important that the criteria for diagnosis of endpoints be standardized early in the follow-up period. Although criteria for the endpoints may change in the clinical community during the study, it is important that some criteria remain stable over time so that the incidence of cases occurring early in the period of follow-up can be compared with similar cases occurring later on in the observational period. Attention should be paid to criteria to verify the absence as well as the presence of the study endpoints (i.e. to minimize both false-positive and false-negative diagnoses).
Unequal loss of follow-up across different exposure categories presents serious problems in the analysis, and every method possible should be used to ensure uniform surveillance of each group. Because of the possibility of ascertainment bias resulting from knowledge of the exposure class, it is often desirable to have objective endpoint criteria which can be measured by ‘blinded’ observers. Information used in these criteria should be sought with equal diligence in all exposure classes. This is particularly important when the exposure class is defined by a variable that may lead to different degrees of medical observation, particularly medical examinations that are not under the direct control of the study investigators. For example, if in a study of cardiovascular diseases there is a tendency for participants with high cholesterol levels to receive more frequent electrocardiograms or other examinations by cardiologists, there may be a tendency to diagnose more cardiovascular events, particularly milder events, in this group than in the group with low cholesterol levels. Repeat examination of the subjects, besides providing standardized information on the illnesses under investigation, can often yield additional information about covariates that may be of importance and also allows studies of longitudinal changes in the exposure status.
Example 7—Artificial menopause and breast cancer
All patients in the study were followed from their index admission to 1 December 1961 so that the potential period of observation ranged from 21 to 42 years. The follow-up information was obtained from three sources, of which the first was the hospital records. All information relating to a given patient from any and all admissions to either hospital in the study was located and the data for each patient were then combined into a single record. The second source was the death certificates registered at the Massachusetts Division of Vital Statistics from 1 January 1920 to 31 December 1961. Alphabetical listings of the names of the study patients were compared with those in the index of vital records. Whenever a possible match was obtained, the death certificate was located and the information on the certificate and the identifying data obtained from the hospital chart were compared according to a prescribed set of criteria designed to minimize false matches. Therefore there may have been increased risk of discarding acceptable matches owing to some discrepancies in the available identifying information. All conditions mentioned on the death certificates were coded according to a uniform system. In addition, the underlying cause of death was coded according to the revision of the International Classification of Diseases in use at the State Division of Vital Statistics at the time of the patient’s death. Thus direct comparison could be made with published mortality statistics. The third source of follow-up information was the Massachusetts Tumor Registry, a unit of the Bureau of Chronic Disease Control of the Massachusetts Department of Public Health. Since 1927 this registry had recorded all patients diagnosed with, or treated for, malignancies at State or State-aided cancer clinics. Possible matches were obtained according to rules similar to the criteria for death certificate matching. With regard to mortality follow-up, the assumption was made that all patients dying during the study period should be registered at the Division of Vital Statistics. If no death certificate was located, one of three situations may have occurred: (i) the patient was still alive; (ii) before death she had emigrated from the State and was not a resident of Massachusetts at the time of death; (iii) she had died, but no record could be located because of reporting or matching errors (mis-spellings, changes of name, failure to file a death certificate, etc.). From the three sources of information the status of 19 per cent of the women was known as of January 1962. It was noted that those receiving pelvic radiation had slightly more complete follow-up to death than those surgically treated (20 versus 18.9 per cent). This difference was statistically significant but there was no significant difference in completeness of follow-up among the surgically treated groups. The relative success of the follow-up procedure was estimated by comparing the percentages of those in the cohorts known to have died before 1962 with those expected on the basis of published mortality rates and estimated migration rates. It was estimated that the observed deaths would comprise 72.8 per cent of the expected number after allowance for migration.
With the advent of automated data files in hospitals and the creation of national automated databases, including central death registries, follow-up of cohorts such as these should become easier and more complete. Although, as in this study, only a small proportion of the original cohort may be known to have died, if one is confident that those known are nearly all of those who had died and there is no bias for better ascertainment of deaths in one group compared with the other, the results should be valid for mortality endpoints. The British Doctors Study (Example 8) is an illustration of the use of multiple questionnaires, linkage to other files (physician registries), and other forms of contact to ensure that complete follow-up has been attained. It should be noted that the artificial menopause study, using historical records, took less than 3 years to complete, whereas the next two examples of prospective studies took several decades to achieve similar follow-up.
Example 8—The British Doctors Study
The following quotations are from Breslow and Day (1987, Appendix IA).
During the study, further questionnaires were sent out on three separate occasions to men and on two occasions to women. The purpose was partly to obtain detailed information on smoking habits, in particular giving up smoking, and also to ask additional questions, the relevance of which had emerged during the period of follow-up. Degree of inhalation was asked in these questionnaires, and the use of filter-tipped or plain cigarettes asked in the last questionnaire.
Information about the death of doctors was obtained at first directly from the Registrars-General of the United Kingdom, who provided particulars of every death as referring to a medical practitioner. Later, lists of deaths were obtained from the General Medical Council, and these were complemented by reference to the records of the British Medical Association and other sources at home and abroad. Some deaths came to light in response to the questionnaires. Others were discovered in the course of following up doctors who had not replied to or who had not been sent subsequent questionnaires. Of the 34 440 men studies, 10 072 were known to have died before 1 November 1971, 24 265 were known to have been alive at that date, and 103 (0.3 per cent) were not yet traced.
Many of the 103 untraced doctors were not British, and 67 (65 per cent) were known to have gone abroad. It was felt unlikely that more than about a dozen deaths relevant to the study could have been missed.
Information on the underlying cause of death in the 10 072 doctors known to have died before 1 November 1971 was obtained for the vast majority from the official death certificates. Except for deaths for which lung cancer was mentioned, the certified cause was accepted and (unless otherwise stated) the deaths classified according to the underlying cause. (In only four cases was no evidence of the cause obtainable.) The underlying causes were classified according to the seventh revision of the International Classification of Disease . . . except that a separate category of ‘pulmonary heart disease’ was created.
Cancer of the lung, including trachea or pleura, was given as the underlying cause of 467 deaths and as a contributory cause in a further 20. For each of the 487 deaths, confirmation of the diagnosis was sought from the doctor who had certified the death and, when necessary, from the consultant to whom the patient had been referred. Information about the nature of the evidence was thus obtained in all but two cases. Doubtful reports were interpreted by an outside consultant, with no knowledge of the patient’s smoking history. As a result, carcinoma of the lung was accepted as the underlying cause of 441 deaths and as a contributory cause of 17.
Example 9—The Framingham Heart Study
The key method of follow-up in the Framingham Heart Study was through repeated medical examinations on a 2-year cycle. The greatest loss due to drop-out occurred between the first and second examinations, and those who came in most reluctantly for the initial examination (i.e. towards the end of the recruitment period) seemed to have the highest drop-out rate during the next 30 years. During the first 14 years of follow-up, more than 85 per cent of the participants who were still alive at any examination cycle came in for their examinations. During the subsequent 12 years the examination rates fell to about 80 per cent of the surviving cohort. The chief reasons for non-examination were believed to be the increasing numbers of people who were physically incapacitated or had migrated from the Framingham area.
Indirect follow-up through secondary sources of information was also pursued. The Framingham Union Hospital, the major source of hospital care for the Framingham community, identified each of the Framingham Heart Study participants and notified the study staff of admissions of participants to the hospital. This is particularly important for allowing standardized examination of stroke cases while symptoms of the disease are still present. Mortality follow-up was maintained through regular perusal of vital records at the Town Registrar and following up of obituary notices in newspapers. Mortality follow-up after 30 years was virtually complete with the vital status of less than 2 per cent of the cohort being unknown.
The criteria for diagnosis of cardiovascular and other endpoints investigated in the Framingham Heart Study have been precisely defined, and the utility of the various sources of information in providing diagnostic information according to the study criteria has been investigated. Throughout the follow-up period the core criteria for the major cardiovascular endpoints have remained fixed and all potential cases are reviewed by a panel of trained medical reviewers.
It should be noted that the rate of disease occurrence in this cohort might have been altered by the subjects’ continued participation in the biennial series of examinations. Although no direct advice or treatment was offered to the participants, they were informed through their physicians of abnormal findings such as high blood pressure. If effective preventive measures were instituted in such subjects, then rates of overt cardiovascular diseases would be lowered and would interfere with estimating the ‘true’ effects of the risk factors. It was felt that during the early period of the study such treatment was not widely offered in this population.
Sampling from the cohort
Substantial reductions in the cost of data collection can often be achieved with little loss of statistical power by limiting the collection of detailed exposure information to judiciously chosen subjects. Such designs are used increasingly in situations, such as studies of HIV, where biological specimens or other information sources have been collected prospectively and preserved for all participants. Conducting biological assays or otherwise processing these data sources for the entire cohort would be prohibitively expensive. Using Prentice’s (1986) case–cohort design, the detailed exposure information is ascertained initially for a randomly sampled subcohort that may compromise only a small fraction of the whole. As disease cases occur in other cohort members, detailed exposure information is collected for them also. The major assumption made by this design is that the biological assays, or other detailed exposure measurements, yield the same results on material stored for the cases as they would have yielded had those cases been selected initially as part of the subcohort. An alternative design, known as the nested case–control study, avoids this assumption by sampling a small number of controls for each case at the time of its occurrence and by processing the detailed exposure information for the case and matched controls in the same batch. It is described further in the sequel (see section on the Cox proportional hazards model below). Further substantial improvements in design efficiency are possible by stratifying the subcohort or control samples on the basis of information available for all cohort members.
Data analysis
Grouped data and person-years
If a cohort study has been appropriately designed according to the principles given above, the analysis of the results is relatively straightforward. The first step is to estimate the incidence of the disease of interest for the cohort as a whole and, if the study was designed to make internal comparisons, in the ‘exposed’ and ‘non-exposed’ subgroups. If the follow-up period is relatively short and there is little or no loss to follow-up due to death from other conditions, a simple estimate of risk is easily calculated as the number of new (incident) cases diagnosed during the study period divided by the total population at risk at the beginning of the period. Persons who already have the disease at the outset of the study (prevalent cases) are eliminated from the population at risk. For studies of longer duration, however, the risk of disease may change over the course of the study and there may be appreciable losses from the population at risk due to death from other causes, loss from follow-up, or the occurrence of the illness of interest itself. Then it is advantageous to divide the study period into a number of intervals (Fig. 1) and to estimate the incidence rate of disease as outlined in Table 1 and Table 2.

Fig. 1 Division of the study period into J time intervals.

Table 1 Notation for cohort analysis

Table 2 Measures of incidence and risk

Disease risk refers to the probability of developing the disease during the study period (or some subinterval). As a probability it is a dimensionless quantity that must range in value between zero and unity. The incidence rate, however, is a measure of the frequency of the occurrence of disease per unit time relative to the size of the population at risk. Crude incidence is the ratio of the disease risk during a time interval to the length of the interval. Instantaneous incidence, also known as the hazard rate of force or morbidity, measures the rate of diagnosis of new cases per unit time relative to the size of the disease-free population at risk at time t. The units for incidence rates are t1 and they have no upper limit quantitatively. Owing to limitations of the available data, it is not possible to estimate precisely the incidence rate at each time t. Instead, estimates are made of the average rates over the study period or over each subinterval by dividing the number of new cases diagnosed in the interval by the total person-years of observation time accumulated during the interval (Table 2, eqns (1) and (2)). Accurate estimation of the person-years denominators requires, for each individual in the study, knowledge of the exact duration of follow-up from the start of the study until diagnosis of the disease of interest, death from a competing cause, or loss from further observation. The contributions from each individual at risk during the jth interval are summed to yield the totals Tj shown in Table 1. If such data are not available, various methods can be used to approximate the person-years of observation. For example, the estimated size of the population at the midpoint of the interval can be multiplied by the interval length.
Another useful measure of disease occurrence, known as the cumulative incidence rate, is obtained by summing the products of incidence rate and interval length over a series of intervals (Table 2, eqns (3) and (4)). The cumulative incidence rate over a specified interval, which is a dimensionless quantity with no upper limit, is related via the exponential function to the disease risk over the interval (eqn (5)). If the disease is rare or the study period is short (so that the cumulative risk is no more than 5 per cent), cumulative incidence and risk are nearly equal. Both can be estimated non-parametrically as a function of time t by choosing the intervals to be so fine that the interval endpoints occur exactly at the times of disease diagnosis (eqns (6) and (7)). Plots of cumulative incidence over time provide a powerful graphic tool for examining the evolution of disease risk in the exposed and non-exposed subgroups. As an example, Fig. 2 shows that breast cancer incidence in a cohort of women treated with radiation for postpartum mastitis paralleled the incidence in the control population until some 16 to 20 years after treatment, but then increased to substantially higher levels.

Fig. 2 Cumulative breast cancer morbidity curves for women treated with X-rays for postpartum mastitis (D) and a control group (m), adjusted to the age distribution of the control group. (Source: Shore et al. 1977.)

Although the preceding definitions used time on study as the basic time-scale for estimation of instantaneous incidence, other choices may be more appropriate in some circumstances. The possibilities include age, calendar time, and, for studies where exposure starts before entry into the study, time since initial exposure. With these other time-scales the population at risk changes due not only to the loss from observation of subjects who die or develop the disease of interest, but also to the entry into the cohort of other subjects depending, for example, on their age or the calendar year at the time that they join the study. All the definitions and formulae continue to apply with these alternative time-scales. More advanced statistical analyses often consider several time-scales simultaneously, using a multidimensional classification of incident cases and person-years denominators according to age, calendar year, time on study, and other fixed and time-varying factors.
Cohort studies also facilitate the estimation of various measures of association between the exposure of interest and the occurrence of disease. The standardized morbidity ratio (SMR) is frequently used in occupational cohort studies to estimate the ratio of cohort rates to standard rates obtained from national health statistics registers or other standard sources. As shown in eqn (8) (Table 3), the SMR is simply the ratio of the number of cases of disease observed to the number of cases expected from the standard rates as applied to the age/time-specific person-years of observation. Dose–response trends may be evident from SMRs that are estimated separately for subcohorts defined by levels of cumulative exposure (Table 4). However, doubts about the comparability of the cohort and the standard population, coupled with the fact that the ratio of SMRs for two or more subcohorts may not adequately summarize the ratios of age/time-specific rates, have led many investigators to discard the SMR in favour of measures of association that do not depend on external rates. The Mantel–Haenszel rate ratio (Table 3, eqn (11)) summarizes the ratios of the age/time-specific rates for the exposed versus the non-exposed members of the cohort. It is closely related to the Mantel–Haenszel relative risk measure that is widely used to summarize tables of exposure/disease odds ratios in case–control studies (see Chapter 6.5). This is the preferred measure of association when, as is often the case, the rate ratios are relatively constant over time but the rate differences are not. The cumulative risk difference (Table 3, eqn (12)), which is also known as the attributable risk or excess risk, provides an absolute measure of the effect of exposure, which is useful for public health workers.

Table 4 Lung cancer mortality by cumulative radiation exposure among Canadian fluorspar miners

Data from cohort studies can also be used to measure the potential impact of the removal of a suspected aetiological factor. This is measured either in terms of the estimated effect of removal on disease incidence or on the cumulative risk over the study period. The most direct measure of potential impact is known as the aetiological fraction, defined here using risk (Table 3, eqn (13)) rather than incidence. It represents the proportion of all new cases of disease that can be considered to be due to the exposure and therefore that are potentially preventable if the exposure were to be completely removed. Equation (14) shows how it may be represented using two parameters: the proportion PE of the total population with the exposure, and the risk ratio RRE. Although useful for studies of short duration involving a single risk factor, serious conceptual difficulties arise when attempts are made to extend the definition of aetiological fraction for use with multiple interacting risk factors or in situations where a long study period is needed in order to ascertain the temporal aspects of the exposure–disease association.
Example 10—Artificial menopause and breast cancer
Some of the results of this study are shown in Table 5. For the four ‘exposure’ categories of women who were less than 40 years old at the time of admission into the cohort, 37 cases of breast cancer were discovered to have occurred during the follow-up period. Several difficulties in applying the usual estimates of incidence and risk are readily apparent.

Table 5 Breast cancer in patients with and without artificial menopause

1.
Because of migration and incompleteness of follow-up, the observed cases are known to be an undercount.

2.
The time of onset for each malignancy was not usually known (those ascertained from death certificates did not usually state age at onset).

3.
The duration of follow-up for most of the women was not precisely known.
However, by making several assumptions it is possible to obtain some reasonable estimates of the association of breast cancer occurrence with the extent of pelvic surgery. The basic assumption is that whatever inadequacies there were in the follow-up procedures, they occurred uniformly in each of the exposure groups (i.e. the women with natural menopause were no more likely to have migrated than those with surgical menopause), and therefore any cases of breast cancer were equally likely to be ascertained in each group. Another problem is that the frequency with which the various procedures were performed varied considerably during the 21 years of potential admission to the study (for example, pelvic irradiation was more frequent in the 1920s than later). Thus the women in the radiation group tended to have longer potential periods of follow-up than those in the surgical groups. This was handled by examining the specific dates of entry into the study for each woman.
The fourth column of Table 5 gives an estimate of the crude risk of developing breast cancer. For the reasons given above, this is a very poor estimate. The next column gives an estimate of the cumulative incidence rate of breast cancer over the average 30-year follow-up period, which was obtained by estimating the person-years contribution of each woman. Because of the inadequacies in follow-up mentioned above, the estimates shown are undoubtedly lower than the true rates. However, provided that the under-ascertainment was approximately equal in the different exposure groups, there should be less bias in the estimates of relative risk shown in the last column. The cholecystectomy group was considered to be the ‘unexposed’ or control group. Using eqn (13), the relative risk for the women with unilateral oophorectomy is 1.06, which is not statistically different from the standard group. The relative risk for the women with hysterectomy and bilateral oophorectomy is 0.27, which is significantly less than the standard group. The women receiving irradiation also had a low relative risk for developing breast cancer (0.54), but because this group is small the risk is not statistically significant.
Because of the problems in estimation of the cumulative incidence rates in this study, no attempt was made to obtain estimates of the aetiological fraction.
Example 11—Coronary heart disease and smoking among British doctors
Table 6 shows the numbers of deaths from coronary heart disease and corresponding person-years denominators for smokers and non-smokers observed during the first 10 years of follow-up of the British Doctors Study. The coronary heart disease rates increase markedly with age, but less so for smokers than for non-smokers. Since the rate ratios for smokers versus non-smokers decline sharply with age, whereas the rate differences generally increase, this is an example where neither the Mantel–Haenszel rate ratio nor the cumulative rate difference is very useful in summarizing the age-specific quantities. Either the age-specific rates themselves, or variations in the rate ratios or differences with age, are needed to describe the results of the study adequately. Nevertheless, using the age-specific rates, it can readily be calculated that the cumulative mortality rate in the 35 to 74 years age group is 17.0 per cent for non-smokers and 24.9 per cent for smokers. The corresponding cumulative risks are

Table 6 Deaths from coronary heart disease (CHD) among British male doctors

CRE = 1 – exp(– 0.170) = 15.6 per cent
and
CR0 = 1 – exp(– 0.249) = 22.0 per cent
for a risk ratio RRE = 1.41 and an attributable risk RD = 6.4 per cent. Assuming that PE = 83 per cent of British doctors were smokers at the beginning of the study period, the aetiological fraction is

However, this number should be interpreted cautiously for the reasons mentioned above. The aetiological fraction is much smaller when the coronary heart disease deaths occurring at ages 75 to 84 years are also taken into account. The Mantel–Haenszel rate ratio for the entire 50-year age span is RRMH = 1.42, the attributable risk RD is 3.9 per cent, and the aetiological fraction AF is 9 per cent.
Example 12—The Framingham Heart Study
During the first three decades of its existence, the Framingham Heart Study generated more than 300 publications. Many involved quite sophisticated methodological applications, which are beyond the scope of this chapter.
An example of the relation between the occurrence of coronary heart disease and serum cholesterol based on 6 years of follow-up is shown in Table 7. Data are shown for men who were aged between 40 and 59 years and were free from coronary heart disease at entry. There were 1333 men with measured cholesterol levels and follow-up was complete for 6 years for nearly all of them. These men were classified into tertiles on the basis of their initial serum cholesterol levels as shown in the first column of the table. Person-years of observation were estimated for each tertile based on the assumption that each of the men who developed coronary heart disease was followed on average for half the study period, whereas the other men were followed for the entire 6 years. (It would be better to count those who developed coronary heart disease plus those who died from other causes as contributing 3 years each.) Thus, for example, the average annual incidence for the entire cohort was estimated from eqn (1) as

The cumulative risks determined from eqn (5) are virtually identical in this instance to the crude risks (number of cases divided by number of persons at risk at the start of the period). The relative risks associated with high cholesterol levels are shown in the next column, where the men with cholesterol levels lower than 210 mg/100 ml are taken as the standard or unexposed group. Men with cholesterol levels between 210 and 244 mg/100 ml have 1.81 times the risk of developing coronary heart disease compared with men with lower cholesterol levels, and men with cholesterol levels above 244 mg/100 ml have risks 3.43 times greater. The attributable risks RD associated with higher cholesterol levels are shown in the last column.

Table 7 Six-year incidence of coronary heart disease according to initial serum cholesterol in men aged 40–59 years

If men could be prevented from having cholesterol levels above 244 mg/100 ml, the potential impact upon the incidence of coronary heart disease can be estimated from the aetiological fraction (eqn (14)). The combined group of men with cholesterol levels below 245 mg/100 ml is considered to be the unexposed groups with a (crude) risk of

Then the aetiological factor is

i.e. the risk of coronary heart disease among men could potentially be lowered by 31 per cent if none of them had cholesterol levels over 245 mg/100 ml. A similar calculation showed that if all the men had cholesterol levels below 210 mg/100 ml, the aetiological fraction would be 51 per cent, i.e. half of the cases of coronary heart disease could potentially be prevented. This illustrates the strong dependence of the aetiological fraction on the rather arbitrary specification of the baseline level for a continuous-valued risk factor. Furthermore, since whatever intervention was undertaken to reduce the serum cholesterol levels might have unpredictable effects on the coronary heart disease rates, it is clear that ‘potential impact’ as used here must be interpreted in terms of statistical association rather than causation.
The Cox proportional hazards model
These examples involve estimation of separate relative risks for each level of a categorical variable relative to a baseline level (RR = 1): cumulative radiation exposure in Example 10, age in Example 11, and serum cholesterol in Example 12. This approach works well when there are sufficient disease cases at each level of exposure to provide stable estimates. For multiple exposures, or for purposes of low-dose extrapolation, it is desirable to model the relative risk as a mathematical function of one or more quantitative exposure variables. In radiation epidemiology, for example, it is common to model the relative risk as a linear function of the radiation dose x:
RR(x) = 1 + ax.
When there are several exposure variables x1, x2,…, choice of the log-linear relative risk function implies that the individual exposures have relative risks that multiply:
log RR (x1, x2, …) = a1x1 + a2x2 +… .
If, for example, exposure to one risk factor doubles the baseline risk while exposure to another triples it, the log-linear model assumes that the joint exposure increases the risk sixfold. Some such modelling assumptions, which must be carefully checked against the data, are needed to make sense of highly multivariate exposure information.
Cox (1972) provided a mathematical foundation for relative risk function estimation that revolutionized the statistical approaches to analysis follow-up data collected in both clinical and epidemiological studies. His proportional hazards model assumes that the disease incidence rate at time t for a subpopulation with exposure(s) x is the product of a baseline (x = 0) incidence rate I0(t) multiplied by a parametric relative risk function such as linear or log-linear. The cumulative baseline incidence rate is estimated non-parametrically by a formula that generalizes eqn (6). The parameters (regression coefficients) a in the relative risk function are estimated by comparing the values of the exposure variables for the disease case that occurs at a given time tj with those of subjects in the risk set of cohort members who are being followed but have not yet developed disease just prior to tj. The model derives its name from the fact that, in its standard formulation, disease incidence rates for subjects with different exposures are assumed to be in constant proportion (ratio) as a function of time. Important generalizations allow both the exposures and the relative risks associated with them to vary with time. The model led to the realization that costs could be greatly reduced by limiting collection of expensive data items to the disease cases and to a small number of disease-free controls sampled randomly from each of the risk sets. This design, known as the nested case–control study, permits estimation of both the relative risk function and the baseline cumulative incidence rate.
Types of bias and their resolution
In this section the different types of bias that may occur in cohort studies are presented and discussed. Because all the different types of bias are not necessarily present in the same study, we draw examples from additional studies as well as from two of the three studies presented above.
Factors related to the selection of the study population, response rate, collection of information, methodologies used, and analytical strategies employed often introduce biases which, if not anticipated, can lead to incorrect conclusions concerning a possible relationship between an exposure (independent variable) and a disease (outcome variable). Such biases are inherent in all types of epidemiological studies. In this section we shall confine our discussion to the types of biases that affect cohort studies.
There are five broad categories of bias that are operative in cohort studies. These are selection bias, follow-up bias, information bias, confounding bias, and post hoc bias. Each of these is discussed separately below. These biases can cause systematic errors, which affect the internal validity of a study. This is in contrast with random errors, which may not affect the internal validity of the study but will reduce the probability of observing a true relationship. A true bias (i.e. a systematic error that is introduced into one group or subgroup to a greater extent than in other subgroups) often leads to the observation of a relationship that is not a true relationship or, vice versa, leads to the conclusion that there is no relationship when, in fact, there is a true relationship between the independent and outcome variables. Errors that occur with equal frequency in all subgroups usually do not affect the validity of a relationship. However, because a certain proportion of the measurements in all the subgroups will be erroneous, the probability of observing a true relationship is diminished and the true magnitude of the relationship may be underestimated.
While internal validity is paramount, it is often important to have external validity in cohort studies as well. External validity refers to the degree to which an association observed in the study populations also holds true in the general population. In order to ensure external validity, the population studies must be representative of the population to which the results of the study will be generalized. In many cohort studies, it is necessary for various reasons to study some subpopulation of the general population. This subpopulation may represent a non-random sample of the general population, such as an occupational group, a group selected from a particular health plan, and so on. If such subpopulations are used, external validity may be reduced.
Selection bias
Selection bias may occur when the group actually studied does not reflect the same distribution of factors (such as age, smoking, race, etc.) as occurs in the general population. This may be because some of the members of the cohort selected originally refuse to participate or, in a non-current cohort study, because records on some individuals are missing or incomplete. Therefore the response rates among the various subgroups invited to participate in the study differ. In some studies particular subgroups may be used for convenience and may not be representative of the general population for other reasons.
Example 13—Effects of volunteering
An example of selective non-response to recruitment was observed and documented in the Framingham studies cited above. It was found that individuals who agreed to participate in these cohort studies were healthier than individuals who did not agree to participate. While this would not affect the internal validity of the study, since the groups to be followed were characterized on the basis of factors present at baseline, it would be likely to reduce the incidence of the disease of interest, particularly in the first few years of the study. Thus the external validity would be diminished, but the internal validity should not be affected for those independent variables defined at baseline. However, because the incidence of disease might be lower in this healthier group that is being followed up, the probability of finding significant relationships would be somewhat diminished. In occupational cohort studies, this type of selection bias has been termed ‘the healthy worker effect’.
Example 14—Spectrum of independent variables in study groups
A second problem with selection non-response is associated with the extent to which the population that agrees to participate in the study actually represents the true spectrum of the independent variable.
Early studies of the relationship of dietary cholesterol and saturated fat intake to coronary heart disease in the United States gave inconclusive results. This may have been due, in part, to the fact that very few Americans have dietary cholesterol and saturated fat intakes in the lower ranges, whereas residents of less affluent countries have a higher proportion of individuals with these low levels of intake. If there exists a threshold level of the independent variable necessary to produce disease and the respondents include only individuals with levels of the independent variable that are above the threshold level, then no relationship will be seen between the variable and the disease under study. Thus some of the comparisons of the incidence of coronary heart disease among Americans with higher levels of cholesterol and fat in the diet may not have shown a relationship because the threshold level of dietary fats was below the levels consumed in the study population. Even in situations where there is no threshold level, inclusion of individuals at only one end of the spectrum of the independent variable will reduce the likelihood that a dose–response relationship will be observed. Thus non-response or non-inclusion of participants in the cohort who represent one or the other extreme of the independent variable may affect the internal validity of the study and lead to a false observation that there is no relationship.
Example 15—Presence of incipient disease
Another problem of selection bias occurs when individuals who have incipient disease are included in the cohort. Individuals with the disease of interest should be excluded from the study population at the time of recruitment. However, with many chronic diseases that have a long induction period, such as cancer and heart disease, it is difficult to identify individuals with incipient disease. Their inclusion in the study population may lead to an observation of associations that are, in fact, a result of the disease process rather than a risk factor for the disease.
An association between low cholesterol and risk of cancer has been observed in several cohort studies. The induction period for cancer is probably one or more decades. Individuals who develop cancer in a follow-up period of less than the induction period probably had incipient disease at the time of the formation of the cohort. Thus a low cholesterol level in these individuals may have been a result of the cancer process rather than a risk factor for it.
Example 16—Distribution of covariates
A final example of selection bias occurs when the distribution of covariates that may be related to disease incidence is not equally represented in the study cohorts.
Smoking is related to a number of diseases. In some, it is the probable major cause whereas in others, such as coronary heart disease, it represents only one of several risk factors that increase the probability of developing disease. Thus, if the non-respondents include a higher proportion of smokers than non-smokers, the total incidence of coronary heart disease in the study cohort would be lower than if smokers were appropriately represented in the cohort. However, the effect would be not only to make the observed incidence of coronary heart disease lower in the study population than in the general population, but also to lead to a false estimate of the proportion of coronary heart disease that is associated with smoking (the aetiological fraction). Specifically, the incidence of coronary heart disease among the smokers would be correct, but the proportion of the total numbers of cases that were associated with smoking would be smaller than actually exists because the proportion of smokers in the study population would be lower than in the general population. Thus any estimates of the aetiological fraction of coronary heart disease due to smoking would be too low.
Follow-up bias
One of the major problems in cohort studies is to accomplish the successful follow-up of all members of the cohort. If the loss to follow-up occurs equally in the exposed and unexposed groups, the internal validity should not be affected. Of course, this assumes that the rate of disease occurrence is the same among those lost to follow-up as among those not lost to follow-up within each group. However, if the rate of disease is different among those lost to follow-up, then the internal validity of the study may be affected (i.e. the relationship between exposure and outcome may be changed).
Example 17—Bias resulting from differential incidence in those lost to follow-up
If the rate of lung cancer is higher in those smokers who are lost to follow-up than in those who remain in the study, the observed incidence of lung cancer in those smokers who remain in the study will be lower than the actual incidence of lung cancer in the entire cohort of smokers. The effect will be to observe a lower association between lung cancer and smoking than actually exists (provided that the incidence of lung cancer is the same in non-smokers who were and were not followed). If the lung cancer incidence rate is lower in smokers who are not followed up than in those who are, the reverse effect would occur (i.e. the observed association would be greater than the true association).
Usually the incidence of disease is not known among those lost to follow-up, making it difficult to look for this type of bias. If possible, the occurrence and cause of death should be sought in those who are lost to follow-up. This is easier in the United States now that there is a National Death Registry. If the death rate is similar between those lost and not lost to follow-up within each group, the occurrence of a different incidence of disease in the two groups is less likely.
Another strategy is to compare the known characteristics at baseline of those lost and not lost to follow-up. The more similar the two groups are, the less likely it is that a different incidence of disease occurred in them.
Neither of these strategies guarantees that the incidence was the same in both those followed and those not followed. Therefore the best strategy is to reduce the number lost to follow-up to the lowest level possible.
Example 18–Bias resulting from loss to follow-up of individuals under observation for the independent variable
Another possible source of bias may be observed in studies in which the independent variable is being documented concurrently with the development of the outcome variable, presenting the opportunity for misclassification resulting from loss to follow-up.
In evaluating the relationship between a decline in lung function test results and concurrent levels of exposure to photochemical oxidants at place of residence, a problem arises in considering how to evaluate individuals who have moved from the study area to other areas (Detels et al. 1991). In some instances they will have moved to areas with lower levels of exposure to photochemical oxidants and in other instances to areas with higher levels. It is not feasible to maintain constant monitoring for levels of photochemical oxidants in all the areas to which these individuals have moved. If there is indeed a relationship between levels of exposure to photochemical oxidants and decreasing lung test performance, the inclusion in analyses of individuals who have moved to a cleaner area, as if they had remained in the area of high exposure, will lead to misclassification bias and thus to an underestimate of the relationship. However, if individuals who moved to a dirtier area are included, this will lead to misclassification bias with the reverse effect.
Another potential bias is introduced if the individuals who have moved are excluded from the analysis to avoid misclassification bias. Individuals who have moved out of the study area may have done so because of the high level of exposure to photochemical oxidants and their awareness of their declining respiratory ability. This would result in an observed relationship in those not moving that is lower than the true relationship.
While this type of bias is almost impossible to prevent, there are several pieces of information that can assist the investigator in evaluating the magnitude of the bias that may be introduced. Firstly, the investigator may compare lung function test results at baseline among those who remained and those who moved away. Any difference between those retested and those not retested would provide information about the direction, and possibly about the magnitude, of the bias that occurred.
Secondly, it is often possible to send a mail questionnaire to individuals who have moved away from the study area, which should include questions regarding reasons for moving. If it is found, for example, that many of the respondents moved because of the development of respiratory symptoms, the probability of potential bias can be recognized. In addition, the ascertainment of diagnosed respiratory impairment among those not retested would also indicate the presence of bias.
Although there is no completely satisfactory solution to this problem resulting from loss to follow-up, awareness of the potential for bias will enable the investigator to explore various methods to evaluate its effect.
Example 19—Unequal observation
Smoking is associated with a wide range of adverse health outcomes. Any one of these is more likely to result in smokers being seen by a physician, thus increasing the likelihood that the disease of interest may also be diagnosed at that time, i.e. there would be an earlier diagnosis of disease in the smoking individual than in a comparable non-smoking individual who would be less likely to come under medical scrutiny. As a result, there would be an overestimate of the association of the disease of interest with the smoking variable. This overestimate would occur when a crude relative risk analysis (eqn (13)) is used since cohort studies usually have a defined follow-up period. It would also occur when a summary rate ratio based on person-years (eqn (11)) is used since the individual would appear as a case after fewer years of follow-up than would normally occur if he or she were not brought to medical attention as a result of smoking.
Information (misclassification) bias
Information bias occurs when there is an error in the classification of individuals with respect to the outcome variable. This may result from measurement errors, imprecise measurement, and misdiagnosis for whatever reason. Information bias is also termed misclassification bias. If the misclassification occurs equally in all the subgroups of the study population, the internal validity of the study will not be affected, but the precision or probability of being able to demonstrate a true relationship is reduced and the true magnitude of the relationship is likely to be underestimated.
Example 20
If the proportion of cases under- or over-reported in a cohort study of the risk of coronary heart disease is equal among smokers and non-smokers, no change in the observed risk ratio for smoking would occur and the internal validity of the study would be unaffected. However, if the misclassification occurs to a greater extent among either smokers or non-smokers, the observed risk will be altered, thereby affecting the relative risk and incidence difference and, as a result, the internal validity of the study.
Confounding bias
Confounding occurs when other factors that are associated with both the outcome and exposure variables do not have the same distribution in the exposed and unexposed groups. Two common confounders in cohort studies are smoking and age. The risk of disease varies with age for almost all diseases. Likewise, smoking increases the risk of acquiring a wide range of diseases.
Example 21
In a cohort study to determine the risk of coronary heart disease among individuals who drink and do not drink, the prevalence of individuals who smoke is likely to be higher among those who drink than those who do not drink. If one does not take into account the prevalence of smoking in the two groups, there will be a higher incidence of coronary heart disease in the drinking group than in the non-drinking group, which is, in fact, ascribable to smoking rather than to drinking. A false association or non-association also might be observed if the age distributions were not the same in the drinking and non-drinking groups since the incidence of coronary heart disease increases with age.
Confounding bias can result in either an overestimate or an underestimate of the relative risk of an independent variable with disease. Estimates of the effect of confounding variables in a cohort study usually require primarily the use of the investigator’s judgement, although the application of specific statistical procedures can help in reducing the effects of recognized confounders (see Chapter 6.10).
Post hoc bias
Another source of potential bias is the use of data from a cohort study to make observations that were not part of the original study intent. Thus interesting relationships that were not originally anticipated are often observed in cohort studies. These findings should be treated as hypotheses that are an appropriate subject for additional studies. Such fortuitous findings should not be considered to have established the validity of a relationship and in no circumstance should the same data be analysed to test hypotheses arising from that data.
Resolution of bias
There are various strategies for reducing the presence of bias in cohort studies. Selection bias can be reduced by careful selection of individuals for inclusion in the study and by making every attempt to characterize differences that may exist between respondents and non-respondents. Although consideration of characteristics that may be more frequent in non-respondents will not eliminate bias, it may permit the investigator to assess the directionality and degree of bias that may have resulted from specific selection procedures. Information bias can be reduced by using well-defined precise measurements and classification criteria for which the sensitivity and specificity have been determined. Follow-up bias can be reduced by intensive follow-up of all study participants and by establishing criteria for follow-up that will assure that all members of the cohort have an equal opportunity for being diagnosed as having the outcome variable. Comparison of the characteristics present at baseline among those lost to follow-up and those successfully followed up may provide information upon which estimates of the nature and degree of bias that may have been introduced through loss to follow-up may be based.
Confounding bias can be reduced in the analysis stage by careful stratification and/or adjustment procedures. However, fine stratification for multiple potential confounders may result in a loss of information, which reduces the likelihood of observing a significant difference. Thus careful consideration should be given to whether proposed adjustment factors are clearly related to the disease outcome. If not, it is usually better not to attempt to restrict the selection of participants or to adjust during analysis. The identification and resolution of bias is primarily a matter of epidemiological judgement. Statistical and analytic techniques designed to reduce bias should be applied only to factors that, in the judgement of the investigators, are potential sources of bias.
We have discussed the major sources of bias in a cohort study. However, this list is far from exhaustive, and additional types of bias will surely be described in the future for which investigators should be alert. More detailed discussions of the problems of cohort studies are given by Kleinbaum et al. (1982), Rothman (1986), Breslow and Day (1987), Hennekens and Buring (1987), and Samet and Muñoz (1998).
Summary
Cohort studies are usually the best type of studies for demonstrating the association between an exposure and a disease because it is possible to derive relative and attributable risks and often incidence measures from them. However, they are usually expensive to carry out and large cohorts are required for rare diseases. In addition, there are very significant problems associated with the selection of appropriate groups to be studied and with complete ascertainment of disease occurrence in them. Usually it is necessary to compromise the ideal, thus providing the opportunity for various types of bias to occur that can result in incorrect conclusions. The success of a cohort study often depends on the care of the investigator in recognizing and correcting for these biases.
Chapter References
Breslow, N.E. and Day, N.E. (1987). Statistical methods in cancer research. Vol. 2: The design and analysis of cohort studies. International Agency for Research on Cancer, Lyon.
Committee on the Biological Effects of Ionizing Radiations (1988). In Health risks of radon and other internally deposited alpha-emitters, p. 471. National Academy Press, Washington, DC.
Cox, D.R. (1972). Regression models and life tables (with discussion). Journal of the Royal Statistical Society, Series B, 34, 187–220.
Dawber, T.R., Kannel, W.B., and Lyell, L.P. (1963). An approach to longitudinal studies in a community: the Framingham Study. Annals of the New York Academy of Sciences, 107, 539–56.
Detels, R., Tashkin, D.P., Sayre, J.W., et al. (1991). The UCLA Population Studies of CORD. X: A cohort study of changes in respiratory function associated with chronic exposure to SOx, NOx, and hydrocarbons. American Journal of Public Health, 81, 350–9.
Doll, R. and Peto, R. (1976). Mortality and relation to smoking: 20 years’ observations on male British doctors. British Medical Journal, ii, 1525–36.
Feinleib, M. (1968). Breast cancer and artificial menopause: a cohort study. Journal of the National Cancer Institute, 41, 315–29.
Hennekens, C.H. and Buring, J.E. (1987). Epidemiology in medicine. Little, Brown, Boston, MA.
Kannel, W.B., Dawber, T.R., Kagan, A., et al. (1961). Factors of risk in the development of coronary heart disease—six year follow-up experience. The Framingham Study. Annals of Internal Medicine, 55, 33–50.
Kleinbaum, D.G., Kupper, L.L., and Morgenstern, H. (1982). Epidemiologic research. Lifetime Learning, Belmont, CA.
Prentice, R.L. (1986). A case–cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika, 73, 1–12.
Rothman, K. (1986). Modern epidemiology. Little, Brown, Boston, MA.
Rothman, K.J. and Greenland, S. (1999). Modern epidemiology (2nd edn). Lippincott–Raven, Philadelphia, PA.
Samet, J.M. and Muñoz, A. (ed.) (1998). Cohort studies. Epidemiologic Reviews, 20, 1–136.
Shore, R.E., Hemplemann, L.H., Kowaluk, E., et al. (1977). Breast neoplasms in women treated with X-rays for acute postpartum mastitis. Journal of the National Cancer Institute, 59, 813–22.

Advertisements

One comment on “6.6 Cohort studies

  1. […] How you can Eating plan Adequately Beginning ImmediatelyHow to lose weight through dieting6.6 Cohort studiesLose weight quickly – 10 Effortless Approaches to Knock Weight Off Swiftly!6.6 […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: