The Public Health Service and the Centers for Disease Control kept the medical records of the men in the Study. Now housed at the Southeast Regional National Archives in Morrow, Georgia, they were opened in 2004 to the public through a Freedom of Information Act (FOIA) request filed by librarian/historian Tywanna Whorley. Although the articles written by PHS/CDC researchers when the Study was ongoing are also based on these records, the data below is an example of the kinds of information that can be gleaned from coding and data analysis.

The data below is based on a coding project done at the Southeast Regional National Archives of these records by Susan M. Reverby with the assistance of Rachel Stern, Harvard '07. Joan Huang, Wellesley '07, did further coding into the program "Epi Info" and "Excel." Donna Stroup, PhD, MSc, of Data for Solutions in Decatur, Georgia, provided the statistical analysis.

The first section of data below provides frequencies. The second section shows some of the statistical analysis.


Chart 1: Bar Graph of Men's Infection Status
Chart 2: Pie chart of Men's Infection Status
Chart 3: Date of First Exam
Chart 4: Men's Age at First Exam
Chart 5: Men's Age at First Exam - Controls
Chart 6: Men's Age at First Exam - Subjects
Chart 7: Number of Years Between First Lesion Date and First Exam
Chart 8: Date of Death
Chart 9: Date of Death - Controls
Chart 10: Date of Death - Subjects

Statistical Analysis

I. Summary Information

The dataset consisting of 624 records. Of these, 427 (68.4%) were Subjects, 185 (29.6%) were Controls, and 12 (1.9%) were Control to Subject(CtoS) conversions. Unless otherwise specified, analysis was performed after adding the 12 CtoS conversions to the Subjects, giving 185 (29.6%) controls and 439 (70.4%) subject "cases". Cause of death information was provided for 394 (63.14%) of the 624 records; of those records with cause of death, 285 (72.3%) were subjects and 109 (27.7%) were controls. In all cases, cause of death analysis used the first cause mentioned for the record. For each analysis below, number of observations is determined by information available.  For example, 538 records (383 subjects and 155 controls) have information for "age at first exam."

II. Age when study started

Information on Entry into Study was not available. "Age at first exam" was used as a proxy for age at study start.  Table 1 below summarizes descriptive information on age at first exam by case status.

Table 1: Age at First Exam

The difference (between controls and subjects) in the mean age at first exam is not statistically significant (t Statistic = 0.3143, p-value = 0.7534).

III. Date of lesion

Data are available on date of first lesion for 147 men. Of these, 143 also have information on data of first exam (these include one C to S crossover). The mean length of time from first lesion to first exam for these 143 men was 19.7 years (median 19, standard deviation 12.2, range 1-47 with one outlier at 72) (Table 2).

Table 2: Distribution of years from first lesion to first exam (143 men)

IV. Age/Date at death, by case status

Although the data show no difference between age at first exam (Table 1), case status does have an impact on age at death. Subjects died, on average, almost 5 years earlier than controls (Table 3).

Table 3: Age at first exam by case status

The difference in the mean age at death is highly statistically significant (t Statistic = 3.18, p-value = 0.0016).

To investigate the effect of the development of new knowledge in medical care, we investigated the effect of “decade of death” by case status (Table 4).

Table 4: Decade of death by case status

The differences in death decade are statistically significant (Chi-squared statistic = 7.93, p-value = 0.09); men who were subjects tended to die in earlier decades. Specifically, for each decade prior to 1950, proportionately more subjects died than for the controls (see row %), even though the groups were the same age at study entry.

Death rates before 1950 are of interest since this date marks the start of availability of penicillin. Table 5 shows death rates by case status and death by 1950.

Table 5: Death by 1950 by Case Status

The relative risk of death prior to 1950 is 44.6% (144/323) for subjects compared to 34.5%  (41/119) for controls; that is, subjects were about 10% more likely to die in that period (p = 0.03) .

For Subjects (including C to S conversions), the mean age at death is 65.17 (std dev 14.15), median age at death 66.

Click here to continue with Statistical Analysis.