Appendix B

Technical Notes

Content of the Report

In this report, Indiana cancer incidence numbers and age-adjusted Indiana cancer incidence rates for the year 1997 are presented. Indiana rates for the most common cancers are compared with national rates. For selected cancers, the African-American and white population rates are compared, as are state and county rates. Rates and numbers are reported for the state as a whole, and for each of the 92 counties, individually. Rates are also given for both sexes combined, and for males and females individually. The data utilized in calculating these rates was that available to the Indiana State Cancer Registry as of January 16, 2003. These cases represent 94.3% of the estimated number of cases to be diagnosed in Indiana in 1997, as calculated using the method of estimating completeness of case ascertainment specified by the North American Association of Central Cancer Registries (NAACCR).

By convention, cancer incidence rates do not include carcinoma in situ (with the exception of bladder cancer in situ), nor do they include basal and squamous cell carcinomas of the skin. The numbers and rates of reported cancers that appear in Table I, Table II, all Table IIIs, all Table Vs, and all Table VIIs follow this convention.

In contrast, in situ cancers are included in the numbers given in all Table IVs and Table VI since these tables concern cancers diagnosed by stage. Thus, the total numbers in the two types of tables will not "match."

Incidence Rates

The cancer incidence rate is the number of new cancers of a specific site or type occurring in a specified population during a year, expressed as the number of cancers per 100,000 people. It should be noted that the numerator of the rate can include multiple primary cancers occurring in one individual. This rate can be computed for each type of cancer, as well as for all cancers combined. These rates are age-standardized to the U.S. 1970 standard million population to allow for comparisons between groups (geographic or demographic) that have different age distributions.

Age-adjusted Rates

When comparing rates over time or across different populations, crude rates (the number of newly-diagnosed cancer cases per 100,000 persons) can be misleading because differences in the age distributions of the various populations are not considered. Since cancer is age-dependent, the comparison of crude incidence rates from cancer can be especially deceptive.

Age-adjusted rates take into account the diverse age distributions of the populations. Valid comparisons between age-adjusted rates can be made, provided the same standard population and age groups have been used in the calculation of the rates. The direct method of adjustment was used to produce the age-adjusted rates for this report. In this method, the population is first divided into reasonably homogeneous age ranges and the age-specific rate is calculated for each age range; then each age-specific rate is weighted by multiplying it by the proportion of the standard population in the respective age group. The age-adjusted rate is the sum of the weighted age-specific rates.

For example, suppose there are 200,000 people aged 70 to 74 in the state, and this is 3.2% of the total state population (which would be 6,250,000 in this example), but only 2.7% of the standard population. Suppose further there are 64 cases in this age group of some type of cancer for which we want to calculate the rate. This is a crude rate of 32 per 100,000 for this age group. If this age group comprised only 2.7% of the state population, the same proportion as in the standard population, there would be only 168,750 people in this age group instead of 200,000. If this were the case and the crude rate were still 32 per 100,000, there would be only 54 cases instead of 64. In computing the age-adjusted rate, this age group is counted as if there were only 54 cases, since the additional cases are due to the increased proportion of people in this age group.

Conversely, suppose there are 400,000 people aged 20-24 in the state, which is 6.4% of the total state population, and suppose this age group comprises 7.2% of the standard population. Suppose further there are 16 cases in this age group of the type of cancer we're concerned with. This is a crude rate of 4 per 100,000 for this age group. If the percentage of people in this age group were the same as in the standard population, it would consist of 450,000 people instead of 400,000. If this were the case and the crude rate were still 4 per 100,000, there would be 18 cases instead of 16. In computing the age-adjusted rate, this age group is counted as if there were 18 cases, since the smaller number of cases is due to the decreased proportion of people in this age group.

Confidence Intervals

Rates based on small numbers of events over a given period of time or for sparsely populated geographic areas should be viewed with caution. These rates show considerable random variation and are considered "unstable," which limits their usefulness in comparisons and estimation of rare occurrences.

In this report, by convention, whenever the number of cases of any type of cancer is less than 5 at the county level, the actual number is not reported to protect the privacy of these individuals. An asterisk (*) will denote this in all tables. In all Table VIIs, if there are fewer than 5 cases for one sex but not the other, a tilde (~) is entered for the sex with 5 or more cases so that the actual number cannot be subtracted from the total to compute the number that is less than 5. If the number of cases of any type of cancer is less than twenty, the rate generated is considered "unstable" and is marked with a double asterisk (**) when given in the tables.

Even when rates are based on a large numbers of events, there is still some degree of random variation. Thus the calculated rate may not be the "true" rate. Nonetheless it is possible to calculate the end points of an interval such that the probability that the true rate is outside the interval is less than some given value. For example, if the calculated rate for a particular type of cancer is 100 cases per 100,000 people, it can be calculated that the probability is less than 0.05 that the true rate is less than (say) 97 or greater than 104. Thus we are 95% confident that the true rate is between 97 and 104. The bar charts in Tables I, II and III use a bar to show the calculated rates and a horizontal I-beam to show the confidence interval, as shown here:

Confidence Interval Example

Because the calculated rate is not necessarily the true rate, it is not sufficient to compare the rates of two areas to determine if one area has a higher rate than the other. For example, suppose Area A has a calculated rate of 87 and Area B has a calculated rate of 94. Area B appears to have a higher rate. But suppose the 95% confidence intervals are computed and it turns out that we are 95% confident that Area A's rate is between 84 and 91, and we are 95% confident that Area B's rate is between 88 and 100. Then the confidence intervals overlap, so it's possible A's true rate is 90 and B's is 89, and it may have been a mistake to assume Area B has a higher rate, as shown here:

Overlapping Confidence Intervals

On the other hand, if A's 95% confidence interval turns out to be 85 to 89, and B's 91 to 98, then the confidence intervals do not overlap. Thus B's true rate must be greater than A's, as shown below, unless A's true rates lies outside its confidence interval (and there's only a 5% chance of that), or B's true rates lies outside its confidence interval (and there's only a 5% chance of that).

Non-overlapping Confidence Intervals

The maps accompanying the Table IIIs are shaded to show county rates that are higher, lower, or similar to the rate for Indiana as a whole. The rates are considered similar if the 95% confidence intervals overlap. In other words, if it cannot be said with at least 95% confidence that one rate is higher than the other, they are considered similar. Rates based on fewer than 20 cases are excluded from comparison.

Formulas

A crude rate is the number of cases per 100,000 in a given population, as given by the following formula:

The following is the formula used to calculate the age-adjusted rate for age groups x through y:

Formula for Age-Adjusted Rate

where count_i is the number of cases for the i^th age group, pop_i is the relevant population for the same age group, and stdmil_i is the standard population for the same age group. The 1970 standard population is given in Appendix D, which shows the population divided into 18 age cohorts, each with a range of 5 years, except the last, which includes everyone 85 and over. In this report, age-adjusted rates are calculated for all age groups, so in the above formula, x =1 (the first age group) and y = 18 (the last age group).

The formula for computing the end-points of a confidence interval for age-adjusted rates is somewhat complex. Suppose that the age-adjusted rate is comprised of age groups x through y, and let:

Formula for ith weight value

The endpoints of a (1 - p) × 100% confidence interval are calculated as:

Formula for lower end of confidence interval

Formula for upper end of confidence interval

where ChiInv(p,n) is the inverse of the chi-squared distribution function evaluated at p and with n degrees of freedom, and we define ChiInv(p,0) = 0.

This method for calculating the confidence interval produces similar confidence limits to the standard normal approximation when the counts are large and the population being studied is similar to the standard population. In other cases, the above method is more likely to ensure proper coverage.

Note: The rate used in the above formulas for the confidence interval endpoints is not per 100,000 population.

All of the above formulas are taken from A Guide to Using SEER*Stat, Version 3.0, National Cancer Institute, Cancer Statistics Branch, DCCPS.

Table of Contents