Appendix B

Technical Notes

Content of the Report

In this report, Indiana cancer incidence and mortality numbers and age-adjusted and age-specific Indiana cancer incidence and mortality rates are presented for the year 2001 and the years 1997 through 2001 combined. Indiana Rates for the most common cancers are compared with national rates. For selected cancers, the African-American and white population rates are compared, as are state and county rates. Rates and numbers are reported for the state as a whole, and for each of the 92 counties, individually. Rates are also given for both sexes combined, and for males and females individually. The data utilized in calculating these rates was that available to the Indiana State Cancer Registry as of February 7, 2005. The North American Association of Central Cancer Registries (NAACCR) has specified a method of estimating completeness of case ascertainment, that is, calculating the actual number of cases reported as a percentage of the estimated number of cases to be diagnosed in Indiana in a given year. These are the completeness estimates for each of the five years covered in this report:

1997	1998	1999	2000	2001
93.7%	98.6%	95.9%	97.6%	98.8%

By convention, cancer incidence rates do not include carcinoma in situ (with the exception of bladder cancer in situ), nor do they include basal and squamous cell carcinomas of the skin. The numbers and rates of reported cancers that appear in Sections 1, 4 and 5 follow this convention, as do Table 2-2 and all the tables in Section 3 other than Tables 3.n-7.

In contrast, in situ and skin cancers are included in the numbers given in Table 2-1 and in Tables 3.n-7, since these tables concern cancers diagnosed by stage. Thus, the total numbers in Table 2-1 do not match the other sections, but do match the numbers in Tables 3.n-7, with the exception of Table 3.2-7, which is limited to female breast cancer, whereas Table 2-1 displays numbers for both sexes. Also note that in situ cancers of the cervix and prostate are not included in any table.

Incidence Rates

The cancer incidence rate is the number of new cancers of a specific site or type occurring in a specified population during a year, expressed as the number of cancers per 100,000 people. It should be noted that the numerator of the rate can include multiple primary cancers occurring in one individual. This rate can be computed for each type of cancer, as well as for all cancers combined. These rates are age-standardized to the U.S. 2000 standard million population to allow for comparisons between groups (geographic or demographic) that have different age distributions.

Age-adjusted Rates

When comparing rates over time or across different populations, crude rates (the number of newly-diagnosed cancer cases per 100,000 persons) can be misleading because differences in the age distributions of the various populations are not considered. Since cancer is age-dependent, the comparison of crude incidence rates from cancer can be especially deceptive.

Age-adjusted rates take into account the diverse age distributions of the populations. Valid comparisons between age-adjusted rates can be made, provided the same standard population and age groups have been used in the calculation of the rates. The direct method of adjustment was used to produce the age-adjusted rates for this report. In this method, the population is first divided into reasonably homogeneous age ranges and the age-specific rate is calculated for each age range; then each age-specific rate is weighted by multiplying it by the proportion of the standard population in the respective age group. The age-adjusted rate is the sum of the weighted age-specific rates.

For example, suppose there are 200,000 people aged 70 to 74 in the state, and this is 3.2% of the total state population (which would be 6,250,000 in this example), but only 2.7% of the standard population. Suppose further there are 64 cases in this age group of some type of cancer for which we want to calculate the rate. This is a crude rate of 32 per 100,000 for this age group. If this age group comprised only 2.7% of the state population, the same proportion as in the standard population, there would be only 168,750 people in this age group instead of 200,000. If this were the case and the crude rate were still 32 per 100,000, there would be only 54 cases instead of 64. In computing the age-adjusted rate, this age group is counted as if there were only 54 cases, since the additional cases are due to the increased proportion of people in this age group.

Conversely, suppose there are 400,000 people aged 20-24 in the state, which is 6.4% of the total state population, and suppose this age group comprises 7.2% of the standard population. Suppose further there are 16 cases in this age group of the type of cancer we're concerned with. This is a crude rate of 4 per 100,000 for this age group. If the percentage of people in this age group were the same as in the standard population, it would consist of 450,000 people instead of 400,000. If this were the case and the crude rate were still 4 per 100,000, there would be 18 cases instead of 16. In computing the age-adjusted rate, this age group is counted as if there were 18 cases, since the smaller number of cases is due to the decreased proportion of people in this age group.

Confidence Intervals

Rates based on small numbers of events over a given period of time or for sparsely populated geographic areas should be viewed with caution. These rates show considerable random variation and are considered "unstable," which limits their usefulness in comparisons and estimation of rare occurrences.

In this report, by convention, whenever the number of cases of any type of cancer is less than 5 at the county level, the actual number is not reported to protect the privacy of these individuals. An asterisk (*) will denote this in Section 5. If the number of cases of any type of cancer is less than twenty, the rate generated is considered "unstable" and is marked with a double asterisk (**) when given in the tables.

Even when rates are based on a large numbers of events, there is still some degree of random variation. Thus the calculated rate may not be the "true" rate. Nonetheless it is possible to calculate the end points of an interval such that the probability that the true rate is outside the interval is less than some given value. For example, if the calculated rate for a particular type of cancer is 100 cases per 100,000 people, it can be calculated that the probability is less than 0.05 that the true rate is less than (say) 97 or greater than 104. Thus we are 95% confident that the true rate is between 97 and 104. The bar charts in Sections 1 and 2 use a bar to show the calculated rates and a horizontal I-beam to show the confidence interval, as shown here:

Confidence Interval Example

Because the calculated rate is not necessarily the true rate, it is not sufficient to compare the rates of two areas to determine if one area has a higher rate than the other. For example, suppose Area A has a calculated rate of 87 and Area B has a calculated rate of 94. Area B appears to have a higher rate. But suppose the 95% confidence intervals are computed and it turns out that we are 95% confident that Area A's rate is between 84 and 91, and we are 95% confident that Area B's rate is between 88 and 100. Then the confidence intervals overlap, so it's possible A's true rate is 90 and B's is 89, and it may have been a mistake to assume Area B has a higher rate, as shown here:

Overlapping Confidence Intervals

On the other hand, if A's 95% confidence interval turns out to be 85 to 89, and B's 91 to 98, then the confidence intervals do not overlap. Thus B's true rate must be greater than A's, as shown below, unless A's true rate is greater than the upper bound of its confidence interval (and there's only a 2.5% chance of that), or B's true rate is less than the lower bound of its confidence interval (and there's only a 2.5% chance of that).

Non-overlapping Confidence Intervals

The maps in Section 3 are shaded to show county rates that are higher, lower, or similar to the rate for Indiana as a whole. The rates are considered similar if the 95% confidence intervals overlap. In other words, if it cannot be said with at least 95% confidence that one rate is higher than the other, they are considered similar. Rates based on fewer than 20 cases are excluded from comparison.

Formulas

A crude rate is the number of cases per 100,000 in a given population, as given by the following formula:

The following is the formula used to calculate the age-adjusted rate for age groups x through y:

Formula for Age-Adjusted Rate

where count_i is the number of cases for the i^th age group, pop_i is the relevant population for the same age group, and stdmil_i is the standard population for the same age group. The 2000 standard population given above shows the population divided into 18 age cohorts, each with a range of 5 years, except the last, which includes everyone 85 and over. In this report, age-adjusted rates are calculated for all age groups, so in the above formula, x =1 (the first age group) and y = 18 (the last age group).

The formula for computing the end-points of a confidence interval for age-adjusted rates is somewhat complex. Suppose that the age-adjusted rate is comprised of age groups x through y, and let:

Formula for ith weight value

The endpoints of a (1 - p) × 100% confidence interval are calculated as:

Formula for lower end of confidence interval

Formula for upper end of confidence interval

where ChiInv(p,n) is the inverse of the chi-squared distribution function evaluated at p and with n degrees of freedom, and we define ChiInv(p,0) = 0.

This method for calculating the confidence interval produces similar confidence limits to the standard normal approximation when the counts are large and the population being studied is similar to the standard population. In other cases, the above method is more likely to ensure proper coverage.

Note: The rate used in the above formulas for the confidence interval endpoints is not per 100,000 population.

All of the above formulas are taken from A Guide to Using SEER*Stat, Version 3.0, National Cancer Institute, Cancer Statistics Branch, DCCPS.

Table of Contents