Notes on County Business Patterns Special Extracts Data

Beginning with 1998 data, the Census Bureau releases County Business Patterns data classified by industry according to the North American Industry Classification System (NAICS) replacing the Standard Industry Classification (SIC) system which had been the basis for providing industry level County Business Patterns Special Extracts (CBPSE) data. In order to facilitate comparison to earlier years, SOCDS presents estimates of the discontinued Census SIC CBPSE data series from 1998 to present. The NAICS and SIC data are available for download as part of the complete CBPSE data file here. This page provides documentation of the estimation procedure.

Caution should be used in comparing the SIC estimates (1998-present) to previous years' data (1991-1997). The NAICS-to-SIC conversion is based on national data aggregated from highly detailed industry categories. The estimates can not account for local variation in the detailed industry composition of the broader industry summary categories in which the original local data are provided.

The estimation procedure involves two primary tasks: creation of a NAICS-to-SIC conversion matrix; and estimating NAICS data items (number of jobs and payroll) suppressed for confidentiality purposes.

NAICS TO SIC Translation Methodology

The Census Bureau’s NAICS-to-SIC Bridge, created from data collected in the 1997 Economic Census, serves as the primary source for HUD’s NAICS-to-SIC conversion matrix. The NAICS-to-SIC Bridge disaggregates each "6-digit" NAICS category of jobs, payroll and establishments into corresponding "4-digit" SIC jobs, payroll and establishment components. HUD consolidated the highly detailed Census NAICS-to-SIC Bridge to a set of equations for converting “3-digit” NAICS jobs, payroll, and establishment data to “2-digit” SIC jobs, payroll, and establishment estimates in the following sequence of steps:

  1. Imputed suppressed SIC data in the NAICS-to-SIC Bridge.

    • Jobs--Used the midpoint in the suppression code range as a starting value. Modified starting values so that sum of SIC job components equaled NAICS value.
    • Payroll--Multiplied the national sector payroll per job ratio by number of sector jobs as the starting value. Modified starting values so that sum of SIC payroll components equaled NAICS value.

  2. Aggregated NAICS-to-SIC data to "3-digit" NAICS level and "2-digit" SIC level. Each "3-digit" NAICS level was then linked to its "2-digit" SIC components.


  3. Calculated the percent "contribution" of each "2-digit" SIC category in jobs, payroll, and establishments to the "3-digit" NAICS totals.


  4. Resorted and grouped the data, including the percents created in step 3, by "2-digit" SIC codes. At this stage, each "2-digit" SIC category was linked to its contributing set of "3-digit" NAICS categories. The percentages reveal the proportion of each "3-digit" NAICS in the "2-digit" SIC category. The percentages become the coefficients in the NAICS-to-SIC equations.

The Census NAICS-to-SIC Bridge does not include a translation of jobs data for the agricultural sector. SOCDS, however, provides this translation from its own agricultural NAICS-to-SIC bridge addition constructed from information found in the North American Industry Classification System Manual and the 1998 County Business Patterns Data. In the SOCDS agricultural NAICS-to-SIC bridge addition, NAICS categories Forestry and logging (1130) and "Fishing, hunting, & trapping" (1140) were mapped directly into SIC categories Forestry (S0800) and Fishing, hunting, and trapping (S0900) respectively. The translation of NAICS into the SIC category Agricultural services (S0700) proved to be more challenging. A review of The North American Industry Classification System Manual revealed that the the 3-digit NAICS categories, Professional, scientific & technical services (541), Administrative & support services (561), and Personal & laundry services (812) in the NAICS-to-SIC bridge did not include the agricultural services components of Veterinary services (54194), Landscaping services (56173) and Pet care(except veterinary)services (81291) respectively. Thus the NAICS-to-SIC Bridge was adjusted accordingly:

  1. Estimated the percentage contribution of these missing agricultural components to their respective 3-digit NAICS sectors by using the more complete 1998 County Business Patterns data set.


  2. Added the missing agricultural S0700 SIC components to the bridge.


  3. Proportionately reduced the preexisting non-agricultural SIC component contributions to these 3-digit NAICS sectors.

The complete set of NAICS-to-SIC equations are available for download as a Microsoft Excel spreadsheet here.

Estimation Methodology

The Census Bureau is bound by law not to release any statistical information that may reveal information about individual persons or business establishments. Thus, the original 1998-present CBPSE-NAICS data included missing data suppressed under confidentiality rules. In place of actual data, the Census Bureau supplies codes indicating the range in which falls the number of jobs in that industry category. In order to provide more accurate SIC-based estimates, HUD had to create estimates of the missing NAICS jobs and payroll data.

Missing jobs data were estimated first.  Linear programming is used to estimate suppressed NAICS jobs data because of the linear constraints inherent in the data. The constraints are:

  1. The sum of two digit NAICS jobs equal total jobs within each MSA and each central city.


  2. The sum of three-digit NAICS jobs equal the corresponding two-digit NAICS jobs within each MSA and each central city.


  3. The central city industry jobs do not exceed the MSA industry jobs.


  4. The estimates fall within their corresponding suppression code ranges:


      • A -- less than 20 employees
      • B -- 20 to 99 employees
      • C -- 100 to 249 employees
      • E -- 250 to 499 employees
      • F -- 500 to 999 employees
      • G -- 1,000 to 2,499 employees
      • H -- 2,500 to 4,999 employees
      • I -- 5,000 to 9,999 employees
      • J -- 10,000 to 24,999 employees
      • K -- 25,000 to 49,999 employees
      • L -- 50,000 to 99,999 employees
      • M -- 100,000 or more employees
      • Q -- Not Available

The effective objective function of the linear programming problem is to minimize the absolute value of the difference between the final estimates and the starting values. The starting values were set at either the ratio of the national sector jobs per establishment multiplied by the number of local establishments for the suppressed sector value in question, or, if the nationally based estimate fell out of the suppression code range, the midpoint of the suppression code range. Linear programming software was used to find estimates for each MSA/central city combination.

The Census did not provide suppression code ranges for missing payroll data. The technique to impute suppressed NAICS payroll data is therefore simpler and based upon the jobs estimates.  The national sector payroll per job ratio multiplied by the local number of jobs provided the starting point for each local payroll suppression imputation. These first-cut estimates were then modified so as to satisfy the three payroll constraints:

  1. The sum of two-digit NAICS payroll equal total payroll within each MSA and each central city.


  2. The sum of three-digit NAICS payroll equal the corresponding two-digit NAICS payroll within each MSA and each central city.


  3. The central city payroll estimates for each industry do not exceed the MSA payroll estimates for each industry.

Final SIC-based estimates for each central city and MSA are created by applying the NAICS-to-SIC equations to the city’s or MSA’s NAICS-based data and estimates. Please note that SOCDS only presents SIC estimates if more than 50% of the NAICS contributing parts come from actual not estimated data. Otherwise suppression codes are shown.