| Beginning with 1998 data, the Census Bureau releases County Business Patterns data classified by industry according to the North American Industry Classification System (NAICS) replacing the Standard Industry Classification (SIC) system which had been the basis for providing industry level County Business Patterns Special Extracts (CBPSE) data. In order to facilitate comparison to earlier years, SOCDS presents estimates of the discontinued Census SIC CBPSE data series from 1998 to present. The NAICS and SIC data are available for download as part of the complete CBPSE data file here. This page provides documentation of the estimation procedure. Caution should be used in comparing the SIC estimates (1998-present) to previous years' data (1991-1997). The NAICS-to-SIC conversion is based on national data aggregated from highly detailed industry categories. The estimates can not account for local variation in the detailed industry composition of the broader industry summary categories in which the original local data are provided. The estimation procedure involves two primary tasks: creation of a NAICS-to-SIC conversion matrix; and estimating NAICS data items (number of jobs and payroll) suppressed for confidentiality purposes. | 
| The Census Bureau’s NAICS-to-SIC Bridge, created
from data collected in the 1997 Economic Census, serves as the primary source for HUD’s
NAICS-to-SIC conversion matrix. The NAICS-to-SIC Bridge disaggregates each "6-digit" 
NAICS category of jobs, payroll and establishments into 
corresponding "4-digit" SIC jobs, payroll and establishment components.  HUD consolidated the highly detailed Census
NAICS-to-SIC Bridge to a set of equations for converting “3-digit” NAICS jobs,
payroll, and establishment data to “2-digit” SIC jobs, payroll, and establishment estimates in the following sequence of steps: 
 The Census NAICS-to-SIC Bridge does not include a translation of jobs data for the agricultural sector. SOCDS, however, provides this translation from its own agricultural NAICS-to-SIC bridge addition constructed from information found in the North American Industry Classification System Manual and the 1998 County Business Patterns Data. In the SOCDS agricultural NAICS-to-SIC bridge addition, NAICS categories Forestry and logging (1130) and "Fishing, hunting, & trapping" (1140) were mapped directly into SIC categories Forestry (S0800) and Fishing, hunting, and trapping (S0900) respectively. The translation of NAICS into the SIC category Agricultural services (S0700) proved to be more challenging. A review of The North American Industry Classification System Manual revealed that the the 3-digit NAICS categories, Professional, scientific & technical services (541), Administrative & support services (561), and Personal & laundry services (812) in the NAICS-to-SIC bridge did not include the agricultural services components of Veterinary services (54194), Landscaping services (56173) and Pet care(except veterinary)services (81291) respectively. Thus the NAICS-to-SIC Bridge was adjusted accordingly: 
 The complete set of NAICS-to-SIC equations are available for download as a Microsoft Excel spreadsheet here. 
 Estimation MethodologyThe Census Bureau is bound by law not to release any statistical information that may reveal information about individual persons or business establishments. Thus, the original 1998-present CBPSE-NAICS data included missing data suppressed under confidentiality rules. In place of actual data, the Census Bureau supplies codes indicating the range in which falls the number of jobs in that industry category. In order to provide more accurate SIC-based estimates, HUD had to create estimates of the missing NAICS jobs and payroll data. Missing jobs data were estimated first. Linear programming is used to estimate suppressed NAICS jobs data because of the linear constraints inherent in the data. The constraints are: 
 The effective objective function of the linear programming problem is to minimize the absolute value of the difference between the final estimates and the starting values. The starting values were set at either the ratio of the national sector jobs per establishment multiplied by the number of local establishments for the suppressed sector value in question, or, if the nationally based estimate fell out of the suppression code range, the midpoint of the suppression code range. Linear programming software was used to find estimates for each MSA/central city combination. The Census did not provide suppression code ranges for missing payroll data. The technique to impute suppressed NAICS payroll data is therefore simpler and based upon the jobs estimates. The national sector payroll per job ratio multiplied by the local number of jobs provided the starting point for each local payroll suppression imputation. These first-cut estimates were then modified so as to satisfy the three payroll constraints: 
 Final SIC-based estimates for each central city and MSA are created by applying the NAICS-to-SIC equations to the city’s or MSA’s NAICS-based data and estimates. Please note that SOCDS only presents SIC estimates if more than 50% of the NAICS contributing parts come from actual not estimated data. Otherwise suppression codes are shown. |