Imputing scores

7.2.3 Imputing scores: details
With no further treatment our occupational unit index files will contain a number of units with no CAMSIS score associated with them. These are the units which were not represented by any cases in the data file from which the CAMSIS scale was constructed. By definition, they are usually uncommon units in the population as a whole, and indeed in many cases are combinations which are apparently illogical (for instance a non-managerial (status) office manager (title)).
Our solution is to impute CAMSIS scale scores to these 'empty' occupational units based upon the CAMSIS scale values of their 'surrounding' occupations. With index files created to the point described in the preceding sections, we already have, attached to each unit, values which indicate the occupational subgroup and subgroup-by-status average scores for the units' relevant subgroups. With this information, imputation is simply a matter of substituting the best available such average score, which we take to be the non-missing score from the highest level of corresponding detail. The accompanying file shows the SPSS syntax that can achieve this imputation, following the structure of the index file example generated above. The syntax also includes the generation of an indicator variable showing whether a score value was imputed or the base unit originally represented in the sample.
It could be argued that this approach is subject to a few potential weaknesses. First, the subgroup averages on which the imputations are based are calculated from weighted distributions and could be unreasonably influenced by the situations of the largest occupational units. Second, the non-represented occupational units have the potential to be also the more unusual occupational units (such as 'new' or 'old' units which have expanded or contracted considerably in recent years), in which case imputation from average scores may be misleading. Lastly, the differential data resources available for alternative CAMSIS versions mean there can be quite substantial disparities in the number of unit scores imputed for different versions. For example a version constructed using census data on the national specific occupational units is likely to include very few categories with imputed values, whereas a version constructed using sample data, and, in particular, using occupational units from a schema which was obtained through the aggregation of an earlier national version, may well have a high proportion of base units being assigned imputed scores.
Empirically, however, there is little evidence that such imputations to non-represented occupations make any significant difference to the strength of the CAMSIS schema. An opportunity to test this was presented in the generation of CAMSIS scales for ISCO units in Switzerland, where the information on ISCO units was obtained only by recoding an alternative occupational unit measure, with the net effect that only approximately half of the potential ISCO units were represented in the sample. The subsequent index file therefore held scores for all ISCO units, of which half were thus imputed. When the properties of these values were compared on a dataset covering the full range of ISCO units, however, there were no significant differences in the properties of the CAMSIS scale scores as related to other variables, between the subgroups comprising the imputed, non-imputed, and combined unit scores.

Return to Construction guide

Last modified 14 February 2002
This document is maintained by Paul Lambert (paul.lambert@stirling.ac.uk)