Return to Construction guide / CAMSIS home

5.2.1 Handling pseudo-diagonals in LEM: details

Pseudo-diagonal combinations can be identified from the successive results of LEM RC models in a similar way to those from SPSS CA models - namely, by scanning the scale rankings given to occupational units in interim models and looking for extreme scores which can plausibly be linked with pseudo-diagonality; and by defining some husband-wife occupational combinations as pseudo-diagonals on a priori grounds. Additionally a much more thorough evaluation of each husband-wife combination can also be achieved by examining the value of the residuals between the observed and estimated frequencies of each specific combination.

Starting with the simplest of these three methods, we can name prespecified occupational unit combinations as pseudo-diagonals by using the same data construction techniques as suggested above for the SPSS CA treatment. In those, we first identify, for the original data categories during the data construction stages, the relevant occupational unit combinations to be treated as pseudo-diagonals, as with the examples in this section of SPSS syntax. Next, for consistency with the data revisions used in the relevant LEM models, we then recode the groups of a priori pseudo-diagonals into (largely arbitrary) husband-wife combinations that we know will be represented in the revised data models, as in this example. Lastly, after checking how those combinations were subsquently represented in the revised occupational unit categorisations, we then add their specification as individual combinations in the same way as we do for any other specific combination using the LEM design matrix. Specific procedures of use in this application are described in the third of the step-by-step instructions below.

The other two methods of pseudo-diagonal identification both involve the exportation of interim LEM model results and their assessment for pseudo-diagonal trends. In the first, we look at the scale scores given to the husband and wife unit groups, and check if any values have extreme rankings in a way which may be consistent with a pseudo-diagonal combination. (This is the same method as used in the SPSS CA models, when we looked for extreme occupational unit scores on the SPSS output graphs). Unfortunately, we have found that the quick review and assessment of row and column score estimates from LEM models is not as quickly achieved as is the case with SPSS model output. It is generally necessary to export text results into other packages where they can be matched with the occupational title unit names then sorted by their values - instructions on how to achieve this using first pfe and then Excel, are described in this section of these webpages (relating to the section on exporting the scores). When specific combinations are thus identified, we then name then as cases to be treated as pseudo-diagonals, as described in the instructions below.

Finally, our third method of identifying husband-wife pseudo-diagonal occupational combinations also involves a degree of effort in dealing with the output from interim LEM models. The method which we find most thoroughly screens combinations for pseudo-diagonality is to review the distribution of residuals between the observed husband-wife combination frequencies and their expected values under the interim model, and check for extreme values. High positive standardised residuals are taken to indicate examples of our conventional interpretation of pseudo-diagonals, as a combination that has occurred much more often than the dimension structure of the model would predict. Indeed in all CAMSIS examples thus far, the large majority of extreme standardised residuals have been high positive values, which can plausibly be explained as pseudo-diagonals (although some combinations, typified by high positive residuals involving only a single couple, are simply outliers, a marital combination which is far removed from the normal pattern for no explicable reason). In addition, high negative standardised residuals, which in practice occur occasionally, represent example combinations where there are far fewer observed values than expected given the dimensional structure of the model. These could also be interpreted as pseudo-diagonals if some plausible 'barrier' to the combination can be imagined.

Row-to-column combinations with extreme residual values can be identified from the LEM output simply by sorting by the magnitude of the (0,1) standardised residual. For LEM models involving relatively few categories to the cross-classification (approximately less than 300), tables of every row-to-column combination's residuals are produced in the primary LEM output file, and these may be immediately copied and pasted into an output such as Excel which allows sorting by the standardised residual value. For models involving more categories, however, we proceed by specifying within the LEM model that the output residuals should be saved to a named output file, then it is necessary to read this file into another package, merging it with information on the husband-wife categories, then examining the subsequent distributions. Information on how this can be achieved using SPSS is again found here, referring to the section below on assessing scale values. Again, when specific combinations are thus identified, we then name then as cases to be treated as pseudo-diagonals, as described in the instructions below.

In the following subsections, we describe the mechanics of estimating LEM models which account for pseudo-diagonal combinations. The LEM models themselves are run on a revised dataset which features the 'table' data file for the latest version of occupational units (ie the revised and autorecoded variables, such as {h/w}bst4), plus a frequency weighting factor for each row. The generation of such a plain text file was shown earlier, in the first section of the generic LEM example. Details on the data construction processes necessary to reach this stage of analysis are found in the preceding sections of these pages.

The identification of pseudo-diagonals is achieved through their specification within a design matrix for the specific LEM model. In the first instance this comprises a two dimensional matrix, whose rows and columns represent the values of the (revised then autorecoded) occupational units, and whose entries indicate whether or not that combination is to be treated as a pseudo-diagonal. The cells contain the value 0 by default (ie not a pseudo-diagonal combination). Then, a numeric value that is incremented by one is added at the appropriate row-column locations for each new pseudo-diagonal row-column combination to be added. (It is probably fair to say that this feature of LEM was not designed with the specification of models with very high numbers of rows and columns in mind! With small numbers of categories, such design matrices are readily produced by hand, as in the many examples of the LEM manual).

The following procedures are recommended for constructing design matrices for LEM for a large number of occupational base units. (Those who wish to see in advance the structure of an RC model design matrix can look at this example of a matrix previously used in the CAMSIS constructions.) First, create by hand a plain text file which has n X n 0's separated by spaces, where n is the number of occupational unit categories in the (revised and autorecoded) version of interest. This is readily achieved, for instance one technique starts by creating the appropriate column length of 0's in Microsoft Excel by highlighting then dragging to distribute an initial cell value of 0, then pasting the column to a plain text file in the pfe package, then repeatedly using the 'replace' option on pfe to replace the single 0's in each row with multiple 0's, until the appropriate width has been achieved. Second, read this file into SPSS as a data file and treat the cases (rows) as the husband occupational units and the variables (columns) as wife occupational units. Third, deal with any a priori pseudo-diagonals (or those identified in the preliminary CA stage above), which are named through their original occupational titles, by specifying them as identified with specific dummy variables in SPSS (see the SPSS example for clarification). Fourth, incorporate the definition of any (often very many) specific pseudo-diagonals associated with husband-wife combinations as identified through the interim results of earlier RC models (either extreme residuals or extreme scores). We do this by building up a macro-led SPSS file to replace the relevant design matrix data file values with consecutive integers. The files necessary can be complex to handle, but are illustrated in this example SPSS syntax. Finally, export the relevant variables of the SPSS data file as a plain text file and treat that as the design matrix (example). (In extensions, such as the treatment of subsidiary dimensions discussed in section 5.2.2, further modifications to this existing design matrix can be added). Generate information on the husband-wife pseudo-diagonal specifications identified.

As mentioned elsewhere, the number of pseudo-diagonal combinations, and the number of cases excluded as part of pseudo-diagonal combinations, can be quite substantial after a lengthy review of model results. This raises the danger that some occupational base units, which after the initial data revision were represented by an 'adequate' number of cases, are no longer represented by enough non-excluded cases. If this does occur to a significant extent, it can cause quite a substantial logistic problem, as it may be necessary to return to the data revision stage of the scale construction, adjust the merged cases and start again.

The chances of this problem occurring are lessened by setting the 'threshold' minimum number of cases to thirty and conducting the data revision on a sub-sample after already excluding the most obvious pseudo-diagonals (as mentioned in section 4 on data revisions). To check whether this situation has arisen after generating a number of pseudo-diagonal indicators, though, we can run a modification of the SPSS syntax used to specify the design matrix on the original dataset, and produce tables which show the number of cases in the two models left over. This is shown as the sixth syntax example in the supplementary file. The output tables showing cell frequencies for each title unit can be pasted to for instance Microsoft Excel, and sorted by the values of the frequency columns, revealing whether any units are covered by fewer than 20 cases.

Return to Construction guide


Last modified 14 February 2002
This document is maintained by Paul Lambert (