Preparing input data

Name	Description
*hocc*	Husband's occupational title unit
*wocc*	Wife's occuaptional title unit

*{h/w}empst*	Husband's / Wife's occupational employment status unit


{h/w}bst	Title-by-status base unit (Always equals {h/w}occ10 + {h/w}empst)*

{h/w}1gp	Occupational title "major group" (Usually a single digit categorisation of a few major occupational sectors)
{h/w}2gp	Occupational title "submajor group" (Usually a 2-digit categorisation of a modest number of occupational groupings)
{h/w}3gp	Occupational title "minor group" (Usually a 3-digit categorisation of a large number of small groups of related occupational titles)
{h/w}{n}gp	Continuation : any other occupational title subgroups

{h/w}1gpst	Occupational title "major group-by-status" (Always = {h/w}1gp10 + {h/w}empst)*
{h/w}2gpst	Occupational title "submajor group-by-status" (Always = {h/w}2gp10 + {h/w}empst)*
{h/w}3gpst	Occupational title "minor group-by-status" (Always = {h/w}3gp10 + {h/w}empst)*
{h/w}{n}gpst	Continuation : any other occupational title subgroups by status

Return to Construction guide

2.2 Translating from a 'case' to a 'table' file: details

The accompanying SPSS syntax can be used to transform an SPSS 'case' data file (one record per individual couple) into a 'table' data file where one record represents each combination of the variables of interest. In this example we have two variables of interest for men and women, namely employment title {h/w}occ and employment status {h/w}empst, meaning the key unit is the cross-classification of those units by gender, the title-by-status units {h/w}bst. It is also assumed that the employment status variables have already been recoded into a relatively succinct categorisation in keeping with the discussion in the overview section. Note also that if one or more other factors was also of interest (eg an indicator of ethnicity or region), it would be necessary to use as a key unit the relevant three or four way cross-classification. Also note that in some cases the 'autorecoding' segment will not be necessary if there are relatively few possible base unit values.

Return to Construction guide

2.3 Occupational unit 'value labels': details

There are five steps involved in this process and these are outlined below. They involve SPSS data function commands and additional text editing, which can be easily achieved by the combination of Microsoft Excel and pfe facilities (see note on software). The starting point is dealing with the occupational title unit labels, as these are consistently the most complex labels regularly used.

Work-in-progress note (Paul Lambert Jan 02): I have tried to make the file handling descriptions below understandable to other users, but the test will be if and when anybody has a go. Feedback on the ease or otherwise of following these notes is therefore very welcome!

2.3.1 Obtain plain text file of 'tabbed' title labels. We first seek to produce a file which, on each line, has a record with the numeric occupational value, then that title's value label. For reasons which will become clearer later, it is most convenient if the value label itself contains the occupation's numeric index as well. The desired text file will appear as in the table below, which shows four of the earliest occupational titles of the ISCO-88(COM) schema :

110	110	Armed Forces
1210	1210	Directors and chief executives
1221	1221	Production and operations department managers in agriculture, hunting, forestry and fishing
1222	1222	Production and operations department managers in manufacturing
Etc	Etc	Etc

Each column is separated by 'tab' spacings. Again for reasons that will become clear later, it is actually most convenient if there are two tab spacings between the first and second columns (the two numeric values), and only one tab spacing between the second and third columns.

In the majority of cases such a table is very quickly obtained. If the schema of occupational value labels in the relevant documentation is in any form such as a word processor document, an html document, a statistics software 'output' table or a plain text file, it is possible to select the whole text then paste it into a worksheet in Excel. Usually, this will place the numerical value and title labels into separate columns, though if this is not done initially it can normally be achieved in Excel by selecting the single column into which all results have been pasted, then choosing the "Data -> Text to columns" windows option. The 'duplicate' column of numeric title units can be obtained by copying and pasting columns in Excel.

Next, once an Excel section of the desired form has been created, the easiest way to create the appropriate plain text file is to highlight the relevant area of the Excel sheet then copy and paste it into an empty plain text file held in a plain text editor such as pfe.

2.3.2 Create further text files for status, title-by-status, major group etc units. This process is repeated for all of the other relevant units found in the version being worked with. For the other 'original' units, such as the value labels for employment status categories, major and minor groups, and any other related variables such as indicators of ethnicity or region, the creation of the text file follows the same requirements, and can usually be achieved by pasting into Excel.

For the units involving the cross-classification of original units, such as the title-by-status base unit, the creation of the relevant files can most quickly be achieved by text editing within Excel. In these cases we create text files with a slightly different form, as shown in the example table below for title-by-status units where there are 4 employment status categories and three ISCO title units. (Note also how the examples of the table reveal the occasional redundancy of some of the possible cross-classification categories, such as the non-managerial categories of status with the 'managerial' occupational title units).

1101	1101	Self-Emp.	Armed Forces
12211	12211	Self-Emp.	Production and operations department managers in agriculture, hunting, forestry and fishing
12221	12221	Self-Emp.	Production and operations department managers in manufacturing
Etc1	Etc1	Self-Emp.	Etc
1102	1102	Managr	Armed Forces
12212	12212	Managr	Production and operations department managers in agriculture, hunting, forestry and fishing
12222	12222	Managr	Production and operations department managers in manufacturing
Etc2	Etc2	Managr	Etc
1103	1103	Empyee	Armed Forces
12213	12213	Empyee	Production and operations department managers in agriculture, hunting, forestry and fishing
12223	12223	Empyee	Production and operations department managers in manufacturing
Etc3	Etc3	Empyee	Etc
1109	1109	Unkwn St	Armed Forces
12219	12219	Unkwn St	Production and operations department managers in agriculture, hunting, forestry and fishing
12229	12229	Unkwn St	Production and operations department managers in manufacturing
Etc9	Etc9	Unkwn St	Etc

(Again it is desirable to have two tab spaces separating the first and second columns and one tab space separating the other columns. Also, it will later become apparent why it is desirable to abbreviate the employment status value labels).

These (often very long) tables are readily created in Excel starting from the first title-only table mentioned above. In brief: it is possible to paste the title-only table to a new sheet; then delete the middle column of numeric values; then insert several more columns between the remaining column of numeric values and the value labels. This entire table is copied and then pasted again, aligned to and immediately below its own end, repeating for the number of different employment status categories used. Next, repeating once for each different status value, at the top value on the within status title range write the status value label in a column immediately left of the title value labels, and in the cells between that column and the numeric title value, twice define functions which are equal to the value in the title value column times 10 plus the relevant status value. Highlight the new cells created, then distribute those values as functions down the range of the title group by using the drag function on Excel (clicking on the cross symbol on the bottom right of the highlighted cell then dragging down to the end of the first range of titles). Finally, repeat this for each status group. We are then left with an Excel table, which again can be pasted into a plain text file.

2.3.3 Translate plain text files into SPSS 'add value label' commands. If using an editor such as pfe, the created text files can be readily converted into appropriate SPSS commands for adding value labels. The easiest way to manage the appropriate files is to define them as macros within an SPSS syntax include file.

To achieve this, first use the 'replace' commands of pfe to transform tab and return characters (indicated in pfe by \t and \n) into text compatible with SPSS. This can be done for the following files with three sequential replace commands in pfe where the text [SP] is used to indicate the insertion of a single space on the pfe text. First, replace all characters \n with "\n[SP] ; second, replace all characters \t\t [ie the double tabs between the first and second columns], with [SP]"; third, replace all characters \t with [SP] .

Next, edit the start and end of the relevant commands, being sure to add a single . [full stop] character to the last value label only.

Finally, stack together all the created value label commands into one text file, which is named as an spss 'include file', for instance "versionlabels.sps". Sample SPSS include file

2.3.4 Call the relevant macros. The include file is utilised in an early stage of the data construction process - as in the sample SPSS syntax in the accompanying file - thus ensuring that the original files are subsequently saved with the relevant data labels attached. It may also be necessary to repeat the value label additions if and when 'revised' versions of the occupational base unit schema are created, as will be mentioned below.

The advantage of this schema is that any descriptive and analytic methods using SPSS will immediately have occupational title unit value labels attached, whilst those units' titles are themselves readily exported from SPSS to other applications via pasted SPSS output tables. Note, however, two qualifications. First, the 'add value labels' command in SPSS automatically truncates long value labels to a maximum of 60 characters, but many title or title-by-status units are longer than that. Second, many SPSS procedures generate output which further automatically truncates value labels, typically to a maximum of 20 characters. To cater for the latter circumstance, it is desirable to have the fullest information near the start of the value label; hence the attraction of using the numeric codes and using abbreviated employment status labels. To cater for the former problem, it is best to retain the original plain text and Excel files containing the full title unit labels, and utilise them directly for the formal presentation of results in other formats (for instance, consider storing them in a form compatible with database presentation software).

2.3.5 Value labels for 'autorecoded' data. A helpful feature of the SPSS facilities for handling value labels is that when an 'autorecode' function is run ('autorecoding' is discussed in section 2.4 below), the value labels of the original categories are transferred to those of their autorecoded units. For instance, if the numeric occupation 110 had the value label "110 Armed Forces", after autorecoding it may have been assigned the numeric value 1, but that value 1 on the new variable will still have the value label "110 Armed Forces".

Most of the results from CAMSIS scale construction models relate to such autorecoded data. Therefore, using SPSS means we can rapidly create tables and output for the autorecoded data, which can subsequently be used to index the model outputs in the examples of RC model results from LEM estimations. We can do this by pasting tables of the autorecoded values from SPSS to Excel, then pasting, next to them, the numeric values from LEM onto the same Excel worksheet.

(The other way of matching autorecoded values with original values and value labels is of course to maintain a data (or database) file which has the original and autorecoded variables mapped to each other).

Return to Construction guide

2.4 'Square autorecoded' values: details

Although not technically essential, it is a great advantage if the autorecoded values of male and female occupational base units are the same. This will not typically happen automatically, because some occupations may have only male incumbents and others only women. In this work, the imposition of such a constraint on the autorecoding of occupations is termed 'square autorecoding'. The accompanying file includes sample SPSS syntax illustrating how such an autorecoding can be achieved. In fact, as the example shows, when we are routinely working with a number of alternative occupational units (title-only, title-by-status, major group, and so on), it is equally desirable to autorecode each of them together.

As we have said, an advantage of using SPSS is that value labels are transferred from the original to the autorecoded variable, which makes handling subsequent analyses considerably easier. However, there is also a significant complication in maintaining the ability to translate back from the autorecoded values to the original numeric values. To achieve this, we suggest keeping all autorecoded variable values within data files which also feature the same variables before they were autorecoded. In our examples, a consistent terminology of variable names is maintained by suffixing a number to the end of the original variable names. Thus for example, the 'raw' husband and wife title-by-status base units are termed {h/w}bst, and their autorecoded values termed {h/w}bst2; additionally, if they are later revised by merging certain sparsely represented categories (see section 4), the new versions of the data in their original value codings are named {h/w}bst3, then {h/w}bst5, and so on, whilst their autorecoded equivalent variables are named {h/w}bst4, then {h/w}bst6, etc.

Return to Construction guide

Last modified 14 February 2002
This document is maintained by Paul Lambert (paul.lambert@stirling.ac.uk)