***Exporting occupational codes to Pajek

**Dave Griffiths, University of Stirling
**17 November 2010


**This enables occupational pairings to be converted into a Pajek file.
**This file predicts the expected numbers of pairings given the number of people, by gender, in each unit group.
**This expectation is then compared to the actual number observed to identify the relative levels of over/under-representation.

**From this, a file is available to be converted by txt2pajek to get a social network matrix.
*txt2pajek can be downloaded from: http://vlado.fmf.uni-lj.si/pub/networks/pajek/howto/text2pajek.htm 

**hocc and wocc can be exported to populate a social network.
**pro_obs and value can be used as thresholds to dichotomise the network, the values of lines or both.

**Examples of research using these methods can be found at: http://www.camsis.stir.ac.uk/sonocs/papers


************************
**Requirements

* A horizontal dataset with:
* i) an occupation variable for the ego (called 'hocc')
* ii) an occupation variable for their lter (called 'wocc')
* iii) all cases without occupational information for respondent and/or spouses removed.

**Please note, this follows the CAMSIS convention of holding the husband's occupation (hocc) and wife's (wocc)
* This method can be used for other pairings of data, including, for instance, fathers and sons, or housemates

******Exporting only those linkages which are above the expected values

**create frequency dataset
capture drop freq
gen freq = 1
collapse (count) freq, by(hocc wocc)
list in 1/20

*****Find total for each category
capture drop tot
egen tot=sum(freq)

summarize tot

*******Find totals for men and women
capture drop nhocc
capture drop nwocc
egen nhocc=sum(freq), by(hocc)
egen nwocc=sum(freq), by(wocc)

list hocc wocc freq nhocc nwocc in 1/20

****Find percentage for each category for men and women
capture drop phocc
capture drop pwocc

gen phocc=nhocc/tot

gen pwocc=nwocc/tot

summarize

list hocc wocc freq phocc pwocc in 1/20


*******Calculate expected numbers of women
capture drop ewocc
gen ewocc=pwocc*nhocc

summarize

list hocc wocc ewocc freq nhocc nwocc in 1/20


**************create expectation surplus
capture drop value
gen value=freq/ewocc

************Create standard error predictions
capture drop prop
gen prop = freq/tot


capture drop staner 
gen staner = sqrt((prop)*(1 - prop) / tot)
list freq tot phocc pwocc ewocc value prop staner in 1/20


**staner = proportion variance expect

**therefore, we need to compare actual proportion to expect

capture drop pro_obs
gen pro_obs = freq/tot

capture drop pro_exp
gen pro_exp = ewocc/tot

capture drop pro_min
gen pro_min = pro_obs - staner

capture drop pro_max
gen pro_max = pro_obs + staner

capture drop value
gen value = pro_obs / pro_exp

capture drop val_min
gen val_min = pro_min / pro_exp

capture drop val_max
gen val_max = pro_max / pro_exp


***********************label variables
label variable tot "total number in sample"
label variable nhocc "total number of males in occupation"
label variable nwocc "total number of females in occupation"
label variable phocc "percentage of men in occupation"
label variable pwocc "percentage of women in occupation"
label variable ewocc "expected number of partnerships"
label variable staner "Standard error for tie"
label variable pro_obs "Observed proportion of all ties"
label variable pro_exp "Expected proportion of all ties"
label variable pro_min "Lower confidence interval of observed proportion"
label variable pro_max "Higher confidence interval of observed proportion"
label variable value "Observed value of representation"
label variable val_min "Value of representation for lower confidence interval"
label variable val_max "Value of representation for higher confidence interval"

**This do file was created as part of the Economic and Social Research Council funded project:
**Social Networks and Occupational Structure (ESRC grant no: RES-062-23-2497)
**Paul Lambert and Dave Griffiths, University of Stirling
*For more information on the project, see http://www.camsis.stir.ac.uk/sonocs/