# Wolfram Data Repository

Immediate Computable Access to Curated Contributed Data

Data examining the efficacy of job training programs on increasing earnings

These are actual and augmented subsets of the well studied dataset developed by Professor Robert Lalonde for his paper “Evaluating the Econometric Evaluations of Training Programs,” American Economic Review, Vol. 76, pp. 604-620. The paper examined the effects of certain job training programs operated between 1975 and 1977 on the earnings of its participants. The sets were compiled by Professor Gary King as part of his "cem" package for the R language, which deals with "coarsened exact matching," a technique he developed for use in causal inference.

The order of the variables in the default Dataset from left to right is: treatment indicator (1 if treated, 0 if not treated), age, education, Black (1 if Black, 0 otherwise), married (1 if married, 0 otherwise), nodegree (1 if no degree, 0 otherwise), re74 (earnings in 1974), re75 (earnings in 1975), re78 (earnings in 1978), Hispanic (1 if Hispanic, 0 otherwise), u74 (1 if the person was unemployed in 1974, 0 otherwise) and u75 (1 if the person was unemployed in 1975 and 0 otherwise). The default Dataset contains 722 rows.

The "Lelonde" variant contains the same variables as the default but makes 10% of the data missing and adds a variable q1 which is the fictituous answer to the questionarie on “Agreement on this job training program.” Presumably this fictitious question can be used to study the effects of "collider bias," a problem that can arise in causal inference when variables effected by both the "treatment" and the "outcome" are controlled for.

The "MatchIt Lalonde" variant is taken from the MatchIt package available in the R language. It excluded the unemployment variables and contains only 614 rows. (The criteria on which certain rows were excluded from the original data is not clear). It also uses a single variable "race" to encode whether the person is Black, Hispanic or other (which is assumed to be White).

Also available is the "DW" (Dehejia-Wahba) 445-row subset of the data discussed in this paper: Rajeev Dehejia and Sadek Wahba, “Causal Effects in Non-Experimental Studies: Reevaluating the Evaluation of Training Programs,” Journal of the American Statistical Association, Vol. 94, No. 448 (December 1999), pp. 1053-1062. This paper claimed that "propensity scores" methods succeed in estimating the treatment impact of the the job training program studied originally by Professor Lalonde.

(12 columns, 722 rows)

Retrieve the data:

In[1]:= |

Out[1]= |

Retrieve the "Lelonde" variant of the data:

In[2]:= |

Out[2]= |

Retrieve the "DW" variant of the data:

In[3]:= |

Out[3]= |

Retrieve the "MatchIt Lalonde" variant:

In[4]:= |

Out[4]= |

Show the distribution of wages based on race:

In[5]:= |

Out[5]= |

Show the distribution of wages based on treatment (without regard to covariates):

In[6]:= |

Out[6]= |

Perform a linear regression on the data to see what effect job training may have had:

In[7]:= |

Out[7]= |

Examine the parameters and the adjusted R^{2}. It appears that this form of the model does not have much predictive value. Not receiving job training lowered wages by $824, but one can not be certain that the result is statistically significant:

In[8]:= |

Out[8]= |

Perform a logistic regression to determine what factors affected whether one received "treatment" (i.e. was enrolled in the job training program):

In[9]:= |

Out[9]= |

The only statistically significant factor in determining treatment is the absence of a high school degree. The very low Cragg Uhler Pseudo R^{2} suggests that the model does not have much explanatory power:

In[10]:= |

Out[10]= |

See if a probit model performs any better; it does not:

In[11]:= |

Out[11]= |

Split the data into training and test set:

In[12]:= |

Run a classifier on the training set:

In[13]:= |

Out[13]= |

Create a classifier measurements object using the classifier just built and the test data:

In[14]:= |

Out[14]= |

Assess classifier performance. The machine learning classifier does not perform particularly well:

In[15]:= |

Out[15]= |

A look at the probabilities of being treated shows the classifier is extremely uncertain in its results; the range of probabilities is extremely small:

In[16]:= |

Out[16]= |

Use the "MatchIt Lalonde" data and compare the mean values of various covariates among the treated and untreated groups:

In[17]:= |

In[18]:= |

Out[18]= |

Generate an anomaly detector function of the covariates of the treated population:

In[19]:= |

Out[19]= |

Use the anomaly detector function on the untreated (control) population but set the AcceptanceThreshold to be extremely high so that most members are treated as anomalous, which will bring the number of persons in the remaining untreated population down to about the number of persons in the treated population:

In[20]:= |

Out[20]= |

Join the treated persons with the non-anomalous members of the control group:

In[21]:= |

Out[21]= |

And now compare the mean values of their covariates, which are now considerably more similar to each other:

In[22]:= |

Out[22]= |

Compare the mean earnings in 1978 among the treated and untreated in the matched groups; the treated group has income about $800 higher even though their demographics are, on average, about the same:

In[23]:= |

Out[23]= |

The increase in median income is much smaller, suggesting that the increased income among the treated may come for a few high earners:

In[24]:= |

Out[24]= |

Compare the mean and median income in 1978 by race among the matched individuals:

In[25]:= |

Out[25]= |

The preceding example established what was "normal" by looking at the treatment data and then eliminated rows of the control data that looked anomalous by that standard. This use of the treatment data as the baseline is standard in the literature, but perhaps somewhat arbitrary. An alternative approach would be to take only treatment data that was "normal" by looking at the control data and taking only control data that was "normal" by looking at the treatment data. Such a method could use symmetric "acceptance thresholds." Compute an anomaly detection function on the control data:

In[26]:= |

Out[26]= |

In[27]:= |

Out[27]= |

In[28]:= |

Out[28]= |

In[29]:= |

Out[29]= |

Use of this alternative matching method results in the average treatment effect being smaller than in the prior example:

In[30]:= |

Out[30]= |

The increase in median income caused by the treatment appears more robust against changes in the matching methodology:

In[31]:= |

Out[31]= |

Make a distribution chart showing the difference in the distribution of incomes between those not receiving job training and those doing so. The chart suggests that the gains in earnings come for a few people in the treated group earning what were high amounts of money:

In[32]:= |

Out[32]= |

The "fundamental problem of causal inference" (https://en.wikipedia.org/wiki/Rubin_causal_model) is said to be that we can observe only the treated outcome or the untreated outcome on the individual, but not both. The problem is essentially one of missing data. But Wolfram Language can impute missing values, which, as described in these lecture notes and this journal article, suggests a direct approach to causal inference. First, create a new dataset with missings:

In[33]:= |

Out[33]= |

Then use SynthesizeMissingValues to "solve" the fundamental problem of causal inference:

In[34]:= |

Out[34]= |

Now find the mean earnings when the population is "treated" with job training and when it is not. This method suggests that job training causes a loss of earnings rather than the generally accepted notion that it results in a gain. This finding suggests that direct use of missing value imputation must be explored further before it is used as an accepted algorithm for making causal inferences:

In[35]:= |

Out[35]= |

We can see if using RandomSampling as the EvaluationStrategy instead of the default ModeFinding helps:

In[36]:= |

Out[36]= |

We fine that it does not solve the problem. So, missing value imputation, although theoretically promising as a vehicle for causal inference, has tricky and presently unresolved implementation issues:

In[37]:= |

Out[37]= |

Seth J. Chandler, "Job Training Efficacy" from the Wolfram Data Repository (2021)

Used with consent of Professor Gary King