Analysis (5) 
Perform a linear regression on the data to see what effect job training may have had:
Examine the parameters and the adjusted R2. It appears that this form of the model does not have much predictive value. Not receiving job training lowered wages by $824, but one can not be certain that the result is statistically significant:
Perform a logistic regression to determine what factors affected whether one received "treatment" (i.e. was enrolled in the job training program):
The only statistically significant factor in determining treatment is the absence of a high school degree. The very low Cragg Uhler Pseudo R2 suggests that the model does not have much explanatory power:
See if a probit model performs any better; it does not:
Split the data into training and test set:
Run a classifier on the training set:
Create a classifier measurements object using the classifier just built and the test data:
Assess classifier performance. The machine learning classifier does not perform particularly well:
A look at the probabilities of being treated shows the classifier is extremely uncertain in its results; the range of probabilities is extremely small:
Use the "MatchIt Lalonde" data and compare the mean values of various covariates among the treated and untreated groups:
Generate an anomaly detector function of the covariates of the treated population:
Use the anomaly detector function on the untreated (control) population but set the AcceptanceThreshold to be extremely high so that most members are treated as anomalous, which will bring the number of persons in the remaining untreated population down to about the number of persons in the treated population:
Join the treated persons with the non-anomalous members of the control group:
And now compare the mean values of their covariates, which are now considerably more similar to each other:
Compare the mean earnings in 1978 among the treated and untreated in the matched groups; the treated group has income about $800 higher even though their demographics are, on average, about the same:
The increase in median income is much smaller, suggesting that the increased income among the treated may come for a few high earners:
Compare the mean and median income in 1978 by race among the matched individuals:
The preceding example established what was "normal" by looking at the treatment data and then eliminated rows of the control data that looked anomalous by that standard. This use of the treatment data as the baseline is standard in the literature, but perhaps somewhat arbitrary. An alternative approach would be to take only treatment data that was "normal" by looking at the control data and taking only control data that was "normal" by looking at the treatment data. Such a method could use symmetric "acceptance thresholds." Compute an anomaly detection function on the control data:
Use of this alternative matching method results in the average treatment effect being smaller than in the prior example:
The increase in median income caused by the treatment appears more robust against changes in the matching methodology:
Make a distribution chart showing the difference in the distribution of incomes between those not receiving job training and those doing so. The chart suggests that the gains in earnings come for a few people in the treated group earning what were high amounts of money:
The "fundamental problem of causal inference" (https://en.wikipedia.org/wiki/Rubin_causal_model) is said to be that we can observe only the treated outcome or the untreated outcome on the individual, but not both. The problem is essentially one of missing data. But Wolfram Language can impute missing values, which, as described in these lecture notes and this journal article, suggests a direct approach to causal inference. First, create a new dataset with missings:
Then use SynthesizeMissingValues to "solve" the fundamental problem of causal inference:
Now find the mean earnings when the population is "treated" with job training and when it is not. This method suggests that job training causes a loss of earnings rather than the generally accepted notion that it results in a gain. This finding suggests that direct use of missing value imputation must be explored further before it is used as an accepted algorithm for making causal inferences:
We can see if using RandomSampling as the EvaluationStrategy instead of the default ModeFinding helps:
We fine that it does not solve the problem. So, missing value imputation, although theoretically promising as a vehicle for causal inference, has tricky and presently unresolved implementation issues: