This appendix documents the construction of the 2024 Ohio Opportunity Index. The code in this document assumes the database of 35 constituent measures—each assigned to a domain—is already prepared. A separate document describes each variables in each domain using a conceptual English definition, univariate descriptive results, correlation analyses, and choropleth map visualization.
The steps are as follows:
The domain scores are a function of the constituent measures within each domain. We are first standardized each individual constituent measure by transforming it to a z-score (centering it around zero and dividing by its standard deviation). We saved that value for plotting. We then transformed the z-score further to an exponential distribution that incorporates certain desirable cancellation properties, discussed in the next paragraph, into the final OOI.
The following is a simplified example characterizing the benefits of the transformation. If we used untransformed z-scores or domain rank values, then one unit of opportunity contribution by one domain could completely cancel-out one unit of deprivation contributed by another domain (i.e., zero-sum). The exponential transform adjusts these cancellation properties in such a way that such a cancellation would require more than one unit of opportunity to cancel out one unit of deprivation.
This choice is based on key principles stemming from research on the creation of deprivation indices in the UK (Noble, Wright, Smith & Dibbens, 2006).
Below is the R code used in the transformation. We plot univariate and bivariate information about the resulting set of untransformed and transformed domain scores.
# load the constituent measure data
load(paste0(output_loc, "ConstituentMeasures.RData"))
# impute missing values with the mean of neighboring tracts
index_GEOID_crosswalk <- data %>% as.data.frame() %>% select(GEOID) %>% tibble::rownames_to_column()
neighbors = sf::st_touches(data$geometry) # find direct neighbors of each tract
neighbors2 = as.matrix(neighbors) %*% as.matrix(neighbors) # find direct and once-removed neighbors of each tract
# reformat
neighbors = lapply(seq(1:length(data$GEOID)), function(tract_id){
near <- neighbors[[tract_id]]
index_GEOID_crosswalk[near,]$GEOID
})
neighbors2 = lapply(seq(1:length(data$GEOID)), function(tract_id){
near <- which(neighbors2[,as.numeric(tract_id)] != 0)
index_GEOID_crosswalk[near,]$GEOID
})
names(neighbors) = data$GEOID
names(neighbors2) = data$GEOID
# find mean of neighbors
near_data_means <- lapply(data$GEOID, function(x){
one_neigh = neighbors[[x]]
two_neigh = neighbors2[[x]]
one_neigh_means = data %>%
filter(GEOID %in% one_neigh) %>%
select(unname(unlist(D))) %>%
colMeans(., na.rm = TRUE)
still_miss = is.na(one_neigh_means)
if(any(still_miss)){ # if there are no neighboring tracts with known values, also use once-removed neighbors
names(still_miss[still_miss == TRUE])
two_neigh_means = data %>%
filter(GEOID %in% two_neigh) %>%
select(unname(unlist(D))) %>%
colMeans(., na.rm = TRUE)
c(two_neigh_means[still_miss], one_neigh_means[!still_miss])
}else{one_neigh_means}
})
names(near_data_means) <- data$GEOID
# for metrics with missing information, input the mean of the neighbors
for(metric in unname(unlist(D))){
missing_vals <- data %>% filter(is.na(get(metric)))
for(tract in missing_vals$GEOID){
data[data$GEOID == tract, metric] <- near_data_means[[tract]][[metric]]
}
}
# if there are no people living in a tract, assign all values for that tract "NA". This prevents tracts with no
# population from influencing the standardization and ranking of all tracts
data[which(data$Pop == 0), c(unname(unlist(D)))] <- NA
# subset to variables that make up OCOI
data_sub <- data %>% select(unname(unlist(D)))
# standardize variables (create z-scores)
data_sub <- data_sub %>% mutate_all(function(col){scale(col, scale = TRUE, center = TRUE)})
# join with tract names and geometry
data_scaled <- cbind(GEOID = data$GEOID, data_sub, data$geometry)
# create a new data frame for the domain scores
data_rankexp <- data_scaled %>% select(all_of(c("GEOID", "geometry")))
# create an intermediate data frame for untransformed domains averages (for visualization)
data_sum <- data_scaled %>% select(all_of(c("GEOID", "geometry")))
# create a data frame for ranks of domain averages
data_rank <- data_scaled %>% select(all_of(c("GEOID", "geometry")))
# average the measures in their respective domains and transform
for(d in names(D)) {
# sum
data_sum[,d] <- rowSums(data_sub[,D[[d]]])
# rank
data_rank[,d] <- rank(data_sum[,d], na.last = "keep") -1
# scale to [0,1]
data_rank[,d] <- data_rank[,d]/max(data_rank[,d], na.rm = TRUE)
# exponential transform
data_rankexp[,d] <- -23 * log(1 - data_rank[,d] * (1 - exp(-100/23)))
}
# find GEOIDs of tracts with 0 population
no_pop <- data_rankexp[which(is.na(data_rankexp[,names(D)]) %>% rowSums() > 0),]$GEOID
# find the domain means of neighboring tracts for tracts with 0 population
near_data_for0pop <- lapply(no_pop, function(x){
one_neigh = neighbors[[x]]
one_neigh_means = data_rankexp %>%
filter(GEOID %in% one_neigh) %>%
select(names(D)) %>%
colMeans(., na.rm = TRUE)
})
names(near_data_for0pop) <- no_pop
# for metrics with 0 population, input the mean of the neighbors
for(domain in names(D)){
missing_vals <- data_rankexp %>% filter(is.na(get(domain)))
for(tract in missing_vals$GEOID){
data_rankexp[data_rankexp$GEOID == tract, domain] <- near_data_for0pop[[tract]][[domain]]
}
}
# name the rows according to tract for easier merging during later mapping
rownames(data_rankexp) <- data_rankexp$GEOID
Take a look at histograms of the domain sum variables.
Take a look at histograms of the domain averages that have been transformed.
We calculated the OOI for a tract as the mean of its transformed domain scores, and then we reverse the OOI such that higher values reflect more overall opportunity.
Below we see histograms of the DI and the OOI side-by-side. They are—as they should be—mirror images.
totVar | uniqVar | |
---|---|---|
TR | 0.509 | 0.055 |
EN | 0.429 | 0.060 |
CR | 0.414 | 0.054 |
EM | 0.389 | 0.063 |
HS | 0.324 | 0.069 |
HL | 0.105 | 0.077 |
ED | 0.031 | 0.060 |
Outcome |
Correlation with OOI (p-value) |
Multiple Regression R-squared |
---|---|---|
Infant Mortality |
-0.06 (0.00) |
0.01 |
Outcome 1 | Outcome 2 |
Correlation (p-value) |
---|---|---|
OOI | OCOI v2 (2023) |
0.66 (0.00) |
OOI | OOI v2 (2018) |
0.64 (0.00) |
OOI | CDC Social Vulnerability Index |
-0.49 (0.00) |
This section contains choropleth maps of each of the domain scores (transformed), the overall deprivation index (DI), and the reversed deprivation index (i.e., the Ohio Opportunity Index or OOI). These plots provide a means for determining the face validity of each domain score and the overall OOI. For the OOI, higher values (brighter areas) correspond with higher levels of opportunity.