This document will walk you through some of the methods you could use to generate pooled model results that account for both sampling variability and across imputation variability. The package hot.deck does not come with a set of functions to do inference, so we will show you how you could use the data generated by hot.deck in combination with glm.mids (and similarly lm.mids) from the mice package, zelig from the Zelig package and by using MIcombine from the mitools package on a list of model objects.

Generating Imputations

The data we will use come from Poe, Tate, and Keith (1999) dealing with democracy and state repression. First we need to call the hot.deck routine on the dataset.

library(hot.deck)
data(isq99)
out <- hot.deck(isq99, sdCutoff=3, IDvars = c("IDORIGIN", "YEAR"))
#> Warning in hot.deck(isq99, sdCutoff = 3, IDvars = c("IDORIGIN", "YEAR")): 52 observations with no observed data.  These observations were removed
#> Warning in hot.deck(isq99, sdCutoff = 3, IDvars = c("IDORIGIN", "YEAR")): 45 of 4661 imputations with # donors < 5, consider increasing sdCutoff or using method='p.draw'

This shows us that there are still 45 observations with fewer than 5 donors. Using a different method or further widening the sdCutoff parameter may alleviate the problem. If you want to see the frequency distribution of the number of donors, you could look at:

numdonors <- sapply(out$donors, length)
numdonors <- sapply(out$donors, length)
numdonors <- ifelse(numdonors > 5, 6, numdonors)
numdonors <- factor(numdonors, levels=1:6, labels=c(1:5, ">5"))
table(numdonors)
#> numdonors
#>    1    2    3    4    5   >5 
#>   18   10   11    6   20 4596

Before running a model, three variables have to be created from those existing. Generally, if variables are deterministic functions of other variables (e.g., transformations, lags, etc…) it is advisable to impute the constituent variables of the calculations and then do the calculations after the fact. Here, we need to lag the AI variable and create percentage change variables for both population and per-capita GNP. First, to create the lag of AI, PCGNP and LPOP. To do this, we will make a little function.

tscslag <- function(dat, x, id, time){
  obs <- apply(dat[, c(id, time)], 1, paste, collapse=".")
  tm1 <- dat[[time]] - 1
  lagobs <- apply(cbind(dat[[id]], tm1), 1, paste, collapse=".")
  lagx <- dat[match(lagobs, obs), x]
}
for(i in 1:length(out$data)){
  out$data[[i]]$lagAI <- tscslag(out$data[[i]], "AI", "IDORIGIN", "YEAR")
  out$data[[i]]$lagPCGNP <- tscslag(out$data[[i]], "PCGNP", "IDORIGIN", "YEAR")
  out$data[[i]]$lagLPOP <- tscslag(out$data[[i]], "LPOP", "IDORIGIN", "YEAR")
}

Now, we can use the lagged values of PCGNP and LPOP, to create percentage change variables:

for(i in 1:length(out$data)){
  out$data[[i]]$pctchgPCGNP <- with(out$data[[i]], c(PCGNP-lagPCGNP)/lagPCGNP)
  out$data[[i]]$pctchgLPOP <- with(out$data[[i]], c(LPOP-lagLPOP)/lagLPOP)
}

Using MIcombine

You can use the MIcombine command from the mitools package to generate inferences, too. Here, you have to produce a list of model estimates and the function will combine across the different results.

# initialize list
out <- hd2amelia(out)
results <- list()
# loop over imputed datasets
for(i in 1:length(out$imputations)){
    results[[i]] <- lm(AI ~ lagAI + pctchgPCGNP + PCGNP + pctchgLPOP + LPOP + MIL2 + LEFT +
    BRIT + POLRT + CWARCOW + IWARCOW2, data=out$imputations[[i]])
}
summary(mitools::MIcombine(results))
#> Multiple imputation results:
#>       MIcombine.default(results)
#>                   results           se        (lower        upper) missInfo
#> (Intercept)  5.210914e-01 1.315694e-01  0.2628351910  7.793476e-01      7 %
#> lagAI        4.615741e-01 3.348799e-02  0.3814342622  5.417140e-01     82 %
#> pctchgPCGNP  5.409855e-03 3.961998e-03 -0.0027079073  1.352762e-02     42 %
#> PCGNP       -2.143957e-05 3.775037e-06 -0.0000293615 -1.351764e-05     52 %
#> pctchgLPOP  -7.594087e-01 8.597509e-01 -2.5209219788  1.002105e+00     42 %
#> LPOP         7.417993e-02 1.002792e-02  0.0537453200  9.461454e-02     39 %
#> MIL2         1.160802e-01 4.587761e-02  0.0203072668  2.118531e-01     50 %
#> LEFT        -1.472793e-01 4.402444e-02 -0.2340071121 -6.055140e-02     14 %
#> BRIT        -1.324853e-01 4.721912e-02 -0.2359887279 -2.898194e-02     65 %
#> POLRT       -6.760555e-02 1.179953e-02 -0.0929377267 -4.227338e-02     59 %
#> CWARCOW      6.295949e-01 7.565076e-02  0.4658645456  7.933253e-01     62 %
#> IWARCOW2     2.064528e-01 5.751963e-02  0.0921742028  3.207314e-01     23 %

Using mids

The final method for combining results is to convert the data object returned by the hot.deck function to an object of class mids. This can be done with the datalist2mids function from the miceadds package.

out.mids <- miceadds::datalist2mids(out$imputations)
#> Warning: Number of logged events: 1
s <- summary(mice::pool(mice::lm.mids(AI ~ lagAI + pctchgPCGNP + PCGNP + pctchgLPOP + LPOP + MIL2 + LEFT +
BRIT + POLRT + CWARCOW + IWARCOW2, data=out.mids)))
#> Warning in mice::lm.mids(AI ~ lagAI + pctchgPCGNP + PCGNP + pctchgLPOP + : Use with(imp,
#> lm(yourmodel)).
print(s, digits=4)
#>           term   estimate std.error statistic      df   p.value
#> 1  (Intercept)  5.055e-01 1.446e-01   3.49564  64.859 8.584e-04
#> 2        lagAI  4.811e-01 1.963e-02  24.51022  31.149 6.356e-22
#> 3  pctchgPCGNP  1.028e-03 4.553e-03   0.22570   7.824 8.272e-01
#> 4        PCGNP -2.097e-05 3.609e-06  -5.81007  19.776 1.151e-05
#> 5   pctchgLPOP  9.813e-02 1.135e+00   0.08646   8.480 9.331e-01
#> 6         LPOP  7.011e-02 9.799e-03   7.15549  34.419 2.650e-08
#> 7         MIL2  1.235e-01 4.491e-02   2.75014  20.801 1.206e-02
#> 8         LEFT -1.445e-01 4.935e-02  -2.92779  38.532 5.703e-03
#> 9         BRIT -1.274e-01 4.582e-02  -2.78066  11.995 1.664e-02
#> 10       POLRT -6.084e-02 9.210e-03  -6.60552  63.748 9.251e-09
#> 11     CWARCOW  6.174e-01 6.989e-02   8.83451  16.315 1.281e-07
#> 12    IWARCOW2  2.014e-01 5.271e-02   3.82115 540.096 1.483e-04

References

Poe, Steven, C. Neal Tate, and Linda Camp Keith. 1999. “Repression of the Human Right to Personal Integrity Revisited: A Global, Cross-National Study Covering the Years 1976-1993.” International Studies Quarterly 43: 291–313.

- Generating Imputations
  - Using MIcombine
  - Using mids
- References

Using Multiple Hot Deck Data Sets for Inference

Generating Imputations

Using MIcombine

Using mids

References