This document will walk you
through some of the methods you could use to generate pooled model
results that account for both sampling variability and across imputation
variability. The package hot.deck
does not come with a set
of functions to do inference, so we will show you how you could use the
data generated by hot.deck
in combination with
glm.mids
(and similarly lm.mids
) from the
mice
package, zelig
from the
Zelig
package and by using MIcombine
from the
mitools
package on a list of model objects.
The data we will use come from Poe, Tate, and
Keith (1999) dealing with democracy and state repression. First
we need to call the hot.deck
routine on the dataset.
library(hot.deck)
data(isq99)
out <- hot.deck(isq99, sdCutoff=3, IDvars = c("IDORIGIN", "YEAR"))
#> Warning in hot.deck(isq99, sdCutoff = 3, IDvars = c("IDORIGIN", "YEAR")): 52 observations with no observed data. These observations were removed
#> Warning in hot.deck(isq99, sdCutoff = 3, IDvars = c("IDORIGIN", "YEAR")): 45 of 4661 imputations with # donors < 5, consider increasing sdCutoff or using method='p.draw'
This shows us that there are still 45 observations with fewer than 5
donors. Using a different method or further widening the
sdCutoff
parameter may alleviate the problem. If you want
to see the frequency distribution of the number of donors, you could
look at:
numdonors <- sapply(out$donors, length)
numdonors <- sapply(out$donors, length)
numdonors <- ifelse(numdonors > 5, 6, numdonors)
numdonors <- factor(numdonors, levels=1:6, labels=c(1:5, ">5"))
table(numdonors)
#> numdonors
#> 1 2 3 4 5 >5
#> 18 10 11 6 20 4596
Before running a model, three variables have to be created from those
existing. Generally, if variables are deterministic functions of other
variables (e.g., transformations, lags, etc…) it is advisable to impute
the constituent variables of the calculations and then do the
calculations after the fact. Here, we need to lag the AI
variable and create percentage change variables for both population and
per-capita GNP. First, to create the lag of AI
,
PCGNP
and LPOP
. To do this, we will make a
little function.
tscslag <- function(dat, x, id, time){
obs <- apply(dat[, c(id, time)], 1, paste, collapse=".")
tm1 <- dat[[time]] - 1
lagobs <- apply(cbind(dat[[id]], tm1), 1, paste, collapse=".")
lagx <- dat[match(lagobs, obs), x]
}
for(i in 1:length(out$data)){
out$data[[i]]$lagAI <- tscslag(out$data[[i]], "AI", "IDORIGIN", "YEAR")
out$data[[i]]$lagPCGNP <- tscslag(out$data[[i]], "PCGNP", "IDORIGIN", "YEAR")
out$data[[i]]$lagLPOP <- tscslag(out$data[[i]], "LPOP", "IDORIGIN", "YEAR")
}
Now, we can use the lagged values of PCGNP
and
LPOP
, to create percentage change variables:
for(i in 1:length(out$data)){
out$data[[i]]$pctchgPCGNP <- with(out$data[[i]], c(PCGNP-lagPCGNP)/lagPCGNP)
out$data[[i]]$pctchgLPOP <- with(out$data[[i]], c(LPOP-lagLPOP)/lagLPOP)
}
You can use the MIcombine
command from the
mitools
package to generate inferences, too. Here, you have
to produce a list of model estimates and the function will combine
across the different results.
# initialize list
out <- hd2amelia(out)
results <- list()
# loop over imputed datasets
for(i in 1:length(out$imputations)){
results[[i]] <- lm(AI ~ lagAI + pctchgPCGNP + PCGNP + pctchgLPOP + LPOP + MIL2 + LEFT +
BRIT + POLRT + CWARCOW + IWARCOW2, data=out$imputations[[i]])
}
summary(mitools::MIcombine(results))
#> Multiple imputation results:
#> MIcombine.default(results)
#> results se (lower upper) missInfo
#> (Intercept) 5.626680e-01 0.1462983352 2.709142e-01 8.544217e-01 26 %
#> lagAI 4.498087e-01 0.0199046606 4.092328e-01 4.903846e-01 39 %
#> pctchgPCGNP 4.734201e-03 0.0044497381 -5.370512e-03 1.483891e-02 73 %
#> PCGNP -2.150899e-05 0.0000029234 -2.725833e-05 -1.575965e-05 11 %
#> pctchgLPOP -6.431388e-01 0.9793532821 -2.695206e+00 1.408928e+00 51 %
#> LPOP 7.399631e-02 0.0092305867 5.560912e-02 9.238350e-02 25 %
#> MIL2 1.225367e-01 0.0388621773 4.514354e-02 1.999299e-01 25 %
#> LEFT -1.556586e-01 0.0429827574 -2.400559e-01 -7.126137e-02 8 %
#> BRIT -1.439648e-01 0.0315462265 -2.059059e-01 -8.202363e-02 8 %
#> POLRT -6.880233e-02 0.0093569454 -8.748514e-02 -5.011952e-02 27 %
#> CWARCOW 6.588045e-01 0.0631002226 5.300219e-01 7.875870e-01 40 %
#> IWARCOW2 2.101938e-01 0.0575523062 9.603315e-02 3.243545e-01 21 %
The final method for combining results is to convert the data object
returned by the hot.deck
function to an object of class
mids
. This can be done with the datalist2mids
function from the miceadds
package.
out.mids <- miceadds::datalist2mids(out$imputations)
#> Warning: Number of logged events: 1
s <- summary(mice::pool(mice::lm.mids(AI ~ lagAI + pctchgPCGNP + PCGNP + pctchgLPOP + LPOP + MIL2 + LEFT +
BRIT + POLRT + CWARCOW + IWARCOW2, data=out.mids)))
#> Warning in mice::lm.mids(AI ~ lagAI + pctchgPCGNP + PCGNP + pctchgLPOP + : Use with(imp,
#> lm(yourmodel).
print(s, digits=4)
#> term estimate std.error statistic df p.value
#> 1 (Intercept) 5.459e-01 1.460e-01 3.74036 69.650 3.737e-04
#> 2 lagAI 4.502e-01 1.959e-02 22.97789 32.289 1.405e-21
#> 3 pctchgPCGNP 3.236e-03 3.032e-03 1.06723 8.797 3.143e-01
#> 4 PCGNP -2.078e-05 2.755e-06 -7.54261 1982.117 6.973e-14
#> 5 pctchgLPOP 1.334e-01 1.734e+00 0.07697 8.377 9.405e-01
#> 6 LPOP 7.474e-02 9.543e-03 7.83253 47.270 4.366e-10
#> 7 MIL2 1.322e-01 3.824e-02 3.45613 92.456 8.296e-04
#> 8 LEFT -1.532e-01 4.329e-02 -3.53803 412.852 4.489e-04
#> 9 BRIT -1.507e-01 3.357e-02 -4.48917 107.537 1.803e-05
#> 10 POLRT -6.835e-02 9.369e-03 -7.29581 62.711 6.231e-10
#> 11 CWARCOW 6.631e-01 5.695e-02 11.64377 80.771 5.815e-19
#> 12 IWARCOW2 2.100e-01 6.214e-02 3.37926 39.811 1.637e-03