Discrete outcome regression
resid_quasi()
generates QQ-plots for regression models
with discrete outcomes, employing quasi-empirical residual distribution
functions. It’s tailored to assess model assumptions in GLMs featuring
binary, ordinal, Poisson, negative binomial, zero-inflated Poisson, and
zero-inflated negative binomial outcomes. Unlike typical functions in
assessor
package, resid_quasi()
exclusively
focuses on plotting QQ-plots and does not compute DPIT residuals.
- Negative binomial,
MASS::glm.nb()
- Poisson,
glm(formula, family=poisson(link="log"))
- Binary,
glm(formula, family=binomial(link="logit"))
- Ordinal,
MASS::polr()
- Zero-Inflated Poisson,
pscl::zeroinfl(dist = "poisson")
- Zero-Inflated negative binomial,
pscl::zeroinfl(dist = "negbin")
The tabs below explain how to interpret the QQ-plots generated by resid_quasi() in Poisson and Zero-inflated Poisson examples, respectively.
We simulate a Poisson random variable using covariates \(X_1\) and \(X_2\). The true mean of \(Y\) is intricately connected to both \(X_1\) and \(X_2\), as expressed in the ensuing relationship: \[ Y \sim \text{Poisson}(\lambda = \exp(\beta_0 + \beta_1 x_1 + \beta_2 x_2)), \] where \(\beta_0=-2,~\beta_1=2,~\beta_2=1\).
library(assessor)
## Poisson example
n <- 500
set.seed(1234)
# Covariates
x1 <- rnorm(n)
x2 <- rbinom(n, 1, 0.7)
# Coefficients
beta0 <- -2
beta1 <- 2
beta2 <- 1
lambda1 <- exp(beta0 + beta1 * x1 + beta2 * x2)
y <- rpois(n, lambda1)
# True model
poismodel1 <- glm(y ~ x1 + x2, family = poisson(link = "log"))
resid_quasi(poismodel1)
#> Multistart 1 of 1 |Multistart 1 of 1 |Multistart 1 of 1 |Multistart 1 of 1 /Multistart 1 of 1 |Multistart 1 of 1 |
The figure presented above represents the results of
poismodel1
, which is a GLM fitted with a Poisson
distribution. In this context, the variable y
follows a
Poisson distribution as defined in the model. Our expectation was that
the QQ-plot would exhibit alignment with diagonal lines, as this
alignment indicates conformity with the assumptions of the Poisson
distribution for discrete outcome regression.
As anticipated, the result indeed demonstrates a well-aligned QQ plot, closely following the diagonal line. This alignment is indicative of the correctness of our model assumption regarding the distribution of the outcome variable. In simpler terms, it suggests that our model is appropriately capturing the characteristics of the data, specifically the discrete nature of the outcomes, as dictated by the Poisson distribution.
We generate simulated data using a zero-inflated Poisson model. The probability of excess zeros is modeled using \(\mathrm{logit}(p_0) = \beta_{00} + \beta_{10}X_1\), while the Poisson component has a mean of \(\lambda = \exp(\beta_0 +\beta_1X_1 +\beta_2X_2)\). Here, \(X_1\) follows a normal distribution with mean 0 and standard deviation 1, and \(X_2\) is a binary variable with a probability of 1 set to 0.7. The parameter values are set to \(( \beta_{00} ,\beta_{10}, \beta_0, \beta_1, \beta_2) = (-2, 2, -2, 2, 1)\).
## Zero-Inflated Poisson
library(assessor)
library(pscl)
n <- 500
set.seed(1234)
# Covariates
x1 <- rnorm(n)
x2 <- rbinom(n, 1, 0.7)
# Coefficients
beta0 <- -2
beta1 <- 2
beta2 <- 1
beta00 <- -2
beta10 <- 2
# Mean of Poisson part
lambda1 <- exp(beta0 + beta1 * x1 + beta2 * x2)
# Excess zero probability
p0 <- 1 / (1 + exp(-(beta00 + beta10 * x1)))
## simulate outcomes
y0 <- rbinom(n, size = 1, prob = 1 - p0)
y1 <- rpois(n, lambda1)
y <- ifelse(y0 == 0, 0, y1)
par(mfrow=c(1,2))
## True model
modelzero1 <- zeroinfl(y ~ x1 + x2 | x1, dist = "poisson", link = "logit")
resid_quasi(modelzero1)
#> Multistart 1 of 1 |Multistart 1 of 1 |Multistart 1 of 1 |Multistart 1 of 1 /Multistart 1 of 1 |Multistart 1 of 1 |
## Zero inflation
modelzero2 <- glm(y ~ x1 + x2, family = poisson(link = "log"))
resid_quasi(modelzero2)
#> Multistart 1 of 1 |Multistart 1 of 1 |Multistart 1 of 1 |Multistart 1 of 1 /Multistart 1 of 1 |Multistart 1 of 1 |
The QQ plots shown above correspond to modelzero1
and
modelzero2
. Given that the true distribution of
y
follows a zero-inflated Poisson distribution, we expect
to see deviations from the diagonal line in the QQ plot of
modelzero2
. As anticipated, the QQ plot on the left closely
follows the diagonal line. However, in the right panel, both the left
and right tails of the QQ plot for modelzero2
deviate from
the diagonal line, indicating that the assumption of a Poisson
distribution may not be accurate.
The observed differences in the QQ plot of modelzero2
suggest that the assumption of a Poisson distribution is not
well-supported by the data. This underscores the importance of
considering alternative distributional assumptions, such as the
zero-inflated Poisson distribution, which may better capture the
characteristics of the simulated data.