Skip to contents

Semicontinuous outcome regression models

resid_semiconti() is used for calculating the DPIT residuals for regression models with semicontinuous outcomes and constructing corresponding QQ-plots. Specifically, a Tobit regression and a Tweedie regression model are suitable models for resid_semiconti(). The suitable model objects are as follows:

We simulate y1 to follow Tweedie distribution depending on covariates x11 and x12.

## Tweedie model
library(assessor)
library(tweedie)
library(statmod)
n <- 500
x11 <- rnorm(n)
x12 <- rnorm(n)
beta0 <- 5
beta1 <- 1
beta2 <- 1
lambda1 <- exp(beta0 + beta1 * x11 + beta2 * x12)
y1 <- rtweedie(n, mu = lambda1, xi = 1.6, phi = 10)

In constructing model2, the intentional omission of the covariate x12 was aimed at facilitating a direct comparison with the true model, model1. As expected, the QQ plot in the right panel corresponding to model2 exhibits a substantial deviation from the diagonal line, attributing this deviation to the deliberate omission. Conversely, the left panel shows a closer alignment along the diagonal, implying a better fit when including both covariates, x11 and x12. This outcome strongly suggests that incorporating all covariates results in a more appropriate and improved model.

# True model
model1 <-
  glm(y1 ~ x11 + x12,
    family = tweedie(var.power = 1.6, link.power = 0)
  )

# missing covariate
model2 <-  glm(y1 ~ x11 ,
    family = tweedie(var.power = 1.6, link.power = 0)
  )

par(mfrow=c(1,2))
resid1 <- resid_semiconti(model1)
resid2 <- resid_semiconti(model2)

resid_semiconti() function supports calculating DPIT residuals for a Tobit regression from both VGAM::vglm and AER::tobit packages.

In this example, we assume that the latent variable \(Y^*\) follows a normal distribution with a mean given by \[\mu=\beta_0+\beta_1X_1+\beta_2X_2, \] where \(X_1, X_2 \sim Unif (0, 1)\) independently, and \(\beta_0 = 1, \beta_1 = -3, \beta_2 = 3\). We observe \(Y=0\) if \(Y^*<0\). These variables will be employed as inputs for Tobit regression analyses provided by the VGAM and AER packages.

## Tobit regression model
library(VGAM)
beta13 <- 1
beta14 <- -3
beta15 <- 3

set.seed(123)
x11 <- runif(n)
x12 <- runif(n)
lambda1 <- beta13 + beta14 * x11 + beta15 * x12
sd0 <- 0.3
yun <- rnorm(n, mean = lambda1, sd = sd0)
y <- ifelse(yun >= 0, yun, 0)

The model fit1miss, corresponding to the right QQ plot, intentionally omits the covariate x12. This omission leads to a deviation from the diagonal line in the QQ plot, observed in the right panel. Thus, it can be interpreted as the model misspecification. In contrast, the left panel, corresponding to the model including all covariates, aligns closely along the diagonal line.

# Using VGAM package
# True model
fit1 <- vglm(formula = y ~ x11 + x12, tobit(Upper = Inf, Lower = 0, lmu = "identitylink"))
# Missing covariate
fit1miss <- vglm(formula = y ~ x11, tobit(Upper = Inf, Lower = 0, lmu = "identitylink"))

par(mfrow=c(1,2))
resid1 <- resid_semiconti(fit1, plot = TRUE)
resid2 <- resid_semiconti(fit1miss, plot = TRUE)

The interpretation remains the same as the VGAM example. Note that the results from the AER are exactly the same as those from the VGAM example.

# Using AER package
library(AER)
# True model
fit2 <- tobit(y ~ x11 + x12, left = 0, right = Inf, dist = "gaussian")
# Missing covariate
par(mfrow=c(1,2))
fit2miss <- tobit(y ~ x11, left = 0, right = Inf, dist = "gaussian")
reisd1 <- resid_semiconti(fit2, plot = TRUE)
resid2 <- resid_semiconti(fit2miss, plot = TRUE)