Assessing the mean structure
ord_curve()
creates a plot to assess the mean structure
of regression models. The plot compares the cumulative sum of the
response variable and its hypothesized value. Deviation from the
diagonal suggests the possibility that the mean structure of the model
is incorrect.
Ordered curve method is not restricted to a discrete outcome
regression model. This function also supports a continuous outcome
regression such as lm
object as well as glm
,
glm.nb
, or polr
.
In the example below, the underlying model is a logistic regression with the probability of 1 as \(\mathrm{logit}^{-1}(\beta_0+\beta_1 X_1+\beta_2 X_2+\beta_3X_1 X_2)\), where \((\beta_0,\beta_1,\beta_2,\beta_3)=(-5,2,1,3)\), \(X_1\sim N(1,1)\), and \(X_2\) is a dummy variable with a probability of one equal to 0.7. For the misspecified model, the binary covariate and the interaction term are omitted. #### Example
library(assessor)
## Binary example of ordered curve
n <- 500
set.seed(1234)
x1 <-rnorm(n,1,1); x2 <- rbinom(n,1,0.7)
beta0 <- -5; beta1 <- 2; beta2<- 1; beta3 <- 3
q1 <-1/(1+exp(beta0+beta1*x1+beta2*x2+beta3*x1*x2))
y1 <- rbinom(n,size=1,prob = 1-q1)
par(mfrow=c(1,2))
model0 <- glm(y1~x1*x2,family =binomial(link = "logit") )
ord_curve(model0,thr=model0$fitted.values)
model1 <- glm(y1~x1,family =binomial(link = "logit") )
ord_curve(model1,thr=x2)
The figures above illustrate ordered curves plots corresponding to
model0
and model1
. In the left panel, the
curve closely aligns with the diagonal line, indicating that the mean
structure of model0
is correctly specified. On the
contrary, model1
exhibits a deviation from the diagonal
line due to the omission of the variable \(x_2\). This misspecification, coupled with
the choice of the threshold value as \(x_2\), results in the observed discrepancy
in the ordered curve in the right panel.
The substantial disparity between the observed curve in
model1
and the diagonal line strongly suggests the
necessity of including the variable \(x_2\) in the model. This inclusion is
crucial for accurately capturing the underlying mean structure.