\(Var(\epsilon|X)=\sigma^2 I\) homoskedasticita
\(Var(\epsilon|X)=\sigma^2 \Omega\) heteroskedasticita
\(Var(\epsilon|X)=\sigma^2 h(X)\)
Pro jednodusí predstavu uvazujme prímkovou regresi
\(Var(\epsilon|x)=\sigma^2 h(x)\) kde \(h(x)\) predstavuje funkci “typ” heteroskedasticity. Rozptyl musí být vzdy nezáporný!
Otázkou je jak vypadá funkce \(h(x)\)
\(wage=\beta_0+\beta_1 IQ+\beta_2 expert+\beta_3 educ+\epsilon\)
set.seed(100)
N=10000
IQ=rnorm(N,100,5)
expert=as.numeric(runif(N, min=0, max=30))
educ = as.numeric(runif(N, min=9, max=20))
sigma=2*(exp(educ))^0.5 # tvar podmíneného rozptylu, rozptyl závisí na jednom z regresoru
e4= rnorm(N, 0, sigma)
wage = -50 +10*IQ+ 50*educ+3*expert + e4
populace = data.frame(IQ,expert, educ, wage)
Jen pro zajímavost si odhadneme model.
regrese_hetero=lm(wage~IQ+educ+expert,data = populace) # provedeme odhad regresní rovnice, nyní nedeláme výber
summary(regrese_hetero)
##
## Call:
## lm(formula = wage ~ IQ + educ + expert, data = populace)
##
## Residuals:
## Min 1Q Median 3Q Max
## -126807 -1462 -181 1790 97841
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2571.13 2680.08 -0.959 0.3374
## IQ 56.01 26.07 2.149 0.0317 *
## educ -82.85 40.88 -2.027 0.0427 *
## expert -23.64 14.96 -1.580 0.1142
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12960 on 9996 degrees of freedom
## Multiple R-squared: 0.001115, Adjusted R-squared: 0.0008156
## F-statistic: 3.721 on 3 and 9996 DF, p-value: 0.01091
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
bptest(regrese_hetero)
##
## studentized Breusch-Pagan test
##
## data: regrese_hetero
## BP = 1405.5, df = 3, p-value < 2.2e-16
Zásadním predpokladem GSL odhadu je \(\bf{znalostost}\) tvaru heteroskedasticity!
\(Var(\epsilon|X)=4\times e^{educ}\)
\(h(X)=e^{educ}\)
\(\dfrac{wage}{\sqrt{e^{educ}}}=\dfrac{\beta_0}{\sqrt{e^{educ}}}+\beta_1 \dfrac{IQ}{\sqrt{e^{educ}}}+\beta_2 \dfrac{expert}{\sqrt{e^{educ}}}+\beta_3 \dfrac{educ}{\sqrt{e^{educ}}}+\dfrac{\epsilon}{\sqrt{e^{educ}}}\)
\(Var(\dfrac{\epsilon}{\sqrt{e^{educ}}})=\dfrac{1}{e^{educ}} \times Var(\epsilon) =\dfrac{1}{e^{educ}} \times 4\times e^{educ}=4\)
attach(populace)
## The following objects are masked _by_ .GlobalEnv:
##
## educ, expert, IQ, wage
wage_W=wage/sqrt(exp(educ))
konstanta=cbind(rep(1,length(educ)))/sqrt(exp(educ))
IQ_W=IQ/sqrt(exp(educ))
expert_W=expert/sqrt(exp(educ))
educ_W=educ/sqrt(exp(educ))
Vsimnete si, co nám zpusobila silná heterokedasticita.
regrese_hetero2=lm(wage_W~0+konstanta+IQ_W+educ_W+expert_W,data = populace)
summary(regrese_hetero2)
##
## Call:
## lm(formula = wage_W ~ 0 + konstanta + IQ_W + educ_W + expert_W,
## data = populace)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.0748 -1.3715 -0.0315 1.3475 7.2975
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## konstanta 80.5596 136.4402 0.590 0.55491
## IQ_W 8.5704 1.2012 7.135 1.04e-12 ***
## educ_W 51.7456 6.0206 8.595 < 2e-16 ***
## expert_W 2.6090 0.6935 3.762 0.00017 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.016 on 9996 degrees of freedom
## Multiple R-squared: 0.8604, Adjusted R-squared: 0.8604
## F-statistic: 1.54e+04 on 4 and 9996 DF, p-value: < 2.2e-16
bptest(regrese_hetero2)
##
## studentized Breusch-Pagan test
##
## data: regrese_hetero2
## BP = 1.3732, df = 3, p-value = 0.7118
vaha=1/exp(educ) # Pozor na zadání váh!!
regrese_gls=lm(wage~IQ+educ+expert,weights = vaha)
summary(regrese_gls)
##
## Call:
## lm(formula = wage ~ IQ + educ + expert, weights = vaha)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -8.0748 -1.3715 -0.0315 1.3475 7.2975
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 80.5596 136.4402 0.590 0.55491
## IQ 8.5704 1.2012 7.135 1.04e-12 ***
## educ 51.7456 6.0206 8.595 < 2e-16 ***
## expert 2.6090 0.6935 3.762 0.00017 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.016 on 9996 degrees of freedom
## Multiple R-squared: 0.01332, Adjusted R-squared: 0.01302
## F-statistic: 44.98 on 3 and 9996 DF, p-value: < 2.2e-16
Co delat, kdyz nevíme jak vypadá h(X)?
\(\textbf{Castecným}\) resením je FGLS. V rámci FGLS se snazíme odhadnout tvar heteroskedasticity, tedy tvar funkce \(h(X)\). Mozností je nekonecno, proto muzeme pouzít napríklad predpoklad podle White:
\(Var(\epsilon|X)=\sigma^2 exp(\gamma_0+\gamma_1 IQ+\gamma_2 expert+\gamma_3 educ)\)
\(\epsilon^2=\sigma^2 exp(\gamma_0+\gamma_1 IQ+\gamma_2 expert+\gamma_3 educ)u\)
\(\log{(\epsilon^2)}=\alpha_0+\gamma_0+\gamma_1 IQ+\gamma_2 expert+\gamma_3 educ+u\)
regrese1=lm(wage~IQ+educ+expert)
residua=regrese1$resid
le2=log(residua^2)
\(\log{(\epsilon^2)}=\alpha_0+\gamma_0+\gamma_1 IQ+\gamma_2 expert+\gamma_3 educ+u\)
regrese2=lm(le2~IQ+educ+expert)
nafit_hodnoty=regrese2$fitted.values
h=exp(nafit_hodnoty)
vahy2=1/h
regrese3=lm(wage~IQ+educ+expert,weights = vahy2)
summary(regrese3)
##
## Call:
## lm(formula = wage ~ IQ + educ + expert, weights = vahy2)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -9.9032 -1.0808 -0.0222 1.0471 7.0005
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 62.8147 189.7879 0.331 0.7407
## IQ 8.8137 1.7241 5.112 3.25e-07 ***
## educ 51.4228 7.4101 6.940 4.18e-12 ***
## expert 2.4024 0.9957 2.413 0.0159 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.678 on 9996 degrees of freedom
## Multiple R-squared: 0.007752, Adjusted R-squared: 0.007455
## F-statistic: 26.03 on 3 and 9996 DF, p-value: < 2.2e-16
bptest(regrese3)
##
## studentized Breusch-Pagan test
##
## data: regrese3
## BP = 1405.5, df = 3, p-value < 2.2e-16
Metoda FGLS vám neodstraní heteroskedasticitu! Neznáte její tvar, pouze vám pomuze získat asymptoticky vydatnejsí odhady parametru. Musíte proto vzdy pouzít i robustní odhad chyb!
library(sandwich)
## Warning: package 'sandwich' was built under R version 3.2.5
coeftest(regrese3,vcovHC)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 62.81472 135.38014 0.4640 0.6426667
## IQ 8.81368 1.18655 7.4280 1.193e-13 ***
## educ 51.42277 6.41880 8.0113 1.262e-15 ***
## expert 2.40235 0.71471 3.3613 0.0007786 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Nejsem uchalák, toto je príklad jak z clánku, tak z knihy od Greena! V clanku autor zkoumal rozhodování o práci a volném casu. viz. Fairs (1977) Extramarital Affairs
affairs=read.table("affairs.txt",header = TRUE)
regrese_aff=lm(Y~Z1+Z2+Z3+Z4+Z5+Z6+Z7+Z8, data = affairs)
summary(regrese_aff)
##
## Call:
## lm(formula = Y ~ Z1 + Z2 + Z3 + Z4 + Z5 + Z6 + Z7 + Z8, data = affairs)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.0503 -1.7226 -0.7947 0.2101 12.7036
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.87201 1.13750 5.162 3.34e-07 ***
## Z1 0.05409 0.30049 0.180 0.8572
## Z2 -0.05098 0.02262 -2.254 0.0246 *
## Z3 0.16947 0.04122 4.111 4.50e-05 ***
## Z4 -0.14262 0.35020 -0.407 0.6840
## Z5 -0.47761 0.11173 -4.275 2.23e-05 ***
## Z6 -0.01375 0.06414 -0.214 0.8303
## Z7 0.10492 0.08888 1.180 0.2383
## Z8 -0.71188 0.12001 -5.932 5.09e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.095 on 592 degrees of freedom
## Multiple R-squared: 0.1317, Adjusted R-squared: 0.12
## F-statistic: 11.23 on 8 and 592 DF, p-value: 7.472e-15
bptest(regrese_aff)
##
## studentized Breusch-Pagan test
##
## data: regrese_aff
## BP = 56.188, df = 8, p-value = 2.593e-09
residua=regrese_aff$resid
le2=log(residua^2)
regrese_aff2=lm(le2~Z1+Z2+Z3+Z4+Z5+Z6+Z7+Z8, data = affairs)
nafit_hodnoty=regrese_aff2$fitted.values
h=exp(nafit_hodnoty)
vahy2=1/h
regrese_aff3=lm(Y~Z1+Z2+Z3+Z4+Z5+Z6+Z7+Z8,data = affairs,weights = vahy2)
Nesmíme vsak zapomenout na robustní odhad chyb. FGLS neodstraní heteroskedasticitu, jen zmensí její dopad.
coeftest(regrese_aff3, vcov = vcovHC(regrese_aff3, type = "HC1"))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.906047 1.204729 1.5821 0.114152
## Z1 0.085088 0.230422 0.3693 0.712057
## Z2 0.045144 0.049417 0.9135 0.361334
## Z3 0.028811 0.037210 0.7743 0.439064
## Z4 -0.355230 0.421631 -0.8425 0.399841
## Z5 -0.289497 0.082068 -3.5275 0.000452 ***
## Z6 -0.025546 0.059936 -0.4262 0.670100
## Z7 -0.010622 0.057912 -0.1834 0.854540
## Z8 -0.240043 0.145419 -1.6507 0.099329 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A porovnáme s modelem bez odhadu FGLS.
coeftest(regrese_aff, vcov = vcovHC(regrese3, type = "HC1"))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.872 135.134 0.0435 0.9654