install.packages(“mlogit”)
library(mlogit)
\(p_{ij}=P(y_i=j)=\dfrac{\exp{w_i^T \gamma_j}}{\sum_{k=1}^{m}\exp{w_i^T \gamma_k}}\)
m - pocet mozností j=1,..,m
Jeden koeficient \(\gamma=0\), normalizujeme. Ziskame base-group, referencni kategorii, ke ktere budeme intepretovat odhady zbylych koeficientu.
Budeme odhadovat jak prijem ovlivni to, jak bude clovek rybarit
ryby=read.csv("C:/Users/Lukas/Desktop/multinomial_fishing1.csv")
attach(ryby)
Income v tis. USD se nemeni se zmenou typu rybareni - pouzijeme Multinomial logit model
Nejprve si upravie data. Nyni totiz vidime pouze to, jak chyta dany rybar.
Data si upravime do tzv. dlouheho formatu, abychom u kazdeho jedince meli i zaznam k ostatnim typum zavisle promenne. Jsou to samozrejme nuly.
ryby$mode=as.factor(ryby$mode) # typ rybareni musime zadefinovat jako faktor a nebo 0,1
mldata=mlogit.data(ryby, varying=4:15, choice="mode", shape="wide")
mldata[1:12,]
## mode price catchrate income mode1 alt d catch chid
## 1.beach FALSE 157.930 0.5391 7.083332 4 beach 0 0.0678 1
## 1.charter TRUE 182.930 0.5391 7.083332 4 charter 1 0.5391 1
## 1.pier FALSE 157.930 0.5391 7.083332 4 pier 0 0.0503 1
## 1.private FALSE 157.930 0.5391 7.083332 4 private 0 0.2601 1
## 2.beach FALSE 15.114 0.4671 1.250000 4 beach 0 0.1049 2
## 2.charter TRUE 34.534 0.4671 1.250000 4 charter 1 0.4671 2
## 2.pier FALSE 15.114 0.4671 1.250000 4 pier 0 0.0451 2
## 2.private FALSE 10.534 0.4671 1.250000 4 private 0 0.1574 2
## 3.beach FALSE 161.874 0.2413 3.750000 3 beach 0 0.5333 3
## 3.charter FALSE 59.334 0.2413 3.750000 3 charter 0 1.0266 3
## 3.pier FALSE 161.874 0.2413 3.750000 3 pier 0 0.4522 3
## 3.private TRUE 24.334 0.2413 3.750000 3 private 1 0.2413 3
porovnejte s puvodnim formatem ryby
Nejprve normalizujeme parametr pro “charter”"
mlogit.model1 <- mlogit(mode ~ 1 | income, data=mldata, reflevel="charter") # 1 je zde protoze tu nemame zadne promenne co se meni s variantou, treba cena rybareni
summary(mlogit.model1)
##
## Call:
## mlogit(formula = mode ~ 1 | income, data = mldata, reflevel = "charter",
## method = "nr", print.level = 0)
##
## Frequencies of alternatives:
## charter beach pier private
## 0.38240 0.11337 0.15059 0.35364
##
## nr method
## 4 iterations, 0h:0m:0s
## g'(-H)^-1g = 8.32E-07
## gradient close to zero
##
## Coefficients :
## Estimate Std. Error t-value Pr(>|t|)
## beach:(intercept) -1.341291 0.194517 -6.8955 5.367e-12 ***
## pier:(intercept) -0.527141 0.177784 -2.9651 0.003026 **
## private:(intercept) -0.602371 0.136096 -4.4261 9.597e-06 ***
## beach:income 0.031640 0.041846 0.7561 0.449591
## pier:income -0.111763 0.043979 -2.5413 0.011046 *
## private:income 0.123546 0.027911 4.4265 9.577e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -1477.2
## McFadden R^2: 0.013736
## Likelihood ratio test : chisq = 41.145 (p.value = 6.0931e-09)
S vyssim prijmem:
poroste pravdepodobnost, ze dany jedinec bude rybarit ze soukrome lode, nez z pujcene lode.
klesa pravdepodobnost, ze bude rybarit z mola, nez z pujcene lode.
Zmenime referencni promennou na “pier”
mlogit.model2 <- mlogit(mode ~ 1 | income, data = mldata, reflevel="pier")
summary(mlogit.model2)
##
## Call:
## mlogit(formula = mode ~ 1 | income, data = mldata, reflevel = "pier",
## method = "nr", print.level = 0)
##
## Frequencies of alternatives:
## pier beach charter private
## 0.15059 0.11337 0.38240 0.35364
##
## nr method
## 4 iterations, 0h:0m:0s
## g'(-H)^-1g = 8.32E-07
## gradient close to zero
##
## Coefficients :
## Estimate Std. Error t-value Pr(>|t|)
## beach:(intercept) -0.814150 0.228632 -3.5610 0.0003695 ***
## charter:(intercept) 0.527141 0.177784 2.9651 0.0030262 **
## private:(intercept) -0.075229 0.183240 -0.4106 0.6814007
## beach:income 0.143403 0.053288 2.6911 0.0071223 **
## charter:income 0.111763 0.043979 2.5413 0.0110455 *
## private:income 0.235309 0.043668 5.3886 7.101e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -1477.2
## McFadden R^2: 0.013736
## Likelihood ratio test : chisq = 41.145 (p.value = 6.0931e-09)
S vyssim prijmem poroste pravdepodobnost, ze dany jedinec vybere variantu rybareni beach, obou lodi nez z mola. Jinak receno s roustoucim prijem se stavaji uvedene 3 varianty pravdepodobnejsi, nez ze ho najdeme na mole.
POZOR znaménko koeficientu nutne nemusí korespondovat s marginalnim efektem!
Opet musime zvolit, zdali budeme pouzivat - marginal effects at mean (MEM) a average marginal effects (AME).
The marginal effect - zvyseni regresoru, zmeni pravdepodobnost vyberu alternativy j o:
\(\dfrac{\partial p_{ij}}{\partial w_i} = p_{ij}(\gamma_i-\bar{\gamma}_i)\)
m = mlogit(mode ~ price+catch |income, data = mldata, reflevel="charter")
z = with(mldata, data.frame(price = tapply(price, index(m)$alt, mean),catch = tapply(catch, index(m)$alt, mean), income=mean(income)))
Marginalni efekt
effects(mlogit.model1, covariate = "income", data = z)
## charter beach pier private
## -1.201367e-02 7.495845e-05 -2.065980e-02 3.259851e-02
effects(mlogit.model2, covariate = "income", data = z)
## pier beach charter private
## -2.065980e-02 7.495844e-05 -1.201367e-02 3.259851e-02
Interpretace v procentech, ale pozor na jednotky regresoru.
S rustem Income o jednotku (1000 USD) dojde k:
prob. vyberu charter klesne o 1.2%
prob. vyberu pier klesne o 2%
prob. vyberu private vzroste o 3.2%
Nyni si do regrese zahrneme i promenne, ktere se budou menit s variantou. Budou to cena rybareni a “catch rate”
\(p_{ij}=P(y_i=j)=\dfrac{\exp{(x_{it}^T \beta + w_i^T \gamma_j)}}{\sum_{k=1}^{m}\exp{(x_{it}^T \beta+w_i^T \gamma_k)}}\)
VSIMNETE SI, ZE \(\beta\) NEMA ZADNY INDEX. NEMENI SE S VARIANTOU
Conditional logit coefficients
Opet zacneme s interpretaci vzhledem k referencni kategorii “charter”.
clogit.model1 <- mlogit(mode ~ price+catch |income, data = mldata, reflevel="charter")
summary(clogit.model1)
##
## Call:
## mlogit(formula = mode ~ price + catch | income, data = mldata,
## reflevel = "charter", method = "nr", print.level = 0)
##
## Frequencies of alternatives:
## charter beach pier private
## 0.38240 0.11337 0.15059 0.35364
##
## nr method
## 7 iterations, 0h:0m:0s
## g'(-H)^-1g = 1.37E-05
## successive function values within tolerance limits
##
## Coefficients :
## Estimate Std. Error t-value Pr(>|t|)
## beach:(intercept) -1.6943657 0.2240506 -7.5624 3.952e-14 ***
## pier:(intercept) -0.9164063 0.2072648 -4.4214 9.805e-06 ***
## private:(intercept) -1.1670869 0.1590475 -7.3380 2.169e-13 ***
## price -0.0251166 0.0017317 -14.5042 < 2.2e-16 ***
## catch 0.3577820 0.1097733 3.2593 0.001117 **
## beach:income 0.0332917 0.0503409 0.6613 0.508403
## pier:income -0.0942854 0.0500600 -1.8834 0.059640 .
## private:income 0.1227315 0.0286306 4.2867 1.813e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -1215.1
## McFadden R^2: 0.18868
## Likelihood ratio test : chisq = 565.17 (p.value = < 2.22e-16)
NEMENNE PROMENNE
Ve srovnani s charter fishing, tak s rostoucim prijmem:
roste prob. privatni lode
klesa pravdepodobnost na mole
VARIABILNI PROMENNE
s rostouci cenou alternativy, klesa pravdepodobnost, ze bude vybrana tato varianta a roste pravdepodobnost, ze bude vybrana jina varianta.
S rostouci “catch rate” alternativy, je naopak pravdepodobnejsi, ze bude zvolena alternativa.
Nyni bude referenci kategorie “prier”.
clogit.model2 <- mlogit(mode ~ price+catch | income, data = mldata, reflevel="pier")
summary(clogit.model2)
##
## Call:
## mlogit(formula = mode ~ price + catch | income, data = mldata,
## reflevel = "pier", method = "nr", print.level = 0)
##
## Frequencies of alternatives:
## pier beach charter private
## 0.15059 0.11337 0.38240 0.35364
##
## nr method
## 7 iterations, 0h:0m:0s
## g'(-H)^-1g = 1.37E-05
## successive function values within tolerance limits
##
## Coefficients :
## Estimate Std. Error t-value Pr(>|t|)
## beach:(intercept) -0.7779594 0.2204939 -3.5283 0.0004183 ***
## charter:(intercept) 0.9164063 0.2072648 4.4214 9.805e-06 ***
## private:(intercept) -0.2506806 0.2039395 -1.2292 0.2190004
## price -0.0251166 0.0017317 -14.5042 < 2.2e-16 ***
## catch 0.3577820 0.1097733 3.2593 0.0011170 **
## beach:income 0.1275771 0.0506395 2.5193 0.0117582 *
## charter:income 0.0942854 0.0500600 1.8834 0.0596396 .
## private:income 0.2170169 0.0500582 4.3353 1.456e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log-Likelihood: -1215.1
## McFadden R^2: 0.18868
## Likelihood ratio test : chisq = 565.17 (p.value = < 2.22e-16)
NEMENNE PROMENNE
Ve srovnani s olem, tak s rostoucim prijmem:
VARIABILNI PROMENNE
Nemeni se, hodnoty zustavaji stejne.
\(\dfrac{\partial p_{ij}}{\partial x_{ik}} = p_{ij}(\delta_{ijk}-p_{ik})\beta\)
Budeme zkoumat, marginalni efekt jednotlivých regresoru na závisle promennou.
effects(clogit.model1, covariate = "income", data = z)
## charter beach pier private
## -0.0217339246 -0.0007214181 -0.0093059734 0.0317613161
Kdyz vzroste income o (1000 USD), tak prob. výberu private vzroste o 3.2%.
effects(clogit.model1, covariate = "price", data = z)
## charter beach pier private
## charter -0.0062430047 6.091542e-04 7.642235e-04 0.0048696270
## beach 0.0006091541 -1.249124e-03 8.681094e-05 0.0005531588
## pier 0.0007642234 8.681094e-05 -1.545008e-03 0.0006939736
## private 0.0048696270 5.531588e-04 6.939737e-04 -0.0061167595
Kdyz vzroste cena charter o 1 USD, tak klesne prob. vyberu charter o 6.24%. Kdyz vzroste cena beach o 1 USD, tak vzroste prob. vyberu charter o 0.6%. Kdyz vzroste cena pier o 1 USD, tak vzroste prob. vyberu charter o 0.07%. Kdyz vzroste cena private o 1 USD, tak vzroste prob. vyberu charter o 0.04%.
effects(clogit.model1, covariate = "catch", data = z)
## charter beach pier private
## charter 0.088930726 -0.008677316 -0.010886256 -0.069367154
## beach -0.008677329 0.017793621 -0.001236612 -0.007879681
## pier -0.010886272 -0.001236612 0.022008455 -0.009885571
## private -0.069367164 -0.007879671 -0.009885559 0.087132394
install.packages(“rms”)
library(rms)
Vliv promennych na zdravotni stav - (fair=1, good=2, excellent=3).
zdravi=read.csv("C:/Users/Lukas/Desktop/ordered_health.csv")
ologit<- lrm(healthstatus1 ~ age+logincome+numberdiseases, data=zdravi)
print(ologit)
## Logistic Regression Model
##
## lrm(formula = healthstatus1 ~ age + logincome + numberdiseases,
## data = zdravi)
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 5574 LR chi2 740.39 R2 0.148 C 0.675
## 1 523 d.f. 3 g 0.799 Dxy 0.350
## 2 2034 Pr(> chi2) <0.0001 gr 2.224 gamma 0.357
## 3 3017 gp 0.075 tau-a 0.198
## max |deriv| 2e-11 Brier 0.077
##
## Coef S.E. Wald Z Pr(>|Z|)
## y>=2 1.3960 0.2061 6.77 <0.0001
## y>=3 -0.9513 0.2054 -4.63 <0.0001
## age -0.0293 0.0017 -17.43 <0.0001
## logincome 0.2837 0.0231 12.27 <0.0001
## numberdiseases -0.0550 0.0041 -13.51 <0.0001
##
\(\ln{(\dfrac{\pi_1} {\pi_3})}=\)
\(\ln{(\dfrac{\pi_2} {\pi_3})}=\)
\(\pi_1=\dfrac{\exp{L_1}}{1+\exp{L_1}+\exp{L_2}}\)
\(\pi_2=\dfrac{\exp{L_2}}{1+\exp{L_1}+\exp{L_2}}\)
\(\pi_k=\dfrac{1}{1+\exp{L_1}+\exp{L_2}}\)