Statický vs. dynamický model

Základní stavební kameny

install.packages("tseries")
install.packages("stats")
library(tseries)
library(stats)
library(fUnitRoots)
## Loading required package: urca
## Loading required package: timeDate
## Loading required package: timeSeries
## Loading required package: fBasics
## 
## Rmetrics Package fBasics
## Analysing Markets and calculating Basic Statistics
## Copyright (C) 2005-2014 Rmetrics Association Zurich
## Educational Software for Financial Engineering and Computational Science
## Rmetrics is free software and comes with ABSOLUTELY NO WARRANTY.
## https://www.rmetrics.org --- Mail to: info@rmetrics.org
## 
## Attaching package: 'fUnitRoots'
## The following objects are masked from 'package:urca':
## 
##     punitroot, qunitroot, unitrootTable

Bílý sum

\(e_t \sim i.i.d(0,\sigma^2)\)

Podmnozinou bílého sumu, je Gaussovský bílý sum

\(e_t\sim N(0,\sigma^2)\)

n=1000
WN=rnorm(n,0,7)
plot.ts(WN)

AR proces

Naprídklad AR(k) proces

\(y_t=\alpha_0+\alpha_1y_{t-1}+...+\alpha_{t-k}y_{t-k}+e_t\)

\(y_t=0.5y_{t-1}+e_t\)

ar_proces=arima.sim(list(order = c(1,0,0), ar = 0.5), n = 1000)
plot.ts(ar_proces)

Náhodná procházka

\(p_t=p_{t-1}+\epsilon_t\)

N=1000
w=rnorm(N)
RW=cumsum(w)
plot.ts(RW)

Vyhodnocování modelu

Autokorelacní funkce ACF

\(\rho_k=\dfrac{Cov(y_t,y_{t-k})}{Var(y_t)}\)

Parcilní autokorelacní funkce PACF.

Princip je, ze se postupne navysuje pocet zpozdení a a testuje se kazdý parametr pro nove pridanou promennenou.

\(y_t=\alpha_0+\alpha_1y_{t-1}+e_t\)

\(y_t=\alpha_0+\alpha_1y_{t-1}+\alpha_2y_{t-2}+e_t\)

\(y_t=\alpha_0+\alpha_1y_{t-1}+\alpha_2y_{t-2}+\alpha_3y_{t-3}+e_t\)

\(y_t=\alpha_0+\alpha_1y_{t-1}+\alpha_2y_{t-2}+\alpha_3y_{t-3}...+\alpha_{t-k}y_{t-k}+e_t\)

Bilý sum

par(mfrow=c(1,2))
acf(WN)
pacf(WN)

AR(1) proces

par(mfrow=c(1,2))
acf(ar_proces)
pacf(ar_proces)

RW proces

par(mfrow=c(1,2))
acf(RW)
pacf(RW)

data=pacf(RW)

data=data$lag

Vyhodnocení pomocí Ljung-Box testu.

Box.test(ar_proces, lag = 4, type ="Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  ar_proces
## X-squared = 354.32, df = 4, p-value < 2.2e-16

Stacionarira a unit root test

Deterministický trend

n=1000
e=rnorm(n,0,70)
t=1:1000
x=rnorm(n,10,20)
y=20+1.2*t+0.5*x+e
plot.ts(y)

par(mfrow=c(1,2))
acf(y)
pacf(y)

Pro testování stacionarity pouzíváme napríklad ADF test. Tento package vybírá nejvhodnejsí zpozdení a tvar procesu. To nemusí být uplne vhodné

adf.test(y)
## Warning in adf.test(y): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  y
## Dickey-Fuller = -10.158, Lag order = 9, p-value = 0.01
## alternative hypothesis: stationary

Balícek fUnitRoots obsahuje velmi hezky zpracované testy na unit root

urdfTest(y, lags = 1, type = "ct", doplot = TRUE)

## 
## Title:
##  Augmented Dickey-Fuller Unit Root Test
## 
## Test Results:
##   
##   Test regression trend 
##   
##   Call:
##   lm(formula = z.diff ~ z.lag.1 + 1 + tt + z.diff.lag)
##   
##   Residuals:
##        Min       1Q   Median       3Q      Max 
##   -250.597  -50.366   -2.078   51.950  249.045 
##   
##   Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
##   (Intercept) 27.405656   4.657215   5.885 5.45e-09 ***
##   z.lag.1     -1.002456   0.044853 -22.350  < 2e-16 ***
##   tt           1.201379   0.054318  22.117  < 2e-16 ***
##   z.diff.lag   0.002718   0.031729   0.086    0.932    
##   ---
##   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##   
##   Residual standard error: 71.14 on 994 degrees of freedom
##   Multiple R-squared:  0.4998,   Adjusted R-squared:  0.4983 
##   F-statistic: 331.1 on 3 and 994 DF,  p-value: < 2.2e-16
##   
##   
##   Value of test-statistic is: -22.3497 166.7136 249.7543 
##   
##   Critical values for test statistics: 
##         1pct  5pct 10pct
##   tau3 -3.96 -3.41 -3.12
##   phi2  6.09  4.68  4.03
##   phi3  8.27  6.25  5.34
## 
## Description:
##  Sun Apr 29 21:42:46 2018 by user: Lukas
urersTest(y, type ="DF-GLS", model = "trend",lag.max = 4, doplot = TRUE)

## 
## Title:
##  Elliott-Rothenberg-Stock Unit Root Test
## 
## Test Results:
##   
##   Test of type DF-GLS 
##   detrending of series with intercept and trend 
##   
##   Residuals:
##        Min       1Q   Median       3Q      Max 
##   -245.928  -40.003    9.056   58.660  257.229 
##   
##   Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
##   yd.lag       -0.80037    0.06548 -12.223  < 2e-16 ***
##   yd.diff.lag1 -0.16986    0.05988  -2.837  0.00465 ** 
##   yd.diff.lag2 -0.14084    0.05293  -2.661  0.00792 ** 
##   yd.diff.lag3 -0.09989    0.04401  -2.270  0.02344 *  
##   yd.diff.lag4 -0.05668    0.03174  -1.786  0.07445 .  
##   ---
##   Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##   
##   Residual standard error: 72.18 on 990 degrees of freedom
##   Multiple R-squared:  0.4858,   Adjusted R-squared:  0.4832 
##   F-statistic:   187 on 5 and 990 DF,  p-value: < 2.2e-16
##   
##   
##   Value of test-statistic is: -12.2225 
##   
##   Critical values of DF-GLS are:
##                    1pct  5pct 10pct
##   critical values -3.48 -2.89 -2.57
## 
## Description:
##  Sun Apr 29 21:42:46 2018 by user: Lukas

Stochastický trend

Nejznámejsím procesem obsahující stochastický trend je proces náhodné procházky (RW)

\(p_t=p_{t-1}+\epsilon_t\)

Integrovaný proces rádu 1. Proces obsahuje tzv. jeden jednotkový koren. Prakticky to znamená, ze pro získání stacionární casové rady je potreba provést jednu diferenci. Pro proces I(2) by bylo nutné provést 2 diference. V ekonomii se takový proces vyskytuje vyjímecne.

adf.test(RW)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  RW
## Dickey-Fuller = -0.60984, Lag order = 9, p-value = 0.9768
## alternative hypothesis: stationary

Zdánlivá regrese

Jedním z problému nestacionárních casových rad, je moznost vzniku zdánlivé regrese

set.seed(100)
N=1000
w=rnorm(N)
RW=cumsum(w)
x2=rnorm(N)
RW2=cumsum(x2)
regrese=lm(RW~RW2)
summary(regrese)
## 
## Call:
## lm(formula = RW ~ RW2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -21.572  -6.849  -1.709   2.845  28.521 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -0.04025    0.35948  -0.112   0.9109  
## RW2          0.11360    0.06540   1.737   0.0827 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.95 on 998 degrees of freedom
## Multiple R-squared:  0.003014,   Adjusted R-squared:  0.002015 
## F-statistic: 3.017 on 1 and 998 DF,  p-value: 0.08268

V ideálním prípade by se mela residua chovat jako bílý sum.

residua=regrese$residuals
adf.test(residua)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  residua
## Dickey-Fuller = -1.4519, Lag order = 9, p-value = 0.8103
## alternative hypothesis: stationary

Zlom a sezonost

set.seed(5)
N=32
x=rnorm(N,0,1)
sezon=rep(c(1,0,0,0),8)

y=x+2*sezon
plot.ts(y)

adf.test(y)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  y
## Dickey-Fuller = -1.6539, Lag order = 3, p-value = 0.7064
## alternative hypothesis: stationary

A nyní odfiltrujeme sezoní slozku pomocí dummy promenných

y_sez=lm(y~sezon)
fit=y_sez$fitted.values
adf.test(fit)                  # casova rada ocistena od sezonosti je jiz stacionarni
## Warning in summary.lm(res): essentially perfect fit: summary may be
## unreliable
## Warning in adf.test(fit): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  fit
## Dickey-Fuller = -1.0024e+16, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary

Proces se zlomem

N1=100
N2=100
ar_proces1=arima.sim(list(order = c(1,0,0), ar = 0.5), n = N1)
ar_proces2=arima.sim(list(order = c(1,0,0), ar = 0.5), n = N2)+10
proces=c(ar_proces1,ar_proces2)
plot.ts(proces)

adf.test(proces)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  proces
## Dickey-Fuller = -2.1855, Lag order = 5, p-value = 0.4984
## alternative hypothesis: stationary

Jak se vyporádat s nestacionární casovou radou?

  • difference
  • zahrnutí deterministického trendu (POZOR)
  • ocistit od sezonosti
  • zahrnout zlom
set.seed(250)
n=1000
e=rnorm(n,0,70)
t=1:1000
x=rnorm(n,10,20)+0.02*t
y=20+1.2*t+e
regrese=lm(y~x)
summary(regrese)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -737.47 -275.22  -15.78  280.48  791.18 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 528.4333    15.1854  34.799   <2e-16 ***
## x             4.4555     0.5141   8.666   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 343.3 on 998 degrees of freedom
## Multiple R-squared:  0.06999,    Adjusted R-squared:  0.06906 
## F-statistic: 75.11 on 1 and 998 DF,  p-value: < 2.2e-16
regrese2=lm(y~x+t)
summary(regrese2)
## 
## Call:
## lm(formula = y ~ x + t)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -213.287  -50.409    0.618   48.040  193.155 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 15.553340   4.672845   3.328 0.000905 ***
## x            0.075235   0.110779   0.679 0.497205    
## t            1.205489   0.008103 148.768  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 71.31 on 997 degrees of freedom
## Multiple R-squared:  0.9599, Adjusted R-squared:  0.9598 
## F-statistic: 1.194e+04 on 2 and 997 DF,  p-value: < 2.2e-16

Stochastický trend a pridání deterministického trednu do regrese

set.seed(4)
N=1000
w=rnorm(N)
RW=cumsum(w)
x2=rnorm(N)
RW2=cumsum(x2)
t=seq(1:N)
regrese=lm(RW~RW2)
summary(regrese)
## 
## Call:
## lm(formula = RW ~ RW2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -21.403  -9.550  -2.244   9.306  26.564 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -7.06929    0.39927  -17.70   <2e-16 ***
## RW2         -0.70471    0.03525  -19.99   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.53 on 998 degrees of freedom
## Multiple R-squared:  0.286,  Adjusted R-squared:  0.2853 
## F-statistic: 399.7 on 1 and 998 DF,  p-value: < 2.2e-16
regrese=lm(RW~RW2+t)
summary(regrese)
## 
## Call:
## lm(formula = RW ~ RW2 + t)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.7533  -2.0703   0.0622   2.5524   9.2601 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.2573031  0.2547982  52.031   <2e-16 ***
## RW2          0.1431759  0.0146198   9.793   <2e-16 ***
## t           -0.0484489  0.0005237 -92.515   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.725 on 997 degrees of freedom
## Multiple R-squared:  0.9255, Adjusted R-squared:  0.9254 
## F-statistic:  6193 on 2 and 997 DF,  p-value: < 2.2e-16
regrese=lm(diff(RW)~diff(RW2))
summary(regrese)
## 
## Call:
## lm(formula = diff(RW) ~ diff(RW2))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8393 -0.6441  0.0128  0.6556  3.2252 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.03530    0.03070  -1.150    0.250
## diff(RW2)    0.02559    0.03076   0.832    0.406
## 
## Residual standard error: 0.9699 on 997 degrees of freedom
## Multiple R-squared:  0.0006939,  Adjusted R-squared:  -0.0003085 
## F-statistic: 0.6923 on 1 and 997 DF,  p-value: 0.4056

ARIMA Box-Jenkins

Nasimuloval jsem 6 procesu, kdy se pokusíme si jej odhadnout

Akcie

library(quantmod)
getSymbols("AAPL")
## 'getSymbols' currently uses auto.assign=TRUE by default, but will
## use auto.assign=FALSE in 0.5-0. You will still be able to use
## 'loadSymbols' to automatically load data. getOption("getSymbols.env")
## and getOption("getSymbols.auto.assign") will still be checked for
## alternate defaults.
## 
## This message is shown once per session and may be disabled by setting 
## options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.
## 
## WARNING: There have been significant changes to Yahoo Finance data.
## Please see the Warning section of '?getSymbols.yahoo' for details.
## 
## This message is shown once per session and may be disabled by setting
## options("getSymbols.yahoo.warning"=FALSE).
## [1] "AAPL"
price_A=AAPL$AAPL.Adjusted
chartSeries(price_A)

getSymbols("WMT")
## [1] "WMT"
price_W=WMT$WMT.Adjusted
chartSeries(price_W)

adf.test(price_A)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  price_A
## Dickey-Fuller = -2.0386, Lag order = 14, p-value = 0.562
## alternative hypothesis: stationary
adf.test(price_W)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  price_W
## Dickey-Fuller = -2.6818, Lag order = 14, p-value = 0.2897
## alternative hypothesis: stationary
zdanliva_regrese=lm(price_A~I(lag(price_W)))
summary(zdanliva_regrese)
## 
## Call:
## lm(formula = price_A ~ I(lag(price_W)))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -54.487 -11.694   0.296  10.500  62.546 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -95.13140    1.43956  -66.08   <2e-16 ***
## I(lag(price_W))   2.78089    0.02437  114.09   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 20.37 on 2847 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.8205, Adjusted R-squared:  0.8205 
## F-statistic: 1.302e+04 on 1 and 2847 DF,  p-value: < 2.2e-16
vynos_A=diff(log(price_A), lag=1)    
vynos_W=diff(log(price_W), lag=1)  
vynos_A=na.trim(vynos_A)
vynos_W=na.trim(vynos_W)
adf.test(vynos_A)
## Warning in adf.test(vynos_A): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  vynos_A
## Dickey-Fuller = -13.228, Lag order = 14, p-value = 0.01
## alternative hypothesis: stationary
adf.test(vynos_W)
## Warning in adf.test(vynos_W): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  vynos_W
## Dickey-Fuller = -14.198, Lag order = 14, p-value = 0.01
## alternative hypothesis: stationary
regrese=lm(vynos_A~I(lag(vynos_W)))
summary(regrese)
## 
## Call:
## lm(formula = vynos_A ~ I(lag(vynos_W)))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.198137 -0.009035  0.000000  0.010180  0.128794 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)   
## (Intercept)      0.0010571  0.0003763   2.809  0.00501 **
## I(lag(vynos_W)) -0.0399141  0.0301065  -1.326  0.18502   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.02008 on 2846 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.0006172,  Adjusted R-squared:  0.0002661 
## F-statistic: 1.758 on 1 and 2846 DF,  p-value: 0.185