Lecture 10

Julia Schedler

Announcements

  • Assignment 1 due next Monday night at Midnight

  • Posted some time today or tomorrow morning

  • Midterm grades by next Monday

  • A note on AI use

  • Grading update

Recall the “Time series data analysis process”

  1. Observation
  2. Question
  3. Hypothesis
  4. Experiment
  5. Data Collection and Analysis
  6. Conclusion
  7. Report Results
  8. Replication

“Observations” of the day

“Observations” of the day

Activity 1: Define a question

Research Questions of the Day

  • What is the estimated value of the Quarterly Adjusted GNP for 2002, Q4?

  • What is the estimated percent of Egyptian GDP due to exports in 2018?

Note: These are both forecasting questions.

Methodological Questions of the Day

  • When do we want to fit an ARMA model?
  • When do we want to fit an ARIMA model?
  • How do we identify the order(s) of the ARIMA(p,d,q) model?
  • How can we check assumptions of a model?
  • How do we compare models?
  • How can we forecast from a model?

Definition of ARMA model

A time series \(x_t\) is ARMA(p,q) if

\[ x_t = \alpha + \phi_1 x_{t-1} + \phi_2 x_{t-2} + \dots + \phi_p x_{t-p} + w_t + \theta_1 w_{t-1} + \theta_2 w_{t-2} + \dots + \theta_q w_{t-q} \] Where \(\phi_p \ne 0, \theta_q \ne 0, \sigma^2_w > 0\) (and the model is causal and invertible)

Causal and invertible? 😱

  • For “causal”, if the time series is stationary it has a causal representation (depends on past, not future)
  • For “Intertible”, it’s so the moving average part gives us a unique model

A note non-uniqueness of Moving Average

Consider the MA model and its theoretical autocorrelation function, where \(\sigma^2_w\) is the white noise variance

\[ \begin{aligned} x_t &= w_t + \theta w_{t-1}\\ \gamma(h) &= \begin{cases} (1+\theta^2)\sigma^2_w & \Vert h \Vert = 0 \\ \theta \sigma^2_w & \Vert h \Vert = 1 \\ 0 & \Vert h \Vert > 1 \end{cases} \end{aligned} \] Consider two possible values for \(\theta\) and \(\sigma^2_w\):

\(\theta = 5, \sigma^2_w = 1\)

\[ \gamma(h) = \begin{cases} (1+5^2)\cdot1 & \Vert h \Vert = 0 \\ 5\cdot 1 & \Vert h \Vert = 1 \\ 0 & \Vert h \Vert > 1 \end{cases} \]

\(\theta = 1/5, \sigma^2_w = 25\)

\[ \gamma(h) = \begin{cases} (1+(1/5)^2)\cdot 25 & \Vert h \Vert = 0 \\ 1/5\cdot 25 & \Vert h \Vert = 1 \\ 0 & \Vert h \Vert > 1 \end{cases} \]

Both simplify to \[ \gamma(h) = \begin{cases} 26 & \Vert h \Vert = 0 \\ 5 & \Vert h \Vert = 1 \\ 0 & \Vert h \Vert > 1 \end{cases} \]

In practice we cannot distinguish the difference between the two models (they are stochastically the same). So, we will choose the invertible model, which is the one with \(\sigma^2_w =25\), \(\theta = 1/5\) (see Shumway and Stoffer page 72-73, example 4.6 for details.)

Interpretation of ARMA model

If we let \(\epsilon_t = w_t + \theta_1 w_{t-1} + \theta_2 w_{t-2} + \dots + \theta_q w_{t-q}\), then if \(x_t\) is ARMA(p,q), \[ x_t = \alpha + \phi_1 x_{t-1} + \phi_2 x_{t-2} + \dots + \phi_p x_{t-p} + \epsilon_t \]

  • Recognize the linear regression structure?
  • Difference: constraints on \(\phi, \theta\).

Activity 2: Fit an ARMA model?

Is there any seasonality (fixed-length cycles)? If so, probably want to use a seasonal model (not ARMA)

Activity 2 Solution: Fit an ARMA model?

No, the cycles are not of fixed length

[1] 15 12 17

Choosing the order of ARMA models

Activity 3: Recall MA ACF

Propose a rule for choosing the order of the MA model.

What about AR?

Error in arima.sim(list(ar = c(0.75, 0.5)), n = 1000): 'ar' part of model is not stationary

Activity 4: Fix the AR model so it is stationary

Code
set.seed(807)
par(mfrow = c(1,2))
acf(arima.sim(list(ar = c(0.25, 0.5)), n = 1000), main = "AR(2), theta = (.25, .5)")
acf(arima.sim(list(ar = c(FIXME, FIXME)), n = 1000),
    main ="AR(2), theta = (FIX, FIX)") #hint (lecture 1)

Activity 4 Solution: Fix the AR model so it is stationary

Making sure AR is stationary

  • Is complicated
  • Is taken care of in estimation by software

AR or MA or both?

Consider two of the models just simulated:

  • Shape is similar, but not identical
  • Is there a better way to tell?

Partial Autocorrelation plot

The PACF is the correlation between \(x_t\) and \(x_{t-k}\) that is not explained by lags \(1, 2, \ldots, k-1\).

“Removes” the autocorrelation of the previous lags.

Useful in determining the order of an Autoregressive process, and in combination with ACF, choosing between an AR or an MA model.

PACF of two series we just saw

ACF and PACF

AR or MA or both

Behavior of ACF/PACF for ARMA Models
AR(p) MA(q) ARMA(p,q)
ACF Tails off Cuts off after lag q Tails off
PACF Cuts off after lag p Tails off Tails off

Activity 5: AR or MA or both?

ACF: Tails off; PACF: Cuts off after lag 4. Maybe an AR(4), or some sort of ARMA?

Fitting an ARMA model

Use the sarima function, specify p, d, q (d = 0 for ARMA).

<><><><><><><><><><><><><><>
 
Coefficients: 
      Estimate     SE t.value p.value
ar1     0.9861 0.1247  7.9103  0.0000
ar2    -0.1715 0.1865 -0.9199  0.3618
ar3     0.1807 0.1865  0.9688  0.3370
ar4    -0.3283 0.1273 -2.5779  0.0128
xmean  20.0986 1.0699 18.7858  0.0000

sigma^2 estimated as 7.204995 on 53 degrees of freedom 
 
AIC = 5.052611  AICc = 5.072505  BIC = 5.265761 
 

FPP way

Code
fit_auto <- global_economy |>
  filter(Code == "EGY") |>
  model(ARIMA(Exports)) ## automatic ARIMA
report(fit_auto)
Series: Exports 
Model: ARIMA(2,0,1) w/ mean 

Coefficients:
         ar1      ar2      ma1  constant
      1.6764  -0.8034  -0.6896    2.5623
s.e.  0.1111   0.0928   0.1492    0.1161

sigma^2 estimated as 8.046:  log likelihood=-141.57
AIC=293.13   AICc=294.29   BIC=303.43

Which model should we choose?

  • One that fits well

  • We can use AICc to choose between models that seem to have good fit

AR(4) vs. ARMA(2,1)

Series: Exports 
Model: ARIMA(2,0,1) w/ mean 

Coefficients:
         ar1      ar2      ma1  constant
      1.6764  -0.8034  -0.6896    2.5623
s.e.  0.1111   0.0928   0.1492    0.1161

sigma^2 estimated as 8.046:  log likelihood=-141.57
AIC=293.13   AICc=294.29   BIC=303.43
Series: Exports 
Model: ARIMA(4,0,0) w/ mean 

Coefficients:
         ar1      ar2     ar3      ar4  constant
      0.9861  -0.1715  0.1807  -0.3283    6.6922
s.e.  0.1247   0.1865  0.1865   0.1273    0.3562

sigma^2 estimated as 7.885:  log likelihood=-140.53
AIC=293.05   AICc=294.7   BIC=305.41

Very similar! Let’s check the residuals

Check model fit

Code
gg_tsdisplay(residuals(fit_auto), plot_type = "partial")

Code
gg_tsdisplay(residuals(fit_manual), plot_type = "partial")

Make a forecast

Code
fit_auto %>% 
   forecast(h=5) |>
  autoplot(global_economy)

Code
fit_manual %>% 
   forecast(h=5) |>
  autoplot(global_economy)

Code
egy_forecast <- ## save auto forecast 
fit_auto %>% 
   forecast(h=5) 

How did the forecasts do?

Plot analysis

  • the historic data and the updated data mostly agree, except for the last shared point in 2017
  • The forecast for 2018 is spot-on
  • The later forecasts are not at all correct

Modeling workflow

GNP Series Example

GNP series

Does the series appear stationary?

Acf and pacf of GNP

Differenced GNP series

Does the series appear stationary?

ACF and PACF of differenced GNP series

ARIMA models

A time series is said to be ARIMA(p,d,q) if

\[ \nabla^dx_t = (1-B)^dx_t \]

Is ARMA(p,q). In other words, if after an order \(d\) difference we get an ARMA model. To bring in the parameters, we write

\[ \phi(B)(1-B)^dx_t = \alpha + \theta(B)w_t \]

Here, \(\phi(B) = 1 - \phi_1B - \phi_2B^2 - \dots - \phi_p B^p\) and \(\theta(B) = 1 + \theta_1 B + \theta_2 B^2 + \dots + \theta_q B^q\)

Example: ARIMA(1,1,0):

Since \(q = 0\), \(\theta(B) = 1\). Since \(q = 1\), \(\phi(B) = 1-\phi_1B\). Finally, \(d = 1\) so

\[ \begin{aligned} \phi(B) (1-B)^dx_t &= \alpha + \theta(B) w_t \\ (1 - \phi_1B)(1-B)^1x_t &= \alpha + 1\cdot w_t\\ (1- \phi_1B)(x_t -x_{t-1}) &= \alpha + w_t \\ x_t - x_{t-1} - (\phi_1Bx_t + \phi_1Bx_{t-1}) &= \alpha + w_t \\ x_t - x_{t-1} - (\phi_1x_{t-1} + \phi_1x_{t-2}) &= \alpha + w_t \\ x_t - x_{t-1} &= \alpha + \phi_1(x_{t-1} - x_{t-2}) + w_t \end{aligned} \]

Next time

  • More on ARMA vs. ARIMA

  • Box-cox transformations

  • Parameter redundancy

  • Portmanteau tests

  • Unit root tests

  • (maybe) Seasonal ARIMA?