What is autocorrelation?

Generalized Additive Mixed Effects Models have several components:

Smooth terms for covariates
Random Effects: Intercepts, Slopes and Smooths.
Categorical Predictors
Interactions of (1)-(3)

We can add one more component for autocorrelation: modeling the residuals:

Covariance structure for the residuals. Once you’ve removed the fixed and random effects from your data what pattern is there ot be accounted for.

What does this mean? In practical terms autocorrelation is meant to caputre either temporal or spatial dependance in a model. You are taking measurements at set points in time, for example, and want to model the relationship between \(y_t\) and \(y_{t \pm n}\) for some n. You can also capture spatial autocorrelation by the same principles.

This is a greatly condensed overview of autocorrelation. A number of online texts through the UIUC library website can give you a much more indepth overview.

Generally, time series data is modelled in the following way:

Remove trend (i.e. fixed effects predictors)
Remove non-stationarity (i.e either difference the data or )
Model the noise (and it should look like noise) that is left over.

We will see – it is not straightforward to account for auto-correlation if time is also a fixed and random smooth in the model.

Autoregression and Moving Average

An autoregressive process is one in which the previous n points influence the current observations. An AR(1) process would be one in which \(y_t\) = \(\psi y_{t-1}\); an AR(2) process is \(y_t\) = \(\psi_1 y_{t-1} + \psi_2 y_{t-2}\), etc.

A moving average (MA) process is one in which the current value is an average of preceeding white noise: An MA(1) process is \(y_t\) = \(\mu\) + \(\theta \epsilon_{t}\); an MA(2) process is \(\mu\) + \(\theta_1 \epsilon_{t}\) + \(\theta_2 \epsilon_{t-1}\)

Another important component in autocorrelation is differencing \(\Delta y_{t} = y_t - y_{t-1}\) where the second order difference model is \(\Delta y_{t} - \Delta y_{t-1}\) = \((y_t - y_{t-1}) - (y_{t-1} - y_{t-2})\).

Any introduction to time series will cover these concepts in much greater detail: In time-series parlance, linguists are most interested in the trend (or fixed effects) part of the model and any autocorrelation is ancilary to the main model.

df$time = as.numeric(df$time)

#Whichever Electrode we are interested in: convert from chr to numeric
df$CP = as.numeric(df$CP)
df$Pz = as.numeric(df$Pz)
df$FP2 = as.numeric(df$FP2)
df$F7 = as.numeric(df$F7)
df$gramCat = as.factor(df$gramCat)
df$subject = as.factor(df$subject)

range(as.numeric(df$time))

## [1] -200  995

ggplot(df, aes(x=time, y=CP,color=gramCat)) + stat_smooth()

gamint3 = gam(CP ~ s(time) + gramCat + s(time, by=gramCat) + s(subject, time, bs="fs", m=1) ,data=df)

## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has
## repeated 1-d smooths of same variable.

summary(gamint3)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## CP ~ s(time) + gramCat + s(time, by = gramCat) + s(subject, time, 
##     bs = "fs", m = 1)
## 
## Parametric coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -1.838e+03  1.034e+03  -1.778   0.0754 .  
## gramCatUngrammatical  1.160e+00  3.868e-02  29.983   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                  edf Ref.df       F  p-value    
## s(time)                        7.878  7.941 103.256  < 2e-16 ***
## s(time):gramCatGrammatical     7.825  8.058   5.340 8.57e-07 ***
## s(time):gramCatUngrammatical   1.235  1.254   0.706    0.424    
## s(subject,time)              166.903 93.000 128.521    0.784    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Rank: 199/200
## R-sq.(adj) =  0.699   Deviance explained = 70.5%
## GCV = 3.4815  Scale est. = 3.4106    n = 9120

We have repeated measures over time, but we’ve included time in the mixed effects component

There are two visualizations of the residuals that can help you model autocorrelations: the ACF graph and the PACF. In general, ACF lets you assess the moving average component of the model and PACF lets you identify the Autoregressive component. The p,q parameters can be estimated from the sharp cut off in the (P)ACF graphs. The full model is ARIMA(p,d,q) where p is the autoregression order, d is the differencing parameter and q is the moving average order.

acf(resid(gamint3), lag.max = 36, main = "ACF")

pacf(resid(gamint3), lag.max = 36, main = "pACF")

You may notice for the ACF above, however, there is no such cutoff. This is an indication that we need to difference the series.

df$CP_diff1 = c(NA,diff(df$CP))

Is this right? No! We don’t have just one time series. Use tidy tools to make data tidy and then difference the way we want: within subject and grammatical category. [You lose an observation per subject and gramCat]

df = df %>%
 arrange(gramCat, subject, time)%>%
 group_by(subject, gramCat)%>%
mutate(diff1=c(NA,diff(CP)))

We can fit the new response to our original model.

gamD = gam(diff1 ~ s(time) + gramCat + s(time, by=gramCat) + s(subject, time, bs="fs", m=1) ,data=df)

## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has
## repeated 1-d smooths of same variable.

acf(resid(gamD), lag.max = 36, main = "ACF")

pacf(resid(gamD), lag.max = 36, main = "pACF")

We can see there is still non-stationarity present in the residuals. I’m going to take the second difference. This is a bit iterative, but let’s see what this gives us.

df = df %>%
 arrange(gramCat, subject, time)%>%
 group_by(subject, gramCat)%>%
mutate(diff1=c(NA,NA,diff(CP,difference=2)),repGr = factor(paste0(as.character(subject),as.character(gramCat))))

df$diff2 = df$diff1

gamD2 = gam(diff1 ~ s(time) + gramCat + s(time, by=gramCat) + s(subject, time, bs="fs", m=1) ,data=df)

## Warning in gam.side(sm, X, tol = .Machine$double.eps^0.5): model has
## repeated 1-d smooths of same variable.

acf(resid(gamD2), lag.max = 36, main = "ACF")

pacf(resid(gamD2), lag.max = 36, main = "pACF")

summary(gamD2)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## diff1 ~ s(time) + gramCat + s(time, by = gramCat) + s(subject, 
##     time, bs = "fs", m = 1)
## 
## Parametric coefficients:
##                       Estimate Std. Error t value Pr(>|t|)
## (Intercept)           0.002597   0.005598   0.464    0.643
## gramCatUngrammatical -0.001501   0.007917  -0.190    0.850
## 
## Approximate significance of smooth terms:
##                                    edf   Ref.df     F p-value
## s(time)                      6.667e-01   0.6667 0.000   0.991
## s(time):gramCatGrammatical   6.667e-01   0.6667 0.061   0.840
## s(time):gramCatUngrammatical 6.667e-01   0.6667 0.057   0.846
## s(subject,time)              3.647e-08 170.0000 0.000   1.000
## 
## Rank: 199/200
## R-sq.(adj) =  -0.000323   Deviance explained = 0.00088%
## GCV = 0.14177  Scale est. = 0.14171   n = 9044

Ideally we would want complete non-stationarity, but I seem to moved the problem from one part of the model to another. For the sake of time, and because I can’t fit three million different models, let’s move on to the ARMA process. Eye-balling the (P)ACF graphs, lets start with an ARMA(3,3) process and see how well that works.

In order to fit the model, I need to use the gamm() function [I’m still specifying the mixed model via the s() function, but I need to use the correlation structure argument of gamm()]

(valRho <- acf(resid(gamD2), plot=FALSE)$acf[2])
gamD3 = gamm(diff1 ~ s(time) + gramCat + s(time, by=gramCat) + s(subject, time, bs="fs", m=1), corr=corARMA(form = ~ 1|repGr, p=1,q=1), rho=valRho ,data=df)

acf(resid(gamD3$lme), lag.max = 36, main = "ACF")
pacf(resid(gamD3$lme), lag.max = 36, main = "pACF")

summary(gamD3)

We have to decide what do when this won’t run: I suspect it is because time is a covariate and random smooth; adding it also in the correlation structure causes problems, because lme can’t handle it (gam() can) – I have to find a way to reduce the structure to something of full-rank.

Moving on with my life

Let’s move on to something more straightforward than real data: fake data! We can simulate different designs and fit autocorrelation– this is from the mgcv package.

## now an example with autocorrelated errors....
dat <- gamSim(6,n=200,scale=.2,dist="poisson")

## 4 term additive + random effectGu & Wahba 4 term additive model

fac <- dat$fac
op <- par(mfrow=c(2,2))

n <- 200;sig <- 2
x <- 0:(n-1)/(n-1)
f <- 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10
e <- rnorm(n,0,sig)
for (i in 2:n) e[i] <- 0.6*e[i-1] + e[i]
y <- f + e

Y reperesents the fixed effect smooth and then the error term, separately, which is fit with an AR(1) process.

## Fit model with AR1 residuals
b <- gamm(y~s(x,k=20),correlation=corAR1())

visreg(b$gam,"x")

## Raw residuals still show correlation, of course...
acf(residuals(b$gam),main="raw residual ACF")

## But standardized are now fine...
acf(residuals(b$lme,type="normalized"),main="standardized residual ACF")

## compare with model without AR component...
b <- gam(y~s(x,k=20))
visreg(b,"x")

acf(residuals(b),main="raw residual ACF")

## more complicated autocorrelation example - AR errors
## only within groups defined by `fac'
e <- rnorm(n,0,sig)
for (i in 2:n) e[i] <- 0.6*e[i-1]*(fac[i-1]==fac[i]) + e[i]
y <- f + e
b <- gamm(y~s(x,k=20),correlation=corAR1(form=~1|fac))
plot(b$gam);lines(x,f-mean(f),col=2)

par(op) 
## more complex situation with nested random effects and within
## group correlation 

set.seed(0)
n.g <- 10
n<-n.g*10*4
## simulate smooth part...
dat <- gamSim(1,n=n,scale=2)

## Gu & Wahba 4 term additive model

f <- dat$f
## simulate nested random effects....
fa <- as.factor(rep(1:10,rep(4*n.g,10)))
ra <- rep(rnorm(10),rep(4*n.g,10))
fb <- as.factor(rep(rep(1:4,rep(n.g,4)),10))
rb <- rep(rnorm(4),rep(n.g,4))
for (i in 1:9) rb <- c(rb,rep(rnorm(4),rep(n.g,4)))
## simulate auto-correlated errors within groups
e<-array(0,0)
for (i in 1:40) {
  eg <- rnorm(n.g, 0, sig)
  for (j in 2:n.g) eg[j] <- eg[j-1]*0.6+ eg[j]
  e<-c(e,eg)
}
dat$y <- f + ra + rb + e
dat$fa <- fa;dat$fb <- fb
## fit model .... 
b <- gamm(y~s(x0,bs="cr")+s(x1,bs="cr")+s(x2,bs="cr")+
            s(x3,bs="cr"),data=dat,random=list(fa=~1,fb=~1),
          correlation=corAR1())
plot(b$gam,pages=1)

summary(b$gam)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## y ~ s(x0, bs = "cr") + s(x1, bs = "cr") + s(x2, bs = "cr") + 
##     s(x3, bs = "cr")
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7.7756     0.4413   17.62   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##         edf Ref.df       F  p-value    
## s(x0) 3.393  3.393  11.033 2.55e-07 ***
## s(x1) 2.757  2.757 127.438  < 2e-16 ***
## s(x2) 8.006  8.006 111.492  < 2e-16 ***
## s(x3) 1.000  1.000   0.203    0.652    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.567   
##   Scale est. = 4.9578    n = 400

vis.gam(b$gam)

## Prediction from gam object, optionally adding 
## in random effects. 

## Extract random effects and make names more convenient...
refa <- ranef(b$lme,level=5)
rownames(refa) <- substr(rownames(refa),start=9,stop=20)
refb <- ranef(b$lme,level=6)
rownames(refb) <- substr(rownames(refb),start=9,stop=20)

## make a prediction, with random effects zero...
p0 <- predict(b$gam,data.frame(x0=.3,x1=.6,x2=.98,x3=.77))

## add in effect for fa = "2" and fb="2/4"...
p <- p0 + refa["2",1] + refb["2/4",1] 

## and a "spatial" example...
library(nlme);
set.seed(1);
n <- 100
dat <- gamSim(2,n=n,scale=0) ## standard example

## Bivariate smoothing example

attach(dat)
old.par<-par(mfrow=c(2,2))
contour(truth$x,truth$z,truth$f)  ## true function
f <- data$f                       ## true expected response
## Now simulate correlated errors...
cstr <- corGaus(.1,form = ~x+z)  
cstr <- Initialize(cstr,data.frame(x=data$x,z=data$z))
V <- corMatrix(cstr) ## correlation matrix for data
Cv <- chol(V)
e <- t(Cv) %*% rnorm(n)*0.05 # correlated errors
## next add correlated simulated errors to expected values
data$y <- f + e ## ... to produce response
b<- gamm(y~s(x,z,k=50),correlation=corGaus(.1,form=~x+z),
         data=data)
plot(b$gam) # gamm fit accounting for correlation
# overfits when correlation ignored.....  
b1 <- gamm(y~s(x,z,k=50),data=data);
plot(b1$gam) 
b2 <- gam(y~s(x,z,k=50),data=data);plot(b2)

Week 4: Autocorrelation and GAMMs

Joseph Roy

October 6, 2016

What is autocorrelation?

Autoregression and Moving Average

Moving on with my life