Stock Regression Forecast

Stock Regression Forecast – Additional ETS and ARIMA Code

We have been asked for some additional ETS and ARIMA code for the original data set. I have attached the original Time Series with the ARIMA code, the ETS Code and the data set.

Download the daily adjusted closing prices from Yahoo Finance or another source for the last two years.  Using this data (be sure to sort in ascending order).

Additional questions:

  1. Retain 6 months of data for a test set.  Rerun the ETS forecasts and the auto.ARIMA. Which performs better on the hold-out set?
  2. Build your own ARIMA

“`{r setup, include=FALSE}

knitr::opts_chunk$set(echo = TRUE)


## Load library

“`{r }









## Load and preprocessing dataset for Task1

“`{r }

wab = fread(“WAB.csv”)

wab$Date<- as.Date(wab$Date, format = “%m/%d/%Y”)

wab = as.xts(wab)


wab.months<- to.monthly(wab)

wab.months<- wab.months[,4]

colnames(wab.months) <- “Close”

wab.ts<- ts(wab.months, start = c(2015,9))

wab.return=  periodReturn(wab[, “Adj Close”], period = “daily”)


autoplot.zoo(wab[,’Adj Close’])


##Describe statistic

From the descriptive statistics it can be seen that there is some kind of seasonality on the shares. But there is very little data to qualitatively identify.

“`{r }

summary(wab[,’Adj Close’])


autoplot(wab.return) + ggtitle(“Daily price return”) + xlab(“Year”)

ggseasonplot(as.ts(wab.months), year.labels=TRUE, year.labels.left=TRUE) +

ylab(“$ Price”) + ggtitle(“Seasonal plot: Price stock”)

ggseasonplot(as.ts(wab.months), polar=TRUE) +

ylab(“$ Price”) + ggtitle(“Stock Price”)





“`{r }

plot(as.ts(wab.months), type = “l”, main = “Linear regression for Price”)

abline(reg=lm(as.ts(wab.months)~ time(as.ts(wab.months))))

boxplot(as.ts(wab.months)~cycle(as.ts(wab.months)), main = “Boxplot for Price”)


“`{r }

adf.test(wab[,5], alternative=”stationary”, k=0)

acf(wab[,5],main=”Sample ACF of Price.”)

pacf(wab[,5],main=”Sample PACF of Price.”)

model_null = auto.arima(wab[,5])



I conducted a test for stationarity of the time series. The test showed that we can not reject the alternative hypothesis of stationarity of the time series. In addition, I constructed a graph showing that autocorrelation decreases smoothly and quickly, which is typical for a stationary series. I also built an automatic model of arima. We have obtained the Arima model (3,1,2). Nevertheless, we see that the first lag of the pacf went beyond the confidence interval. This suggests that the daily change in the stock prices is essentially a random amount uncorrelated with previous days.

## Decompose

When we decompose a time series into components, we usually combine the trend and cycle into a single “trend-cycle” component (sometimes called the “trend” for simplicity).

To give an answer to what is best additive or decomposition. Let’s make some calculations. We calculate Mean absolute scaled error (MASE) for each method. The smallest error in the model will show which method is better.

# Calculate decomposition for each method

“`{r }

dec1 <- decompose(wab.months, type = “additive”) #build an additive decomposition model

dec2 <- decompose(wab.months, type = “multiplicative”) #build a multiplicative decomposition model


Now we can subtract the seasonality from the trend for additive method. After that we will build the model and look at the value MASE (Mean absolute scaled error).

# Additive method

“`{r }




dec1.seasonal <-$seasonal)

dec1.trend <-$trend)

df.additive<- cbind(dec1.seasonal, dec1.trend)

colnames(df.additive) <- c(“seasonal”, “trend”)

df.additive<- na.omit(df.additive)

df.additive$decomp = df.additive$trend – df.additive$seasonal

model.additive<- auto.arima(df.additive$decomp)



We have MASE 0.75

Let’s do the same for the method multiplicative

“`{r }




dec2.seasonal <-$seasonal)

dec2.trend <-$trend)

df.multiplicative<- cbind(dec2.seasonal, dec2.trend)

colnames(df.multiplicative) <- c(“seasonal”, “trend”)

df.multiplicative<- na.omit(df.multiplicative)

df.multiplicative$decomp = df.multiplicative$trend – df.multiplicative$seasonal

model.multiplicative<- auto.arima(df.multiplicative$decomp)



We have MASE 0.55

# Conclusion

We calculated MASE for each method. The least error in the method multiplicative. This means that this method is better than additive.

However, it should be said that both methods are not effective. Because for their calculation it was required to subtract almost half of the data. I see that it makes sense to use these old methods only on large data sets. I believe that stock prices are seasonal, but it is not stable. If we looked at a longer period, we would see it. This happens because, as the stock prices are influenced by many other factors, such as economic conditions, a co-proprietary strategy, as well as market expectations. The studies that I did at the beginning show that we can not see clear seasonal patterns.

## Forecast

# Model ets “AAA” MASE = 0.6202738

“`{r }

model.eta.AAA = ets(wab.months, model =”AAA”, ic = “aic”, n = 6)



plot(forecast(model.eta.AAA), ylab=”Forecast Price Activity”, type=”l”, xlab = “Time”, main = “Forecast ets ‘AAA'”)


We have MASE 0.6202738.

This model has a good coefficient MASE. The remainders are normally distributed, and also shows a fairly logical forecast.

# Model Exponential smoothing forecasts  MASE = 0.5689797

“`{r }

model.hw = hw(wab.months,seasonal=”multiplicative”)



plot(forecast(model.hw), ylab=”Forecast Price”, type=”l”, xlab = “Time”, main = “Forecast from model ‘Exponential smoothing'”)


We have MASE 0.5689797

# Model ets “MNM”  MASE = 0.3357446

“`{r }

model.ets.MNM = ets(wab.months, model =”MNM”, ic = “aic”, n = 24)



plot(forecast(model.ets.MNM), ylab=”Forecast Solar Activity”, type=”l”, xlab = “Time”, main = “Forecast ets ‘MNM'”)


## Conclusion

We built three models:

* Model ets ‘MNM’, where MASE = 0.5668805

* Exponential smoothing, where MASE = 0.5689797

* Model ets “AAA”, where MASE = 0.6202738

The least error is model Modelets ‘MNM’. For interest, I also built a arima model, where MASE =  0.9949591. After we made an adjustment for seasonality, the model’s accuracy increased. This means that when there is some seasonality in the data, it must be taken into account when predicting data. For example, even when I use the model Exponential smoothing, I specified in the parameters “multiplicative”. As a result, we got quite a low error MASE. It should also be noted two important things. The first is that the accuracy of the model can change significantly if we analyze a longer period of time (for example, prices for 10 years). The second important thing is that there is no ideal model. Seasonality on the shares may not be large in any particular year and these models will not work. 


#your downloaded data file is used .

x<-read.csv(“WEB.csv”,header=T) #whatever your file name is , change directory as per the location of file in your PC

w=x[,6] #adjusted historical stock data




w_test=w[379:505] #last 6 months , March 15 to Sep 14 , whichever dates availble , actually 127 only available , you may change these in case you need


z <-forecast(fit,length(w_test)) #predicted values for last 6 months by ARIMA

plot(z) #plots the predictions from auto.ARIMA

fit1 <-ets(w_train)

u <- forecast(fit1,length(w_test))

plot(u) #plots the predictions from ETS

accuracy<- accuracy(z,w_test)

accuracy1 <- accuracy(u,w_test)

#though there’s no unique measure to compare , we compare the MASE values and ETS has smaller MAS values (very slight difference though). So , you may call ETS better in that case.

#building my own ARIMA ,i.e. to choose p , d , q

y=diff(log(w_train),lag=1) #its the first difference of log returns

azfinal.aic<- Inf #initiation with large value

azfinal.order<- c(0,0,0)

for (p in 1:4) for (d in 0:1) for (q in 1:4) {

azcurrent.aic<- AIC(arima(y, order=c(p, d, q)))

if (azcurrent.aic<azfinal.aic) {

azfinal.aic<- azcurrent.aic

azfinal.order<- c(p, d, q)

azfinal.arima<- arima(y, order=azfinal.order)


} #we are taking  d=0,1, p=1,2,3,4 , q=1,2,3,4 and choosing which combination of p,d,qminimises the AIC

#the iteration gives (p,d,q)=(2,0,2)

Box.test(resid(azfinal.arima), lag=20, type=”Ljung-Box”)

#the p-value turns out to be 0.98 , i.e. bigger than 0.05 . So it’s a good fit


plot(v) #this gives an estimate of log returns using our own ARIMA(2,0,2) for test set

# there might be bit of convergence issues since the max iteration limit may exceed . But still it doesn’t matter much since the fit is still good , as confirmed by Ljung-Box test . Some NaN’s may be produced since R computes negative loglikelihood and might show NaN if MLE is around the boundary , but these don’t matter much here as long as the model is concerned.