# R and RStudio Forecasting

**Preparation**

Using

*R,*Install*R and RStudio*and do the tutorials

2. Resources

ØTime series course.Introduction to forecasting

ØEconomic Forecasting course, Slides on

·1. Using*R*·2. Getting started

·3. The forecaster’s toolbox

3. Simple regression

Resources

ØEconomic Forecasting course, Slides on Simple regression

ØPredictive Analytics course.Simple Regression

4.Multiple regression

Resources

ØEconomic Forecasting course.Multiple regression

ØPredictive Analytics course. Multiple Regression

**Questions**

Problem 1

2. Skin cancer rates have been steadily increasing over recent years. It is thought that this

may be due to ozone depletion. The following data are ozone depletion rates in various

locations and the rates of melanoma (a form of skin cancer) in these locations.

Ozone dep (%) 5 7 13 14 17 20 26 30 34 39 44

Melanoma (%) 1 1 3 4 6 5 6 8 7 10 9

a. Plot melanoma against ozone depletion and fit a straight line regression model to the

data.

b. Plot the residuals from your regression against ozone depletion. What does this say

about the fitted model?

c. What percentage of the variation in rates of melanoma is explained by the regression

relationship?

d. Scientists discovered that 40% of ozone was depleted in a certain region. What

would you expect to be the rate of melanoma in this area? Give a prediction interval.

e. Explain the assumptions and limitations in your prediction. What other factors may

play a role?

**ASSIGNMENT 02**** **

**Preparation****
**1.Time series decomposition

Resources

ØTime series course.White noise and time series decomposition

ØEconomic Forecasting course.Time series decomposition

2.Exponential smoothing

Resources

ØTime series course.Exponential smoothing methods

ØEconomic Forecasting course. Exponential smoothing

**ASSIGNMENT 03**

**Preparation****
**1. ARIMA models

Resources

ØTime series course, Slides on

·3. Autocorrelation and seasonality

·8.Stationarity and differencing

·9. Non-seasonal ARIMA models

·10. Seasonal ARIMA Models

ØEconomic Forecasting course. ARIMA models

**ASSIGNMENT 04**

Write a 15 – 20 pages report on the application of all the forecasting methods covered in this

module on a data set. Provide references to all sources that you use.

The methods are covered in

·Topic 1: Regression

·Topic 2: Time series decomposition and Exponential Smoothing

·Topic 3: ARIMA models

The data set must be a single variable time series with at least 100 observations. It can be

primary or secondary data. Refer to the source from which it is obtained. In the report, include

the values of the variable and describe the variable used.

Use a computer package, such as *R, *with which you should be familiar by now, to apply the

forecasting methods on the data. Include the output as well as graphs and tables in the report.

Interpret the output of each forecasting method on the data and describe your conclusions

clearly.

Compare the suitability of the forecasting methods for the chosen data set and justify your

choice.

The report should consist of an introduction, a description of the chosen data, the application of

each of the forecasting methods on the data set, and the comparison of the results and your

conclusions. References should be shown in a proper reference list/bibliography.** **

**Solution**** **

**1 Introduction
**

Time series data often arise when monitoring industrial processes or tracking

corporate business metrics. The essential difference between modeling data via

time series methods or using the process monitoring methods discussed earlier in

this chapter is the following:Time series analysis accounts for the fact that data

points taken over time may have an internal structure (such as autocorrelation,

trend or seasonal variation) that should be accounted for. This section will give

a brief overview of some of the more widely used techniques in the rich and

rapidly growing field of time series modeling and analysis.** **

**2 DATA REPRESENTATION**

**2.2 My data**

In this data,that I have used is a univariate one,in which no missing values arepresent,contains 255 values and it is a sequential .time series data.First of allwe will like to plot the data to find its graphical characteristics.

Subsection R code

beer2 !- window(ausbeer,start=1992,end=2006-.1)

beerfit1 !- meanf(beer2, h=11)

beerfit2 !- naive(beer2, h=11)

beerfit3 !- snaive(beer2, h=11)

plot(beerfit1, plot.conf=FALSE,

main=”Forecasts for quarterly beer production”)

lines(beerfit2mean,col=2)

lines(beerfit3mean,col=3)

legend(“topright”,,lty=1,col=c(4,2,3),

legend=c(“Mean method”,”Naivemethod”,”Seasonal naive method”))

This is a typical example of r code ,to plot a time series data

The plot has been attached herewith .

Similarly ,other plots can be drawn to analyse the data the r codes necessaryto draw it is given as follows .The most important tw that I have used is timeplot and seasonal plot ,it is attached herewith .

plot(a10, ylab=”*million*“*; xlab*= “*Y ear*“*; main *= “*nameofthedataset*“)

plot(melsyd[,”name of data “], main=”Dataset name”, xlab=”Year”,ylab=”Thousands”)

The analysation that can be made from the data sets are as follows

In describing these time series, we have used words such as “trend” and “seasonal” which need to be more carefully defined.

• A trend exists when there is a long-term increase or decrease in the data.

There is a trend in the observations data shown above.

• A seasonal pattern occurs when a time series is affected by seasonal factorssuch as the time of the year or the day of the week. The monthly sales ofobservation above shows seasonality partly induced by the change in costof the drugs at the end of the calendar year.

- A cycle occurs when the data exhibit rises and falls that are not of a fixedperiod. These fluctuations are usually due to economic conditions and areoften related to the “business cycle”. The economy class passenger dataabove showed some indications of cyclic effects.

It is important to distinguish cyclic patterns and seasonal patterns. Seasonalpatterns have a fixed and known length, while cyclic patterns have variableand unknown length. The average length of a cycle is usually longer than thatof seasonality, and the magnitude of cyclic variation is usually more variablethan that of seasonal variation. Many time series include trend, cycles and seasonality. When choosing aforecasting method, we will first need to identify the time series patterns in thedata, and then choose a method that is able to capture the patterns properly.

2.3 Regression

the forecast and predictor variables are assumed to be related by the simple linear model:

In this epsilons are the errors introduced due to human precision and is assumed

to be distributed identically and independently with mean o and some unkonwn

variance ,which we will predict from data from the knowledge of simple unbiased

estimator of varince The parameters 00 and 11 determine the intercept and the

slope of the line respectively. The intercept 00 represents the predicted value of

yy when x=0x=0. The slope 11 represents the predicted increase in YY resulting

from a one unit increase in x. In this the case the value of the parametres are

given as follow

Hence the conclusion made from these are as follows

• the observation has positive trend to follow with time ,from the regression

plot and calculation we can easily find out that

Notice that the observations do not lie on the straight line but are scattered

around it. We can think of each observation yi consisting of the systematic

or explained part of the model, 0+1xi0+1xi, and the random \error”, ii. The

\error” term does not imply a mistake, but a deviation from the underlying

straight line model. It captures anything that may affect yi other than xi. We

assume that these errors:

have mean zero; otherwise the forecasts will be systematically biased. are

not autocorrelated; otherwise the forecasts will be inefficient as there is more

information to be exploited in the data. are unrelated to the predictor variable; otherwise there would be more information that should be included in

the systematic part of the model. It is also useful to have the errors normally

distributed with constant variance in order to produce prediction intervals and

to perform statistical inference. While these additional conditions make the

calculations simpler, they are not necessary for forecasting.

Another important assumption in the simple linear model is that x is not a

random variable. If we were performing a controlled experiment in a laboratory,

we could control the values of x (so they would not be random) and observe the

resulting values of y. With observational data (including most data in business

and economics) it is not possible to control the value of x, and hence we make

this an assumption.

n practice, of course, we have a collection of observations but we do not know

the values of 00 and 11. These need to be estimated from the data. We call this

“fitting a line through the data”.

As a result of these properties, it is clear that the average of the residuals

is zero, and that the correlation between the residuals and the observations for

the predictor variable is also 0

the forecast using the regression model described abobe use the simple model

described above ,it predicts x from the observed that ,using the regression model

gives the fitted values which are then used

**3 Decomposition
**

While decomposition is primarily useful for studying time series data, and exploring the historical changes over time, it can also be used in forecasting.

Assuming an additive decomposition, the decomposed time series can be

written as:

To forecast a decomposed time series, we separately forecast the seasonal component, *S _{t}*, and the seasonally adjusted component

*A*. It is usually assumedthat the seasonal component is unchanging, or changing extremely slowly, andso it is forecast by simply taking the last year of the estimated component. Inother words, a seasonal na¨ıve method is used for the seasonal component.

_{t}To forecast the seasonally adjusted component, any non-seasonal forecasting

method may be used. For example, a random walk with drift model, or Holt’s

method , or a non-seasonal ARIMA model, may be used.

In this data used by use We propose to use the additive model from the

plots,as suggested from the plot curved from us.

**4 Exponential smoothing**

A variation from Holt’s linear trend method is achieved by allowing the level

and the slope to be multiplied rather than added:

wherebt now represents an estimated growth rate (in relative terms rather than

absolute) which is multiplied rather than added to the estimated level. The

trend in the forecast function is now exponential rather than linear, so that the

forecasts project a constant growth rate rather than a constant slope. The error

correction form .we have used corrected exponential smootihing for my data set.** **

**5 ARIMA MODELLING
**

Although we have calculated forecasts from the ARIMA models in our examples, we have not yet explained how they are obtained. Point forecasts can be

calculated using the following three steps.

• Expand the ARIMA equation so that yt is on the left hand side and all

other terms are on the right.

• Rewrite the equation by replacing t by T+h. On the right hand side of

the equation, replace future

• observations by their forecasts, future errors by zero, and past errors by

the corresponding residuals.

The calculation of ARIMA forecast intervals is much more difficult, and the

details are largely beyond the scope of this book. We will just give some simple

examples.

The first forecast interval is easily calculated. If is the standard deviation of

the residuals, then a 95 percentforecast interval is given by

This result is true for all ARIMA models regardless of their parameters and orders.** **

**6 conclusion** From the all the plots attached and by predicting the model with the help of

regression technique and then undergoing exponential smoothing,with the help

of proper r codes, we can easily forecast the future observation which coincides

with the observed values.