R and RStudio Forecasting
Using R, Install R and RStudioand do the tutorials
ØTime series course.Introduction to forecasting
ØEconomic Forecasting course, Slides on
·1. Using R
·2. Getting started
·3. The forecaster’s toolbox
3. Simple regression
ØEconomic Forecasting course, Slides on Simple regression
ØPredictive Analytics course.Simple Regression
ØEconomic Forecasting course.Multiple regression
ØPredictive Analytics course. Multiple Regression
2. Skin cancer rates have been steadily increasing over recent years. It is thought that this
may be due to ozone depletion. The following data are ozone depletion rates in various
locations and the rates of melanoma (a form of skin cancer) in these locations.
Ozone dep (%) 5 7 13 14 17 20 26 30 34 39 44
Melanoma (%) 1 1 3 4 6 5 6 8 7 10 9
a. Plot melanoma against ozone depletion and fit a straight line regression model to the
b. Plot the residuals from your regression against ozone depletion. What does this say
about the fitted model?
c. What percentage of the variation in rates of melanoma is explained by the regression
d. Scientists discovered that 40% of ozone was depleted in a certain region. What
would you expect to be the rate of melanoma in this area? Give a prediction interval.
e. Explain the assumptions and limitations in your prediction. What other factors may
play a role?
1.Time series decomposition
ØTime series course.White noise and time series decomposition
ØEconomic Forecasting course.Time series decomposition
ØTime series course.Exponential smoothing methods
ØEconomic Forecasting course. Exponential smoothing
1. ARIMA models
ØTime series course, Slides on
·3. Autocorrelation and seasonality
·8.Stationarity and differencing
·9. Non-seasonal ARIMA models
·10. Seasonal ARIMA Models
ØEconomic Forecasting course. ARIMA models
Write a 15 – 20 pages report on the application of all the forecasting methods covered in this
module on a data set. Provide references to all sources that you use.
The methods are covered in
·Topic 1: Regression
·Topic 2: Time series decomposition and Exponential Smoothing
·Topic 3: ARIMA models
The data set must be a single variable time series with at least 100 observations. It can be
primary or secondary data. Refer to the source from which it is obtained. In the report, include
the values of the variable and describe the variable used.
Use a computer package, such as R, with which you should be familiar by now, to apply the
forecasting methods on the data. Include the output as well as graphs and tables in the report.
Interpret the output of each forecasting method on the data and describe your conclusions
Compare the suitability of the forecasting methods for the chosen data set and justify your
The report should consist of an introduction, a description of the chosen data, the application of
each of the forecasting methods on the data set, and the comparison of the results and your
conclusions. References should be shown in a proper reference list/bibliography.
Time series data often arise when monitoring industrial processes or tracking
corporate business metrics. The essential difference between modeling data via
time series methods or using the process monitoring methods discussed earlier in
this chapter is the following:Time series analysis accounts for the fact that data
points taken over time may have an internal structure (such as autocorrelation,
trend or seasonal variation) that should be accounted for. This section will give
a brief overview of some of the more widely used techniques in the rich and
rapidly growing field of time series modeling and analysis.
2 DATA REPRESENTATION
2.2 My data
In this data,that I have used is a univariate one,in which no missing values arepresent,contains 255 values and it is a sequential .time series data.First of allwe will like to plot the data to find its graphical characteristics.
Subsection R code
beer2 !- window(ausbeer,start=1992,end=2006-.1)
beerfit1 !- meanf(beer2, h=11)
beerfit2 !- naive(beer2, h=11)
beerfit3 !- snaive(beer2, h=11)
main=”Forecasts for quarterly beer production”)
legend=c(“Mean method”,”Naivemethod”,”Seasonal naive method”))
This is a typical example of r code ,to plot a time series data
The plot has been attached herewith .
Similarly ,other plots can be drawn to analyse the data the r codes necessaryto draw it is given as follows .The most important tw that I have used is timeplot and seasonal plot ,it is attached herewith .
plot(a10, ylab=”million“; xlab= “Y ear“; main = “nameofthedataset“)
plot(melsyd[,”name of data “], main=”Dataset name”, xlab=”Year”,ylab=”Thousands”)
The analysation that can be made from the data sets are as follows
In describing these time series, we have used words such as “trend” and “seasonal” which need to be more carefully defined.
• A trend exists when there is a long-term increase or decrease in the data.
There is a trend in the observations data shown above.
• A seasonal pattern occurs when a time series is affected by seasonal factorssuch as the time of the year or the day of the week. The monthly sales ofobservation above shows seasonality partly induced by the change in costof the drugs at the end of the calendar year.
- A cycle occurs when the data exhibit rises and falls that are not of a fixedperiod. These fluctuations are usually due to economic conditions and areoften related to the “business cycle”. The economy class passenger dataabove showed some indications of cyclic effects.
It is important to distinguish cyclic patterns and seasonal patterns. Seasonalpatterns have a fixed and known length, while cyclic patterns have variableand unknown length. The average length of a cycle is usually longer than thatof seasonality, and the magnitude of cyclic variation is usually more variablethan that of seasonal variation. Many time series include trend, cycles and seasonality. When choosing aforecasting method, we will first need to identify the time series patterns in thedata, and then choose a method that is able to capture the patterns properly.
the forecast and predictor variables are assumed to be related by the simple linear model:
In this epsilons are the errors introduced due to human precision and is assumed
to be distributed identically and independently with mean o and some unkonwn
variance ,which we will predict from data from the knowledge of simple unbiased
estimator of varince The parameters 00 and 11 determine the intercept and the
slope of the line respectively. The intercept 00 represents the predicted value of
yy when x=0x=0. The slope 11 represents the predicted increase in YY resulting
from a one unit increase in x. In this the case the value of the parametres are
given as follow
Hence the conclusion made from these are as follows
• the observation has positive trend to follow with time ,from the regression
plot and calculation we can easily find out that
Notice that the observations do not lie on the straight line but are scattered
around it. We can think of each observation yi consisting of the systematic
or explained part of the model, 0+1xi0+1xi, and the random \error”, ii. The
\error” term does not imply a mistake, but a deviation from the underlying
straight line model. It captures anything that may affect yi other than xi. We
assume that these errors:
have mean zero; otherwise the forecasts will be systematically biased. are
not autocorrelated; otherwise the forecasts will be inefficient as there is more
information to be exploited in the data. are unrelated to the predictor variable; otherwise there would be more information that should be included in
the systematic part of the model. It is also useful to have the errors normally
distributed with constant variance in order to produce prediction intervals and
to perform statistical inference. While these additional conditions make the
calculations simpler, they are not necessary for forecasting.
Another important assumption in the simple linear model is that x is not a
random variable. If we were performing a controlled experiment in a laboratory,
we could control the values of x (so they would not be random) and observe the
resulting values of y. With observational data (including most data in business
and economics) it is not possible to control the value of x, and hence we make
this an assumption.
n practice, of course, we have a collection of observations but we do not know
the values of 00 and 11. These need to be estimated from the data. We call this
“fitting a line through the data”.
As a result of these properties, it is clear that the average of the residuals
is zero, and that the correlation between the residuals and the observations for
the predictor variable is also 0
the forecast using the regression model described abobe use the simple model
described above ,it predicts x from the observed that ,using the regression model
gives the fitted values which are then used
While decomposition is primarily useful for studying time series data, and exploring the historical changes over time, it can also be used in forecasting.
Assuming an additive decomposition, the decomposed time series can be
To forecast a decomposed time series, we separately forecast the seasonal component, St, and the seasonally adjusted componentAt. It is usually assumedthat the seasonal component is unchanging, or changing extremely slowly, andso it is forecast by simply taking the last year of the estimated component. Inother words, a seasonal na¨ıve method is used for the seasonal component.
To forecast the seasonally adjusted component, any non-seasonal forecasting
method may be used. For example, a random walk with drift model, or Holt’s
method , or a non-seasonal ARIMA model, may be used.
In this data used by use We propose to use the additive model from the
plots,as suggested from the plot curved from us.
4 Exponential smoothing
A variation from Holt’s linear trend method is achieved by allowing the level
and the slope to be multiplied rather than added:
wherebt now represents an estimated growth rate (in relative terms rather than
absolute) which is multiplied rather than added to the estimated level. The
trend in the forecast function is now exponential rather than linear, so that the
forecasts project a constant growth rate rather than a constant slope. The error
correction form .we have used corrected exponential smootihing for my data set.
5 ARIMA MODELLING
Although we have calculated forecasts from the ARIMA models in our examples, we have not yet explained how they are obtained. Point forecasts can be
calculated using the following three steps.
• Expand the ARIMA equation so that yt is on the left hand side and all
other terms are on the right.
• Rewrite the equation by replacing t by T+h. On the right hand side of
the equation, replace future
• observations by their forecasts, future errors by zero, and past errors by
the corresponding residuals.
The calculation of ARIMA forecast intervals is much more difficult, and the
details are largely beyond the scope of this book. We will just give some simple
The first forecast interval is easily calculated. If is the standard deviation of
the residuals, then a 95 percentforecast interval is given by
This result is true for all ARIMA models regardless of their parameters and orders.
6 conclusion From the all the plots attached and by predicting the model with the help of
regression technique and then undergoing exponential smoothing,with the help
of proper r codes, we can easily forecast the future observation which coincides
with the observed values.