Understanding linear regression

Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable. The most common method for fitting a regression line is the method of least-squares. This method calculates the best-fitting line for the observed data by minimizing the sum of the squares of the vertical deviations from each data point to the line (if a point lies on the fitted line exactly, then its vertical deviation is 0).

What is Linear Regression?

Linear regression is a statistical method used to model the relationship between two variables (explanatory and dependent). This type of regression does this by fitting a linear equation to observed data. Before using linear regression, it is prudent to determine whether or not a relationship exists between the explanatory and dependent variables. This does not mean that one variable causes the other. Rather, you should check for a significant association between the variables of interest.

A scatterplot can come in handy if you want to determine the strength of the relationship between two variables. However, if a scatterplot does not reveal any increasing or decreasing trends, then using a linear regression model for that data will not be quite useful. The correlation coefficient is a useful numerical measure of association between two variables. It is a value between -1 and 1 and highlights the strength of the relationship of the observed data for the two variables.

The Linear Regression Equation

If you are new to regression analysis, the first type of regression you will learn in your stats class is the simple linear equation. It is the most widely used statistical technique because it is simple and can easily make predictions about data. Today, linear regression can be calculated by software packages and calculators like Excel and TI-83. Also, you can calculate linear regression by hand.

The linear regression line is always represented by the following equation:

Y = a+bx where:

• X is the explanatory variable
• Y is the dependent variable
• B is the slope of the line
• A is the value of y when x = 0. It is also known as the intercept.

To find a linear regression equation, you first have to determine if indeed there is an association between the two variables. Also, you will need to list your data in the x-y format, i.e. one column for the dependent variable and another column for the independent column.

Linear Regression concepts that you must know

We have discussed below some of the concepts that you must understand before you start using linear regression. These concepts are considered the building blocks of this type of regression analysis.

Least Squares

Least squares the most common technique used to fit a regression line. This method minimizes the sum of the squares of the vertical deviations from each data point to the line when calculating the best line of fit. The least-squares method has no cancellations between negative and positive values. This is because it first squares the deviations before summing them.

Outliers and influential observations

After you have computed a regression line for your group of data, you may notice that there are points that lie far from the line. These points are known as outliers and may represent data with errors. Outliers may also indicate a poorly fitting regression line. An influential observation is a point that lies far from the other data in a horizontal direction.

Residuals

Residuals are the deviation from the fitted line to the observed value. Examining the residuals allows the analyst to test the assumption of the existence of a linear relationship between the two variables.

Lurking Variables

If a third variable that has not been included in the model significantly affects the relationship between the two variables, then a lurking variable is created. You can use a time series plot to identify the existence of lurking variables.

Extrapolation

It is vital that you observe the range of data whenever you use a linear regression model to fit a group of data. It is wrong to try to use a regression model to predict values that are outside the set range. Doing this is known as an extrapolation and may lead to absurd answers.

What is the "best-fitting line"?

A linear equation summarizes the trend between two variables. In elementary school, you were probably given a scatter plot and asked to draw the line that is most appropriate through the data points. This line is the best fitting line. A line that fits your data best is the one that has the least prediction errors. One way of finding this line is to use the least-squares method that we discussed above.

If you are troubled by your linear regression homework, you can find an expert who can assist you here. We offer comprehensive assistance with all statistics-related homework.

Related Topics