Simple linear regressionSimple linear regression is used to study and summarize the relationships between two quantitative (continuous) variables. One variable (usually denoted as x) is considered the independent, explanatory, or predictor variable. The other variable, (usually denoted as y) is considered the dependent, response, or outcome variable. Simple linear regressions are referred to as “simple” because they examine only one independent variable. They differ from multiple regressions because the latter analyzes two or more independent variables.
Simple linear regression involves two factors, y and x, and is represented by the following formula:
y = β0 +β1x+ε
This equation is usually graphed or drawn as a straight line where:
- β0is the y-intercept
- β1is the slope, and
- ε is the error term
- A positive linear relation
- A negative linear relationship, or
- No relationship
Negative relationship: A regression line is said to be negative if it slopes with its upper end at the y axis (intercept) and the lower end extending downwards towards the x-axis (intercept). The reason why this relationship is referred to as negative is that the value of one variable goes higher while the value of the other goes lower.
No relationship: A regression line is said to have no relationship if it is flat. Lack of slope means that there is no connection between the two variables being observed. If you wish to have this area explained by an expert, get in touch with our simple and multiple linear regression online tutors.
Multiple linear regressionMultiple linear regression is an advancement of simple linear regression, used to determine the relationship between a dependent variable and multiple independent variables. In multiple linear regression, the dependent variable, also known as the target, criterion, or outcome variable is what we want to analyze, and the independent variable is what we use to analyze the value of the target variable.
For instance, we could use multiple regression to know whether the performance of students in a given exam can be predicted on the basis of test anxiety, revision time, gender, and lecture attendance. We could also use the technique to determine whether the daily alcohol consumption of an individual can be predicted on the basis of the drinking duration, income, age when the person started drinking, and gender.
Multiple linear regression assumptionsBefore you choose multiple linear regression for data analysis, you have to make sure that your data can actually be analyzed using multiple linear regression. This is important because multiple regression can only be used if the data being studied does not violate the assumptions required for this technique to produce a valid result. Here are the assumptions that each set of data must pass for one to analyze it using multiple linear regression:
- Your dependent variable must be measured on a nonstop or continuous scale. For instance, it could be measured in hours, IQ score, kg, etc.
- You must have two or more independent variables. These can be either categorical (nominal or ordinal variables) or continuous (ratio or interval variable)
- You must have the independence of observations
- There must be a linear relationship between:
- The dependent variable and all your independent variables and
- The dependent variable and each one of your independent variables
- Your data must display homoscedasticity
- There must be no significant outliers, highly influential points, or high leverage points
- Your data must not display multicollinearity, meaning, there should be no correlation between multiple independent variables
- Residuals (errors) must be approximately normally distributed