Simple linear regression
Simple linear regression involves two factors, y and x, and is represented by the following formula:
y = β0 +β1x+ε
This equation is usually graphed or drawn as a straight line where:
- β0is the y-intercept
- β1is the slope, and
- ε is the error term
- A positive linear relation
- A negative linear relationship, or
- No relationship
Negative relationship: A regression line is said to be negative if it slopes with its upper end at the y axis (intercept) and the lower end extending downwards towards the x-axis (intercept). The reason why this relationship is referred to as negative is that the value of one variable goes higher while the value of the other goes lower.
No relationship: A regression line is said to have no relationship if it is flat. Lack of slope means that there is no connection between the two variables being observed. If you wish to have this area explained by an expert, get in touch with our simple and multiple linear regression online tutors.
Multiple linear regression
For instance, we could use multiple regression to know whether the performance of students in a given exam can be predicted on the basis of test anxiety, revision time, gender, and lecture attendance. We could also use the technique to determine whether the daily alcohol consumption of an individual can be predicted on the basis of the drinking duration, income, age when the person started drinking, and gender.
Multiple linear regression assumptions
- Your dependent variable must be measured on a nonstop or continuous scale. For instance, it could be measured in hours, IQ score, kg, etc.
- You must have two or more independent variables. These can be either categorical (nominal or ordinal variables) or continuous (ratio or interval variable)
- You must have the independence of observations
- There must be a linear relationship between:
- The dependent variable and all your independent variables and
- The dependent variable and each one of your independent variables
- Your data must display homoscedasticity
- There must be no significant outliers, highly influential points, or high leverage points
- Your data must not display multicollinearity, meaning, there should be no correlation between multiple independent variables
- Residuals (errors) must be approximately normally distributed