# Gauss-Markov Theorem

The Gauss-Markov theorem states that if a linear regression model meets a given set of assumptions, then the OLS (ordinary least squares) estimate produces the Best Linear Unbiased Estimate (BLUE). In other words, if your model satisfies the classical linear regression conditions, you can rest easy knowing you will obtain the best coefficient estimates. More specifically, when a regression model meets the given assumptions, the ordinary least squares follow the tightest sampling distribution of unbiased coefficients than any other linear estimation method.

## Gauss-Markov assumptions

There are five main Gauss-Markov assumptions to be met by regression models. They include:

### Linearity:

The parameters being estimated using the ordinary least squares method must be linear.

### Random:

The data being used must be sampled randomly from the population

### Exogeneity:

There should not be any correlation between regressors being calculated and the error term

### Non-collinearity:

The regressors should not be correlated with each other

### Homoscedasticity:

The variance error remains constant no matter what values the regressors have.

Gauss-Markov conditions ensure that the ordinary least squares remain valid during the estimation of regression coefficients. When estimating regression coefficients, one must check how well the data corresponds with these conditions. When one knows how and where the assumptions have been violated, they can come up with ways to change the setup of the experiment so that the current situation fits the appropriate Gauss-Markov situation. In practice, all these assumptions are rarely met, but it is still important to have your model meet as many of them as possible because used as a benchmark, they can tell a data analyst what the appropriate conditions would be. Not just that; they also identify problem areas that may lead to inaccurate or even unusable estimated regression coefficients.

## Gauss-Markov theorem and inferencing

If all the Gauss-Markov conditions are met, then the ordinary least squares estimators’ beta and alpha are considered the best linear unbiased estimators. The term ‘best’ here means that the variances of the ordinary least squares’ estimators are smaller than the variances of other estimators. The term ‘linear’ means that if the correlation is not linear, then the ordinary least squares method cannot be used. The estimated values of the expected alpha and beta are equivalent to the true values explaining the correlation between x and y. Significance tests can be used to answer questions related to the Gauss-Markov theorem. If a coefficient is considered significant, then one can draw conclusions not just for the sample data being observed but also for the larger population. But this can only be done if the properties of the sample data match those of the population. This is usually the case if the data being studied meets all the assumptions of the Gauss-Markov theorem. If the data violates these assumptions, the coefficients’ standard errors may produce biased results and this means the significance test results could also be wrong or could lead to false conclusions. One of the most common significance tests is the t-test. This test determines whether the coefficient beta differs significantly from zero. To perform a t-test, first, we need to estimate the precision of the coefficient beta’s regression.

## Unbiasedness

A model is said to be unbiased if the coefficient being estimated is on average true. This means that when dealing with repeated samples of a defined size n, the average result of the estimate is a true value of the parameter being estimated. If a given estimator is unbiased, its probability distribution will have an expected value that equals the parameter being estimated. Unbiasedness does not necessarily mean the estimate obtained from a certain sample equals the true parameter; it means that the average of the estimates drawn infinitely from random samples is equal to the true parameter. Unbiased estimators are not all consistent, but those in which the variance reduces to zero as the sample being observed increases in size are consistent. Consistency is the measure of how far an estimator remains from the true parameter when the size of the sample grows indefinitely. The estimator value usually shrinks around the true parameter value as n gets closer to infinity. The requirement that estimators be unbiased cannot be ignored since estimators that are biased have a lower variance.