Generalized Linear Modeling
Generalized linear modeling is the process of generalizing ordinary linear regressions by creating relationships between linear models and response variables through a link function. This technique was formulated by Robert Wedderburn and John Nelder to unify other statistical models like Poisson regression, logistic regression, and linear regression. Generalized linear modeling can be used in data analysis that cannot be performed using linear regression, for instance:
- If the relationship between the values in X and those in Y is not linear, which means it is more likely to be exponential
- If the Y variance is not constant with regards to the X variance. Here, the Y variance increases when the X variance increases
- If Y is a discrete variable
To analyze this kind of data, most data scientists will use Poisson regression, which is one of the most common type of generalized linear models.
The components of generalized linear models
Generalized linear models consist of three major components:
- Linear predictor: This is a linear combination of explanatory variable (x) and parameter (b)
- Link function: The link function connects, or rather, links the parameter andthe linear predictor for a probability distribution. When using Poisson regression the link function is typically the log function.
- Probability distribution: This component generates the observed variable (y).
Advantages of generalized linear modeling
Generalized linear modeling has a number of advantages over the traditional ordinary least squares regression. Here are some of them:
- You do not necessarily have to transform the response (y) in order to have a normal distribution
- There is more flexibility in modeling because the link choice is separate from the random component choice
- If the link has additive effects, then a constant variance is not needed
- The models are fitted using maximum likelihood estimation. This ensures maximum properties of the estimators.
Disadvantages of generalized linear modeling
- Does not allow the selection of features without step-by-step selection
- There are strict assumptions around the randomness of error terms and the distribution shape
- The predictor variables must be uncorrelated
- Generalized liner models cannot detect non-linearity directly. However, this can be addressed manually through feature engineering
- Generalized linear modelling is sensitive to outliers
- Low predictive power
Having trouble understanding the components of generalized linear modeling or its advantages/ disadvantages? Consider taking professional assistance from our generalized linear modeling online tutors.
Assumptions of generalized linear modeling
GLMs make some strict assumptions with regards to the data structure. These assumptions are based on:
- Independence of each point of data
- Proper distribution of the residuals
- Proper specification of the structure of the variance
- Linear relationship between the linear predictor and the response (y)
For general linear modeling to be effective, the residuals must be distributed normally and the variance should remain homogeneous across all the fitted model values. Also, the response (y) must have a linear relationship with the predictors. You should always double check your model to make sure it does not violate these assumption. Generalized linear models that do not adhere to these assumptions can lead to falsification of results.
Generalized linear model extensions
The standard generalized linear modeling assumes that all observations are not correlated. Over the years, developments have been made to generalized linear models to enable correlation between observations. These developments have resulted in the following extensions of generalized linear models:
- Generalized estimation equations: This extension allows observations to be correlated without applying explicit probability models. They are commonly used when random effects and variances are not of integral interest because they allow correlation without defining its origin. The main purpose of a generalized estimation equation is to make an approximation of the average response (y) of a given population rather than defining regression parameters. For effective data analysis, generalized estimation equations are often used together with Huber-White standard errors.
- Generalized linear mixed model: This extension incorporates effects in the linear predictor producing an explicit probability model that defines the origin or a correlation. Generalized linear mixed models are also referred to as mixed models or multilevel models. They are more computationally intensive than generalized estimation equations.
- Generalized additive models: In this extension of generalized linear modeling, the linear predictor is not restricted to the covariates; it is the sum of the smoothing functions estimated from the set of data being analyzed.
If you need professional support in understanding the basics of generalized linear models, get in touch with our generalized linear modeling assignment help experts.