# Logistic Regression

Logistic regression dates back to the early twentieth century when it was used in biological sciences. This type of regression is used when the target (dependent) variable is categorical. It is a predictive type of analysis like all regression analyses. We can also say that logistic regression describes data and tries to explain the association between one dependent binary variable and one or more independent variables that are ordinal, nominal, or ratio-leveled. This type of regression can sometimes be extremely difficult to interpret. As a result, the Intellectus Statistics tool has been developed to help you conduct the analysis. This tool also allows you to interpret the output in plain English. Logistic regression can predict whether an email is a spam, or if a tumor is malignant or not.

• Assumptions in a Logistic Regression
• Types of Logistic Regression
• What is the decision boundary?

## Assumptions in a Logistic Regression

Before using logistic regression, you should be familiar with these assumptions. First, the dependent variable must be dichotomous. For example, it should be present or absent. The second assumption is this type of regression does not support data with outliers. You can assess this by changing the continuous predictors to standardized scores. Moreover, the predictors shouldn't have high correlations (multicollinearity). You can check this using a correlation matrix among the predictors. According to Tabachnick and Fidell (2013), this assumption is met as long as the correlation coefficients among the independent variables are less than 0.90.

It is important to consider the model fit when selecting the model you are going to use for the logistic regression analysis. The amount of variance will increase if you add independent variables to a logistic regression model. However, you should remember that if you add more variables to the model it will result in overfitting. This greatly reduces the generalizability of the model beyond the data the model is fit.

Binary logistic regression can be used with several pseudo – R2 values. These values have many computational issues and should be interpreted with caution. A better approach is to use the Hosmer-Lemeshow to measure the goodness of fit based on the Chi-square test.

## Types of Logistic Regression

There are 3 main types of logistic regression. We have discussed them below:

### Binary logistic regression

It is used to determine the effect of more than two independent variables presented simultaneously to predict one or other of the two dependent variable types. It is impossible to use logistic regression to predict a numerical value for the dependent variable because it is dichotomous. To do this, we can use a binomial probability theory where there are two values to predict. In a binary logistic regression, there are only two possible outcomes from the categorical response. For example, when checking a mail, it can only be spam or not.

### Multinomial logistic regression

Multinomial logistic regression analyzes nominal outcome variables. The log odds of the outcomes are modeled as a linear combination of predictor variables. Since it allows more than two independent variable categories, multinomial logistic regression is often considered an extension of binomial logistic regression.

### Ordinal Logistic regression

Ordinal logistic regression uses one or more independent variables to predict an ordinal dependent variable. Experts consider this type of regression a generalization of either binomial logistic regression or multinomial logistic regression. Ordinal logistic regression like other types of regressions can predict the dependent variables by using the interaction between independent variables.

## What is the decision boundary?

Determining the decision boundary for a binary classification problem is one of the vital applications of logistic regression. So what is the decision boundary and how do you plot it? You must set a threshold when predicting which category a value belongs to. This threshold will be the basis for the classification of the obtained estimated probability. If you want to get a more complex decision boundary, the polynomial order should be increased. Also, decision boundaries can either be linear or non-linear.