+1 (315) 557-6473 

Tips for writing a master's statistics final paper with example

A master's statistics final paper can be a hard nut to crack. It not only requires a significant amount of extensive research but also writing of more than 10,000 words. It is usual for any student to find this task daunting. Some challenges most students face are knitting ideas together and putting their findings into words. If you are sailing in the same boat, these tips will get you on the right track and have you writing your master's statistics final paper easily. Here are top tips to help you throughout your writing process:

1) Familiarize and understand your topic

2) Have a precise structure for your paper

3) Write an in-depth literature review

4) Go into detail about your methodology

5) Limit accidental plagiarism

6) Have a powerful conclusion

If you cannot write a research paper alone, consider getting assistance from experts. We have a team of professional writers who can draft an excellent master’s statistics final paper. To clear your doubts and prove our writing prowess, here is an example of a master’s statistics final paper prepared by us. The paper is on discrete choice methods for planning and public policy.

1. Introduction

Housing is one of the basic foundation on which living and modern structures thrive. It is impossible for people to survive alone or in group without a functioning, protective and equitable housing stock (Krieger and Hikkins, 2002). In the US wealth is built through homeownership and those who have their homes are richer than those who rent home. Moreover, the wealth from homeownership is intergenerational i.e. there is high probability that people whose parent have their home will also become homeowners than people whose parents rent house (Martinez and Kirchner, 2021).

Historically, home ownership have been denied to people of color. For example, the redlining of black and immigrant neighborhood by the federal government agency “Home Owners Loan Corporation’ which deems neighborhood dominated by black and immigrants as hazardous credit risk made lending institutions to deny mortgages to prospective homebuyers in this neighborhood (Martinez and Kirchner, 2021). However after thirty years of redlining, the federal government introduced the Fair Housing Act of 1968 which make denial of housing loans because of race as illegal.

Although erasing racial discrimination in housing is the main objective of the Fair Housing Act of 1968, housing inequality and segregation is still evident in the 20th century (Williams, 2020). A report by Census Bureau showed that in 2020, 46.4% of black household owns home compared to 75.8% among White household (US Census Bureau, 2021). Moreover, homes are valued at $48,000 in the non-white neighborhood which is 23% lower than homes in the White neighborhood (Ray et al., 2021). This discrepancy has led to racial wealth with the White family posing $171,000 in median family wealth while the Non-Whites have median family wealth of $17,600. Moreover, wealth is a predictor for education, employment and other metrics that determines the quality of life, the Non-Whites has fallen short of the Whites in these aspects (Perry, 2021).

Recently, the racial discrimination has become driven by algorithm where financial characteristic of applicants are collected and put in an algorithm to automate division into good and bad loans. Unfortunately, factors which the algorithm considered like wealth, income and debt-to-income ratio have been disproportionately against the Non-White (Brancaccio and Canlon, 2021). The housing gap between black and white has reached the largest level in 50 years after the housing bubble and great recession. A survey by America Community Survey showed that housing gap rose from 28.1% to 30.2% from 2010 to 2017.

The persistence of the homeownership gap has attracted the interest of researchers over the years (for example, Acolin et al, 2019; Brown and Dey, 2019) as well as policy makers. Some researchers have found that Non-White are charged higher for mortgage loans than white borrowers of similar characteristics (Bayer et al., 2015; Been et al., 2009, Rugh et al., 2015). Bayer et al. (2015) found that Non-White are significantly more likely to be charged higher than White after controlling for credit scores, loan to value ratios, housing and debt expenses.

The continual trend in racial discrepancy in homeownership has shown that policy enacted over the years to deal with this matter has failed and there is need for policy revision and urgent action to reverse this trend (McCargo et al, 2019). Therefore this study investigates the racial gap in ability to access mortgage loans as well as determine factors driving the gap in New York city. This study intends to answer the following research questions:

• Is there are racial gap in ability to access mortgage loans?

• What factors are responsible for this gap?

2. Theoretical Framework

The taste-based discrimination is a hypothesis that is widely applied to explain labour market discrimination. It explains that employers who have taste for discrimination will choose not to hire a person who belongs to a minority group not because of his ability or qualification but because of his belonging to a particular ethic group or race (Baert and Pauw, 2014). Such employers do so to avoid interacting with them and employers are willing to pay financial penalty to avoid dealing with them (Krueger, 2002).

This hypothesis can be adapted to this study to explain why people with the same characteristics in terms of variables used to assess good loans from bad loans but one application will be accepted and the other application will be rejected just because the former belong to majority race/ethnic group and the other belong to the minority ethnic group. The possible explanation is that lenders who have taste for discrimination will always prefer not to give loans to a minority (e.g. non-White) even if the applicant has all the requirement and qualification and opt for majority even if the characteristics of the majority applicant shows that giving him loan is more risky. Therefore, from this hypothesis, we derive two hypothesis that guides this study

H01: Loan originators consider the ethnicity of applicants when deciding whether to approve or originate loan

H11: Loan originators consider the race of applicants when deciding whether to approve or originate loan

3. Research Design

The first step in research design is for the researcher to determine the philosophy behind the research. Research philosophy entails all assumptions that the researcher makes implicitly or explicitly which may be ontological (the nature of reality), epistemological (what constitute knowledge and axiological (how values and beliefs affect research). Four types of philosophy found in literature are: Posivitism, interpretivism, pragmatism and critical realism. This study is based on the positivist philosophy where relatnoship between phenomenon can be expressed as causal relationship. The next step is for the researcher to determine the research approach. There are three research approach: deductive, inductive and abductive. This study is based on deductive reasoning which is an approach that tries to verify or falsify theory/hypothesis. The implication is that the study tests the hypotheses that have been outlined and make generalization based on the result. Based on the approach, the methodological choice for this study is mono method based quantitative design which involves collecting of numerical data and applying quantitative analysis to process it and extract meaningful result from the data. The research strategy is a case study design and the case study is the New York State. Considering the time horizon, the study is a cross-sectional study as the data is collected for the year 2015.

4. Data

The data used is the Home Mortgage Disclosure Data for the state of New York. The data is for the year of 2015 and contains 439,654 observations and 78 columns of variables. However, almost half of the columns are labels for the preceding columns. Although, not all the variables are used but the variables used for this studies are:

• Action taken: whether the loan is originated or rejected and other categories

• Applicant ethnicity: whether the applicant is Hispanic or not

• Applicant Race: whether the applicant is White, Asian, Black American and others

• Applicant’s income: The income that accrued to applicants

• Applicant’s sex: Whether applicants is male or female

• Loan amount: The amount of loan the applicants requested for

• Denial reason: the reason why the loan is denied if denied

• Loan Purpose: the purpose for which the loan is gotten (home purchase, home improvement or home refinancing)

In addition to these variables, new variables were generated and they are:

• Co-Applicant: A dummy variable that have value of 1 if the applicants has co-applicant and 0 otherwise

• Coapplicant_white: A dummy variable which equals 1 if the co-applicant is White and 0 otherwise.

• White: A dummy variable which equals 1 if the applicant is White and 0 otherwise.

 Data Cleaning

The variable “action taken” has seven categories which include loans that were originated, application that were approved but not accepted, application denied by financial institutions, application withdrawn by applicants, file closed for incompleteness and others. Since this study is interested in loans that are either granted or denied and the binary logit and probit methods to be considered permits dependent variables in two categories, the study restricted the data to only loans that were originated or applications denied by financial institutions. In addition majority of the variables has entries for not applicable and information not provided, we remove these categories from the dataset. After the cleaning, the number of observations remaining is 261,684.

5. Analytic Strategy

The dependent variable for this study is a binary variable i.e. it takes on only two possible values. The method of analysis applicable for such variable are linear probability model (LPM), logit and probit model. The linear probability model is just a linear regression model that have the binary variable as the dependent variable. The LPM model is given as


Where y is the dummy variables. After estimating the model, the fitted value gives us the predicted value of having y=1 for given values of Xis i.e.

There are two problems with linear probability models. The first is a relatively less significant problem of heteroscedasticity (Perrailon, 2019). This can be solved by using robust standard error. However, the second problem which cannot be patched up is that predicted probabilities can be greater than 1 or less than zero which does not make sense.

To get around this problem, the logit and probit model was developed. The models are specifically for binary dependent variables and always result in predicted values that is between 0 and 1. The logistic model assumes that there is a latent variable which when it cross a threshold, we observe the 0/1 variables we have (Perrailon, 2019). The modelling starts by writing the joint probability of an iid Bernoulli random variable


Taking the logarithm of equation (2) we have


Which simplifies to


In order to make p a function of covariates, there is need to transform it since it has to be bounded between 0 and 1. The logit model makes use of the logistic distribution function by expressing the model as log of odds (Perrailon, 2019) i.e.


Transforming equation (5), we have


The major problem with logistic model is interpreting the coefficient because the estimated coefficients in equation 5 are change in log odds and if we take the exponential, it becomes change in odds but the real meaning of odds is still subject of debate (Hanck et al., 2021).

Instead of using the logistic distribution to transform the variable, the probit regression method used the standard normal distribution to model the dependent variable (Hanck et al., 2021).


Since plays the role of a quantile z and


The coefficient estimates is thus the change in z associated with one unit change in X which means that like logit, there is no simple interpretation due to non-linear association (Hanck et al., 2021).

 Model Specification

For the purpose of this study, we specify six models which will each be estimated by LPM, logit and probit resulting in 18 estimated models. The first model is the simple model comparing the whether the loan was originated or not by ethnicity (being Hispanic or not). The model is given as


Where action taken represents whether the loan is originated or denied and Hispanic is a dummy variable representing whether the respondent is Hispanic by ethnicity or not.

The second model adds controls to equation 9 and is specified as


Where HHI represents household income, AI represents applicant’s income, LP represent purpose of loan.

The third model is a simple model comparing whether the loan was originated or not by race (White vs non-White). The model is give as


The fourth model extends the third by adding controls


The fifth model segregate the race into five categories as


Where AIN is American Indian, AS is Asian, BAA is Black/African American and NH is Native Hawaiian

The sixth model adds controls to the fifth model as


6. Results

 Descriptive Statistics and Frequency Distribution

Table 1: Frequency Distribution

action_takenLoan originated
financial institu
applicant_ethnicityHispanic or Latino
Not Hispanic or Latino
applicant_race_1American Indian or Alaska Native
Black or African American
Native Hawaiian or Other Pacific Isla









coapplicant is White



coapplicant is hispanic



Reason for Denial


Credit application incomplete

Credit history

Debt-to-income ratio

Employment history

Insufficient cash (downpayment, closing

Mortgage insurance denied


Unverifiable information










Loan Purpose

Home improvement

Home purchase








Table 1 presents the frequency distribution of the variables. The result showed that 75.4% of loan application were originated while 24.6% were denied. 6.36% of the applicants were Hispanic by ethnicity. 83.39% of applicants are white while 7.78% were black and 7.98% are Asian. A small proportion of the applicants identified as American Indian or Alaska native (0.48%) and Native Hawaiian (0.36). 66.51% of applicants are male while 33.49% are female. 43.68% of applicants have co-applicants and 56.32% have no co-applicant. For those who have co-applicant, 86.28% have White co-applicant while 13.72% have non-white co-applicant. 6.59% have Hispanic co-applicant and 93.41% has non-Hispanic as co-applicant. Debt-to-income ratio is the major reason for denying application as 28.69% of denials are due to it. Credit history is the second major reason why application is denied. 24.49% of application were denied because of credit history. Other important reasons are collateral (19.25%), incomplete application (11.41%), other (8.63%), unverifiable information (3.59%) and insufficient cash (2.41%). There are three purpose of applying for loans. 49.7% of loans is for home purchase, 37.03% is for refinancing and 13.27% is for home improvement.

Table 2: Summary Statistics

VariableObsMeanStd. Dev.MinMax
applicant income248,057130.92223.8819999
household median income261,07677710.2816224.6657200109000
loan amount261,684266.91360.951


The summary statistics presented in table 2 showed that average applicant’s income is $130.92 while the average household median income is $77,710.28 and the average loan amount is $266.91

 Bi-Variate Descriptive Statistics and Frequency Distribution

Table 3 presents the frequency distribution of action taken by ethnicity and race. The result showed the proportion of Hispanic (31.76%) denied loan is greater than that of Non-Hispanic (24%) by 7.6 percentage points. In the same vein, the proportion of non-White (33.31%) that were denied loan is greater than the proportion of White (22.87%) that were denied loan by almost 11 percentage points.

Table 3: Action Taken by Ethicity and Race

action_takenHispanicNot HispaTotalNon-White White
Loan originated11,360
Application denied 5,286

The findings was taken further by considering the disaggregated race. The result showed that American Indian or Native Alaska are the worst hit in terms of denial as about half (49.76%) of their application were denied followed by the black race which have 40% of applications denied. Native Hawaiian have 38.74% of application denied. However, Whites have 22.87% of application denied and only Asian are non-Whites that have relatively low denials at 25%.

Table 5 presents the cross-tabulation of reasons for denial by ethnicity. The result showed that debt-to-income ratio is the major reason for denying both Hispanic (29.44%) and non-Hispanic (28.63%). Credit history is also a major reason of denial for both Hispanic (27.78%) and non-Hispanic (24.19%). In terms of race, it appears that debt-to-income ratio is also the major reason for denial. 28.1% of Whites were denied because of debt-to-income ratio compared to 28.87% among non-Whites. Similarly, 27.61% of Whites and 23.58% of non-Whites were denied due to credit history.

The racial disparities have been entrenched over time as depicted by the cross-tabulation of loan purpose by ethnicity and race. Applying for the purpose of home improvement or refinancing showed that the applicant have a home already. The result showed that proportion of Whites for these two categories are greater than the proportion of non-Whites in these two categories by approximately 4 percentage points for refinancing and almost 3 percentage points for home improvement which suggests that Whites historically have more homes than non-Whites.

 Estimation Result

Table 6 presents the regression of action taken (approved vs denied) against ethnicity and race using linear probability model (LPM), probit and logit models. The result showed that there is significant difference in approval rate across ethnicity and race. The LPM model showed that the probability of a non-Hispanic getting loan is 0.076 greater than that of Hispanic getting loan while the probit model showed that the probability of a non-Hispanic getting loan is greater than that of Hispanic getting loan by 0.23. The logit mode on the other hand showed that Non-Hispanic have 46% higher odds of getting loans than Hispanic. Comparing Whites and non-Whites, the result showed that Whites have higher probability of having their loans originated than non-Whites by 0.1 in the LPM while in the probit model is 0.31. The logit results showed tha Whites 1.68 times of odds of non-White of securing loan. Looking at the disaggregated race result, with White as the base, the negative significant coefficient showed that all of other race have lower probability of securing loan than all other race. For example, the probability of American Indian assessing loan is 0.269 lower in the LPM model and 0.737 lower in the probit model and the odds of securing loan is 70% lower than that of white. The probability of Asian assessing loan is 0.02 lower in the LPM model and 0.07 lower in the probit model and the odds of securing loan is 12% lower than Whites. Also, the probability of Blacks assessing loan is 0.18 lower in the LPM model and 0.5 lower in the probit model and the odds of securing loan is 56% lower than Whites. Finally, the probability of Native Hawaiian assessing loan is 0.16 lower in the LPM model and 0.46 lower in the probit model and the odds of securing loan is 53% lower than Whites.

The result discussed above is without controls which means it is possible that some uncontrolled factors may bias the results. Therefore, in the next set of analyses, we control for household income, applicant income, loan amount and loan purpose and applicant gender. The result presented in Table 7 showed that even after controlling for these variables, the significant racial difference still persist. For example, in the LPM model, the probability of the loan being originated for White is 0.12 greater than non-white in general while when segregated, the probability of the loan being originated for White is 0.22 greater than that of American Indian, 0.065 greater than that of Asian, 0.163 greater than that of Black and 0.13 greater than that of Native Hawaiian. Similarly, in the probit model, the probability of the loan being originated for White is 0.37 greater than non-white in general while when segregated, the probability of the loan being originated for

White is 0.62 greater than that of American Indian, 0.23 greater than that of Asian, 0.49 greater than that of Black and 0.41 greater than that of Native Hawaiian. All differences are statistically significant. Similarly, in the logit model, the odds of loans being originated for White is 1.88 times that of non-White in general and when segregated, American Indian has 64.3% lower odds of having their loan originated than Whites. Asian applicants has 32.5% lower odds of having their loan originated than Whites, Black has 55.4% lower odds of having their loan originated than Whites and Native Hawaiian have 49.6% lower odds of having their loan originated than Whites. Conversely, in the case of ethnicity, the sign reversed when we control for these variables as non-hispanic now have lower probability and odds of having their loan originated. than Hispanic. This means that the ethnic segregation we observe before may not be real but the racial segregation is real.

7. Discussion

The study aims at investigating racial discrimination in housing. The result showed that for loans denied, debt-to-income ratio, credit history and collateral are major reasons while loans were denied for all the groups. The result showed that more Whites have their loan originated than Non-Whites while less Hispanic have their loan originated than non-Hispanic. Moreover, disaggregating the race, the result showed that American Indian have almost half of their application rejected comapared to 22.87% for Whites. Blacks have almost 18 percentage points rejection compared to Whites while if we consider Native Hawaiian, there is 16 percentage points higher rate of denial compared to Whites. The racial segregation seemed to have been entrenched over history as the proportion of non-Whites seeking loan for refinancing or house improvement is lower than that of Whites which depicts that more Whites own homes than non-Whites in the past.

 The regression models showed that for LPM, probit and logit models, without controlling for other factors, there is significant difference across ethnicity and race. The result showed that Hispanic have lesser chance of getting loans than non-Hispanic and non-Whites in general have lower chance of getting loans than Whites. If we seggreagate race, the result showed that American Indian have the lowest chance of securing mortgage loans followed by Black/African American then by Native Hawaiian and finally by Asian applicants. However, when variables like loan amount, applicants income, gender, household income and loan purpose were controlled for, the result showed that there is reversal for Hispanic and non-Hispanic which means after controlling for these variables, Hispanic now have higher probability of securing loan than non-Hispanic. This means that there is no ethnic segregation in loans and the difference found in the simple model is due to factors not controlled for. However, even after adding the control variables for race and disaggregated race, the result still remain the same which means that there is sufficient evidence that there is racial segregation in assessing mortgage loan in New York. The result supports the first hypothesis that loan originators consider race when deciding to originate loans or deny loans while there is no support for the second hypothesis that loan originators consider ethnicity (being Hispanic or not) when deciding to originate loans or deny loans. The result support the findings of (Krieger and Hikkins, 2002, Martinez and Kirchner, 2021, Williams, 2020; Ray et al., 2021) who also found significant racial segregation in mortgage loans.

7.0. Conclusion

This study investigates the presence of racial and ethnic segregation in mortgage loans in New York using the Home Mortgage Disclosure Data for the state of New York for the year 2015. The hypothesis was that loan originators consider ethnicity and race when deciding to originate loans or deny loans. descriptive statistics and binary models like linear probability model, logit model and probit model were used to compare findings. While the study provide evidence for the presence of racial segregation, no evidence was found for ethnic segregation. Therefore, despite the introduction of the Fair Housing Act of 1968 to erase racial discrimination in housing, the act have not had desired effect as racial segregation still persisits and there is need for stakeholders to take urgent step to attack this menace so that no American will live as a second-class citizen in the land of their fathers.


Baert, S and Pauw, A (2014). Is Ethnic Discrimination Due to Distaste or Statistics?" Economics Letters. 125 (2): 270–273.

Bayer P., Ferreira F., Ross S.L. (2014). Race, Ethnicity and High-Cost Mortgage Lending Working Paper. National Bureau of Economic Research; 2014.

Been V, Ellen IG, Madar J. (2009) The High Cost of Segregation: Exploring Racial Disparities in High-Cost Lending. Fordham Urban Law

Brancaccio, D and Conlon, R. (2021). How Mortgage Algorithms Perpetuate Racial Disparity in Home Lending. Available at https://www.marketplace.org/2021/08/25/housing-mortgage-algorithms-racial-disparities-bias-home-lending/ [retrieved 12 Jan 2022]

Hanck, C., Arnold, M., Gerber, A., and Schmelzer, M. (2021). Introduction to Econometrics with R. Open Review

Krieger J, Higgins DL (2002) Housing and Health: Time Again For Public Health Action. Am J Public Health. 2002 May; 92(5):758-68.

Krueger, Alan B. (2002-12-12). Economic Scene; Sticks and Stones Can Break Bones, But The Wrong Name Can Make A Job Hard To Find. The New York Times.

Martinez, E and Kirchner, L. (2021). How We Investigated Racial Disparities in Federal Mortgage Data. Available at https://themarkup.org/show-your-work/2021/08/25/how-we-investigated-racial-disparities-in-federal-mortgage-data [retrieved 12 Jan 2022]

Perrallion, M.C. (2019). Week 12: Linear Probability Models, Logistic and Probit. University of Colorado Anschutz Medical Campus

Perry, A.M. (2021). How Racial Disparities in Home Prices Reveal Widespread Discrimination. Available at https://www.brookings.edu/testimonies/how-racial-disparities-in-home-prices-reveal-widespread-discrimination/ [retrieved 12 Jan 2022]

Ray, R., Perry, A.M., Harshbarger, D., Elizondo, S. and Gibbons, A. (2021). Homeownership, Racial Segregation, and Policy Solutions To Racial Wealth Equity. Available at https://www.brookings.edu/essay/homeownership-racial-segregation-and-policies-for-racial-wealth-equity/ [retrieved 12 Jan 2022]

No comments yet be the first one to post a comment!
Post a comment