# Understanding Linear Regression in R

Linear regression is used in the prediction of the value of an outcome variable based on one or more output in the predictor variables. This is intending to establish a linear relationship between the predictor variables and the response variable. The formula is then used to estimate the value of the response when the predictors are known.

## Introduction

Today many public goods on the Internet are provided to users for free, many of which rely entirely on free user contributions. Taking Wikipedia as an example, as a free online encyclopedia, Wikipedia relies on volunteer contributors around the world to created and edited content. It is an interesting question, then whether the size of the user of a platform could change the incentive for users to contribute.  Authors utilize the shock event of the block of Chinese Wikipedia in mainland China in October 2005 as a natural experiment to test if the content contribution decrease as a result of the block event. During the block, mainland Chinese could not use or contribute to Chinese Wikipedia, while contributors outside mainland China can still use and contribute; naturally, this causes a dramatic decrease of users of the platform. Authors then test if the contribution levels of the nonblocked contributors also decrease within several weeks of the event. You can find the variable list and definition to each variable below:
 Variable Name Variable Definition date calendar date id Registered contributor ID Addition Total number of characters added Deletion The total number of characters deleted Total Total number of characters added and deleted join date Date of joining Wiki for the contributor last date Date of the last editing of the contributor nonblocked Dummy of the nonblocked contributor overseas Dummy of oversea IP week the week before/since the block event id_week the week before/since the block event (text) weekly_Addition Weekly total number of characters added weekly_Deletion Weekly total number of characters deleted age Age agesqr age squared log addition Log of (weekly total number of characters added + 1) log deletions Log of (weekly total number of characters deleted + 1) log Total Log of (weekly total number of characters added and deleted +1) after AfterBlock social_participation Log of (weekly average of total addition and deletion in user pages or user-talk pages before the block +1) if Total Dummy to indicate if the weekly total number of character add and delete is larger than zero

### Impacts of social effects

In this problem, we have replicated the interaction analysis for after block dummy and average social participation. We have adjusted the mean levels for age, agesqr and after variables along with the dependent variables for each contributor's ID. The social participation was not mean adjusted due to obvious reasons. The hypothesis that group size matter to the user’s free contributions still holds in this analysis at a 5% level of significance as the variable after is statistically significant at a 5% level of significance. However, the hypothesis does not hold for a 1% level of significance.
Table 1: Significance are common for all models
 Fixed effects of the week Dependent Variable Total SD Addition SD Deletion SD after*** 0.360 0.053 0.342 0.050 0.282 0.039 social_participation*** 0.540 0.013 0.510 0.013 0.372 0.010 age*** -0.023 0.002 -0.022 0.002 -0.012 0.002 agesqr** 0.000 0.000 0.000 0.000 0.000 0.000 after:social_participation*** -0.196 0.019 -0.186 0.018 -0.154 0.014
In all the three models, we have hypothesis that group size matter to the user’s free contributions still holds in this analysis at 1% level of significance as the variable after is statistically significant at 1% level of significance. The result is even more in support of claimed hypothesis than with the previous analysis

Table 2: Singnificance are common for all

 Fixed effects of the week Dependent Variable Total SD Addition SD Deletion SD after* -0.23 0.08 -0.18 0.08 -0.14 0.06 social_participation -0.00 0.01 -0.00 0.01 0.00 0.00 overseas -0.00 0.06 -0.00 0.06 0.00 0.05 age* -0.06 0.01 -0.06 0.01 -0.03 0.01 agesqr*** 0.00 0.00 0.00 0.00 0.00 0.00 after:social_participation*** -0.19 0.01 -0.18 0.01 -0.15 0.01 after:overseas 0.11 0.12 0.11 0.12 0.03 0.09 social_participation:overseas -0.00 0.04 -0.00 0.03 -0.00 0.03 after:social_participation:overseas** 0.16 0.07 0.14 0.07 0.15 0.05
After controlling for the overseas interaction effect, the hypothesis that group size matter to the user’s free contributions still holds in this analysis at a 5% level of significance as the variable after is statistically significant at a 5% level of significance. However, the hypothesis does not hold for a 1% level of significance.

Table 3: Significance is same across all models.

 Fixed effects of the week (Total) Dependent Variable Linear SD Logit SD Probit SD after*** -0.06 0.02 -0.60 0.14 -0.29 0.08 social_participation*** 0.06 0.00 0.37 0.01 0.21 0.01 age** -0.01 0.00 -0.07 0.02 -0.04 0.01 agesqr*** 0.00 0.00 0.00 0.00 0.00 0.00 after:social_participation -0.02 0.00 -0.01 0.02 -0.02 0.01
On regressing the if_Total on the response variable, we have three models: linear, logistic, and probit. The results are similar where variable significance is concerned.