+1 (315) 557-6473 

Understanding Linear Regression in R

Linear regression is used in the prediction of the value of an outcome variable based on one or more output in the predictor variables. This is intending to establish a linear relationship between the predictor variables and the response variable. The formula is then used to estimate the value of the response when the predictors are known.

Introduction

Today many public goods on the Internet are provided to users for free, many of which rely entirely on free user contributions. Taking Wikipedia as an example, as a free online encyclopedia, Wikipedia relies on volunteer contributors around the world to created and edited content. It is an interesting question, then whether the size of the user of a platform could change the incentive for users to contribute.  Authors utilize the shock event of the block of Chinese Wikipedia in mainland China in October 2005 as a natural experiment to test if the content contribution decrease as a result of the block event. During the block, mainland Chinese could not use or contribute to Chinese Wikipedia, while contributors outside mainland China can still use and contribute; naturally, this causes a dramatic decrease of users of the platform. Authors then test if the contribution levels of the nonblocked contributors also decrease within several weeks of the event. You can find the variable list and definition to each variable below:
Variable Name Variable Definition
date calendar date
id Registered contributor ID
Addition Total number of characters added
Deletion The total number of characters deleted
Total Total number of characters added and deleted
join date Date of joining Wiki for the contributor
last date Date of the last editing of the contributor
nonblocked Dummy of the nonblocked contributor
overseas Dummy of oversea IP
week the week before/since the block event
id_week the week before/since the block event (text)
weekly_Addition Weekly total number of characters added
weekly_Deletion Weekly total number of characters deleted
age Age
agesqr age squared
log addition Log of (weekly total number of characters added + 1)
log deletions Log of (weekly total number of characters deleted + 1)
log Total Log of (weekly total number of characters added and deleted +1)
after AfterBlock
social_participation Log of (weekly average of total addition and deletion in user pages or user-talk pages before the block +1)
 
if Total Dummy to indicate if the weekly total number of character add and delete is larger than zero
 

Impacts of social effects

In this problem, we have replicated the interaction analysis for after block dummy and average social participation. We have adjusted the mean levels for age, agesqr and after variables along with the dependent variables for each contributor's ID. The social participation was not mean adjusted due to obvious reasons. The hypothesis that group size matter to the user’s free contributions still holds in this analysis at a 5% level of significance as the variable after is statistically significant at a 5% level of significance. However, the hypothesis does not hold for a 1% level of significance.
Table 1: Significance are common for all models
Fixed effects of the week
Dependent Variable Total SD Addition SD Deletion SD
after*** 0.360 0.053 0.342 0.050 0.282 0.039
social_participation*** 0.540 0.013 0.510 0.013 0.372 0.010
age*** -0.023 0.002 -0.022 0.002 -0.012 0.002
agesqr** 0.000 0.000 0.000 0.000 0.000 0.000
after:social_participation*** -0.196 0.019 -0.186 0.018 -0.154 0.014
In all the three models, we have hypothesis that group size matter to the user’s free contributions still holds in this analysis at 1% level of significance as the variable after is statistically significant at 1% level of significance. The result is even more in support of claimed hypothesis than with the previous analysis

Table 2: Singnificance are common for all

Fixed effects of the week
Dependent Variable Total SD Addition SD Deletion SD
after* -0.23 0.08 -0.18 0.08 -0.14 0.06
social_participation -0.00 0.01 -0.00 0.01 0.00 0.00
overseas -0.00 0.06 -0.00 0.06 0.00 0.05
age* -0.06 0.01 -0.06 0.01 -0.03 0.01
agesqr*** 0.00 0.00 0.00 0.00 0.00 0.00
after:social_participation*** -0.19 0.01 -0.18 0.01 -0.15 0.01
after:overseas 0.11 0.12 0.11 0.12 0.03 0.09
social_participation:overseas -0.00 0.04 -0.00 0.03 -0.00 0.03
after:social_participation:overseas** 0.16 0.07 0.14 0.07 0.15 0.05
After controlling for the overseas interaction effect, the hypothesis that group size matter to the user’s free contributions still holds in this analysis at a 5% level of significance as the variable after is statistically significant at a 5% level of significance. However, the hypothesis does not hold for a 1% level of significance.

Table 3: Significance is same across all models.

Fixed effects of the week (Total)
Dependent Variable Linear SD Logit SD Probit SD
after*** -0.06 0.02 -0.60 0.14 -0.29 0.08
social_participation*** 0.06 0.00 0.37 0.01 0.21 0.01
age** -0.01 0.00 -0.07 0.02 -0.04 0.01
agesqr*** 0.00 0.00 0.00 0.00 0.00 0.00
after:social_participation -0.02 0.00 -0.01 0.02 -0.02 0.01
On regressing the if_Total on the response variable, we have three models: linear, logistic, and probit. The results are similar where variable significance is concerned.