# Monte Carlo Estimate

Suppose that a casino manager wants to introduce a new gambling game.  The  game is played as follows.  A player tosses a fair three sided die repeatedly until a face with one or two dots appears or the die is tossed 100 times. The initial stake starts at 3 dollars and is tripled every time a die toss results in a side with three dots.  The first time a face without three dotsappears, the game ends and the player wins whatever is in the pot. Thus the player wins 0 dollars if a 1 or 2 appears on the first toss, 3 dollars if a 3 appears on the first toss and 1 or 2 appears  on the second, 9 dollars if a 3 appears on the first and second toss and a 1 or 2 appears on the third, and so on.  Write a function whose input is  nReps=100,000 and whose output is based on the following two parts (use the function framework that is provided after part (b)).

1. Suppose that the casino manager desires that the house wins 1 dollar on average. What must the initial pay in fee be to achieve this? Do not forget to assess Monte Carlo error.
1. With the entry fee from part a, what is the probability of a player making a profit? (i.e. winning more money than the cost of the entry fee.)

Do not forget to assess Monte Carlo error.  Finally, report a Monte Carlo estimate for the maximum amount of winnings (for the maximum you do not need to assess Monte Carlo error).

set.seed(100)

sim.study(nReps=100000){

}

sim.study()

Problem 2

Since many statistical procedures depend on assuming that observations are normallydistributed, there have been a number of tests (more than 30 in fact) constructed to formally test this condition.  The context associated with these tests is the following.  Consider independent observations x1, x2,…,xn from random variable X and then test the following competing hypotheses H0: X has a normal distribution vs H1: X does not have a normal distribution.

In this part of the exam, you will explore statistical properties associated with two of the more commonly used normality testing procedures. More specifically, you will consider the following tests (R-functions that carry out the tests are provided in parenthesis).

1. Shapiro-Wilks test (shapiro.test)
2. Kolmogorov-Smirnoff test (ks.test. For the second argument of this function use y=”pnorm”)

You will explore how sample size influences the size and power of the two testing procedures. To do this, consider sample sizes of 10, 30, 100. In addition you will generate data from each of the following distributions

1.N(0,1) – standard normal distribution (mean 0, standard deviation 1)

1. t(4) – t distribution with 4 degrees of freedom
2. t(30) – t distribution with 30 degrees of freedom
3. Chisq(4) – Chi-squared with 4 degrees of freedom
4. Chisq(100) – Chi-squared with 100 degrees of freedom

Find Monte Carlo estimates of the size and power for each procedure using an alpha level of 0.05 (don’t forget to assess Monte Carlo error).  Please organize results in six 5 x 3 matrices (one for each sample size/testing procedure combo). The rows of the matrices should correspond to the five data generating mechanisms. The first column should contain a Monte Carlo estimate of rejecting the null hypothesis while the other two columns should provide assessment of Monte Carlo error.

Solution

n=10 #you can change it to 30 , 100

powerKS<- c(rep(0,100))

sizeKS<-powerKS #just initiate

powerSH<-powerKS #just initiate

sizeSH<-powerKS #just initiate

for( i in 1:100) { #generating 100 samples from each distrn.

x1<-rnorm(n)

x2<-rt(n,df=4)

x3<-rt(n,df=30)

x4<- rchisq(n,df=4)

x5<-rchisq(n,df=100)

#these are the samples generated from 5 distrn.

c <- c(rep(0,5))

#KS test

y1=ks.test(x1,pnorm)\$p.value

y2=ks.test(x2,pnorm)\$p.value

y3=ks.test(x3,pnorm)\$p.value

y4=ks.test(x4,pnorm)\$p.value

y5=ks.test(x5,pnorm)\$p.value

if( 0.025<y1 && y1<0.975) { c=1} #we mark c[j] as 1 if jth distribution is detected as normal by KS test

if(0.025<y2 && y2<0.975) { c=1}

if(0.025<y3 && y3<0.975) { c=1}

if(0.025<y4 && y4<0.975) { c=1}

if(0.025<y5 && y5<0.975) { c=1}

z1=shapiro.test(x1)\$p.value

z2=shapiro.test(x2)\$p.value

z3=shapiro.test(x3)\$p.value

z4=shapiro.test(x4)\$p.value

z5=shapiro.test(x5)\$p.value

d <- c(rep(0,5))

if(0.025 <z1 && z1<0.975) { d=1}

if(0.025 <z2 && z2<0.975) { d=1}

if(0.025 <z3 && z3<0.975) { d=1}

if(0.025 <z4 && z4<0.975) { d=1}

if(0.025 <z5 && z5<0.975) { d=1}

powerKS[i]=1-(c+c+c+c)/4

sizeKS[i]= 1-c

powerSH[i]=1-(d+d+d+d)/4

sizeSH[i]= 1-d

}

power_estKS = mean(powerKS)

size_estKS=mean(sizeKS) #montecarloestimates,KS=ks.test,Sh=shapiro.test

power_estSH = mean(powerSH)

size_estSH=mean(sizeSH)

MCE_powerKS=sd(powerKS)

MCE_sizeKS=sd(sizeKS)

MCE_powerSH=sd(powerSH)

MCE_sizeSH=sd(sizeSH) #MCE estimates

…..

#problem1

#let number of players be p=500. you may change if you need .

set.seed(100)

sim.study<- function(nReps=10^5) {

X <- c(rep(0,nReps)) #initiation

for(k in 1:nReps) {

p=1000

gain<- c(rep(0,p)) #initiation

for (j in 1:p) {

x <-sample(3,100,rep=T)

if(x != 3){

g=0

}

if(x==3) {

for(i in 2:100){

if(x[i] != 3) {

g=3^(i-1)

break   }

}

}

gain[j]=g

}

X[k]=mean(gain) }

#X is the array of estimates of avg.gain of players

payinfees=mean(X)+1 #this answer gives the payinfees needed . Since mean(X) is approximately the average gain. So , to make profit , it has to be mean(X)+1. MCE=sd(X) as usual

MCE=sd(X)

prob<- c(rep(0,nReps))#initiate

max<- c(rep(0,nReps))

for(k in 1:nReps) {

p=500

count=0

max<- c(rep(0,nReps))

#gain <- c(rep(0,p)) #initiation

for (j in 1:p) {

x <-sample(3,100,rep=T)

if(x != 3){

g=0

}

if(x==3) {

for(i in 2:100){

if(x[i] != 3) {

g=3^(i-1)

break   }

} #break is used to stop once you get 1 or 2

}

if(g>max[k]) {

#this loop computes max in each of nReps cases

max[k]=g }

if(g >payinfees) {

count=count+1 #count checks how many are scoring above payinfees

}

}

prob[k]=count/p

}

prob_est=mean(prob)

MCE_prob=sd(prob)

max_est=mean(max)