**Multiple-choice questions
**

**1. When all independent variables in a dataset are numeric, we should use a regression tree to build a classification model.
**

True

False

**2. The sum of sensitivity and specificity for a classification tree model where the response has only two classes, is always 100%.
**

True

False

**3. Which of the following is/are true about Random Forest and Extreme Gradient Boosting ensemble methods?
**

Random Forest is used for classification whereas Extreme Gradient Boosting is used for regression problems

Both methods can be used for classification problems

Both methods can be used for regression problems

Random Forest is used for regression problems whereas Extreme Gradient Boosting is used for Classification problems

**4. For three different classification models based on the same data, the following results were obtained:
**

**Model-1
**

Accuracy based on training data = 72.2%

Accuracy based on testing data = 66.9%

**Model-2
**

Accuracy based on training data = 73.5%

Accuracy based on testing data = 39.2%

**Model-3
**

Accuracy based on training data = 69.4%

Accuracy based on testing data = 68.3%

**Which model exhibits the over-fitting problem?
**

None of the models exhibit an over-fitting problem

Model-3

Model-2

Model-1

**5. Reduction in the value of the complexity parameter always results in a lower cross-validation relative error (xerror).
**

True

False

**6. Results from a random forest classification model show less variability in performance compared to a single tree.
**

True

False

**7. Performance of a random forest regression model can be assessed using aroc curve.
**

True

False

**8. A small value of complexity parameter results in a large tree.
**

True

False

**9. For a classification tree model to predict whether or not a potential customer will accept a credit card offer, if sensitivity is 80% then specificity must be 20%.
**

True

False

**10. In a classification tree, the most important independent variable is not part of the tree model.
**

True

False

**11. Which of the following statements are true about bagging trees? Please check any that apply.
**

Each dataset is a bootstrap sample from the training data obtained as a random sample with replacement.

There is a chance that in a bootstrap sample some observations occur more than once.

None of the options are correct.

Each bootstrap sample depends on the previous bootstrap sample.

**12. A random forest regression model can help overcome high variability in performance observed in regression trees.
**

True

False

**13. A classification tree with the highest cross-validation relative error (xerror) has the best performance.
**

True

False

**14. Various trees used in a boosting method are always independent of each other.’
**

True

False

**15. For detecting email spam (factor variable), which of the following methods are appropriate?
**

Logistic regression

Simple linear regression

Multiple linear regression

Regression tree

Classification tree

**16. Random forest model uses a random sample of all the independent variables at each split.
**

True

False

**17. Which of the following algorithms is not an example of an ensemble learning algorithm?
**

Bagging

Decision Trees

Random Forest

Extreme Gradient Boosting

**18. For three different classification models based on the same data, the following results were obtained:
**

Model-1

Accuracy based on training data = 72.2%

Accuracy based on testing data = 66.9%

Model-2

Accuracy based on training data = 73.5%

Accuracy based on testing data = 39.2%

Model-3

Accuracy based on training data = 69.4%

Accuracy based on testing data = 68.3%

**Which model has the best performance?
**

Model-3

Model-2

Model-1

**19. In a classification tree model based on bootstrap aggregating or bagging, the final prediction is based on majority voting from all trees.
**

True

False

**20. A disadvantage of a regression tree model is that it can have high variability in performance.
**

True

False