Multiple-choice questions
1. When all independent variables in a dataset are numeric, we should use a regression tree to build a classification model.
True
False
2. The sum of sensitivity and specificity for a classification tree model where the response has only two classes, is always 100%.
True
False
3. Which of the following is/are true about Random Forest and Extreme Gradient Boosting ensemble methods?
Random Forest is used for classification whereas Extreme Gradient Boosting is used for regression problems
Both methods can be used for classification problems
Both methods can be used for regression problems
Random Forest is used for regression problems whereas Extreme Gradient Boosting is used for Classification problems
4. For three different classification models based on the same data, the following results were obtained:
Model-1
Accuracy based on training data = 72.2%
Accuracy based on testing data = 66.9%
Model-2
Accuracy based on training data = 73.5%
Accuracy based on testing data = 39.2%
Model-3
Accuracy based on training data = 69.4%
Accuracy based on testing data = 68.3%
Which model exhibits the over-fitting problem?
None of the models exhibit an over-fitting problem
Model-3
Model-2
Model-1
5. Reduction in the value of the complexity parameter always results in a lower cross-validation relative error (xerror).
True
False
6. Results from a random forest classification model show less variability in performance compared to a single tree.
True
False
7. Performance of a random forest regression model can be assessed using aroc curve.
True
False
8. A small value of complexity parameter results in a large tree.
True
False
9. For a classification tree model to predict whether or not a potential customer will accept a credit card offer, if sensitivity is 80% then specificity must be 20%.
True
False
10. In a classification tree, the most important independent variable is not part of the tree model.
True
False
11. Which of the following statements are true about bagging trees? Please check any that apply.
Each dataset is a bootstrap sample from the training data obtained as a random sample with replacement.
There is a chance that in a bootstrap sample some observations occur more than once.
None of the options are correct.
Each bootstrap sample depends on the previous bootstrap sample.
12. A random forest regression model can help overcome high variability in performance observed in regression trees.
True
False
13. A classification tree with the highest cross-validation relative error (xerror) has the best performance.
True
False
14. Various trees used in a boosting method are always independent of each other.’
True
False
15. For detecting email spam (factor variable), which of the following methods are appropriate?
Logistic regression
Simple linear regression
Multiple linear regression
Regression tree
Classification tree
16. Random forest model uses a random sample of all the independent variables at each split.
True
False
17. Which of the following algorithms is not an example of an ensemble learning algorithm?
Bagging
Decision Trees
Random Forest
Extreme Gradient Boosting
18. For three different classification models based on the same data, the following results were obtained:
Model-1
Accuracy based on training data = 72.2%
Accuracy based on testing data = 66.9%
Model-2
Accuracy based on training data = 73.5%
Accuracy based on testing data = 39.2%
Model-3
Accuracy based on training data = 69.4%
Accuracy based on testing data = 68.3%
Which model has the best performance?
Model-3
Model-2
Model-1
19. In a classification tree model based on bootstrap aggregating or bagging, the final prediction is based on majority voting from all trees.
True
False
20. A disadvantage of a regression tree model is that it can have high variability in performance.
True
False