Classification Trees, also known as decision trees, use classification algorithms to classify respondents based on their response to or membership in a certain categorical variable. For example, we may want to understand the demographic and other characteristics of physicians who are highly willing to prescribe a particular drug vis-à-vis those who are not less willing to prescribe the same. The initial categorization may be done on the basis of a scale. Once the categories are defined, we may want to understand membership of the category using other variables like specialty, practice setting, regional or urban-rural location, number of years of practice etc.
In the most common algorithm used for classification or decision trees, a chi-square based categorization is done to find the variable whose categories are most closely associated with the predicted variable (willingness to prescribe in this case). Suppose this is specialty. We will then dig further within each specialty to find the next most associated variable – this could be practice setting for nephrologists and years of practice for neurologists.
Prior decision is taken on the depth of the tree (depending on the minimum sample required at the last node). The decision tree is developed on a training sample and then confirmed using a test sample. Performance is judged based on predictive validity achieved in the test sample. Predictive validity must be higher than what would have been achieved without analysis through the mere assignment of all objects to the class with the highest frequency.