Stacked Rankings Using Maximum Difference (MaxDiff) Analysis
Very often in MR studies, we want to look at rankings of a set of attributes in terms of either importance to customer in making choice, relevance, usefulness or some other psychological parameter. The problem is relatively simple and straightforward if the number of attributes is small so that relative rankings can be obtained using direct questions on response scales or asking respondents to rank the attributes.
Responses obtained on importance, relevance, usefulness etc. using response scales can then be compared on averages to obtain a ranking. Directly asking respondents to rank attributes is perhaps more reliable when the number of items to be ranked is small. This is because rating scale questions are often subject to response bias where for example, respondents may be prone to using one of the scale more than the other. Besides this, response scales also lead to low differentiation. Although, differences in average scores can be tested using statistical significance testing, these results themselves are also sensitive to sample sizes. Generally, we are more likely to get statistically significant differences between two average scores with higher sample sizes than with lower sample sizes.
However, once the number of items are larger than 7, both rating scales and rankings present problems in terms of demands on respondents’ cognitive processing requirements and respondent attention spans. With a large number of items presented together, reliability of responses falls as respondents are more prone to unthinkingly marking out responses to quickly get over the exercise. While some of this monotony effect may be overcome by breaking up a battery on which ratings are required into different parts with intervening questions of another type, we are still left battling with low discrimination between items when using rating scales and their averages. Rankings, on the other hand, are even more problematic. Research has shown that the limits to our cognitive processing is reached when trying to process information required to rank 7 items (give and take 2). Therefore, ranking information received from ranking of long lists of items is most likely to be unreliable.
MaxDiff analysis allows us to solve the above problem by breaking it up into smaller cognitive tasks for respondents. Instead of a large number of items , say, 20, being presented to respondents, respondents are presented a list of 3 or 4 or 5 items at a time (in one task) and asked to select their Best and Worst from that set. A number of such tasks (typically 12-16 but maybe, larger given the final number of items and the number of items included in each task) are presented to each respondent. In order to ensure connectivity, items are repeated across the different tasks in combination with other items. Given the challenge of respondent fatigue if a large number of tasks are presented, respondents are not presented with all possible combinations of the items. There is some loss of information which is partially made up by using different versions of the MaxDiff exercise across different sets of respondents. This is what part of a typical MaxDiff question would look like for a single respondent.