Very often in MR studies, we want to look at rankings of a set of attributes in terms of either importance to customer in making choice, relevance, usefulness or some other psychological parameter. The problem is relatively simple and straightforward if the number of attributes is small so that relative rankings can be obtained using direct questions on response scales or asking respondents to rank the attributes.
Responses obtained on importance, relevance, usefulness etc. using response scales can then be compared on averages to obtain a ranking. Directly asking respondents to rank attributes is perhaps more reliable when the number of items to be ranked is small. This is because rating scale questions are often subject to response bias where for example, respondents may be prone to using one of the scale more than the other. Besides this, response scales also lead to low differentiation. Although, differences in average scores can be tested using statistical significance testing, these results themselves are also sensitive to sample sizes. Generally, we are more likely to get statistically significant differences between two average scores with higher sample sizes than with lower sample sizes.
However, once the number of items are larger than 7, both rating scales and rankings present problems in terms of demands on respondents’ cognitive processing requirements and respondent attention spans. With a large number of items presented together, reliability of responses falls as respondents are more prone to unthinkingly marking out responses to quickly get over the exercise. While some of this monotony effect may be overcome by breaking up a battery on which ratings are required into different parts with intervening questions of another type, we are still left battling with low discrimination between items when using rating scales and their averages. Rankings, on the other hand, are even more problematic. Research has shown that the limits to our cognitive processing is reached when trying to process information required to rank 7 items (give and take 2). Therefore, ranking information received from ranking of long lists of items is most likely to be unreliable.
MaxDiff analysis allows us to solve the above problem by breaking it up into smaller cognitive tasks for respondents. Instead of a large number of items , say, 20, being presented to respondents, respondents are presented a list of 3 or 4 or 5 items at a time (in one task) and asked to select their Best and Worst from that set. A number of such tasks (typically 12-16 but maybe, larger given the final number of items and the number of items included in each task) are presented to each respondent. In order to ensure connectivity, items are repeated across the different tasks in combination with other items. Given the challenge of respondent fatigue if a large number of tasks are presented, respondents are not presented with all possible combinations of the items. There is some loss of information which is partially made up by using different versions of the MaxDiff exercise across different sets of respondents. This is what part of a typical MaxDiff question would look like for a single respondent:
Section: In the following questions, you will be presented with combinations of different attributes that have been found to be important in influencing parents' choice of a first school for their wards. For each question, please select the attribute that is least important to you and the one that is most important.
Q1. Considering the following attributes only, which would be the least important to you and which the most important in selecting a first school for your child?
|Least Important||Attribute||Most Important|
There are a few other design principles that are used in selecting the items and their combinations in different tasks. We would generally want:
Often all of these cannot be met exactly and small deviations are permitted.
The data from the MaxDiff survey sets up a multinomial logistic regression exercise and the results is reliable stacked derived ranking of the attributes/items in terms of the variable of interest (importance, value, liking, preference etc.). The final results would look something like the chart below where we show relative importance of a set of 23 items (relative importance adds up to 100% across all items). Note, however, that even though some of the percentage scores show close to each other, we can have confidence in the rankings because they were indirectly derived using an exercise that was easy for respondents and not cognitively difficult for them. These could also be scaled up to any arbitrary value for the highest ranked item so that every other attribute is measured against the highest ranked item.
In some studies, we are interested in arriving at an optimal bundle of products/services that the client wants to offer. If the number of component items is large, the procedure of choice would again be a MaxDiff analysis followed by a TURF analysis. An analytical method called Hierarchical Bayes allows us to arrive at MaxDiff rankings for each individual respondent (this assumes that the choice function is the same across all respondents i.e., that the respondent set is homogeneous).Once individual level rankings are obtained, a traditional TURF analysis can be conducted to optimize reach and design the optimal bundle of products/services that should be offered or the optimal product/service configuration.