Our analysis team has experience in analyzing your data using a variety of statistical techniques to derive meaningful insights beyond what would be apparent from a review of the data tabulation output of frequencies by banner-tabs. At HBG, analytics is simply the act of deriving meaningful insights from data through a robust and validated study of relationships between variables. We never force-fit statistical models unless the data will allow it and we ensure that the analysis yields results that are robust and actionable. Our competencies extend from basic univariate analysis and tests of significance to more complex multivariate techniques.
Some of the techniques that we regularly use are:
Exploratory Data Analysis is the first look at our data. It includes Missing Value Analysis to see if missing data has systematic patterns that need explanation or whether they are Missing At Random (MAR) or Missing Completely at random (PCAR) processes. For example, doe we have systematic bias in missing responses from one particular demographic?
Besides this, this is our first look at the variables of our study often summarizing important information around means, variations and other distributional characteristics. This is also when we start looking at outliers in our data.
This also alerts us to potential problems in the data for example, is the variation across scale questions too less for some respondents? Is there a potential scale use bias among different demographics?
Generally, in market research surveys, marketing managers would have some hypothesis that they would like to test out? In a survey for a cigarette brand, for example, one hypothesis maybe : Female smokers generally prefer lower nicotine concentrations in cigarettes. These hypothesis are generally around demographic differences in behavior, attitudes and lifestyle choices. Or they could be generated by folk knowledge about the market as in the example above. They could also be differences around the strength of brands or products along different parameters.
Very often hypothesis are not framed explicitly for example, when comparing two potential advertisements on credibility post exposure. In these cases, either advertisement may fare better than the other and we may not have prior knowledge about this difference. Statistical procedures for comparison remain the same.
In all such analysis, significance testing of differences in proportions or means are used to compare differences. More than two groups may be compared over multiple variables. For testing of differences in mean effects of some treatment (say exposure to an advertisement) for different groups (say different SEC groups), a procedure called ANOVA is used where overall difference is first gauged before undertaking group-wise comparisons.
Correlation Analysis is generally used to look at associations between different variables. Usually, we look for linear associations Pearson correlations. In some cases, we may look for correlations between ranks of different variables in which case the non-parametric Spearman Rank Correlation may be used. Bivariate correlation analysis is essentially exploratory in nature though some market researchers have used this for comparison of key-drivers. In a key driver analysis with multiple impact variables, bivariate correlations make for unreliable judgments on the importance of the variables since in each bivariate correlation, the effect of the other impact variables is ignored.
Regression analysis with multiple impact variables is a more reliable method of assessing and quantifying impact of a set of independent variables on a focus dependent variable. Quantifies estimates of impact in absolute or percentage terms are provided by the beta-coefficients under a ceteris-paribus assumption.
Proper understanding of MR and analytics helps us avoid either over specification or under-specification errors in regression modeling. After all, the impact of a variable in a regression equation actually depends on the other variables present in the model. Including too many or too few variables is itself a problem and proper variable selection must precede actual estimation.
Product and Marketing teams are often faced with product/service design problems where decisions need to be taken on different product attribute levels or service levels and the prices at which these are to be offered. A tour operator, for example, may need to decide on a) places and locations b) total tour time (no of days and nights) c) level of hotel accommodation - 2-star, 3-star, 5-star etc 3) type of vehicle to be provided 4) nature and number of meals on tour to be included etc. Customers also make trade-offs while purchasing products and services laying more importance on more important attributes and trading off less important attributes. The other important trade-off is between price and product attribute/service level . A very common such choice is between flying Economy or Executive class in an airline and also, the choice of airline.
Conjoint analysis assists us in understanding the trade-offs customers make and in quantifying the importance that they place on different attributes. It also allows us to compare sets of attribute level-price level comparisons in terms of customer preference share.
In choice based conjoint surveys, respondents face different choice tasks where they select from different product /service profiles. Every respondent gets to see only a subset of the potentially possible product profiles. Sometimes constraints are placed to rule out technologically not feasible product profiles. Every respondent only gets to see a subset of all the possible product profiles. Different sets of respondents may face different sets of choice tasks based on an experimental design that tries to maximize efficiency. Respondents make choices from sets of 3, 4 or more product profiles in a single choice task and evaluate multiple choice tasks. Sometimes, to reduce size of the design, they may only be shown partial product profiles ( sub-set of attributes in a single task) or in online surveys, the choice tasks they face may be adapted to earlier evaluations of the attributes and levels ( adaptive choice based conjoint).
Compilation and analysis of the choices made by all respondents gives us quantifiable relative importance of each attribute and of its levels and the ability to simulate scenarios and predict preference shares for a set of product/service profiles in line with those considered for the survey ( having same attributes and levels or for price, levels within the price range considered). The insights are important for product design.
Maximum Difference Scaling is used when we need to rank a large number of items (attributes of a product, attributes of advertising messages etc.) reliably and with sufficient discrimination and confidence. While respondents may be asked to rate individual items on a typical MR scale, scale responses over a large number of items usually show the problem of low discrimination. For example, it is difficult to confidently rank two choice attributes that score averages of 7.44 and 7.52 on a 10-point scale. The other approach can be to ask respondents to directly assign ranks to the items. Research has shown that ranking more than seven items though direct assignment of ranks is a cognitively demanding task for most individuals. MaxDiff provides us with a reliable method for ranking large number of items while avoiding both of the problems above.
In a MaxDiff survey, each respondent sees sets of different items from which they are asked to chose their maximum and minimum on the construct of focus (say value, importance, preference etc.). Each respondent sees multiple such tasks while different sets of respondents may see different sets of tasks. The design of the experiment tries to minimize bias through considerations like a) one way frequency balance showing each item an equal number of times b) two-way freq. balance each task occurring with each other task an equal number of times etc. Attempt is made to minimize variations from such balance.
The data is collated and analyzed using Logit analysis (a form of regression analysis where the dependent and independent variables are all categorical in nature). The final results are a scaled ranking of the items for the aggregate of respondents. Analysis may also be performed separately for different demographic and other segments.
Classification Trees, also known as decision trees, use classification algorithms to classify respondents based on their response to or membership in a certain categorical variable. For example, we may want to understand the demographic and other characteristics of physicians who are highly willing to prescribe a particular drug vis-à-vis those who are not less willing to prescribe the same. The initial categorization may be done on the basis of a scale. Once the categories are defined, we may want to understand membership of the category using other variables like specialty, practice setting, regional or urban-rural location, number of years of practice etc.
In the most common algorithm used for classification or decision trees, a chi-square based categorization is done to find the variable whose categories are most closely associated with the predicted variable (willingness to prescribe in this case). Suppose this is specialty. We will then dig further within each specialty to find the next most associated variable this could be practice setting for nephrologists and years of practice for neurologists.
Prior decision is taken on the depth of the tree (depending on the minimum sample required at the last node). The decision tree is developed on a training sample and then confirmed using a test sample. Performance is judged based on predictive validity achieved in the test sample. Predictive validity must be higher than what would have been achieved without analysis through the mere assignment of all objects to the class with the highest frequency.
The impact of a variable and its contribution in explaining the focus variable of interest depends on other variables included in the model. If we include variables stepwise, The contribution to explanation depends on the stage at which the variable is included in the model . Advanced Key Driver Analysis takes into account this difference in contribution based on the stage of introduction and the other variables in the model.
In advanced key driver analysis using regressions, we will consider all possible permutations and combinations of an explanatory variable with the other explanatory variables - looking at model with different number of variables and different steps at which the explanatory variable of interest is introduced. Over all these many different model, often numbering in the thousands, we average out the contribution of each explanatory variable. This provides a more robust, stable estimate of the relative importance of different variables.
Cluster Analysis is the analytical equivalent of market segmentation analysis. The major part of a cluster analysis is to segment respondents based on their responses to a cluster variate ( a group of variables on which the clusters need to be formed). In most cluster analysis, the variate on which cluster analysis is performed is a set of questions designed to understand customer attitudes, opinions and lifestyles. The objective of the cluster analysis is to form groups of respondents which are homogeneous within the group with regards to the cluster variate and heterogeneous across groups.
The resulting segments are profiled on the individual variables to describe their characteristics in terms of attitudes, ideas, opinions, lifestyles or behavioral characteristics. They are then explored for systematic demographic differences. Demographic description is important if there is need to reach these segments physically through marketing material, promotions etc. However, the mere extraction of segments, if stable, is of immense value in designing marketing communications messages. For example, a health food manufacturer may reach out to a health conscious segment that is also high in their attendance to self-image as trendy and modern using appropriate advertising messages that appeal to these values.
Market Segmentation analysis is as much an art as a science and judgment exercised in determining the cluster variate, the number of clusters to be extracted, the detection of outlying clusters as outliers and not actual occurring small segments in the population etc. play an important part in the final interpretation and results. Where demographic distinction is present, external population data may be brought in to estimate overall size of segment in the population, growth rates in segment size and buying power and thus the segmentation provides a map for future marketing strategy aimed at the target segment.
Very often, marketing managers are interested in key drivers of overall evaluatory constructs like Satisfaction , Engagement, Trust etc. These constructs are themselves broad in scope and meaning and we need to rely on multiple parameters to measure these. On the explanatory side as well, we may be explained in broad evaluations like product quality, service quality etc. which again can only be measured by multiple variables. In a Corporate Reputation model, for example, we might be interested in understanding the relative impact of Product Quality, CSR , Ethical Business Practices etc. on overall Corporate Reputation. These are all broad constructs that cannot directly be measured. We call them Latent Variables and measure each of them using a set of items actually administered in the survey. The end-objective of such models is to a) understand granular levers for impacting the final construct of interest and b) understanding relative impact of these broad dimensions while ensuring that we have sufficiently reliable measurement of the various impacts of the final construct. At HBG we use SEM primarily for key driver analysis.
SEM models provide neat elegant background structures where survey items are grouped into different broad dimensions for studying a construct of interest. Metrics around these broad dimensions can also be developed and tracked over time.
We use Logistic Regressions where a variable we are trying to explain or predict are categorical in nature while the explanatory variables are a mixture of categorical and scale variables. Logistic Regression is useful in classification problems and scoring models. One example could be classifying potential defaulters on bank loans based on credit history. They can be used to predict likely responders to direct mailers, purchasers of cars after trial and in all such cases where we want to classify customers distinct categories.
Using logistic/multiple logistic regression, we can build our model based on an existing dataset, test its predictive validity and then deploy it on new data to predictively classify customers.
Perceptual Mapping techniques allow easy visualization and interpretation of differences and similarities between brands, products , segments of respondents . Using different exploratory techniques, respondent associations between different brands and a set of imagery statements can be represented in a set of one or more two-dimensional graphs. Each of the axes of the graph represents a dimension of brand imagery and the plotting of brands and attributes in that two dimensional space gives us a descriptive idea of how brands are associated with different image statements.
Perceptual Mapping Techniques may either start with a full set of attributes on which brands/products etc are to be compared or we may only have overall evaluations of the brands/products . In the latter case, the dimensions along which the brands/products are plotted are interpreted by the researcher and the marketing team. They may correspond to objective attributes on which the products/brands differ or more affective, perceptual dimensions.
When starting with a full set of attributes, interpretation becomes easier and proximity of attributes and brands/products, both sets of which are now mapped in perceptual space provide descriptive understanding of brand/product positioning. Close competitors for a planned product may also be identified.