194 32 1MB
English Pages 296 [295] Year 2008
Data Mining and Market Intelligence for Optimal Marketing Returns
This page intentionally left blank
Data Mining and Market Intelligence for Optimal Marketing Returns
Susan Chiu Domingo Tavella
AMSTERDAM • BOSTON • HEIDELBERG • LONDON • NEW YORK • OXFORD PARIS • SAN DIEGO • SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Butterworth-Heinemann is an imprint of Elsevier
Butterworth-Heinemann is an imprint of Elsevier Linacre House, Jordan Hill, Oxford OX2 8DP, UK 30 Corporate Drive, Suite 400, Burlington, MA 01803, USA First edition 2008 Copyright © 2008 Susan Chiu and Domingo Tavella. Published by Elsevier Inc. All rights reserved The right of Susan Chiu and Domingo Tavella to be identified as the authors of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988 No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without the prior written permission of the publisher. Permissions may be sought directly from Elsevier ’s Science & Technology Rights Department in Oxford, UK: phone (⫹44) (0) 1865 843830; fax (⫹44) (0) 1865 853333; email: [email protected]. Alternatively you can submit your request online via the Elsevier homepage (http://elsevier.com), by selecting “Support & Contact” then “Copyright and Permission” and then “Obtaining Permission”. Notice No responsibility is assumed by the publisher for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions or ideas contained in the material herein British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data A catalog record for this book is available from the Library of Congress ISBN: 978-0-7506-8234-3 ISBN: 978-0-7506-7980-0 For information on all Butterworth-Heinemann publications visit our web site at http://books.elsevier.com
Typeset by Charon Tec Ltd., A Macmillan Company. (www.macmillansolutions.com) Printed and bound in United States of America 08 09 10 11
10 9 8 7 6 5 4 3 2 1
Contents
Preface Biographies
xi xiii
1 Introduction Strategic importance of metrics, marketing research and data mining in today’s marketing world The role of metrics The role of research The role of data mining An effective eight-step process for incorporating metrics, research and data mining into marketing planning and execution Step 1: identifying key stakeholders and their business objectives Step 2: selecting appropriate metrics to measure marketing success Step 3: assessing the market opportunity Step 4: conducting competitive analysis Step 5: deriving optimal marketing spending and media mix Step 6: leveraging data mining for optimization and getting early buy-in and feedback from key stakeholders Step 7: tracking and comparison of metric goals and results Step 8: incorporating the learning into the next round of marketing planning Integration of market intelligence and databases Cultivating adoption of metrics, research and data mining in the corporate structure Identification of key required skills Creating an effective engagement process Promoting research and analytics
11 12 14 15
2 Marketing Spending Models and Optimization Marketing spending model Static models Dynamic models
19 21 23 34
1 3 4 4 6
6 6 7 8 8 9 9 9 10 10
vi
Contents
Marketing spending models and corporate finance A framework for corporate performance marketing effort integration 3 Metrics Overview Common metrics for measuring returns and investments Measuring returns with return metrics Measuring investment with investment metrics Developing a formula for return on investment Common ROI tracking challenges Process for identifying appropriate metrics Identification of the overall business objective Understanding the impact of a marketing effort on target audience migration Selection of appropriate marketing communication channels Identification of appropriate return metrics by stage in the sales cycle Differentiating return metrics from operational metrics 4 Multi-channel Campaign Performance Reporting and Optimization Multi-channel campaign performance reporting Multi-channel campaign performance optimization Uncovering revenue-driving factors 5 Understanding the Market through Marketing Research Market opportunities Market size Factors that impact market-opportunity dynamics Market growth trends Market share Basis for market segmentation Market segmentation by market size, market growth, and market share: case study one Using market research and data mining for building a marketing plan Marketing planning based on market segmentation and overall company goal: case study two Target-audience segmentation Target-audience attributes Types of target-audience segmentation
35 36 39 41 42 42 43 44 45 45 46 49 56 61
63 65 67 71 73 75 75 76 80 80 81 82 85 85 88 88 89
Contents
Understanding route to market and competitive landscape by market segment Routes to market Competitive landscape Competitive analysis methods Overview of marketing research Syndicated research versus customized research Primary data versus secondary data Surveys Panel studies Focus groups Sampling methods Sample size Research report and results presentation Structure of a research report 6 Data and Statistics Overview Data types Overview of statistical concepts Population, sample, and the central limit theorem Random variables Probability, probability mass, probability density, probability distribution, and expectation Mean, median, mode, and range Variance and standard deviation Percentile, skewness, and kurtosis Probability density functions Independent and dependent variables Covariance and correlation coefficient Tests of significance Experimental design 7 Introduction to Data Mining Data mining overview An effective step by step data mining thought process Step one: identification of business objectives and goals Step two: determination of the key focus business areas and metrics Step three: translation of business issues into technical problems Step four: selection of appropriate data mining techniques and software tools Step five: identification of data sources
91 91 93 95 100 101 105 106 108 109 109 110 112 112 115 117 117 118 118 118 120 120 121 122 126 126 130 134 137 139 141 141 142 142 143 143
vii
viii
Contents
Step six: conduction of analysis Step seven: translation of analytical results into actionable business recommendations Overview of data mining techniques Basic data exploration Linear regression analysis Cluster analysis Principal component analysis Factor analysis Discriminant analysis Correspondence analysis Analysis of variance Canonical correlation analysis Multi-dimensional scaling analysis Time series analysis Conjoint analysis Logistic regression Association analysis Collaborative filtering
145 145 146 146 151 163 165 166 168 172 175 176 179 186 188 190 190
8 Audience Segmentation Case study one: behavior and demographics segmentation Model building Model validation Case study two: value segmentation Model building Model validation Case study three: response behavior segmentation Model building Validation Case study four: customer satisfaction segmentation Model building Validation
193 195 196 201 205 207 208 208 209 210 210 212 213
9 Data Mining for Customer Acquisition, Retention, and Growth Case study one: direct mail targeting for new customer acquisition Purchase model on prospects having received a catalog Purchase model based on prospects not having received a catalog Prospect scoring Modeling financial impact
144
219 221 222 224 226 226
Contents
Case study two: attrition modeling for customer retention Case study three: customer growth model
227 229
10 Data Mining for Cross-Selling and Bundled Marketing Association engine Case study one: e-commerce cross-sell Model building Model validation Case study two: online advertising promotions Model building Model validation
233 235 236 237 239 241 242 243
11 Web Analytics Web analytics overview Web analytic reporting overview Brand or product awareness generation Web site content management Lead generation E-commerce direct sales Customer support and service Web syndicated research
245 247 248 248 249 250 252 253 253
12 Search Marketing Analytics Search engine optimization overview Site analysis SEO metrics Search engine marketing overview SEM resources SEM metrics Onsite search overview Visitor segmentation and visit scenario analysis
255 257 259 262 263 263 264 265 265
Index
269
ix
This page intentionally left blank
Preface
Over the last several decades, Marketing Research has been benefiting from the ever-increasing wave of quantitative innovation in fields that have been traditionally regarded as the purview of softer disciplines. The rising level of quantitative education in the marketing research community, the extraordinary wealth of information accessible on the Internet, along with fierce competition for customers conspire to create a growing need for sophisticated applications of data-mining, statistical, and empirical methodologies to the formulation and implementation of marketing plans. As business experience is increasingly informed by the results of rigorous analysis, it becomes ever more clear that the application of quantitative modeling techniques in marketing has a direct effect on the bottom line. In the extremely competitive environment of the global economy, the potential high price of a misdirected marketing effort is made unacceptable by the abundance of information that, if properly extracted and interpreted, can guide the effort to success. This book’s primary audience is the quantitative middle of the marketing professional spectrum. The primary objectives of the book are to distill and present a portfolio of techniques and methods of demonstrable efficacy in the design, implementation, and continued assessment of a marketing effort. The selection of techniques and the extent and depth of coverage of the quantitative background needed for their practical use have benefited from our experience in practical marketing research and quantitative modeling. The resolution of business issues and the practicality of implementation have been our most important guiding principles in covering the material. The materials we discuss are essential components in today’s sophisticated quantitative marketing professional’s toolbox. The mathematical and statistical issues whose understanding is required to insure the correct interpretation of the various methodologies and their outputs are introduced with minimal complexity. The emphasis in on practical applications, exemplified with case studies implemented in standard computational analysis environments, such as SAS and SPSS. There are three main components in the coverage of the book. The first component refers to the importance and integration of marketing research, metrics, and data mining into the marketing investment process.
xii
Preface
The second is a detailed discussion of marketing research and data mining methods with a view to solve the practical needs of a marketing effort design and implementation. The third thrust of the book is the application of the methodology to illustrative case studies, representative of the common practical challenges marketing professionals confront. San Francisco September 2007
Susan Chiu Domingo Tavella
Biographies
Susan Chiu Susan Chiu is currently Director of Business Intelligence at Ingram Micro, Inc., where she is responsible for advanced analytics and marketing research consulting. Susan Chiu has over 15 years of quantitative marketing research experience and has held positions in analytics, data mining, and business intelligence with Cisco Systems, Wells Fargo, Providian Bancorp, and Safeway Coporation. Susan Chiu has a Masters degree in Statistics from Stanford University.
Domingo Tavella Domingo Tavella is Principal of Octanti Associates, a consulting firm focused on advanced quantitative modeling in finance and marketing. Dr. Tavella has over 25 years of mathematical and computational modeling experience in fields ranging from aerodynamic design, biomedical simulation, computational finance, and marketing modeling. He holds a Ph.D. in Aeronautical Engineering from Stanford University and an MBA in Finance from UC Berkeley.
This page intentionally left blank
CHAPTER 1
Introduction
This page intentionally left blank
■ Strategic importance of metrics, marketing research and data mining in today’s marketing world Today’s marketing executives are under significant pressure to be accountable for their companies’ returns on investment both in the boardroom and in front of their shareholders. The following excerpt from Business Week by Brady, Kiley, and Bureau Reports (Farris, Bendle, Pfeifer and Reibstein 2006) vividly encapsulates this shift in what is expected of marketing executives. ‘For years, corporate marketers have walked into budget meetings like neighborhood junkies. They couldn’t always justify how well they spent past handouts or what difference it all made. They just wanted more money – for flashy TV ads, for big-ticket events, for you know, getting out the message and building up the brand. But those heady days of blind budget increases are fast being replaced with a new mantra: measurement and accountability.’ As pressure for accountability cascades through an organization, every functional group is under scrutiny, and those who cannot quantify their impact on generating satisfactory returns on investment are placed in a vulnerable position. At downsizing or budget reduction time, marketing executives are in the front line. Marketing, as it turns out, is among those corporate functions that are under the closest scrutiny. In recent years, there has been increased awareness and a stronger motivation among marketing professionals to quantify returns on investment. However, there is also a challenge in selecting the proper tools for measuring market returns from the large number of strategic and analytic tools that have emerged in the past decade. Planning, research, execution, and optimization are the four key stages in marketing efforts. The objective of the planning stage is to define the appropriate metrics for measuring marketing returns. The number of metrics needs to be kept under control to ensure that the measuring task is achievable. In the research stage, marketing research is done to have a better understanding of the overall market opportunities and the competitive landscape. In the execution stage, effective implementation is an essential requirement for the success of the marketing effort. In the optimization stage, marketing strategies and tactics are optimized and finetuned on an ongoing basis.
4
Data Mining and Market Intelligence
■ The role of metrics In the previous section, we alluded to the need for defining marketing metrics at the planning stage. A metric is a variable that can be measured and used to quantify the performance of a marketing effort. Metrics fall into the following categories: return metrics, investment cost metrics, operational metrics, and business impact metrics. It is important to understand the roles that different types of metrics play. Return metrics are often referred to as key performance indicators (KPI) or success metrics. The costs of marketing programs, goods sold, and capital are investment cost metrics that must be optimally related to metrics measuring investment returns. Operational metrics influence the performance of return metrics (most of the metrics we consider fall under this category), and a thorough understanding of their impact on return metrics is essential in order to track those with the highest potential. One common mistake in marketing is to invest significant resources to track hundreds of operational metrics without precisely quantifying whether they significantly influence success. Finally, it is essential to understand how marketing investment impacts a company’s financial performance. Ideas such as cash flow analysis or economic value added (EVA) have been utilized to link marketing investment and company financial performance (Doyle 2000).
■ The role of research In essence, marketing research consists of the discovery and analysis of information for understanding the market that a particular firm faces. The America Marketing Association (AMA) offers a comprehensive definition of marketing research (Bennett 1988). ‘Marketing research links the consumer, customer, and public to the marketer through information – information used to identify and define marketing opportunities and problems; generate, refine, and evaluate marketing actions, monitor marketing performance; and improve understanding of marketing as a process. Marketing research specifies the information required to address these issues; designs the method for collecting information; manages and implements the data collection process; analyzes the results; and communicate the findings and their implications.’ Since customers are key components of a market, customer research should also be considered as part of marketing research. Marketing research has been present in the corporate world for decades. Its applications mainly focus on market sizing, market share analysis, product concept testing, pricing strategies, focus groups, brand
Introduction
perception studies, and customer attitude or perception research. The following examples are typical applications of marketing research to address business problems. Although these examples remain fairly common marketing research applications, they are somewhat limited in the whole scheme of marketing investment. ● ● ● ● ●
Running a focus group to evaluate customer experience in certain retail bank branches Determining the feasibility of a full product rollout by first conducting a test in a small and easy-to-control market Conducting a recall test to determine a TV advertisement’s impact on product awareness Compiling market share information for a briefing to a group of industry analysts Conducting a focus group to evaluate new product features.
Marketing research groups are often spread across various corporate functions such as corporate communications, public relations, corporate marketing, segment marketing, vertical marketing, business units, and sales. Under such an organizational setup, the various marketing research efforts in a particular firm serve specific purposes and are sometimes disconnected from each other. In recent years, there has been recognition that optimal synergy among research teams requires centralization of the marketing research teams. The recent economic climate has fostered a broader application of research to marketing investment. For securing resources and funding, marketing investment plans need to be justified by a reasonable level of returns, and this justification needs to be backed up by facts, forecasts, data, and analysis of opportunities. Marketing research generates market opportunity information ideal for supporting such marketing investment plans. For instance, one important question to address is the geographical allocation of marketing investment. Marketing research can be used to determine market opportunities by geography and to drive optimal investment decisions. With increasing frequency, marketing executives at major corporations are asked to submit their annual budget plans with forecast of corresponding returns on investment. The best practices report Maximizing Marketing ROI by the American Productivity and Quality Center (APQC) in conjunction with the Advertising Research Foundation (ARF) reported the following findings (Lenskold 2003). ●
● ●
The pressure is on marketing to demonstrate a quantifiable return and on CEOs to deliver value to their stockholders and business alliance partners ROI-based marketing is sought by more marketers ROI-based models encourage decision makers to challenge and revise the budgeting process.
5
6
Data Mining and Market Intelligence
■ The role of data mining Berry and Linoff give the following definition of data mining (Berry and Linoff 1997). Data mining is the process of exploration and analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns and rules. In essence, data mining is the application of statistical methodologies to analyzing data for addressing business problems. While marketing research allows for opportunities to be identified at a macro level, data mining enables us to discover granular opportunities that are not immediately obvious and can only be detected through statistical techniques and advanced analytics. High-level insights provide directional guidance while granular detail facilitates optimization, execution, and tactics. Insight garnered through marketing research can help drive data mining analysis by providing the initial direction. Conversely, results from data mining analysis can be used to refine high-level strategies. Marketing research and data mining are two disciplines that are complementary to each other, and there is growing awareness of the values added that these two disciplines combined can provide.
■ An effective eight-step process for incorporating metrics, research and data mining into marketing planning and execution The following flowchart (Figure 1-1) summarizes a step-by-step approach for incorporating metrics, marketing research, and data mining into marketing planning and execution.
Step 1: identifying key stakeholders and their business objectives It is crucial to identify the key stakeholders that will support a market effort and those that will implement the recommendations from research and analysis. Buy-in from key stakeholders throughout the process is essential for getting analytic results accepted and implemented. Key stakeholders need to quantify their business objectives and define such objectives as goals to be achieved. An objective might be to increase
Introduction
Step 1 Identifying key stakeholders and their business objectives
Step 2 Selecting appropriate metrics to measure marketing success
Step 3 Assessing the market opportunity
Step 4 Conducting competitive analysis
Step 5 Determining optimal marketing spending and media mix
Step 6 Leveraging data mining for optimization and getting early buyin and feedback from key stakeholders
Step 7 Tracking and comparing of metric goals and results
Step 8 Incorporating learning into next round of marketing planning
Figure 1-1 An effective eight-step process for incorporating metrics, marketing research, and data mining into marketing planning and execution.
sales revenue. An increase in revenue by a specific percentage over the course of a year, therefore, is a quantified objective to be reached. An objective that is not quantified is hard to measure against and should not be used to derive investment strategy.
Step 2: selecting appropriate metrics to measure marketing success A marketing plan requires that metrics should be clearly defined from the outset since the selection of appropriate metrics can direct resources to optimal use. Multiple metrics may need to be examined simultaneously to glean the insights we seek. Multiple metrics can be used to validate one another and maximize the accuracy of the information gathered. A single metric such as revenue growth alone might not shed as much light on the true opportunity as revenue growth and market share information combined. If the business objective is to increase sales revenue by a given amount, then naturally sales revenue is the appropriate return metric to track. In
7
8
Data Mining and Market Intelligence
the case of an advertising program, brand equity (the monetary worth of a brand) and brand awareness are the appropriate return metrics to track. Brand equity is defined as the net present value of the future cash flow attributable to the brand name (Doyle 2000) while brand awareness is the level of exposure and perception that customers have about a particular brand.
Step 3: assessing the market opportunity Market opportunity assessment consists of addressing four fundamental questions. 1. 2. 3. 4.
Where are the market opportunities? What are the market segments? What is the size of each segment? How fast does each segment grow?
Market opportunity information can be acquired through multiple approaches. One approach is through exploration of publicly available news and existing company internal data. Another approach is leveraging third party marketing research sources, which offer a wide range of forecasts on market opportunities by segment. These forecasts, which consist of both opportunity size and growth information, tend to be driven by different assumptions. In situations where market opportunity information is not readily available, customized research is required to gather the information.
Step 4: conducting competitive analysis In the absence of competition, a company can take full advantage of market opportunities. With competition, however, companies can only realize market opportunities by understanding and outperforming their competitors. As Aaker points out, one important reason why the Japanese automobile firms were able to penetrate the US market successfully, especially during the 1970s, is that they were much better than US firms at doing competitive analysis. David Halberstam described the Japanese effort at competitor analysis in the 1960s ‘They came in groups… They measured, they photographed, they sketched, and they tape-recorded everything they could. Their questions were precise. They were surprised how open the Americans were’ (Aaker 2005). Competitive intelligence is an extremely important discipline in the world of marketing research and data mining. A combination of survey data and real life transaction data can be used to analyze and track competitive information. Part of a competitive intelligence analysis is to objectively assess product features, pricing, and brand value of the key players in a market.
Introduction
Product features that meet customer needs represent competitive advantages, and pricing is often used as a tool for gaining market share at the expense of profitability. Since brand perception often affects purchasing decisions it is important to incorporate brand strength and weakness analysis into competitive intelligence.
Step 5: deriving optimal marketing spending and media mix After the fundamental information on market opportunities and competitor landscape has been collected, we proceed to determine the optimal marketing spending given a business objective. As we will elaborate in Chapter 2, there are numerous analytical approaches for modeling optimal marketing spending. Optimization involves maximization or minimization of a particular metric such as maximization of profit and minimization of risk. Maximizing profit is the most common objective in optimization of marketing spending. Some companies may choose to maximize revenue regardless of profitability, but doing so imperils the firm’s long-term value.
Step 6: leveraging data mining for optimization and getting early buy-in and feedback from key stakeholders The high-level and directional insights into market opportunities provided by marketing research serve as the foundation for building a highlevel marketing strategy. However, implementation of a high-level strategy through tactics requires significant analytical work. This is where data mining adds value by delineating a ‘how to’ road map to realize the opportunities uncovered by research. Marketing research could identify a geographic area as the best opportunity. Since it is very costly to target every prospect in this geography, it is necessary to select a target list for a marketing campaign, which requires building a response model to predict the likelihood of a prospect’s response. Response modeling requires statistical data mining techniques such as trees and logistic regression. Soliciting key stakeholders’ feedback and input in the data collection, research and data mining processes can help fine-tune the accuracy and objectivity of the data mining effort by removing potential roadblocks and barriers in the processes.
Step 7: tracking and comparison of metric goals and results In the final presentation on the performance of a marketing campaign, it is essential to compare results derived from the application of the selected
9
10
Data Mining and Market Intelligence
key metrics against the initial business goal. In a successful marketing campaign where goals are achieved, effective strategy and tactics can be applied to future campaigns. In a failed marketing campaign where the result trails the goal, areas of improvement for strategy and tactics can be identified for improving the performance of future campaigns. The final presentation of any research or data mining project is a decisive factor for the success of the project. Good research or data mining work poorly presented will fail to gain adoption and implementation. We have been the victims of speakers who did not know how to ‘work an audience’, to bring them to the point where they are quite ready to accept what is being recommended (Blankenship and Breen 1995).
Step 8: incorporating the learning into the next round of marketing planning Learning from the past programs needs to be incorporated into the next round of marketing planning as an ongoing optimization process, a practice that ultimately leads to a competitive advantage. Learning over time transcends to internal and proprietary market and customer intelligence which competitors have no access to.
■ Integration of market intelligence and databases Market intelligence refers to insights generated from marketing research or data mining. Market intelligence provides the maximum value and insight when its components and parts are weaved together to depict an overall picture of the market opportunities and challenges. Information on revenue growth, competitors, or market share in isolation does not provide significant value, since a company may be growing its revenue but at the same time losing market share if its competitors grow faster. Information on past customer purchase data can often be misleading if the future needs of the same customers differ drastically from their past needs. To facilitate building market and customer intelligence, it is necessary to have integrated database systems that link together data from sales, marketing, customer, research, operations, and finance. Although not a requirement, ideally all the data would be maintained on the same hardware system. If there is more than one single database, then marketing, sales, customer, research, and finance databases need to be related through some sort of identification ID such as customer ID, campaign or program code, date of purchase, and transaction ID.
Introduction
Marketers often encounter data quality challenges. The following is a list of common data quality issues (Groth 2000). ● ● ● ● ● ●
Redundant data Incorrect or inconsistent data Typos Stale data Variance in defining terms Missing data.
The best strategy to deal with data quality is to make sure that key stakeholders are fully aware of the imperfections of any data issues. Very often these same key stakeholders can help drive efforts for cleaning and standardizing the data. Poor data quality arises due to many factors, not the least of which is erroneous data from original data source systems. These source systems may include systems of Enterprise Resource Planning (ERP), Point of Sale (POS), Financial, Customer Relationship Management (CRM), Supply Chain Management (SCM), marketing research, campaign management, advertising servers, e-mail delivery, web analytic tools, and call centers. Firms should consider establishing an automatic process that checks and corrects data input into the source systems. Market opportunity forecasts created by internal departments may vary from those provided by external research firms. The former are often used for setting sales goals and as a result tend to be more conservative while the latter tend to be more aggressive to accommodate a broad set of objectives and assumptions of research subscribers. This difference may lead to inconsistencies that make it difficult to assess the accuracy of the data.
■ Cultivating adoption of metrics, research and data mining in the corporate structure Given the importance of metrics, research, and data mining, having a team specialized in these areas working closely with all key business functional groups can be a competitive advantage. In high-tech industries where sales and marketing groups are often run as separate groups, it is imperative that a dedicated analytic team interface with both marketing and sales groups to ensure proper planning and execution. When sales and marketing agree upon common metrics for setting their benchmarks, the two groups can work effectively together. If sales and marketing have different assessments of market potential, the two groups will likely create unsynchronized or
11
12
Data Mining and Market Intelligence
even conflicting goals in their marketing and sales programs, which may result in suboptimal execution of the overall marketing effort. The following are additional tips for successfully incorporating research and analytics into the corporate structure.
Identification of key required skills Skills in three key disciplines, metrics measurement, marketing research, and data mining, are required for assembling a successful research and analytic team effort. Besides discipline-specific capabilities, there are additional skills that are common requirements across the three disciplines.
Common required skills Clear communication enables a research and analytic team to effectively acquire feedback and articulate findings, thereby facilitating buy-in from key stakeholders. Many analytic professionals are used to communicating in technical terms and have difficulty translating technical terms into plain everyday language. This imposes an extra burden on analytic professionals when explaining analytic concepts to their nonanalytic peers. Two of the most common communication issues are a lack of a clear understanding of the questions asked, and the tendency to give unnecessary information when delivering an answer. An executive who asks a question ‘What is the expected return of the program?’ expects to get a response clearly stating the expected return. Rather than giving a direct answer, many analytical professionals tend to give a vague response and then quickly go on and elaborate on the data mining techniques applied even when the executive does not specifically ask about the data mining techniques being used. The first step toward resolving this communication issue between analytic and nonanalytic professionals is to cultivate ‘active listening’ skills. Active listening requires understanding of what others ask before giving replies. Another required common skill is the ability to focus on truly important tasks and to be able to prioritize tasks based on predetermined criteria, a significant challenge when confronting multiple projects. One way to facilitate focus and prioritization is to establish and formalize a standard engagement process where given criteria are used to determine the priority level of a project. Such criteria may include expected return on investment, turnaround time, resource requirement, revenue potential, and risk level. Another required skill across metrics, research, and data mining is experience and training in marketing and knowledge of the company line of business. The type of marketing experience and training required depends on the overall company marketing culture and use of communication media. Some corporations rely on traditional marketing communication
Introduction
channels such as print and catalog while others focus on new media such as e-mail, search, blog, ipod, and web marketing. Familiarity with the specific types of marketing communication channels that a firm uses allows for derivation of deeper insights from analysis and more substantial business recommendations.
Metrics-specific required skills Metrics specific skills are also called measurement skills, which in the marketing consulting world refer to the identification and tracking of marketing campaign performance. Metrics skills include hands-on experience in tracking and measuring performance of a wide array of marketing communication channels. These communication channels include, but are not limited to, TV, radio, direct mail, e-mail, telemarketing, web marketing, online or print advertising, search marketing, social marketing (blog, community marketing), and podcast. Usage of metrics does not require advanced data mining or statistical skills; rather it requires hands-on experiences in marketing campaign planning, management, execution, and performance tracking and analysis. Metrics experts are expected not only to have extensive understanding of marketing channels and programs, but also to have clear insights into what is important to the overall marketing business. Before selecting any metrics, metrics experts conduct discovery meetings with the key stakeholders to fully understand their goals and propose metrics that are aligned with these goals. Metrics identification and measurement benefit greatly from strong reporting skills, such the ability to create reports using standard tools such as EXCEL, ACCESS, and OLAP tools such as Business Objects, Brio, Crystal Report, and Cognos. Finally, metrics expertise also includes an understanding of both the potential and the limitations of data for constructing or deriving metrics. Practitioners should seek for alternative data sources or metrics if the existing sources have poor data quality.
Research-specific required skills There are two basic types of marketing research: syndicated and customized research. Marketing research skills are often acquired through training in the social science disciplines. Syndicated research expertise includes experience in effective acquisition of data from syndicated research vendors, and management of vendor relationships and research subscriptions. Customized research expertise consists of skills for designing and managing projects, survey research, focus groups, vendor selection, requests for proposal (RFP) process, and presentations of findings and results. Industry and product knowledge is also an important
13
14
Data Mining and Market Intelligence
required attribute for customized research in that they allow for better decisions over vendor selection and extraction of insight from studies. Knowledge and experience in economics, which entail skills in collecting, analyzing, and interpreting market trends, economic climate data, and economic impact on market opportunities, are valuable attributes for both syndicated research and customized research.
Data mining-specific required skills Practice of the data mining discipline is driven by two main skill sets: statistics and information technology. The required statistical skills include the abilities to conduct exploratory analysis and to apply a broad range of data mining techniques to solving marketing problems. The information technology skills include expertise in database structure, data processing and quality control, and data extraction skills such as Standard Query Language (SQL).
Creating an effective engagement process An engagement process needs to be in place to effectively manage and enhance the research and analytic efforts. Without an engagement process, a research and analytic team passively takes on ad-hoc requests where prioritization may be based solely on the particulars of the corporate pecking order without regard for the most relevant business objectives. In such situations efforts may be driven and prioritized by rationales other than project returns on investment. The following list describes a step-by-step engagement process. 1. Choosing a point of contact person within a research and analytic team: This individual should have an overall understanding of the capabilities of the team. Ideally, the point person should be a recognized project manager who can effectively manage timelines, collect requirements, transmit the requirements back to the team, and build relationships. 2. Determining the communication channels through which a research and analytic team can be engaged: There are numerous ways in which an engagement can take place, such as phone, e-mail, and the web. Online request forms can be used to gather business requirements to be followed up with face-to-face needs assessment meetings if needed. The research and analytic management team can review incoming requests on a regular basis. Written requests allow for systematic documentation of requirements and customer needs. 3. Selecting the criteria, process, and frequency for project prioritization: Project prioritization involves ranking projects on the basis of predetermined criteria and using the ranking to determine the order for executing the projects. These criteria include project return on
Introduction
investment, incremental revenue, and incremental number of leads generated. Key team members and stakeholders should be involved in the project prioritization process by holding periodic discussion and prioritization meetings. The prioritization frequency refers to the frequency of holding discussion and prioritization meetings and depends on the anticipated duration of projects. It is common to adopt a monthly frequency of prioritization since overly frequent prioritization is not necessary and can disrupt work schedule. 4. Clearly communicating project delivery timelines and deliverables: After the priority of a project is determined, the group point person needs to communicate the project timeline and deliverables to those who request the group’s service, and effectively manage their expectations throughout the project duration.
Promoting research and analytics A research and analytical service to potential internal and external customers can be promoted in a number of ways. One approach is the distribution of a periodical e-newsletter to communicate the offerings, the accomplishments, and future project pipeline of such a service to key stakeholders. Another approach is the creation of a service web site with the following key sections. ● ● ● ● ● ● ●
Home page Who we are Engagement process Services Case studies and success stories Events Contact us.
■ Book outline The remaining chapters of the book are organized as follows.
Chapter 2: marketing spending models and optimization Chapter 2 introduces the concept of marketing spending modeling for deriving an optimal overall marketing spending budget and effectively allocating this budget across different product categories or marketing
15
16
Data Mining and Market Intelligence
communication channels. The chapter then gives a conceptual overview on how to associate marketing returns with the financial performance of a firm based on the modern portfolio theory.
Chapter 3: metrics overview This chapter proposes a step-by-step procedure to guide metric selection by first introducing the concept of a sales funnel and its five stages. It then discusses the various types of metrics commonly used in marketing such as return metrics, investment metrics, and operational metrics in the context of a sales funnel. This chapter also gives an overview on the various marketing communication channels and how they are usually used across the five key stages in a sales funnel.
Chapter 4: multi-channel campaign performance reporting and optimization This chapter discusses how to report and optimize the overall performance of marketing campaigns that utilize multiple communication channels. The performance reporting section examines the identification and aggregation of common return metrics across multiple communication channels. The performance optimization section of the chapter discusses data mining on operational metrics to uncover the operational metrics with the highest influence on return metrics.
Chapter 5: understanding the market through marketing research This chapter discusses creating a deeper understanding of the market through marketing research. Understanding of the market includes knowledge and insights on the market opportunity and segmentation, routes to market, and competitive landscape. This chapter also reviews marketing research fundamental topics such as syndicated research, customized research, primary data, secondary data, survey design and sampling, focus group, and panel group.
Chapter 6: data and statistics overview This chapter discusses data and statistical concepts that drive selection of data mining techniques for solving marketing problems. Topics such as data types, data distributions, and sampling methodologies are reviewed in detail.
Introduction
Chapter 7: introduction to data mining This chapter examines an array of widely utilized data mining techniques applied to marketing by providing a theoretical overview of each technique and discussing specific examples for some of the techniques. Standard data mining procedures such as data exploration, modeling, validation, and testing are introduced. The following data mining techniques are covered. ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Association analysis Analysis of variance (ANOVA) and multivariate analysis of variance (MANOVA) Canonical correlation analysis Cluster analysis Collaborative filtering Conjoint analysis Correspondence analysis Decision tree Discriminant analysis Factor analysis Logistic regression Multi-dimensional scaling (MDS) Principal component analysis (PCA) Time series.
Chapter 8: audience segmentation Chapter 8 presents four case studies on audience segmentation to illustrate the application of four data mining techniques: cluster, CART, CHAID, and discriminant analysis. The four case studies are on behavior and demographics segmentation, value segmentation, response behavior segmentation, and customer satisfaction segmentation. Each case study gives the background of a business problem, the data mining technique applied to address the problem, the data mining model building and validation processes, and the marketing recommendations resulted from the data mining analysis.
Chapter 9: data mining for customer acquisition, retention, and growth This chapter discusses three case studies on targeting, growth and retention models to demonstrate the application of the logistic regression technique. Each case study examines the background of a business problem, the data mining technique used to solve the problem, the
17
18
Data Mining and Market Intelligence
data mining model building and model validation processes, and the recommendations.
Chapter 10: data mining for cross-selling and bundled marketing This chapter discusses two case studies on e-commerce and targeted online advertising promotions. In both case studies, the fundamental data mining techniques for cross-sell and up-sell are applied to real marketing scenarios.
Chapter 11: web analytics The chapter introduces the fundamentals of web analytics and its key metrics by business objectives such as lead generation and online e-commerce. It also introduces syndicated research tailored for understanding web marketing trends and online customer behavior.
Chapter 12: search marketing analytics This chapter discusses the principles of three search marketing disciplines: search engine optimization (SEO), search engine marketing (SEM), and onsite search. The chapter also provides links to web resources on subjects such as key words, domain, meta tags, and pay per click.
■ References Aaker, D. Strategic Market Management, 7th ed. John Wiley & Sons, New York, 2005. Bennett, P.D. Dictionary of Marketing Terms. American Marketing Association, Chicago, Illinois, 1988. Berry, M.J.A., and G.S. Linoff. Data Mining Techniques: For Marketing, Sales, and Customer Support. John Wiley & Sons, New York, 1997. Blankenship, A.B., and G.E. Breen. State of the Art Marketing Research. Chapter 12, page 277. NTC Business Books, Lincolnwood, Illinois, 1995. Doyle, P. Value-Based Marketing – Marketing Strategies for Corporate Growth and Shareholder Value. John Wiley & Sons, New York, 2000. Farris, P.T., N.T. Bendle, P.E. Pfeifer and D.J. Reibstein. Marketing Metrics: 50+ Metrics Every Executive Should Master. Wharton School Publishing, Upper Saddle River, New Jersey, 2006. Groth, R. Data Mining – Building Competitive Advantage. Prentice Hall PTR, Upper Saddle River, New Jersey, 2000. Lenskold, J.D. Marketing ROI – The Path to Campaign, Customer, and Corporate Profitability. McGraw Hill, New York, 2003.
CHAPTER 2
Marketing Spending Models and Optimization
This page intentionally left blank
Two of the most important questions in marketing refer to how much should be spent and how should the budget be allocated. These questions can be answered in more than one way, depending on the particulars of the firm’s circumstances and the availability of data. In this chapter we address these important issues in an econometric framework.
■ Marketing spending model The primary objective of a marketing spending model is to establish a relationship between marketing investments and marketing returns. Marketing returns are the benefits that a firm receives when it invests in marketing, such as sales value or number of product units sold. A properly devised marketing spending model also helps us understand the way these variables interact, allowing us to gain deeper insights into what is most effective at influencing marketing return. Most of us are aware of potential diminishing returns of marketing spending. That is to say, as market spending increases, its incremental or marginal impact eventually starts to decrease. This is just one of the many features of the complex relationship between marketing spending and revenue. The exact relationship between marketing return and marketing spending can take on many different mathematical forms depending on factors such as the type and frequency of data and the industry segment where the model is applied. Typically, various functions need to be explored to derive the best model and the best corresponding function form. In this book, we will use the term ‘marketing spending model’ to name models that describe the relationship between marketing spending and marketing return. In addition to marketing spending as an independent variable impacting marketing returns, there are other potential independent variables such as seasonality, product price, and the state of the economy. The functional form that describes the relationship between marketing spending and return, which may be quite general and complex in nature, must above all lend itself to calibration in a stable and predictable way. A comprehensive and detailed description of marketing spending model mathematics is beyond the scope of this book. Here we give a brief summary of the issues involved to guide the reader into a more extensive treatment of the subject, such as the excellent book by Hanssens, Parsons, and Schultz, where an extensive list of valuable references can be found. In some sources, the term ‘market response model’ is used in place of ‘marketing spending model’, used in this book. There are two reasons for settling for the terms ‘marketing spending model’. One reason is that it clearly conveys the message that such a model is related to marketing spending. Another reason is that it avoids confusion with the terminology
22
Data Mining and Market Intelligence
used in targeting analysis, where ‘market response model’ is often used to refer to ‘response targeting models’. For any marketing spending model to be effective, it must be data based. This reliance on data is what configures an empirical marketing spending model (Hanssens, Parsons and Schultz 2001). Data enters in the formulation and calibration of a marketing spending model in two primary ways: sequentially and cross-sectional. Sequential data comes in the form of time series information, consisting of values at discrete points in time. Crosssectional data is data that describes values that occur at the same point in time where these values can belong to time series. In general, when generating cross-sectional data, we deal with multidimensional time series – that is, discrete and simultaneous information of several variables. Although the primary framework in setting up and calibrating marketing spending models is data based, at the inception of a new marketing plan the relevant data may not be available. In this case, the data-based model is preceded by an initial growth stage, where parameters are set based on the subjective judgment of experienced managers. Our assumptions about markets as well as the level of detail that we want to capture influence the modeling task. We may, for example, assume that the market parameters are stationary, such as constant demographics and employment level, or that market parameters are evolving very slowly in comparison with our planning horizon. In such cases, our models will be designed to respond appropriately in stationary markets. A different and more complex situation arises if markets are evolving rapidly when compared with our planning horizon. In such a case, a different class of models would be required to capture the intrinsic dynamics of the market. If we consider the market to be stationary, there are two types of models we can postulate, each corresponding to a different level of analysis. At a simpler level of analysis, we may assume that the sales and drivers adjust instantaneously as their level changes. This means that whatever functional relationships we formulate between our variables only involve time to the extent that the variables change in time, not to the rate of change of the variables in time. A situation like this reflects equilibrium among variables and the types of models appropriate for this case are referred to as static models. From the perspective of time series data, static response models involve the marketing investment variables evaluated at a single point in time. Simple regression models fall in this category. Within the assumption of stationary markets, at a more complex level of analysis we may consider the time of adjustment among variables as their levels change. This means that our model will capture not only the time changes in the levels of the variables, but also the rates at which the variables change in time. A model capable of capturing the noninstantaneous adjustment among variables is referred to as a dynamic model.
Marketing Spending Models and Optimization
Dynamic models involve the marketing investment variables evaluated at multiple points in time. From a time series’ point of view, this implies that the model will involve lags and leads. Simple auto-regressive models are examples of this category. In addition to the two time effects we have described so far – the response to the level of variables as opposed to both the level and the level changes of variables – there is yet another time effect imposed by market fluctuations. To properly capture market fluctuations we must formulate models in nonstationary or evolving markets. Models of this type must be able to capture the nonstationarity of the statistical properties of time series. Auto-regressive moving average models (ARMA) are examples of this type of models. Here we will limit the discussion to stationary markets. This is the situation we face when our marketing planning horizon, usually on a quarterly basis, is relatively short compared with the time evolution scales of economic or demographic effects.
Static models A model can be very complex if the full interaction among variables is taken into consideration. A simple static model is of the form Q ⫽ c0 ⫹ c1 f (X )
(2.1)
where Q is the dependent variable of interest, such as sales volume, X is the independent variable, in this case the marketing spending in the marketing plan, and c0 and c1 are coefficients to be estimated. Independent variables are also called explanatory or predictive variables. If function f(X) is linear, the model is referred to as a linear model. Notice that this notion of linearity is not the same as the concept of linearity we encounter in estimation problems. In estimation, linearity refers to the way in which coefficient c0 and c1 enter in the functional form of the model. In estimation problems, the formulation is called linear if the estimation coefficients enter linearly, even if the dependent variables appear in a nonlinear form. The important distinction in this regard is that as long as the estimation coefficients appear linearly in the model they can be estimated by linear regression.
Elasticity An important concept to characterize a model is the notion of elasticity (Hanssens, Parsons and Schultz 2001). Elasticity is the ratio of the relative changes of the dependent and independent variables.
23
24
Data Mining and Market Intelligence
Q Q X Q ⫽ eX ⫽ X X Q X
(2.2)
If there are several explanatory variables, the elasticity with respect to one variable is computed keeping the other variables constant. Mathematically, for infinitesimal changes in X and Q the elasticity can be written in terms of the partial derivative eX ⫽
⭸Q X ⭸X Q
(2.3)
Simple linear model In the simplest case of the linear model Q ⫽ c0 ⫹ c1 X
(2.4)
c1 X c0 ⫹ c1 X
(2.5)
the elasticity is eX ⫽
A model of this form reflects the assumption that additional marketing spending results in the same increment in the dependent variable, regardless of level. This situation is referred to as constant return to scale. It is more realistic to assume a situation of diminishing returns, where additional marketing spending brings about increasingly smaller responses. The constant elasticity model we analyze in the next section accomplishes this objective.
Power models An interesting model that accomplishes diminishing returns is known as the constant elasticity model Q ⫽ aX b
(2.6)
where 0 ⬍ b ⬍ 1. This is a particular case of a power model. The elasticity of this model is constant and equal to b, which gives intuitive meaning to this coefficient. However, the price we pay for having constant elasticity is that the rate of change of Q for vanishing values of X is infinitely large, as shown in Figure 2-1.
Marketing Spending Models and Optimization
Q
0
X
Figure 2-1 Sales volume as a function of marketing effort in the constant elasticity model. Another attractive property of this model is that we can estimate the parameters with linear regression by working with the logarithms of both sides of Eq. (2.6) log Q ⫽ log a ⫹ b log X
(2.7)
The linear regression gives us log b, from which we can extract b. The functional form in Eq. (2.6) can be generalized to the case of multiple independent variables in several ways. In the case of two variables, we have Q ⫽ a1 X1b1 ⫹ a2 X 2b2 ⫹ a12 X1b12 X 2b21
(2.8)
This model captures the interaction of the independent variables in the last term, but is no longer a constant elasticity model. We can maintain the desirable property of constant elasticity property by defining our model as follows Q ⫽ a12 X1b12 X 2b21
(2.9)
Notice that the unrealistic behavior of infinitely rapid change of Q for vanishing values of the independent variables persists in this formulation.
25
26
Data Mining and Market Intelligence
S-shaped models If we can argue that the nature of the return changes from an increasing one to a decreasing one as a function of the independent variable, we can consider an S-shaped curve. In an S-shaped model, there is a transition from a convex to a concave return represented by an inflexion point. A simple function that represents such a shape is the following exponential model, ⎛ b⎞ Q ⫽ exp ⎜⎜ a ⫺ ⎟⎟⎟ ⎜⎝ X⎠
(2.10)
where both a and b are positive. It is easy to see that the elasticity of this model decreases with X eX ⫽
b X
(2.11)
Figure 2-2 shows the overall shape of this functional form. The inflexion point is located at X⫽ b/2. The fact that this function starts out with a zero slope means that there is no response to small marketing efforts.
Q
Inflexion point 0
X
Figure 2-2 Sales volume in the exponential model.
Modifications to the S-shaped model Other possible modifications to the S-shaped model include imposing a saturation level that reflects the fact that sales may not increase beyond certain level of effort, or a sales floor to indicate that sales may still take place in the total absence of any marketing effort. Functions of this type
Marketing Spending Models and Optimization
capable of describing general S-shapes are called sigmoid functions, of which the well-known logistic function is a particular case. An example of a nonsymmetrical one-dimensional logistic model that incorporates both a sales floor QL and a saturation level QU is the following Q⫽
QL ⫹ QU aX b 1 ⫹ aX b
(2.12)
A plot of this function is shown in Figure 2-3. The function starts at QL and asymptotes to QU as X grows. If QL and QU are postulated, the parameters in Eq. (2.41) can be estimated using the logarithmic form of the function. For example, for the case of two variables we have log
Q ⫺ QL ⫽ log a ⫹ b1 log X1 ⫹ b2 log X 2 QU ⫺ Q
(2.13)
Saturation level
Q
Sales floor X
Figure 2-3 One-dimensional logistic model with sales floor and saturation level.
Semilogarithmic model A semilogarithmic function captures diminishing returns and as a result is a widely used function form (Leeflang, Wittink, Wedel and Naert 2000). In a semilogarithmic model, number of units of products sold Q and marketing spending X follow the relationship. Q ⫽ 0 ⫹ 1 ln X ⫹
(2.14)
27
28
Data Mining and Market Intelligence
A regression estimate of Q is: Qˆ ⫽ ˆ0 ⫹ ˆ1 ln X
(2.15)
We now apply this functional form in an optimization framework where we maximize profits with respect to marketing spending. For this exercise, we define profit as gross profit adjusted by marketing spending. Gross profit is the difference between revenue and the cost of producing a product or providing a service without adjustment for items affected by marketing expense, such as overhead or payroll. Consistent with this definition, profit is given by P ⫽ ( p ⫺ c)Q ⫺ X
(2.16)
where p and c are quantities independent of X representing the unit price and the unit variable cost of goods sold, respectively. To maximize profit, the following condition has to be met. ⭸(( p ⫺ c)Q ⫺ X ) ⭸P ⭸Q ⭸X ⭸Q ⫽ ⫽ ( p ⫺ c) ⫺ ⫽ ( p ⫺ c) ⫺1⫽ 0 ⭸X ⭸X ⭸X ⭸X ⭸X
(2.17)
Based on Eq. (2.17) and replacing Q with its estimator, we get ⎛ ⭸ ln X ⎞⎟ ˆ 1 ⭸Q ⫽ ⫽ ˆ1 ⎜⎜⎜ ⎟⎟ ⫽ 1 ⎝ ⭸X ⎠ p⫺c X ⭸X
(2.18)
Therefore, the marketing spending that will optimize the profit of the marketing effort is: X ⫽ ( p ⫺ c)ˆ1
(2.19)
Marketing spending model case studies Next we discuss two case studies illustrating the use of static models. The first case study applies the formulation discussed in the previous section, while the second expands the analysis to include the residual effect over time of marketing expense. Case study one Assume a company spent $5100 on a suboptimal marketing effort, sold 800 units of products, and realized a profit of $34,900. We will apply a semilogarithmic model to the historical data of the firm to determine the optimal marketing spending that would have maximized the profit of the marketing effort, and then compare the optimal marketing spending with the amount that the firm actually spent.
Marketing Spending Models and Optimization
To derive the optimal marketing spending which maximizes the profit of the marketing effort, we use the following parameters in Eqs. (2.15) and (2.19) based on historical data: p ⫽ $100, c ⫽ $50, ˆ ⫽ ⫺20, ˆ 1 ⫽ 120. Given 0 these parameters, the optimal marketing spending is: X ⫽ (100 ⫺ 50) ⫻ 120 ⫽ 6000
(2.20)
The estimated number of product units sold given the optimal marketing spending is: Qˆ ⫽ ˆ0 ⫹ ˆ1 ln X ⫽ ⫺20 ⫹ 120 ln(6000) ⫽ 1024
(2.21)
The maximum profit of the marketing effort is then: ˆ ⫺ X ⫽ (100 ⫺ 50) ⫻ 1024 ⫺ 6000 ⫽ 45, 200 P ⫽ ( p ⫺ c) ⫻ Q
(2.22)
Based on this analysis, we conclude that the optimal marketing spending should be $6000, $900, or 18%, above the amount that the company actually spent. If the company had spent $6000 on marketing, it would have produced a profit of $45,200, 30% higher than the profit it actually realized. Case study two A dollar spent on marketing activities today drives not only the sales today, but also sales in the future. The impact of advertising on generating marketing returns has a residual effect over time. In the next case study, we incorporate such residual effects in a time series (Leeflang, Wittink, Wedel, and Naert 2000). We assume that the residual effect is t at time t and that the simply compounded discount rate per unit time period is i. The estimated profit given the optimal marketing spending is: t⫽n
t ⫺X t t⫽0 (1 ⫹i)
P ⫽ ( p ⫺ c)(ˆ0 ⫹ ˆ1 ln X ) ⫻ ∑
(2.23)
Using the sum of a geometric series, this expression can be simplified as the following when n is large. ⎞⎟ ⎛ ⎜⎜ ⎟⎟ 1 ⎜ ⎟⎟ ⫺ X P ⫽ ( p ⫺ c)(ˆ0 ⫹ ˆ1 ln X ) ⎜⎜ ⎟⎟⎟ ⎜⎜ ⎟ ⎜⎝ 1 ⫺ 1 ⫹ i ⎟⎠
(2.24)
29
30
Data Mining and Market Intelligence
To maximize the profit of a marketing effort with respect to marketing spending, the condition imposed by has to be satisfied: ⎞⎟ ⎛ ⎜ ⎟⎟ ˆ1 ⎜⎜ 1 ⭸P ⎟⎟ ⫺ 1 ⫽ 0 ⎜⎜ ⫽ ( p ⫺ c) ⭸X X ⎜⎜ 1 ⫺ ⎟⎟⎟ ⎟ ⎜⎝ 1 ⫹ i ⎟⎠
(2.25)
The optimal marketing spending is given by ⎞⎟ ⎛ ⎜⎜ ⎟⎟ 1 ⎜ ⎟⎟ X ⫽ ( p ⫺ c)ˆ 1 ⎜⎜ ⎟⎟⎟ ⎜⎜ ⎟ ⎜⎝ 1 ⫺ 1 ⫹ i ⎟⎠
(2.26)
Assume a company spent $8000 on a suboptimal marketing effort, sold 1000 units of products, and realized a profit of $42,000. We will apply a semilogarithmic model to the historical data of the firm to determine the optimal marketing spending that would have maximized the profit of the marketing effort, and then compare the optimal marketing spending with the amount that the firm actually spent. To derive the optimal marketing spending which maximizes the profit of the marketing effort, assume now the following parameters: p ⫽ $100, c ⫽ $50, ˆ0 ⫽ ⫺20, ˆ ⫽ 150, ⫽ 20%, and i ⫽ 3%. 1 Based on Eq. (2.26), the optimal marketing spending is ⎞⎟ ⎛ ⎜⎜ ⎟⎟ 1 ⎜⎜ ⎟⎟ ⫽ 9307 X ⫽ (100 ⫺ 50) ⫻ 150 ⫻ ⎜ 0.2 ⎟⎟⎟ ⎜⎜ ⎟ ⎜⎝ 1 ⫺ 1 ⫹ 0.03 ⎟⎠
(2.27)
Therefore, maximum profit of the marketing effort that can be achieved is ⎞⎟ ⎛ ⎜⎜ ⎟⎟ 1 ⎜ ⎟⎟ ⫺ 9307 ⫽ 74 , 509 P ⫽ (100 ⫺ 50)(⫺20 ⫹ 150 ⫻ ln(9307 )) ⎜⎜ 0.2 ⎟⎟⎟ ⎜⎜ ⎟ ⎜⎝ 1 ⫺ 1 ⫹ 0.03 ⎟⎠ (2.28) Based on this analysis, we conclude that the optimal marketing spending is $9307. The company actually spent $8000; this means it underspent by $1307, or 16%. If the company had spent $9307 on marketing, it would
Marketing Spending Models and Optimization
have produced a profit of $74,509, 77% higher than the actual profit realized.
Optimal multi-channel marketing spending allocation In this section we discuss a model that can be used in either multi-channel or multi-product situation, where we evaluate the distribution of the total marketing spending across multiple marketing communication channels or products. This model is a modified version of a multi-product model originally developed by Doyle and Saunders (Leeflang, Wittink, Wedel, and Naert 2000). Assume that there are n different marketing communication channels. Let Qj denote the contribution of marketing communication channel j to the number of units sold of product, and let Xj denote the company’s marketing spending on marketing communication channel j. Q j ⫽0 j ⫹ 1 j ln X j ⫹ j
(2.29)
The regression estimate of the total number of units sold Q is given by j⫽n
Qˆ ⫽ ∑ (ˆ 0 j ⫹ ˆ1 j ln X j )
(2.30)
j⫽1
The profit, which is gross profit subtracted by marketing spending, is j⫽n
j⫽n
j⫽1
j⫽1
P ⫽ ∑ cm j (ˆ0 j ⫹ ˆ1 j ln X j ) ⫺∑ X j
(2.31)
where cmj ⫽ p – cj, and cj is the unit variable cost of goods sold through marketing communication channel j. The optimal marketing spending of marketing communication channel j results from j⫽n ⭸∑ j⫽1 (cm j ( ˆ 0 j ⫹ ˆ1 j ln X j ) ⫺ X j ) ⭸P ⫽0 ⫽ ⭸X j ⭸X j
(2.32)
The solution for Xj, the optimal marketing spending of marketing channel j, is X j ⫽ cm j ˆ1 j
(2.33)
The total marketing spending is j⫽n
X ⫽ ∑ cm j ˆ1 j j⫽1
(2.34)
31
32
Data Mining and Market Intelligence
Therefore, the fractional budget allocation to marketing communication channel j is as follows. Aj ⫽
Xj
⫽
X
cm j ˆ1 j
∑ j⫽1 cm j ˆ1 j j⫽n
(2.35)
Optimal marketing spending allocation by product Assume there are m different marketing products. Let Qk denote the contribution of the company’s marketing spending on product k to the number of units sold of product k and let Xk denote the company’s marketing spending on product k. Qk ⫽ 0 k ⫹ 1k ln(X k ) ⫹ k
(2.36)
The estimate of Qk is given by: Qˆ k ⫽ ˆ 0 j ⫹ ˆ 1k ln(X k )
(2.37)
The profit of the marketing effort – the estimated gross profit adjusted for marketing spending is: k⫽m
P⫽
∑ cmk (ˆ 0 k ⫹ ˆ1k
ln X k ) ⫺ X k
(2.38)
k⫽1
where cmk ⫽ pk – ck and pk is the unit price for product k. The optimal budget for marketing product k must satisfy the following condition k⫽m ⭸∑ k⫽1 cmk ( ˆ 0 k ⫹ ˆ 1k ln X k ) ⫺ X k ⭸P ⫽ ⫽0 ⭸Xk ⭸Xk
(2.39)
Therefore, the optimal marketing spending for product k is X k ⫽ cmk ˆ1k
(2.40)
The total marketing spending is k⫽m
X⫽
∑ cmk ˆ1k k⫽1
(2.41)
Marketing Spending Models and Optimization
The fractional budget allocation to product k is given by Ak ⫽
Xk X
cmk ˆ1k
∑ k⫽1
k⫽m
cmk ˆ 1k
(2.42)
Environmental changes and seasonality By environmental changes we mean situations where a driver of marketing return changes suddenly – or, more precisely, fast when compared with the marketing horizon – from one level to another. For example, a news report that suddenly exposes a product in a markedly more favorable or negative light will suddenly change the environment where the marketing effort is being conducted. The occurrence of such sudden change is a categorical rather than a numerical event. In the context of linear regressions, it is straightforward to incorporate these sudden changes through the introduction of a dummy numerical variable, Z, which takes on the value zero before the change happens, and takes on the value one after the change happens. Equation (2.4) is modified as follows Qt ⫽ c0 ⫹ d0 Z ⫹ c1 Xt ⫹ d1 ZXt
(2.43)
where the subscript t makes explicit the fact that observations at period t may correspond to different values of the dummy variable. The coefficients are determined through regression. This formulation allows us to use the tools of linear regression to interpret the parameters in the model and to assess their confidence intervals. This idea can be easily extended to handle multiple environmental changes. A particular case of environmental change is seasonality, where each season represents a distinct and sudden change in market conditions. Since there are four seasons, if we take one of them as a reference we need only three dummy variables to represent the changes due to the remaining seasons. Assume the reference season is indicated by the index 1, we can extend Eq. (2.43) to handle seasonality as follows. Qt ⫽ c0 ⫹ d02 Z2 ⫹ d03 Z3 ⫹ d04 Z4 ⫹ c1 Xt ⫹ d12 Z2 Xt ⫹ d13 Z3 Xt ⫹ d14 Z4 Xt (2.44) The dummy variable Zi is one within season i and zero elsewhere. The dummy variable Z1 does not appear in this expression because index 1 identifies the reference season. It is possible to extend the same idea to the case of multiple independent variables and to the nonlinear functional forms. Before leaving this section on static models, we must emphasize that although in principle we can accommodate a large number of variables
33
34
Data Mining and Market Intelligence
and changes in environmental conditions, the number of variables we can handle is determined by the statistical properties of the estimated parameters. This is typically a data issue. Unless sufficient and reliable data is available, estimation will lead to parameters affected by significant error. In such cases, the functional complexity of a model may be overwhelmed by the estimation error.
Dynamic models The objective of a dynamic model is to capture the adjustment time between dependent and independent variables. Notice that this adjustment time is a separate concept from the fact that the market parameters themselves may be changing in time. Capturing the time of adjustment between dependent and independent variables imposes relationships between the variable levels and their rates of change. This is a problem that can be formulated in terms of differential equations, or in terms of discrete values in time series. We focus on the latter, since this is established practice in marketing analytics. The dynamic adjustment between dependent and independent variables may respond to the dissemination of marketing information, to the anticipation of such information, or to a combination of both. Framing the problem of dynamic response in terms of dissemination of marketing information leads to the consideration of lags in a time series, taking into account anticipation would result in the inclusion of leads in the time series analysis. Since the most common situation in practice is the delayed impact of a marketing effort, we focus on the former.
Simple lag model To formulate the simplest lag model, we reconsider Eq. (2.4) to represent the situation where the effect of a marketing effort is felt k periods of time after the marketing effort is implemented. The following modification of Eq. (2.4) reflects this fact Qt ⫽ c0 ⫹ ck Xt⫺k
(2.45)
This means that the return Qt, which we assume to be a constant value during period t, is the result of the effort X implemented k periods earlier. This formula can be generalized to K lag periods k⫽K
Qt ⫽ c0 ⫹
∑ ck Xt⫺k
(2.46)
k⫽1
As stated, this representation may present us with challenging calibration problems. We can get around the calibration issue by imposing additional
Marketing Spending Models and Optimization
structure to the right-hand side of Eq. (2.46), by making assumptions about the coefficients, ck.
Geometrically distributed lag model In this particularly popular model, we assume that the impact of marketing spending on marketing performance decreases geometrically as the number of periods increases. We can formulate this model in terms of a parameter which we can interpret as the fraction that a current marketing effort has on marketing return in future periods. This parameter is called retention rate and its value is typically around 0.5. The geometrically distributed lag model is as follows k⫽⬁
Qt ⫽ c0 ⫹ c(1 ⫺ ) ∑ k Xt⫺k
(2.47)
k⫽0
Here, c0, c, and are parameters to be estimated. Simple manipulations of Eq. (2.47) give the following expression (Clarke 1976). Qt ⫽ (1 ⫺ )c0 ⫹ c1 Xt ⫹ Qt⫺1 ⫹ (ut ⫺ ut⫺1 )
(2.48)
where c1⫽(1⫺)c and ut is an error term added to Eq. (2.47). We get a form suitable for regression estimation by setting wt ⫽ ut⫺ut⫺1 in Eq. (2.48). In the geometrically distributed model, the short-term effect of the marketing effort is given by c1 and a fraction of the long-term impact of the marketing effort takes effect over log(1⫺)/log periods. The estimated values of parameters in discrete time will depend on the frequency of the time series. When repeated estimations are performed for comparative purposes, it is important to keep the data frequency constant between estimations. This issue is referred to in the literature under the concept of data interval bias (Leone 1995 and Clarke 1976). A number of other formulations along these lines are possible. For more details, the reader is referred to the references (Hanssens, Parsons and Schultz 2001 and Koyck 1954).
■ Marketing spending models and corporate finance Here we discuss some ideas of how marketing spending models could be developed in the context of corporate financial objectives. It is commonly
35
36
Data Mining and Market Intelligence
argued that the primary goal of a firm is to maximize shareholder value. This would suggest that the ultimate objective of a marketing plan is to maximize the equity value of the firm that undertakes the marketing plan. Before discussing the possible interactions between marketing effort and shareholder value, we must be precise about the meaning of maximizing shareholder value. We will assume for the moment that the assets of the firm can be neatly divided between debt and equity, where equity holders absorb the vast majority of the firm’s risk and are therefore expected to be rewarded with the highest returns. In reality, the capital structure of firms is much more complex than a clear partition of equity and debt, with components that share both equity and debt-like features (such as convertible bonds and preferred stock). Investors elect to invest in the equity of a particular firm because the equity of that particular firm exhibits a profile of risk versus return that investors like. Modern portfolio theory tells us that in the long term, a change in expected equity returns will go hand in hand with changes in the risk profile of the equity. The risk profile of the equity of a firm results from the combination of the market fluctuations of the shares of the firm, and the correlation or those fluctuations with rest of the market. The task of senior managers is to position the firm in such a way that its long-term equity growth is as high and stable as possible consistent with the risk profile of its industry. This suggests that shareholders will benefit from the marketing effort indirectly, to the extent that the objectives of senior managers, which are aided by the marketing effort, are consistent with the interest of the shareholders. Next we examine a proposed framework for integrating a marketing effort with shareholder objectives.
A framework for corporate performance marketing effort integration Earlier in this chapter, we discussed a way to capture environmental changes in the evolution of a time series, where the time series represents a measure of marketing performance, such as sales volume. We can extend this idea to capture the effect of the implementation of a marketing plan on the statistical properties of equity returns. In our case, the time series we observe are equity returns, and the sudden environmental change we wish to capture and quantify is the onset of the marketing plan. What precisely is the statistical property of equity returns that we aim to enhance through the marketing investment? We address this question by invoking modern portfolio theory (MPT). The theory tells us that
Marketing Spending Models and Optimization
the fair or risk-adjusted returns on an investment in a particular asset, Rasset, the return on a short term risk-free security, Rf, and the returns on the overall market, Rmarket are related by the following expression (Luenberger 1996). E[Rasset ] ⫺ R f ⫽ asset (E[Rmarket ] ⫺ R f )
(2.49)
where asset, known as the ‘beta’ of that particular asset, is defined as follows asset ⫽
cov(Rasset , Rmarket ) var Rmarket
(2.50)
In the light of MPT, we can posit that the purpose of the marketing effort is to produce asset returns above the risk-free rate that exceed, or at least maintain, the fair returns predicted by Eq. (2.49). We start out by assuming that the current long-term returns of the firm at least adjust according to Eq. (2.49). To see whether the marketing spending does indeed have the desired effect, we can conduct the analysis we described in Section 2.1, where Qt is now interpreted as the change in realized return of the firm’s equity over period t. Rt ⫺ Rt⫺1 ⫽ c0 ⫹ d0 Z ⫹ c1 Xt ⫹ d1 ZXt
(2.51)
A linear regression analysis of this representation tells us whether the marketing effort is driving returns above the risk-adjusted levels of Eq. (2.51), or toward this level in case the equity is under-performing.
■ References Clarke, D.G. Econometric measurements of the duration of advertising effect on sales. Journal of Marketing Research, Chicago, Illinois, 13: 345–357, 1976. Hanssens, D.M., L.J. Parsons, and R.L. Schultz. Market Response Models, Econometric Time Series Analysis, 2nd ed. Kluwer, Massachusetts, 2001. Koyc, L.M. Distributed Lags and Investment Analysis. Amsterdam, North-Holland, 1954. Leeflang, P.S.H., D.R. Wittink, M. Wedel, and P.A. Naert. Building Models for Marketing Decisions. Kluwer, Massachusetts, 2000. Leone, R.P. Generalizing what is known about temporal aggregation and advertising carryover. Marketing Science, Hanover, Maryland, 14(3), 1995. Luenberger, D.G. Investment Science. Oxford University Press, New York, 1996.
37
This page intentionally left blank
CHAPTER 3
Metrics Overview
This page intentionally left blank
In this chapter, we discuss the key metrics used for measuring and optimizing return on marketing investment. As we alluded to earlier, using the wrong metrics may have serious consequences on marketing returns, and may in fact drive a company out of business. Consider the following scenario. John, marketing director at Sigma Corporation, tracks 75 metrics to measure the impact of his company website in generating online sales. He has two dedicated full-time staff members compiling reports on these metrics on a daily basis. In addition, he has a full-time web developer focusing on changing website features to increase web traffic. Tom is the marketing director of Huber Sigma Corporation, the main competitor of Sigma. On average, Tom tracks one return (online sales revenue) metric. In addition, he tracks 10 operational metrics about his company website. Over the past year, for the same product category, Sigma’s online sales dropped 40%, while Huber Sigma’s doubled. What could have possibly gone wrong with Sigma? The answer is that Tom focused on few relevant key return and operational metrics, while John pursued every metric available without making any differentiation among them. In addition, it was suboptimal for John to focus so much energy on web traffic alone in light of his objective of generating online sales, since web traffic is really an operational metric, not a return success metric. This story also shows that focus on the wrong metrics not only costs business, but also drains resources unnecessarily. The following is a list of key questions and issues addressed in this chapter. 1. What are the common metrics used to measure returns and investments? 2. What is the most appropriate formula for return on investment? 3. What are the common challenges of tracking returns on investments? 4. What is the process of identifying appropriate metrics? 5. What are the stages in a sales cycle (a.k.a. sales funnel)? 6. What are the common marketing communication channels? 7. What are the key metrics in each stage of the sales cycle? 8. What is the difference between return metrics and operational metrics? How do we use both types of metrics to drive future campaign optimization? 9. What are the tips on addressing common ROI tracking challenges?
■ Common metrics for measuring returns and investments In order to properly measure marketing returns on investment, we need to identify appropriate return metrics and investment metrics. We start out with a discussion on return metrics.
42
Data Mining and Market Intelligence
Measuring returns with return metrics Essentially, a return metric is a variable that can be measured and used to quantify the desired end result of a marketing effort aimed at migrating a target audience from stage to stage in the sales cycle. If the purpose of a marketing effort is to move a target audience from being leads to becoming buyers, then return metrics measure and quantify buyers and their purchases. In this case, the number of buyers, the number of units sold, the total purchase amount, and the average purchase amount are examples of return metrics. Return metrics can be expressed in either financial or nonfinancial terms. Financial return metrics are metrics measured in dollars. Realized revenue is an example of financial return metrics. Nonfinancial return metrics are metrics that are not measured in dollars, but which nevertheless are important indicators of short-term marketing success. Examples of nonfinancial return metrics are incremental increase in awareness, number of responders to a marketing promotion, number of leads (potential buyers), and increase in customer satisfaction. In business-to-business situations, where the sales cycle is usually long, nonfinancial return metrics can be particularly important if the amount of investment in driving this type of short-term return is significant. Take as an example the advertising market. The value of the overall advertising market in 2006 in the US was $292 billion (Mediaweek 2006). Most of the advertising dollars are spent in driving the awareness or perception of a brand, product, and service rather than to generate revenue in the short run. If we only account for financial returns for advertising in the short run, then the short-term returns of advertising may be negligible. In reality, advertising can drive financial returns in the long run. Another reason for measuring short-term returns of marketing efforts is that marketing planning and execution sometimes have a shorter time frame than the sales cycle and are run and optimized ongoingly. As a result, the effectiveness of a campaign needs to be determined before potential customers make any purchases.
Measuring investment with investment metrics Investment metrics refer to investment costs, such as marketing spending. These costs can be either fixed or variable. Fixed costs occur regardless of the implementation of a particular marketing campaign. Examples of such costs include the costs of existing marketing staff members and the upkeep of office facilities that are spread across marketing and other corporate functions, such as sales, operations, and information technology. Fixed costs are not allocated to specific marketing programs. One marketing program manager might be responsible for an e-mail program for product A and a direct mail program for product B simultaneously, and it
Metrics Overview
can be a challenge to estimate how much time this individual spends on a particular marketing program. Other examples of fixed costs are building lease cost and information technology support costs. Variable costs are costs that can be accurately attributed to specific marketing programs. Such costs include media costs, agency costs, production costs, and postage for a specific marketing program. One frequently asked question is which cost category should be included when accounting for marketing investment costs. If fixed costs can be attributed to specific marketing efforts without making assumptions that will lead to significant errors in cost allocation, then the correct thing to do is to consider both fixed and variable costs in estimating marketing investment. However, in situations where fixed costs cannot be accurately attributed to specific marketing efforts, the correct thing to do is to steer away from fixed costs and consider variable costs only. If fixed and variable costs are both imputed, then they must be applied to all marketing programs. If only variable cost is used, then variable cost must be applied to all marketing efforts that the firm undertakes. Consistency is an important consideration when deciding on the application of fixed and variable costs.
■ Developing a formula for return on investment Return on investment (ROI) is defined as the following ratio. ROI ⫽
Return ⫺ Investment Investment
(3.1)
The numerator is the difference between return and investment, and the denominator is investment. Investment is therefore given by Investment ⫽ Costs of Goods Sold ⫹ Costs of Capital ⫹ Marketing I nvestment ⫹ Additional Investment (3.2) In practice, in addition to immediately realized revenue amounts, usually there are potential future revenue streams. The best way to accurately quantify this type of financial returns is to compute the sum of the net present values of these streams of revenue. This is called the lifetime value (LTV) of a campaign and should be factored in as the total return of a marketing campaign whenever possible.
43
44
Data Mining and Market Intelligence
i⫽n
LTV ⫽ NPV(sum of future revenue streams) ⫽ ∑ i⫽0
Ri (1 ⫹ di )i
(3.3)
where Ri is the revenue at the end of time period i, di is the discount rate at time period i, and n is the total number of time periods in a lifetime. Caution needs be exercised when accounting for LTV for multiple marketing campaigns. If several marketing campaigns target the same audience during the same time period and they all account for the LTV of the same audience as returns, then we have a double counting issue. In these circumstances, it is best to report the ROI at an aggregate level across the multiple campaigns by aggregating both the incremental return and the incremental marketing investment. The following formula incorporates the LTV concept into the formulation of a marketing effort profit (Leeflang, Wittink, Wedel, and Naert 2000). Note that we also need to compute the NPV of LTV for costs of goods sold (COG), cost of capital, marketing expense, and additional marketing expenses. i⫽n
Profit ⫽ ∑ i⫽0
i⫽n i⫽n i⫽n Ri Di Mi Ai ⫺∑ ⫺∑ ⫺∑ i i i ⫹ ⫹ (1 ⫹ di ) ( 1 ) ( 1 ) ( 1 ⫹ d d di )i i i i⫽0 i⫽0 i⫽0
(3.4)
where Ci is the cost of goods sold at time period i, Di is the cost of goods sold at time period i, Mi is the marketing expenses at time period i, Ai is the additional expenses at time period i, and di is the discounted rate at time period i.
■ Common ROI tracking challenges It is a well-known fact in the marketing community that ROI is crucial but often difficult to track. The following are some common reasons behind the challenge of tracking ROI. ● ● ● ● ● ● ● ●
No clearly defined business objectives and corresponding metrics Confusion over what a ‘true’ return is Confusion over true return metrics versus other variables such as operational metrics No access to prospect, customer, or sales data No system or process for integrating multiple data sources Information overflow: too much data and too little insight or intelligence Data quality issues: data is not clean and reliable Unable to quantify marketing contribution to a sale transaction
Metrics Overview
● ● ● ● ●
Unable to attribute sales to a particular channel such as offline sales due to online marketing Long sales cycle hindering proper control of environmental factors and effective tracking No cost efficiency threshold to determine when to continue or stop spending No prioritization of metrics in terms of their importance Inability to quantify marketing impact on the company bottom line: marketing often perceived as a cost center.
The list of challenges can easily be extended. On a positive note, there is growing interest and determination in the marketing community to overcome these challenges wherever they occur. Through this book, we will review means and tools that can help address some of the challenges listed above.
■ Process for identifying appropriate metrics Figure 3-1 shows as step-by-step approach for identifying ROI metrics. This approach ensures that the selected metrics are well aligned with the business objectives and are able to track returns on investments effectively.
Step 1
Step 2
Identification of the overall business objective
Understanding the impact of a marketing effort on target audience migration
Step 3 Selection of appropriate marketing communication channels
Step 4
Step 5
Identification of appropriate return metrics by stage in the sales cycle
Construction of ROI metrics with return metrics and investment cost
Figure 3-1 Step-by-step process for metrics identification.
Identification of the overall business objective A business objective is a desired outcome of a marketing effort. The following is a list of common business objectives.
45
46
Data Mining and Market Intelligence
● ● ● ● ● ● ● ● ● ●
Increasing brand or product awareness in the target audience Educating the target audience Generating interest in particular products or services Generating leads Acquiring new customers Minimizing customer attrition or increasing customer loyalty Increasing revenue from existing customers by selling them additional products (cross-sell or up-sell) Increasing profitability Increasing customer satisfaction, renewals, or referrals Increasing market share and penetration.
Correct identification of the business objectives is crucial to the selection of appropriate metrics to effectively measure the success of marketing investment. A very common mistake in marketing is the misalignment between business objectives and metrics tracked. For instance, it often happens that the number of leads is tracked in the context of a brand awareness program. Brand awareness programs are meant to increase the awareness level among the audience, not to generate leads. Leads may be generated as a by-product of a brand program, but should not be used as the sole metric to judge the success of the program.
Understanding the impact of a marketing effort on target audience migration After the business objectives have been determined, we must identify the target audience and where it is in the sales cycle. In general, there are five stages in a sales cycle (or sales funnel). They are awareness, interest and relevance, consideration, purchase, and loyalty and referral (Figure 3-2).
The awareness stage At this stage, prospects are exposed to information about companies, products, or services. This information could be a review of what they already know or completely new information. At this stage, we don’t expect prospects to immediately make purchases. However, their understanding of a company, a product, or service deepens at this stage and their likelihood of making a purchase later increases. There are different types of awareness, such as awareness of a brand, product, or service. We will use the Prozac.com website as an example, illustrated in Figure 3-3. The website provides information about depression as a disease and Prozac as a medicine. The ‘Disease Information’ section serves as a highlevel education source for depression as a disease. The ‘How PROZAC
Metrics Overview
Sales cycle
Audience
Awareness
Prospects Website visitors Inquirers Responders
Interest and relevance
Leads
Consideration
Purchase
Customers High-value customers Satisfied customers Advocates
Loyalty and referral
Figure 3-2 A five-stage sales cycle.
Home
How Can Prozac Help
Disease Information
Next Steps
Welcome to Prozac.com Prozac is the most widely prescribed antidepressant medication in history. Since its introduction in 1986, Prozac has helped over 54 million patients worldwide, including thouse suffering from depression, obsessive compulsive disorder, bulimia nervosa, and panic disorder.
Figure 3-3 Depression and Prozac awareness (Source: Prozac.com Website 2006). Can Help’ section gives an overview of Prozac and how it can help depression patients. Both sections help raise awareness about depression and about Prozac as one of the drugs for mitigating depression. These two sections by themselves may not get readers to purchase Prozac right away, however. What is expected is that once depression patients build enough
47
48
Data Mining and Market Intelligence
awareness and knowledge about the disease and the drug, they will be interested in making an inquiry at their doctor ’s offices or through other channels. Our primary objective at the awareness stage is to accomplish an incremental degree of brand perception by the target audience. Increase in awareness, usually measured through survey studies, is a common return metric at the awareness stage.
The interest and relevance stage At this stage, prospects may exhibit interest after their awareness (of a brand, product, or service) has been brought to a certain level. They may feel that the product or service is relevant to their needs and preferences and they may respond by requesting more information or filling out a survey. We review the Prozac.com example again. Some web visitors will proceed to the ‘Next Steps’ section once they build enough awareness and interest in Prozac. There are five suggested actions under ‘Next Steps’. They are ‘Asking your Doctor ’, ‘Balanced Living’, ‘Support Organizations’, ‘Caring for Others’, and ‘Request More Information’. While ‘Balanced Living’, ‘Support Organizations’, and ‘Care for Others’ are sections intended to further educate visitors, the other two sections, ‘Asking you Doctor ’ and ‘Request More Information’ require some sort of ‘action’ on a visitor ’s part. When a visitor takes an action, it means that he has passed the awareness stage and has moved on to the ‘Interest and Relevance’ stage. Any metric that quantifies interest, such as number of responders, is a suitable return metric for this stage.
The consideration stage At this stage, customers or prospects exhibit sufficient interest to consider a purchase. They are willing to engage with sales or customer service teams in a dialogue about their needs and potential purchases. In the consideration stage, the audience is willing to invest more time in interactions with marketers than in the interest and relevance stage. In the Prozac.com example, sales leads can be generated through different scenarios. The most common scenario is through a doctor ’s prescription since usually a doctor will prescribe a drug that suits a patient’s physical and mental conditions. A patient that has gone through the awareness stage and the interest and relevance stage will likely discuss the potential use of Prozac with his doctor. Any metric that quantifies consideration, such as the number of qualified leads, are appropriate return metric for this stage.
Metrics Overview
The purchase stage By the time a prospect reaches this stage, he has got a clear need and has gathered sufficient information about a certain product or service. These prospects are likely to have gone through a test trial with the product and are getting close to making an outright purchase. They are convinced that the product or service can address their needs and that they can afford it. In the Prozac example, after a patient has gone through the awareness stage, the interest and relevance stage, and the consideration stage, he is ready to make a purchase. The action of a purchase characterizes the purchase stage. At the purchase stage, we need to quantify buyers and purchases among the target audience. Common return metrics at this stage include number of buyers, number of transactions, total sales, and average sales per transaction.
The loyalty and referral stage In this stage, customers are very satisfied with a product or service and can be viewed as loyal customers. They begin to spread positive ‘word of mouth’ (WOM) and actively refer others to the product or service. The importance of WOM cannot be overemphasized. WOM is particularly effective when large transactions and large investments on the part of the purchaser are involved. Customers at this stage are the most loyal ones and should be treated with extreme care. In fact, there is a new discipline of marketing called social marketing that capitalizes on customer referrals. Examples of return metrics at this stage are customer satisfaction level, customer tenure, total historical purchase amount, the number of repeat purchases, the number of referrals, and revenue generated as a result of referrals. So far, we have introduced the process of metrics identification, we described the typical sales cycle and the target audience engaged at each stage of this sales cycle. Now we focus on marketing communication channels that are best suited for each stage of the sales cycle. To some extent, selection of marketing communication channels drives metric selection.
Selection of appropriate marketing communication channels After identifying where to migrate the audience within the sales cycle, we need to leverage the most effective marketing communication channels to accomplish the migration. This section is designed to provide an overview on how various marketing communications channels are utilized.
49
50
Data Mining and Market Intelligence
Marketing communication channels are classified as online or offline channels. Table 3-1 classifies the commonly used marketing communication channels by their online or offline nature. Table 3-1 Key marketing communication channels (offline vs. online.) Online (Internet)
Offline
Banner Search
TV Print (e.g., newspaper/FSI, magazine) Radio billboard Physical store Direct mail (e.g., catalog, postcard, letter, newsletter) Telemarketing Trade show Seminar
Online community Website E-mail (including electronic newsletter) Webinar
Another way to classify marketing communication channels is by whether they are mainly used for broad reach or targeting. Table 3-2 shows a classification of key marketing communication channels in this manner. Table 3-2 Key marketing communication channels (broad reach vs.
targeting.) Broad reach
Targeting
Banner
Direct mail (e.g., catalog, postcard, letter, newsletter) Telemarketing Trade show Seminar E-mail (including electronic newsletter) Webinar Search
Online community Website TV Print (e.g., newspaper/FSI, magazine) Radio Billboard Physical store
As a general rule of thumb, we use broad reach channels to communicate to a broad audience in the initial stages of the sales cycle, such as
Metrics Overview
awareness, and apply for targeting channels to target specific individuals in later stages. Table 3-3 shows how marketing channels are often used across the five stages in the sales cycle. Please note that some marketing communication channels may be used for both broad reach or targeting. Table 3-3 Marketing communication channels by stage in sales cycle Awareness
Interest and relevance
Consideration
Purchase
Loyalty and referral
TV Print Radio Billboard Direct mail
TV Print Radio Billboard Direct mail E-mail Online banner Search Website
TV Print Radio Billboard Direct mail
Direct mail E-mail Search Website Telemarketing
Direct mail E-mail Search Website Telemarketing
E-mail Online banner
Physical store
Physical store Community
E-mail Online banner Search Website
Search Website Telemarketing Physical store
In what follows we give a summary overview of market communication channels.
Broadcast channels TV, radio, billboards, newspaper, and magazines are often used to reach a broad audience to generate awareness. However, there are ways to target a more specific type of audience. In the case of newspapers and magazines, those who read the Wall Street Journal have a different demographic profile than those who read other papers. The Wall Street Journal caters to affluent professionals. Another example is the ‘Parent’ magazine, which appeals to an audience with children.
Online advertising This group of media caters to either a broad or a more targeted audience. For example, the home page of Yahoo.com attracts a broad audience while the various sections of the site attract more targeted audiences. Visitors to the Yahoo Finance site tend to be more interested in finance and investment, while those visiting Yahoo Travel are interested in travel. Next we provide an overview of online advertising (advertorial.org 2006 and iab.com 2008).
51
52
Data Mining and Market Intelligence
The content of online advertising can be text, standard graphs (GIF, flash), or rich media. A rich media ad is a web ad that uses advanced technologies such as a streaming video and an applet. Online ads also have many different formats in terms of style and size. Sponsored text links are one of the latest trends in online advertising. Although less flashy than rich media, text links are often perceived as content rather than advertising. The word ‘advertorial’ is a combination of two words, advertising and editorial. It refers to an advertisement written in the form of an editorial to give an appearance of objectivity. A full banner (468 ⫻ 60) is the classic format (468 pixel in width and 60 pixel in height) and is usually residing at the top of a web page. Even though newer, smaller formats are being utilized, this banner format is still delivering some of the best results. The sheer size of this format gives it the ability to attract more attention. In 2001, a group of new size formats were introduced to allow for a more flexible integration of online ads into web content. There are four common rectangular formats and one common square format. ● ● ● ● ●
Rectangle: 180 ⫻ 150 pixels Medium Rectangle: 300 ⫻ 250 pixels Large Rectangle: 336 ⫻ 280 pixels Vertical Rectangle: 240 ⫻ 400 pixels Square: 250 ⫻ 250 pixels
A leaderboard is a popular format that was originally used in sports. A leaderboard usually sits between the title area of a web page and its content. The standard size for a leaderboard is 728 ⫻ 90 pixels and can consist of text and animation. Some newer formats have been developed to utilize the extra space of a web page or to make web pages more exciting. These newer formats include skyscraper, interstitial, superstitial, floating ad, pop-up, popunder, pop-up, and rollover. A skyscraper is an economic way of using web space. Contrary to the traditional banners that use horizontal space, a skyscraper uses vertical space. The standard formats of a skyscraper are 120 ⫻ 600 pixels and 160 ⫻ 600 pixels and the latter is called a wide skyscraper. An interstitial ad is an ad that appears on a website when a visitor clicks a link to a content page on a site. The visitor will first see the interstitial ad before seeing the requested page. Interstitial ads need to be used very carefully as visitors may find this type of ads intrusive. A superstitial is an interactive (and sometimes entertaining) online ad with a flexible size. The first superstitial was designed for the Superbowl event in 2000. Superstitials can have animation, sound, and even video elements. It has the look and feel of a television ad. Like interstitial
Metrics Overview
ads, superstitial ads are activated when a visitor goes from one page to another. A floating ad is an online advertising format that is superimposed on web page that a visitor requests. It is usually triggered either when rolling over an ad or when the content page loads. It usually disappears automatically after 5–30 seconds. A pop-up is a small window that is initiated by a single or a double mouse click. The small window usually sits on one area of the web page that a visitor is viewing. A visitor can get rid of pop-up by closing the small window. A pop-up is often considered intrusive and should be used with caution. There are many pop-up blockers on the market now that block activation of pop-up ads. The pop-under ad is one of the latest innovations in online advertising. Unlike pop-ups that block part of the content on a web page, a pop-under is small window that appears under the main window of a site. In general, pop-under ads are considered less intrusive than pop-up ads. A rollover ad is an interesting format that allows marketers to maximize use of the web real estate. Graphs or messages are displayed on the same banner whenever the visitors rest the mouse for a moment over the banner, or when they click on the banner.
Search engine marketing Search engine marketing (SEM) as a marketing communication channel has gained significant traction lately. SEM has shown measurable impact on marketing in generating responders, leads, and buyers. Those who search for certain key words are actively seeking information on a particular subject matter and therefore are already in the interest and relevance stage of the sales cycle. There are two types of listings: natural listings (same as organic or editorial listings) and paid listings. Natural listings are free. Search engines such as Yahoo and Google send out ‘crawlers’ or ‘robots’ to comb various websites and pages over the Internet and record relevant web pages in an index. If the content of a web page is relevant to a particular topic, then the web page will be indexed under that topic. When someone searches for that topic, this particular web page will be displayed, along with other web pages, as the search results return. In contrast, paid listing requires advertisers to pay a fee to search engine companies. Search engine marketing allows an advertiser to promote his product or service by displaying the product or service description (usually called an ad copy) and a link as part of the search result listings. Advertisers bid on key words for their products or services to appear in prominent positions in search results. Copies of their advertising on product and services are listed based on their ranking in the bid.
53
54
Data Mining and Market Intelligence
The higher an advertiser is willing to bid on a keyword, the better his ranking is likely to be. A top ranking ensures that an ad copy will appear at the top of a paid listing section. Advertisers are charged only when someone clicks on the paid listing and they pay by the number of clicks. Advertisers do not pay if there is no click on their listings. In the example illustrated in Figures 3-4 where a search for ‘digital camera’ is submitted, the listings appearing under ‘Sponsored Links’ on the top and on the right hand are paid listings. In this example, Dell has the top one position for the key word ‘digital camera’ in the sponsored links section on the top of the web page. Listings that are not under ‘Sponsor Links’ are natural listings.
Figure 3-4 Search results for ‘digital camera’.
Corporate websites Corporate websites have become an important marketing communication channel. A well-designed and implemented company website can serve multiple purposes across the sales cycle. In the awareness stage,
Metrics Overview
a company’s website can be used to build brand awareness among its visitors and educate them on the company products and services. In the interest and relevance stage, the site can be used as a venue for visitors to register for more information and opt-in for newsletters. Onsite search functions provide visitors with additional convenience in locating the products that they are interested in and therefore moves them further into the sales cycle. In the consideration stage, web registrants and opt-in members can be furthered screened for qualified leads and potential buyers. In the purchase stage, the site can serve as an e-commerce marketplace where buyers purchase directly from the marketer. Websites with a look-up feature for account history offer additional convenience for buyers to order the same products repeatedly or to review their past purchases and such a feature can further convert buyers to repeat buyers. In the referral stage, a website can offer blogs, online communities, forums, or chat rooms to solicit feedback and stimulate WOM marketing. There is an emerging trend of turning a company website into an activity ‘hub’ that serves customers wherever they are in the sales cycle. According to a 2006 report (Webtrends 2006), 56% of the CMOs surveyed were using or planning to use, within one year, their company websites as a hub of marketing strategy for building relationships.
Direct mail Direct mail is one of the first media designed for one-to-one direct marketing. Direct mail addresses particular individuals or organizations. Direct mail format varies from postcard, letter, to catalog. Usually there is a clear call to action for direct mail recipients. For example, a business reply card (BRC) may be enclosed for a recipient to fill out and return via a business reply envelop (BRE). Sometimes, a web URL or a toll-free number is given in direct mail for a recipient to visit a website or make an inquiry call. Those who fill out a BRC, visit a website, or call a toll-free number are called responders.
E-mail Like direct e-mail, e-mail is an excellent medium for one-to-one marketing, soliciting responses and generating leads. Compared with direct mail, e-mail is less formal but can be delivered faster and more cheaply.
Newsletters Newsletters can be used in multiple stages in the sales cycle. They can be used to educate prospects and raise their awareness of a product or to generate responses, leads, or sales. Furthermore, newsletters can also be used to cross-sell or up-sell additional products to existing customers. In
55
56
Data Mining and Market Intelligence
terms of format, newsletters can be in either print or electronic form. Print newsletters addressed to specific individuals or organizations are a form of direct mail. Electronic newsletters addressed to specific individuals or organizations are usually in e-mail format.
Telemarketing Telemarketing is another medium for one-to-one marketing beyond the awareness stage. This medium is usually more expensive than direct mail or e-mail. However, telemarketing allows human interaction and intervention in the process while e-mail and direct mail cannot. As a result, telemarketing can be more effective and is widely used as a lead qualification and follow-up tool.
Physical stores Physical stores are currently where most sales transactions take place and this is particularly true of high-value and high-consideration products. Stores serve audiences across the various stages in a sales funnel and are places where prospects, potential leads, and existing customers congregate.
Tradeshows and seminars Tradeshows and seminars are two marketing media mainly used for lead generation purposes. Occasionally, marketers use tradeshows and seminars to educate prospects on complicated products or services as an awareness generation mechanism. However, the use of tradeshows and seminars for awareness generation is usually expensive.
Webinars Webinars are electronic seminars that have recently gained in popularity. Webinars are cost-effective and can serve multiple purposes in the sales cycle. They can be used to educate prospects, solicit responses, or generate leads. Webinars also allow for real-time interaction and questions and answers (Q&A) and as a result, makes the engagement process more interesting and effective.
Identification of appropriate return metrics by stage in the sales cycle There are five groups of return metrics, in alignment with the five stages of the sales cycle.
Metrics Overview
Return metrics at the awareness stage This group of return metrics measures awareness of brands, products, or services. Some of these metrics, such as number of recalls, are direct measures of awareness. Indirect (proxy) measures of awareness, such as number of impressions or reaches, are used under the assumption that they will ultimately lead to awareness buildup. Under some situations, it is difficult or expensive to directly measure the level of awareness, so indirect measures are used as an alternative to direct measures. For corporations with good name recognition, such as Cisco, Google, Microsoft, and Coca-Cola, brand equity value can be one additional measure for brand awareness. Brand equity value is the monetary value of a brand. Table 3-4 shows a list of the top twenty corporate brands. Table 3-5 shows common return metrics at the awareness stage. Table 3-4 Top twenty corporate brands in 2005 (source: Businessweek 2005) Ranking
Brand
Equity value (in million)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Coca-Cola Microsoft IBM GE Intel Nokia Disney McDonald’s Toyota Marlboro Mercedes Citi Hewlett-Packard American Express Gillette BMW Cisco Louis Vuitton Honda Samsung
$67,525 $59,941 $53,376 $46,996 $35,588 $26,452 $26,441 $26,014 $24,837 $21,189 $20,006 $19,967 $18,866 $18,559 $17,534 $17,126 $16,592 $16,077 $15,788 $14,956
Return metrics at the interest and relevance stage Genuine interest from prospects emerges when their awareness level reaches a critical point. The target audience at this stage is engaged
57
58
Data Mining and Market Intelligence
Table 3-5 Common return metrics at the awareness stage Direct return metrics at awareness stage
Proxy return metrics at awareness stage
Number of recalls Number of media mentions Awareness increase measured by survey
Number of impressions Number of reaches Brand value/equity
and responsive to marketing solicitations, searching for a specific topic, requesting information, or responding to marketing campaigns. Return metrics at the interest and relevance stage measure the extent to which the audience is engaged or responsive to marketing stimulation. Here is a list of common return metrics at the interest and relevance stage. ● ● ● ● ● ● ●
Number of website visitors Number of unique website visitors Number of new website visitors Number of repeat website visitors Number of website page views Number of clicks on a particular link Number of responses to website offers.
In defining these metrics, care must be taken that the quantities used properly reflect the dimensions involved in the marketing program. For instance, the number of website visitors may refer to a particular location or a specific period of time. The next example shows the use of website visitor data to measure return of a website at the interest and relevance stage. Company A launched a website featuring its new product at the end of Sep. 2005. The main purpose of the site was to increase interest in the new products. The company tracked the number of unique visitors as the return metric with a web analytic tool. Table 3-6 shows a steady growth in the number of unique visitors to the site.
Return metrics at the consideration stage This group of metrics measures the effectiveness of marketing programs in generating leads. Leads are defined as those with have sufficient awareness and interest in a particular product or service to contemplate making a purchase. The number of leads is a common return metric at this stage.
Metrics Overview
Table 3-6 Unique visitors to company A’s website Number of unique visitors in October 2005
Number of unique visitors in November 2005
Number of unique visitors in December 2005
85,006
110,193
134,500
Return metrics at the purchase stage This group of metrics includes metrics such as the number of transactions, sales revenue, and average purchase amount per transaction (a.k.a. AOV, average order value). Here are some common return metrics at the purchase stage. ● ● ● ●
Number of transactions Number of buyers Total revenue Average value per transaction.
The next example illustrates the return metrics at the purchase stage. A cell phone company A launched an e-mail program to target 50,000 existing customers to persuade them to renew their phone service plans at an annual subscription fee of $600. The e-mails had a link to a website where customers could renew their subscriptions online. Five hundred of those targeted by the e-mail renewed their phone plans online, resulting in direct sales revenue of $300,000. The investment cost of the program was $40,000. The net profit was $260,000. Company A achieved $300,000 direct sales return with an investment of $40,000. We summarize all the above statistics in Table 3-7. Table 3-7 Return metrics of company A’s email program Return metrics at purchase stage
Value
Number of transactions Number of buyers Total revenue Average value per transaction
500 500 $300,000 $600
Company A was not able to capture offline sales as a result of the e-mail program. Offline sales occurred in situations where customers received the e-mail but decided to call to renew instead of doing so online.
59
60
Data Mining and Market Intelligence
Return metrics at the loyalty and referral stage This group of metrics measures the depth of the relationship between a marketer and its customers. Some examples of this type of metrics are customer retention rate, length of customer tenure, the number of purchases per year, and customer lifetime value. Loyal customers can refer people to the brands that they are familiar with and as a result, generate potential future business for those brands. Given the popularity of blogs and other types of online communities, loyalty and referral metrics are expected to exert increasing influence on purchase behavior. Here is a list of common return metrics at the loyalty and referral stage. ● ● ● ● ● ● ●
Life time value (LTV) Purchase frequency Tenure (length of time since becoming a customer) Number of referrals Revenue due to referrals Customer testimonials Customer satisfaction.
There are other metrics, called proxy metrics, that measure loyalty or referral indirectly. Customer satisfaction is an example of a proxy metric used under the assumption that satisfied customers are more loyal. Customer satisfaction is a very important metric in major corporations, and is often used as one of the criteria for measuring marketing executives’ performance and compensation. The most common method for acquiring customer satisfaction data is by survey analysis. Next we discuss an example illustrating customer satisfaction survey analysis. Company A, a consumer electronics manufacturer, recently revamped its website to enhance visitor experience. New features of the site include a store locator page with information on the nearest store locations and store phone numbers, a shopping cart for online direct purchases, and pages of promotion offers. The company ran an online survey prior and after the new site launch to gauge how customer satisfaction may have changed due to the new site features. Satisfaction was measured from a rating of one, not satisfied at all, to nine, extremely satisfied. The survey result shows that the average visitor satisfaction increased from 5.2 to 6.4, a 23% improvement with an 80% statistical confidence level. One frequently asked question is how one can tie customer satisfaction improvement to incremental revenue. Surveys are usually conducted in an anonymous fashion and their results cannot be easily mapped to the revenue-generating customer base. We need to measure custom satisfaction and revenue consistently in the same audience and apply data
Metrics Overview
mining analysis to determine the level of correlation between customer satisfaction and revenue.
■ Differentiating return metrics from operational metrics So far, we have identified an appropriate ROI formula and the key return metrics across the five stages in the sales cycle. However, it is still a common challenge to distill the enormous amount of available marketing data. This challenge arises primarily from the difficulty in differentiating return metrics from operational metrics. The key difference between these two types of metrics is that the former indicates an end result while the latter focuses on a process. Operational metrics track the footprints of an audience as they migrate from one stage to the next or within the same stage in a sales cycle. The majority of the metrics are in fact operational metrics. To better illustrate the difference between return and operational metrics, we will go through some exercises to identify appropriate return and operation metrics in the sales cycle. If the desired end result is to move the audience from the awareness stage to the interest and relevance stage and the marketing communication channel is an online banner for generating clicks on ads, then the return metric should be the number of clicks. The operational metrics are those metrics that measure how effectively impressions turn into clicks and the click-through rate is an example of operational metrics. It is a common mistake to treat the click-through rate as the return metric. If the desired end result is to move the audience from the interest and relevance stage to the consideration stage, and the marketing communication channel is direct mail for generating leads, then the appropriate return metric is the number of leads. The operational metrics are those that measure how effectively responses turn into leads, such as the response to lead conversion rate. If the desired end result is to move the audience from the consideration stage to the purchase stage and the marketing communication channel is outbound phone followup for generating sales, then the appropriate return metrics are the number of buyers, revenue amount, and profit. The operational metrics are those metrics that measure how effectively leads turn into buyers, such as the lead to buyer conversion rate. A best practice on campaign reporting is to clearly show the distinction between return metrics and operational metrics. In a campaign performance report, return metrics need to be placed in more prominent positions than operational metrics.
61
62
Data Mining and Market Intelligence
■ References Leeflang, P.S.H., D.R. Wittink, M. Wedel, and P.A. Naert. Building Models for Marketing Decisions. Kluver, Massachusetts, 2000. Marketer ’s Guide to Media. Mediaweek, New York, 2006. The advertorial.org website. Montreal, Canada (http://www.advertoroal.org). Webtrends CMO Web-Smart Report. Webtrends, Portland, OR, 2006. The Interactive Advertising Bureau website (http://www.iab.net), New York, 2008.
CHAPTER 4
Multichannel Campaign Performance Reporting and Optimization
This page intentionally left blank
In this chapter, we focus on the tracking and analysis of marketing returns from multi-channel campaigns. Marketers often use more than one single communication channel in a campaign due to the fact that different communication channels tend to appeal to different segments within a target audience. In the high-tech business-to-business market place, for instance, marketers often use direct mail to target business decision-makers (BDM) and online channels to target technical decision-makers (TDM).
■ Multi-channel campaign performance reporting It is a constant challenge to report campaign performance on multiple communication channels. The challenge is augmented by lack of clarity about the different roles that different metrics play. Return metrics can be the same across different channels but operational metrics are often channel-specific. Figure 4-1 illustrates a systematic thought process for identifying appropriate metrics for multi-channel reporting.
Step 1
Step 2
Step 3
Step 4
Step 5
Identify all marketing communication channels and their associated cost and target volume
Identify the overall return or success metrics. Aggregate and track these metrics across channels
Select marketing channel specific return or success metrics
Identify operational metrics by marketing channel
Uncover operational metrics highly associated with channel return
Figure 4-1 Metrics identification process for multi-channel campaign reporting. The first step is to identify all the marketing communication channels in the campaign and to track their associated cost and target volume. Consider the following example: Company A launched a multi-channel campaign with the objective of generating leads to move its target audience to the consideration stage of the sales cycle. The campaign consisted of four channels: direct mail, e-mail, online banners on external web sites, and paid search. The cost and the target volume of the four channels are detailed in Table 4-1.
66
Data Mining and Market Intelligence
Table 4-1 Cost and target volume of a multi-channel campaign of
company A Market communication channel
Total marketing cost (agency and media costs)
Target volume
Direct mail E-mail Online banners Paid search Total
$500,000 $300,000 $250,000 $750,000 $1,800,000
1,000,000 pieces mailed 400,000 e-mails delivered 125,000,000 impressions 500,000 clicks –
The second step is to identify the overall return or success metrics of the campaign, to aggregate these metrics across all marketing communication channels, and to track them. In the example of company A, the overall return or success metrics is the number of leads generated by the campaign. However, some channels might have purposes beyond driving the number of leads. For instance, online banners are used to raise awareness as well as to generate leads. We can roll up the number of the leads from online banners and other channels to derive the total number of leads for the campaign, but it is important to remember the additional purpose of each channel. At this stage, it is important to calculate the overall returns and returns by channel. Channels with a higher returns should be invested in more heavily. When the returns is measured in nonfinancial terms such as number of leads, then the cost per lead is the metric to optimize. Table 4-2 shows the key return or success metrics for the company A campaign. Table 4-2 The rollup of return metrics Marketing communication channel
Total marketing cost (agency and media costs)
Leads
Cost per lead
Direct mail E-mail Online banner Paid search Total
$500,000 $300,000 $250,000 $750,000 $1,800,000
5000 9000 1250 10,000 25,250
$100 $33 $200 $75 $71
The third step in selecting appropriate metrics is to identify the channelspecific return or success metrics. These channel-specific return or success
Campaign Performance Reporting and Optimization
metrics can be viewed as mini goals or intermediate return or success metrics. In the example of company A, the ultimate return or success metric of the campaign is the number of leads. Within the online banner channel, the number of responses and the number of clicks can be considered as intermediate return or success metrics for this particular channel. The fourth step is to identify all potential operational metrics by marketing communication channel. Operational metrics are usually channel-specific and may not be rolled up across channels. For example, click-through rate is an operational metric that only applies to online channels and cannot be applied to direct mail. Since there may be hundreds of operational metrics, it is important to take the additional step of identifying those with the highest impact on the returns of the channel. The fifth and last step in the identification process is to uncover operational metrics highly associated with channel returns. This is where data mining is extremely useful and should be fully leveraged. Appropriate adjustments in the values of operational metrics within each channel can maximize the overall returns of the campaign. For example, in the case of online banners, the number of impressions, click-through rate, and response rate are potential operational metrics. By improving any of these three operational metrics, company A can increase the number of leads generated by its online banners and therefore increase the number of leads generated by the overall campaign. Table 4-3 through Table 4-6 show the common operational metrics for direct mail, e-mail, online banner, and search marketing from the awareness stage to the conversion stage of the sales cycle. Table 4-3 Common operational metrics for direct mail from the awareness stage to conversion stage Marketing communication channel
From awareness to interest and relevance
From interest and relevance to conversion
Direct mail
Response rate(responses/ mail quantity)
Lead conversion rate (leads/responses)
■ Multi-channel campaign performance optimization The purpose of campaign performance optimization is to maximize the cost-effectiveness level of a return metric by leveraging what can be
67
68
Data Mining and Market Intelligence
Table 4-4 Common operational metrics for e-mail from the awareness stage to conversion stage Marketing communication channel
From awareness to interest and relevance
From interest and relevance to conversion
E-mail
Open rate(e-mails opened/ e-mails delivered)
Lead conversion rate(leads/responses)
Click-through rate(clicks on links in e-mails/e-mails delivered) Response rate(responses to an offer/e-mails delivered)
Table 4-5 Common operational metrics for online banner from the
awareness stage to conversion stage Marketing communication channel
From awareness to interest and relevance
From interest and relevance to conversion
Online banner
Click-through rate(clicks on banner/ impressions)
Lead conversion rate(leads/responses)
Response rate (responses to an offer/ clicks on banner)
learned from the past. Campaign optimization can be viewed as an extension of campaign reporting. Additional steps to optimize future campaigns need to be taken after the five-step campaign performance reporting process introduced in the previous section is completed, as shown in Figure 4-2. The first step is to identify the metrics to optimize. Optimization metrics should be aligned with the overall campaign return or success metrics. The rationale is apparent: performance optimization aims to cost effectively increase the value of the return or success metrics. In the example of company A, the overall return or success metric is the number of leads. Company A can keep spending money to generate more leads but at some point it will no longer be cost-effective to do so given the potential
Campaign Performance Reporting and Optimization
Table 4-6 Common operational metrics for paid search from the awareness stage to conversion stage Marketing communication channel
From awareness to interest and relevance
From interest and relevance to conversion
Paid search
Click-through rate(Clicks on Webpage links/impressions)
Lead conversion rate(leads/responses)
Response rate(Responses to an offer/clicks on Webpage links)
Step 1
Step 2
Step 3
Step 4
Step 5
Identify metrics for optimization
Determine optimization timeframe, frequency, and tool
Identify key operational metrics with highest impact on the optimization metrics
Identify factors that influence key operational metrics values
Apply learning to future campaign planning and execution
Figure 4-2 Campaign optimization process. diminishing returns. The solution is to maximize the number of leads with a cost per lead threshold in place. How can company A derive this cost efficiency threshold? One way is to derive the expected average profit per lead and make sure that the cost will never run above the expected profit. The second step is to determine the optimization time frame and the tool required for optimization. How frequently an optimization strategy is revisited depends on the marketing communication channel utilized. Sufficient time needs to pass before any conclusions about the strategy are drawn. The time frame and frequency of optimization should be close to the time frame required for a marketing communication channel to achieve its full result. For example, results on metrics such as clicks, click-through rate, and responses for banner, search, and e-mail can usually be measured close to real time with appropriate online analytic tools. In this case, campaign performance can be tracked in real time and
69
70
Data Mining and Market Intelligence
optimized accordingly. Marketing dollars can be shifted from underperforming media sites to over-performing ones. In contrast, direct mail requires a much longer time period (one month or more) to generate final response results. In recent years, there has been significant advancement in analytic and optimization tools, including management and optimization tools for the web, ad serving, search, campaign management, lead and sales tracking, as well as customer relationship management (CRM). Before implementing full-scale deployment of any analytic and optimization tools, it is important to test the tool through a business proof of concept pilot with the tool vendors. The third step in campaign optimization is to analyze the data so far collected, and to identify key operational metrics based on their impact on the optimization metrics. In the example of company A, it is clear that the company needs to optimize the number of leads acquired cost effectively. To accomplish this, we need to identify the most important contributors to the number of leads. Response to lead conversion and number of responses are examples of these contributors. An increase in response to lead conversion rate at a given cost and a given number of responses will result in an increase in the number of leads. Alternatively, an increase in the number of responses at a given cost and a given response to lead conversion rate can result in an increase in the number of leads. Discovering these influential factors is crucial for optimizing future campaign performance. There are occasions when it is not obvious which factors are influential. When this is the case, we can use data mining techniques to uncover hidden relationships. Data mining techniques such as Classification and Regression Tree (CART) can be used to analyze the relationship between potential influential factors and the number of leads acquired at a given cost. Logistic regression is another data mining technique that can be used to build models to target those who are more likely to convert to leads. Chapter 7 discusses various data mining techniques that can be leveraged to uncover hidden relationships. The fourth step in campaign optimization is to identify attributes that can be manipulated to influence the values of key operational metrics. In the previous example of an online banner, marketing messaging is an attribute that may be changed to drive responses to lead conversion rate and hence the number of leads. Where can we find potential influential attributes? Data pertaining to any of the following areas can be candidates for such factors and the following is a list of such factors: ● ●
Target-audience characteristics such as lifestyle and social economic status Stages in the sales cycle such as awareness stage and conversion stage
Campaign Performance Reporting and Optimization
● ● ●
Attributes of marketing communication such as creative and messaging Marketing and sales operations such as customer services and fulfillment Features of marketing campaigns such as rebates and discounts.
In the fifth step, the learning from the current campaign should be applied to future campaigns to optimize marketing planning and execution. Ongoing tests and learning environments can lead to optimal marketing efforts and to sustained high returns on marketing investment.
Uncovering revenue-driving factors Revenue is often the return or success metric that most marketing executives choose to focus on. Given the importance of revenue, we will go over some common practices on how to best identify revenue-driving factors. The key to understanding revenue-driving factors is to understand the target audience and where they are in the sales cycle. Customer segmentation is the tool for understanding the target audience. There are numerous ways of segmenting and profiling a customer base. The following are some common practices for uncovering revenue opportunities by segmentation. ●
●
Segmentation of existing customers by value: The value of a customer is defined as the revenue generated by the customer over a period of time. Potential future revenue opportunities can be better identified as a result of this type of value segmentation. Marketing dollars need to be allocated to those customer segments with high growth potential to maximize revenue. In addition, cross-sell and up-sell can be leveraged to increase revenue. In Chapter 7, we will discuss a common cross-sell and up-sell analytic technique called association analysis (market basket analysis). Segmentation of the target audience by share of wallet: Share of wallet is defined as the total spending on a brand over the total spending on the category that the brand is under. Consider the supermarket business as an example. Customers who spend most of their grocery dollars with a particular supermarket brand (let us call it supermarket A) are the primary customers of this supermarket brand. In other words, supermarket A owns a large share of wallet of these customers. Those customers who spend most of their grocery dollars at competitors’ stores but also shop at this supermarket brand are its secondary customers with a small share of wallet. Supermarket A can increase its revenue by either increasing the share of wallet of its secondary customers, or increasing the purchase amounts of its primary customers.
71
72
Data Mining and Market Intelligence
●
●
Segmentation of the target audience by likelihood to buy: We need to know where the various audiences are in the sales cycle. Do they barely know the brand and products? Do they know the products so well that they are ready to make purchases? Based on the insight gleaned from segmentation, different types of marketing programs can be created to target different subsegments in the audiences. For example, awareness programs are used to educate those at the awareness stage. Lead generation programs are used to target those who are ready to purchase. Data mining techniques such as logistic regression can be leveraged for targeting marketing. These techniques will be discussed in detail in Chapter 7. Segmentation of the target audience by needs: Marketing the right products to the right customers increases returns of marketing programs. Progams targeting specific audiences with specific products are more effective than generic programs.
In summary, success in multiple-channel marketing campaigns requires consistent focus on the appropriate metrics and analysis of the interrelationships between these metrics. Reporting and optimization, two seemingly tactical areas of marketing, can often drive important marketing investment strategies.
CHAPTER 5
Understanding the Market through Marketing Research
This page intentionally left blank
Marketing research is a powerful vehicle for uncovering and assessing market opportunities. In particular, it is an effective tool for addressing the following three sets of questions to ensure effective marketing investment planning. ● ● ●
Where is the market opportunity? What is the size and growth rate of the opportunity? Who is the target audience? What are their profiles and characteristics? Why do consumers or businesses choose one product over another? Why do they choose one brand over another?
In this chapter, we give an overview of marketing research and its applications to enhancing marketing returns. We start out with a synopsis on the application of marketing research to understanding the market and then will discuss marketing research as a discipline.
■ Market opportunities Understanding potential market opportunities is the first step in marketing investment planning. Solid knowledge of the market structure and market opportunities minimizes risk and increases returns on marketing investment. A market opportunity can be described by the following parameters. ● ● ●
Market size Market growth Market share.
Market size One way to describe and quantify a market opportunity is through market size information. Market size information can be segmented by attributes such as geography or industry. Syndicated research companies, such as IDC, Gartner, Forrester, Hitwise, Nielsen Media Research, Jupiter Research, and comScore, provide market size information for standard products. For nonstandard products or new products, customized research is required for gathering market size information. Customized research is usually more expensive than syndicated research. The following is an example of market size syndicated research and data. In the last few years, search marketing has grown rapidly as a marketing and advertising vehicle. Online search marketers are always interested in their key word rankings (positions) with key search engines. Based on a comScore report (comScore Networks, 2006), the total number
76
Data Mining and Market Intelligence
of searches in the US grew from 4.95 billion in January 2005 to 5.48 billion in January 2006 at a growth rate of 10.7%. Table 5-1 shows that as of January 2006, Google had the largest market share (41.4%), followed by Yahoo (28.7%), MSN (13.7%), Time Warner Network (9.6%), and Ask Jeeves (5.6%). This market share information is important for determining where to invest to maximize exposure to potential customers. The search engines with the largest search market size are usually the most attractive marketing and advertising partners for search marketers.
Table 5-1 Total Internet searches and share of online searches by search
engine (Source: comScore 2006)
Total Internet searches Share of searches by engine Google sites Yahoo! sites MSN–Microsoft sites Time Warner network Ask Jeeves
Searches Jan. 2005 (billion)
Searches Jan. 2006 (billion)
4.95 Jan. 2005 (%) 35.10 31.80 16.00 9.60 5.10
5.48 Jan. 2006 (%) 41.40 28.70 13.70 7.90 5.60
Market size terminology We must differentiate between total available market and total addressable market. Syndicated research companies often provide market size information on the total available market. Total addressable market is a subset of the total available market. Due to a variety of factors, companies may only have access to a subset of the total available market. This subset is called the addressable market. For example, a company with no infrastructure in Asia cannot sell into this geographic portion of the market. Therefore, for this particular company, the total addressable market is the total available market subtracted by the Asian market.
Factors that impact market-opportunity dynamics Many factors can impact market opportunity and its growth. Understanding these factors allows for better marketing planning and more
Understanding the Market through Marketing Research
effective buy-in from marketing executives. The most important factors are macroeconomic trends, emerging technologies, and customer needs.
Impact of macroeconomic factors on market opportunities A large number of elements within the macroeconomy affect market dynamics. We explore the most significant ones: Gross Domestic Product (GDP) growth, geopolitical factors, oil prices, exchange rates, interest rates, unemployment rates, product life cycle, and corporate profits. ●
●
GDP growth: The growth of GDP not only is an indicator of market growth but also affects confidence in the market place and can drive subsequent growth. In other words, GDP growth is both a reflection and a potential driver of future market growth. According to the Bureau of Economic Analysis (2006, http://www.bea.gov/glossary/ glossary.cfm), the GDP of a country is the market value of goods and services produced by labor and property in that country, regardless of the nationality of the labor and of those who own the property. The Gross National Product (GNP) of a country is the market value of goods and services produced by labor and property supplied by the residents of that country, regardless of where they are located. GDP replaced GNP as the primary measure of US production in 1991. GDP is a composite measure based on various types of goods and services. Since GDP is a composite of growths in the various sectors of the economy, the growth of the larger economic sectors, such as manufacturing, financial services and government spending, tend to have more influence on the overall GDP growth. Consider the so-called nonresidential equipment and software investment sectors. Figure 5-1 shows that GDP was highly correlated with the nonresidential equipment and software investment sectors from Q1, 2005 to Q1, 2006. It is also noticeable that the nonresidential equipment and software investment sectors tend to have wider swings than GDP. This is likely due to the after-shock effects of GDP data releases, suggesting that a boost in GDP itself tends to boost confidence in the market place and thereby tends to indirectly boost subsequent investment in the two sectors. Political uncertainty: Political uncertainty, like economic uncertainty, tends to trigger or hold back business investments and consumer spending. The war in Iraq and the kidnappings of foreigners in the Middle East, for instance, have made investors think twice about their involvement in rebuilding the region. The threat of terrorism (including cyber terrorism) has boosted US government spending on defense and security since 2001, as illustrated in Figure 5-2.
77
Data Mining and Market Intelligence
16
GDP
14
Nonresidential equipment and software
Growth (%)
12 10 8 6 4 2 0
2005:Q1 2005:Q2 2005:Q3 2005:Q4 2006:Q1 2006:Q1 Quarter
Figure 5-1 GDP versus nonresidential equipment and software investment. Source: The Bureau of Economic Analysis, 2007.
12.0 10.0
GDP Federal national defense investment
8.0 Growth (%)
78
6.0 4.0 2.0 0.0 2001
2002 Year
Figure 5-2 US GDP versus federal national defense investment. Source: The Bureau of Economic Analysis, 2007.
2003
Understanding the Market through Marketing Research
●
●
● ●
●
●
Oil prices: Oil prices usually fluctuate with the political climate in oilexporting areas such as the Middle East, Latin America, and Africa. The Organization of Petroleum Exporting Countries (OPEC) often adjusts its oil production level based on geopolitical factors. When oil prices go up, costs of production for goods go up and investment is scaled back. Exchange rate: The exchange rate has an effect on imports and exports, which in turn affect GDP growth. A weaker currency benefits exports and GDP if all the other factors are kept constant. A stronger currency inhibits exports and can result in increased inflationary pressures. Interest rate: Higher interest rates usually have a deterring effect on capital and consumer spending as borrowing costs increase. Unemployment rate: The employment rate is a reflection of corporate spending and hiring. An increase in unemployment is an indication of weak confidence in the market place and a decrement in the rate of business expansion. Product life cycle: Product life cycle is another factor that influences markets opportunities. When a product is approaching its end of lifetime, the market tends not to invest in this product and as a result its market size shrinks. Customers tend to avoid investing in a product about to become obsolete, and often prefer to wait for the next generation of the product, whose market size may then grow over time. Corporate profits: Increase in corporate profits usually has a positive impact on corporate spending in the long run, if not in the short run. Corporations are ready to spend more when executives feel comfortable with the state of business in their firms. Investment banks and research firms regularly survey CEOs, CFOs, and CIOs to gauge their feelings about the economic climate.
All of the factors discussed above have either positive or negative impacts on market growth. Therefore, paying close attention to these factors is extremely important. A number of marketing research companies designate analysts and experts to analyze these factors on a regular basis and compile market size forecast based on these factors.
Impact of emerging technologies and customer needs on the market Technology breakthroughs often have an impact on market growth, although the growth may be initially small as investors wait for full-scale adoption of the new technology. It is important to track emerging technologies or products that may replace existing technologies or products (and eventually eliminate current markets), supplement an existing technology or products (and thereby impact a current market either positively or negatively), or create
79
80
Data Mining and Market Intelligence
completely new markets. For example, adoption of radio frequency identification (RFID) in the retail market has driven a demand for this new technology. At one point, Wal-Mart, the largest US retailer, requested some of its suppliers to become RFID-compliant by 2005. This created a sizeable market for RFID products and services. The creation of the credit card market, which is now a large financial market, is the result of customers’ demand for convenience. Diner ’s Club introduced the first credit card in the 1950s. American Express and Bank of America started issuing their cards in 1958. Over the years, the credit card became indispensable to most consumers and businesses, and as a result, a new and large financial market emerged. The size of credit card receivables in 2001 was over $600 billion in the US.
Market growth trends Industry or technology analysts often express market growth for the following years (usually a total of five years) with a metric called compound annual growth rate (CAGR). The standard formula for computing CAGR is as follows: ⎛X CAGR ⫽ ⎜⎜⎜ e ⎜⎝ X
1
⎞⎟t⫺1 ⎟⎟ ⫺ 1 ⎟ b⎠
(5.1)
where Xe is the market size forecast for time period t, Xb the market size forecast for time period 1, and t the number of years in the forecast time period.
Market share Market share data indicates how well a company is positioned in a particular market. Those participants with the highest market shares are market leaders, and those with lowest market shares are market laggards. Low market share indicates an opportunity for growth. A firm with a large market share will find it harder to grow further and may seek or create another market. Market share can be expressed in terms of units sold or in dollar amount. The market share of a company during a given time period measured in dollar amount is Revenues of the company Revenues of the company ⫹ Total revenu es of its competitors
(5.2)
Understanding the Market through Marketing Research
Both the denominator and the numerator are in dollar amounts, and therefore market share is a dimensionless quantity. The market share of a company during a given time period measured in number of product units sold is Units sold by the company (5.3) Units sold by the company ⫹ Total u nits sold by its competitors As before, the market share in this case is a dimensionless ratio. We can also compute the market share of a single product category by using only the revenue or units sold in that category. For example, for a consumer electronics manufacturer, its share in the camera category for a given time period can be computed by either of the two following expressions: Camera revenues of the company ny ⫹ Total camera revenues Camera revenues of the compan of its competitors
(5.4)
Units of cameras the company sold Units of cameras the compaa ny sold ⫹ Total units of cameras sold by its competitors
(5.5)
■ Basis for market segmentation The ultimate goal of market segmentation is to create homogeneous segments where constituencies within each segment react uniformly to marketing stimuli. Market segmentation enables formulation of optimal marketing targeting strategies for each segment. The bases of segmentation for a particular product are market size, market growth rate, and market share. The first step in segmentation analysis is to identify the product of interest. In case of consumer banking industry, for instance, products and services can be broken down in segments such as checking, savings, credit card, line of credit, home equity, home mortgage, insurance, and brokerage. A bank may examine the market size of each product it offers or plans to offer and choose to focus on those products or services that have the largest market size, the highest growth rates, or the lowest market shares. We next consider a hypothetical case study on segmentation. For Company W, the total market size of product A, B, C, and D in 2007
81
Data Mining and Market Intelligence
was $927m. In this case, market segmentation results in four distinct segments, illustrated in Figure 5-3.
Low
High
Market size
Low
Growth Segment 2
Segment 1 Product: D
Product: C
Market size : $5 million
Market size : $132 million
Annual growth: 2%
Annual growth: 10%
Priority: Low
Priority: High
Segment 3
High
82
Segment 4
Product: B
Product: A
Market size : $525 million
Market size : $265 million
Annual growth: 5%
Annual growth: 15%
Priority: High
Priority: High
Figure 5-3 Market segmentation by market size and growth. ● ● ● ●
Segment one: small size and low growth Segment two: small size and high growth Segment three: large size and low growth Segment four: medium size and high growth.
Segments three and four represent the most attractive opportunities, followed by segment two. Segment one represents the least attractive opportunity.
Market segmentation by market size, market growth, and market share: case study one So far, we have discussed market size and market growth; we now revisit the last hypothetical case study by adding the market-share consideration (Figure 5-4). ● ●
Segment one: small size, low growth, and medium market share Segment two: small size, high growth, and low market share
Understanding the Market through Marketing Research
Low
High
Low
Growth
Market size
Product: D
Product: C
Market size : $5 million
Market size: $132 million
Annual growth: 2%
Annual growth: 10%
Market share: 50%
Market share: 30%
Incremental opportunity: $2.5 million
Incremental opportunity: $92.4 million
Priority: Fourth
Priority: First Segment 3
High
Segment 2
Segment 1
Product: B Market size: $525 million Annual growth: 5% Market share: 90% Incremental opportunity: $52.5 million Priority: Third
Segment 4 Product: A Market size: $265 million Annual growth: 15% Market share: 70% Incremental opportunity: $79.4 million Priority: Second
Figure 5-4 Market segmentation by market size, growth, and share.
● ●
Segment three: large size, low growth, and medium market share Segment four: medium size, high growth, and high market share.
Market share information provides additional insight on where true market opportunities lie. Although the total market size for segment four is $265m, the incremental market opportunity for Company W is only $79.5m. After consideration of market share data in the four segments, we conclude that segment two is the most attractive segment, followed by segments four, three, and one. We recommend the following three-step process to incorporate market opportunity information into marketing planning. ●
●
Identification of the market size and its geographic and product breakdown: Table 5-2 is a template for compiling the relevant information. Product or geography with the largest market size is often the main revenue source for a company. Identification of high growth market opportunities: Targeting high growth opportunities enables revenue generation and long-term
83
84
Data Mining and Market Intelligence
Table 5-2 Template for market size by product and geography Product A ($)
Product B ($)
Product C ($)
Product D ($)
Total ($)
Region 1 Region 2 Region 3 Region 4 All regions
competitiveness. Table 5-3 is a template for documenting market growth information. Table 5-3 Template for annual market growth rate by product and
geography Product A (%)
Product B (%)
Product C (%)
Product D (%)
Total (%)
Region 1 Region 2 Region 3 Region 4 All regions
●
Identification of the market share: Market share information provides insight on the actual room for growth. Table 5-4 shows a layout for documenting the market share information.
Table 5-4 Template for market share by product and geography Product A (%) Region 1 Region 2 Region 3 Region 4 All regions
Product B (%)
Product C (%)
Product D (%)
Total (%)
Understanding the Market through Marketing Research
Using market research and data mining for building a marketing plan It is common practice for firms to set up their revenue goals using market data as the benchmark. A particular company may set a revenue growth goal of 15% just to outperform the anticipated market growth of 10% and to gain market share. It is also a very common practice for companies to apply an arbitrary percentage (usually a single digit) to their revenue goal to assess their marketing budgets. This is particularly prevalent in the high-tech industry. For instance, a high-tech company may expect to generate $10b in revenue and plans to allocate 5% of the expected revenue, or $500m, as its marketing budget. A more information-driven approach is to apply the marketing spending modeling techniques discussed in Chapter 2 to analyze historical sales and marketing spending data to produce an optimal marketing budget and allocation.
Marketing planning based on market segmentation and overall company goal: case study two Based on the previous market segmentation case study illustrated in Figure 5-4, the most attractive opportunities are segments two, three, and four. Company X is one of the companies competing in these segments. A 2007 marketing plan for Company X will be created based on the market segmentation information. The first step in creating the marketing plan is to populate the template in Table 5-5 (template one) with the market data of 2006 and 2007. Then, the incremental market size growth from 2006 to 2007 and the percent contribution to the overall market size growth is populated for each segment. The total market is expected to grow 8.5%, or $72m, from 2006 to 2007. Out of the $72m, $35m (49% of total growth), $25m (35% of total growth), and $12m (16% of total growth) are from products A, B, and C respectively. The second step is to incorporate the actual revenue and market share of 2006 into template two, as shown in Table 5-6. The third step is to incorporate revenue and market share goals into template two in Table 5-6. To determine a realistic revenue growth goal for each of the three segments, we need to evaluate the historical growth rate of each product. Table 5-7 shows the 2007 revenue and market share goals for Company X, based on its historical marketing spending and revenue data. The company is expected to grow faster than the market in the products B
85
86
Segment
Product
2006 market size (million)
2007 market size (million)
Growth (%)
Incremental market growth (million)
% of incremental revenue (%)
4 3 2 2,3,4
A B C Total
230 500 120 850
265 525 132 922
15 5 10 8.5
35 25 12 72
49 35 16 100
Company X incremental revenue 2007
Company X return on investment 2007
Table 5-6 Template two – incorporation of actual company revenue and market share information from 2006 Segment
Product
2006 Company X revenue (million)
4 3 2
A B C Total
60 120 80 260
2007 Company X incremental revenue goal (million)
Percent revenue increase (%)
2007 Company X revenue
Company X 2006 market share (%)
60/230⫽26 120/500⫽24 80/120⫽67 260/850⫽31
Company X 2007 market share goal
Data Mining and Market Intelligence
Table 5-5 Template one – identification of market size of 2006 and 2007, and growth from 2006 to 2007
Table 5-7 Template three – incorporation of company revenue and market share goals by product Product
2006 Company X revenue (million)
2007 Company X incremental revenue goal (million)
Percent revenue increase (%)
2007 Company X revenue goal (million)
Company X 2006 market share (%)
Company X 2007 market share goal (%)
4 3 2
A B C Total
60 120 80 260
9 12 10.2 31.2
15 10 13 12
69 132 90.2 291.2
60/230⫽26 120/500⫽24 80/120⫽67 260/850⫽31
26.0 25.1 68.3 31.6
Table 5-8 Year 2007 budget allocation based on historical data and modeling Segment
Product
2006 Company X revenue (million)
2007 Company X incremental revenue goal (million)
Percent revenue increase (%)
2007 Company X revenue goal
Historical incremental revenue over budget ratio
Company X 2007 budget (million)
Company X 2006 market share (%)
Company X 2007 market share goal (%)
4 3 2
A B C Total
60 120 80 260
9 12 10.2 31.2
15 10 13 12
69 132 90.2 291.2
1.29 1.33 1.70 1.42
7.0 9.0 6.0 22.0
26 24 67 31
26 25 68 31.6
Understanding the Market through Marketing Research
Segment
87
88
Data Mining and Market Intelligence
and C categories and grow at the same pace as the market in the product A category. Achieving the revenue and growth goal of ABC will lead to a market share increase from 31 to 31.6%, a 0.6% increase. As illustrated in this example, a drastic increase in market share is hard to achieve. An increase of $31.2 m in revenue from products A, B, and C only results in 0.6% market share gain. The fourth step is to use the results from the third step to populate template three with budget information, as illustrated in Table 5-8. The total budget is $22 m. This is exactly where we can tie together market opportunity information, marketing budget, and overall company revenue. Segments four and three have low market shares. There is significant competition in these two segments, and therefore they require significant investment to gain new customers and market share. This competitive situation is reflected in the lower historical revenue to budget ratios. Segment two has a high market share and less competition. Market share gains and losses are highly correlated with competitive pressure, as will be discussed later in this chapter.
■ Target-audience segmentation The target audience of a segment is a group of individuals, households, or businesses that possess similar characteristics and behavior. The following section gives an overview on the common attribute groups used to describe a target audience.
Target-audience attributes ●
●
Demographic or corporate attributes: These attributes describe the general characteristics of an individual, a household, or a company. Age, gender, ethnicity, marital status, education, life stage, personal income, and home ownership are examples of individual demographic attributes. Household income and household size are examples of household demographic attributes. Company size, company annual revenue, industry or Standard Industry Code (SIC), and company start year are examples of corporate attributes. Social–economic attributes: These attributes describe the social economic status of a household or an individual. They are usually constructed based on zip code level census information by data vendors or marketing research companies (e.g., Personicx, Prizm, Microvision, Cohorts, and IXI). For example, one of the attributes used by Personics is referred to as ‘established elite.’ Individuals with this attribute tend
Understanding the Market through Marketing Research
●
●
●
●
to have a higher than average disposable income and a luxurious life style. Attitudinal attributes: These attributes describe an individual’s hobbies, interests, and social, economic, or political views, such as interest in art, space and science, sports, cooking, tennis, travel, politics, economics, antique, or fitness. Purchase behavior attributes: These attributes describe where an individual, a household, or a business is in a sales cycle. Stages in the sales cycle such as awareness, interest and relevance, consideration, purchase, and loyalty and referral, are described in detail in Chapter 3. Need attributes: These attributes describe a customer ’s or a prospect’s need for acquiring or inquiring about a product. Need for pain relief and need for wireless Internet connection are examples of need attributes. Marketing medium preference attributes: These attributes describe an individual’s preference on how to be contacted, receive information, or interact with marketers. In-person visits, direct mail, print, TV, telemarketing, billboard, newspaper print ad, and magazine inserts are examples of offline medium. E-mail, online banner, search, community, podcast, and blog are instances of online medium.
Types of target-audience segmentation There are multiple ways of segmenting a target audience. Segmentation of the audience needs to be aligned with business objectives. The four most common criteria for segmentation are demographics, needs, product purchased, and value, as illustrated in Figure 5-5. ●
●
●
●
Demographics-based segmentation: This is the most common segmentation approach. It gives general descriptions of the various segments in the target audience. This type of segmentation is very useful for providing insight and ideas regarding marketing creative, offers, and messages. Need-based segmentation: This type of segmentation classifies the audience by their need and is useful for constructing relevant product or service offers to potential customers. Product purchased or installed based segmentation: This type of segmentation classifies the audience by what they have purchased or deployed at their sites. This information is useful for driving targeted cross-sell and up-sell marketing strategies and tactics. Value-based segmentation: This type of segmentation classifies the audience by their value, often derived by their total dollar amount of purchase during a period of time. This is a very practical approach as the eighty-twenty rule shows that 80% of a marketer ’s revenues often comes from the top 20% of the customers with the highest values.
89
90
Data Mining and Market Intelligence
Demographics
Syndicated research
Need
Install base or product purchase
Customized research
Value
Figure 5-5 Common segmentation types. Due to its cost effectiveness, syndicated research is a good starting point for acquiring information on audience segmentation. However, syndicated research is sometimes limited in that it cannot provide in-depth value-based segmentation, which is best derived from the company’s internal sales data. Another limitation of syndicated research is that a research firm may draw samples from and segment a population that is not fully representative of the desired target audience. We now consider a segmentation case study in the business-to-business world. Figure 5-6 shows a small business customer value segmentation for Company A. The customer segmentation is derived by a data mining technique called Classification and Regression Tree (CART), which is discussed in detail in Chapter 7. In the case study, the average purchase amount of small business customers is $5000. To accomplish the segmentation task, we use the CART technique (we discuss this approach in detail in Chapter 7), which first splits the sample by industry and identifies the professional service industry as an industry with high average purchases of $8400 while the other industries have an average of $2000. Within the professional industry, companies of size (in terms of number of employees) between 50 and 500 have average purchases of $12,000. This is the subsegment with the highest value in the whole sample. The Tree technique splits the segment in the other industries into two branches. Similar to the professional services industry, the companies with company size (in number of employees)
Understanding the Market through Marketing Research
Overall SB base Average purchase = $5000
Industry: Other average purchase = $2000
Industry: Professional services average purchase = $8400
# Employees < 50 average purchase = $1000
# Employees between 50 and 500 average purchase = $5200
# Employees < 50 average purchase = $6000
# Branch offices < 2 average purchase = $3200
# Branch offices between 2 and 5 average purchase = $4500
# Branch offices >5 average purchase = $5900
# Employees between 50 and 500 average purchase = $12,000
Figure 5-6 Small business customer average annual purchases of equipment X. between 50 and 500 have higher average purchases of $5200, versus $1000 of the smaller companies (company size ⬍50). Among the companies with company size between 50 and 500 and with more than five branch offices, the average purchases are $5900. This case study illustrates that with the application of appropriate data mining segmentation technique, valuable customers subsegments can be uncovered.
■ Understanding route to market and competitive landscape by market segment Once market opportunities and the target audience are identified, the next step is to assess the ability to compete in each segment through understanding of route to market and the competitive landscape.
Routes to market Customers purchase products through different avenues. A route to market is an avenue through which customers purchase products. In the case
91
92
Data Mining and Market Intelligence
of direct sales, customers purchase directly from marketers. In the case of indirect sales, customers purchase from intermediaries. These intermediaries are called channel partners, retailers, resellers, or distributors. For example, a company that designs and manufactures women clothing may use several routes to market its product to customers. These routes include the company’s own physical stores, print catalogues, e-catalogs, department retail stores, and online web sites.
Direct sales In direct sales, products are sold directly to customers. In many cases, selling directly to customers is not a scalable business model, and the need for leveraging a third-party reseller or distributor emerges. For example, consumer goods manufacturers leverage retail stores such as consumer electronics stores and supermarkets to distribute and sell their products. Leading high-tech companies often leverage their channel partners to a very large degree. In general, direct sales are a more common model in business-to-consumer than in business-to-business.
Indirect sales In indirect sales, companies leverage intermediaries to sell their products or services. The main advantage of this distribution method is scalability. These intermediaries are called distributors, resellers, partners, whole-sellers, retailers, or channel partners. Good channel partners often enhance a company’s revenue growth. Companies using channel partners often rely on the partners to contact and interact with end customers. As a result, these companies often do not have the visibility to end customer information. However, channel partners can often provide end user data at an aggregate level. For example, instead of revealing actual end user names, distributors can provide reports on end user sales by vertical industry, company size, and geography. There are different types of indirect sales models depending on the number of intermediaries involved. A one-tier model refers to a model where there is only one channel partner between a vendor and an end user customer. A two-tier model is a model where there are two layers of intermediaries between a vendor and an end user customer. In this case, a vendor sells its product to a channel partner that then sells the product to a reseller. The reseller then sells the products to an end user customer.
Revenue and investment flows Understanding cost effectiveness by route to market is essential for establishing an optimal channel strategy balance. Figure 5-7 illustrates revenue flows from direct and indirect sales.
Understanding the Market through Marketing Research
Partner 1 Revenue and profit from partners
Company
Investment in partners
Partner 2 Partner 3 Partner 4
Revenue and profit from end customers
End customers
Investment in end customers
Figure 5-7 Direct and indirect revenue and investment flows.
It is important for firms to evaluate revenue and profit streams from intermediaries and end customers, as well as the firms’ marketing investments for selling into both contingencies. If the returns on investment are significantly higher from intermediaries than from end customers, then it is necessary to explore the underlying reasons. It is possible that the market climate is such that end customers prefer buying from intermediaries. Objectively assessing returns on investment from both direct and indirect sales enables firms to embark on an optimal strategy of direct and indirect sales. We now consider the market segmentation case study in Figure 5-3. Within each market segment, the contributions of direct sales and channel partner sales are rated as ‘fair ’ or ‘poor,’ as shown in Table 5-9. The company relies mainly on channel partners or others for selling its products in segments three and four. On the contrary, the company sells most of its products directly to its customers in segment two. Degree to which the firm relies on channel partners drives its investments in marketing spending for direct sales and marketing spending for indirect sales.
Competitive landscape The way the market perceives the strengths and weaknesses of a particular firm affects the purchasing behavior of its customers. For instance, it is well known that a manufacturer with a solid brand inspires more trust in customers, and trust is often a key driver for product selection.
93
94
Data Mining and Market Intelligence
Table 5-9 Market segmentation with route to market information overlay Market size
Market growth Low
High
Small
Segment 1 Direct sales: poor Sales through channels: poor
Segment 2 Direct sales: fair Sales through channels: poor
Large
Segment 3 Direct sales: poor Sales through channels: fair
Segment 4 Direct sales: poor Sales through channels: fair
Understanding the competitive landscape in each market segment is to understand a firm’s own strengths and weaknesses as well as those of its competitors. There are many attributes that can be used to evaluate the competition. Before analyzing these attributes, however, we must identify the competitors. The most common way of identifying key existing and potential competitors is to consult industry trade publications, industry financial analysts, or research experts. There is often ranking information on companies in each industry, product or service category. Ranking can be based on market share, growth, or financial position. Sales people and customers can also help identify competitors. Since it is extremely challenging to examine every potential competitor, evaluation should be limited to the top existing and potential competitors. Once the key existing and potential competitors have been identified, the next task is to determine which attributes to use to examine the strengths and weaknesses of each competitor. In general, there are four groups of attributes to consider when analyzing the strengths and weaknesses of the competition. The four groups of competitive attributes are brand recognition, leadership, vision and innovation, current product offering, operational efficiency, and financial condition. ●
Brand recognition, leadership, vision, and innovation: The four seemingly intangible attributes sometimes are important drivers of customers’ purchase decisions. Brand recognition refers to a set of perceptions and feelings evoked in customers or prospects when they are exposed to ideas such as value propositions or images (logos,
Understanding the Market through Marketing Research
●
●
●
symbols) about particular companies. Brand recognition is the result of customer experience and interaction with a particular company, or customer exposure to advertising, marketing, and other activities of the company. Leadership is the ability of an individual to influence, motivate, and enable others to contribute toward the effectiveness and success of the organizations of which they are members (House, Hanges, Javidan, and Dorfman, 2004). Good leadership is consistently viewed as a competitive advantage for a company. Vision refers to the long-term objectives of a company. With its vision as a guiding principle, a company may be more likely to evolve in a manner consistent with its long-term objectives. Innovation is change that creates a new dimension of performance (Hesselbein, Johnston, and the Drucker Foundation, 2002) and drives competitiveness. Current product offering: The current product offering of a firm has unique features and benefits. How these features and benefits are perceived in the market segment affects customer purchase behavior. For example, a product that is more reliable than its competing products will attract buyers that value reliability. In addition to reliability, attributes such as customer service, quality, relevance, convenience, ease of deployment and installation, scalability, warranties, variety, and pricing are also important. Service, in particular, has become a crucial factor for customers when evaluating products. Operational efficiency: Operational efficiency in corporate functions such as manufacturing, management, sales, marketing, fulfillment, inventory, and customer service are also important in shaping market perception of a firm. Companies with frequent delays in product delivery or companies that deliver defective products are likely to be perceived as companies with operational ‘weakness.’ Financial condition: Financial condition is the overall company’s financial performance, as reflected by indicators such as stock growth, revenue, profitability, returns on equity, returns on assets, debt and cash positions that affect the ability of the firm to acquire financing when necessary, capitalization, and the P/E ratio. A strong financial condition is considered a competitive advantage.
Competitive analysis methods There are four analytical formats for analyzing competitive landscape: tabulation, grid, strength, weakness, opportunity, and threat (SWOT) analysis, and perceptual maps. ●
Tabulation: The tabulation format is the easiest approach for compiling competitor information. In the following example, the six supermarket chains in the San Francisco Bay Area are evaluated for their
95
96
Data Mining and Market Intelligence
●
●
competitiveness by brand recognition. The ratings range from one (the weakest rating) to five (the strongest rating) in each attribute category. These six supermarket chains in the analysis are Safeway, Albertsons, Bell, Costco, WholeFoods, and Ranch 99. Safeway and Albertsons are the two mainstream supermarket chains in the Bay Area and the betterknown brands of the six. Safeway and Albertsons have considerably more stores than the other four competitors. Costco is well known for its warehouse environment and low prices. Bell is slightly lesser known than Safeway, Albertsons, and Costco while WholeFoods caters to a more upscale market. Ranch 99 mainly caters to the Asian community. Based on the above information, we may give Safeway and Albertsons a rating of five, Costso a rating of four, and Bell and Ranch 99 a rating of three for brand awareness among mainstream grocery shoppers. The advantage of the tabulation approach is that it compares each player ’s strengths and weaknesses with each other ’s in detail. The disadvantage is that the tabulation approach does not provide a holistic summary of the overall competitiveness. Therefore, when the number of attributes is large, it’s difficult to derive a clear overall picture with the tabulation method. Grid is a common format used by research companies to analyze the competitive landscape. These companies use their proprietary methodologies to examine company competitiveness and present the result in a grid. Very often a competitive grid has two or three key indicators. Each key indicator is usually a composite index based on the values of specific key attributes. Some of these key attributes are similar to what we have discussed in the tabulation example. Gartner Research has developed Magic Quadrant, a graphical presentation of the competitive landscape for each of its key technology groups. Forrester Research has developed a competitive grid called Forrester Wave. Unlike the tabulation approach, grids don’t show detailed information about each player ’s strengths and weaknesses at the attribute level. Instead, grids provide a holistic, synthesized, and graphic summary view of the competitive landscape. SWOT, which stands for strengths, weaknesses, opportunities, and threats, is a very popular format for competitive landscape analysis. A SWOT analysis summarizes a firm’s overall competitive strengths and weaknesses, the market opportunity, and the competitor threats. We now reconsider our previous example of market segmentation of Company X in Figure 5-5 with a focus on the competitive landscape in segment four. The market size of this segment is estimated to be $230m in 2006 and $265m in 2007. The growth from 2006 to 2007 is 15%. The following is a four-step process for constructing a SWOT analysis. The first step is to identify Company X’s strengths, weaknesses, market opportunities, and competitive threats. The result is
Understanding the Market through Marketing Research
Table 5-10 Identification of the strengths, weaknesses, opportunities,
and threats in segment four Strengths
Weaknesses
● Extensive channel partner
● Company X’s market share is a
network for distributing and selling ● Reliable product ● Competitive price
distant no.2 from that of the market leader, TUV (market share: 50%) ● Company X has poor brand name recognition compared to TUV
Opportunity
Threats
● High overall market growth
● Market leader TUV is actively
at 15%
pursuing Company X’s largest customers ● A local vendor in Asia just announced a major price-cutting promotion
illustrated in Table 5-10. The second step is to leverage the strengths to take advantage of the current opportunities or mitigate the competitive threats. This is shown in Table 5-11. The third step, illustrated in Table 5-12, is to prevent weaknesses from sabotaging opportunities or amplifying competitive threats. The fourth step is to maintain areas of strengths and strengthen areas of weakness over time. The outcome Table 5-11 Leveraging strength to take advantage of opportunities or mitigate threats Leveraging strength to take advantage of opportunities
Leveraging strength to mitigate threats
● Leveraging Company X’s extensive
● Creating a customer loyalty
channel partner network to capture high market growth (e.g., increasing investment in joint customer seminars with partners) ● Promoting value propositions on product reliability and competitive pricing
program to prevent customer attrition due to TUV’s threat ● Examining and negotiating the profit-margin structure with partners in Asia to ensure maximum level of price competitiveness
97
98
Data Mining and Market Intelligence
Table 5-12 Preventing weaknesses from sabotaging opportunities or
amplifying threats Preventing weaknesses from sabotaging opportunities
Preventing weaknesses from amplifying threats
● Poor brand recognition may
● Poor brand recognition may
prevent Company X from taking advantage of this opportunity. Company A needs to invest in its brand awareness programs
prevent company X from convincing the market that it provides value with a price premium in Asia. Company X needs to promote its value proposition and brand in Asia
of a SWOT analysis will not only guide short-term planning, but also point out areas for improvement for long-term success. Like the other competitive analysis formats, SWOT has its advantages and disadvantages, as shown in Table 5-13. Table 5-13 Advantages and disadvantages of SWOT Advantages of the SWOT method
Disadvantages of the SWOT method
● Information is easy to acquire
● Analysis may be subjective ● Hard to quantify the
from syndicated research companies or internal marketing/sales groups ● Analysis is easy to construct ● Analysis is easy to digest
●
interrelationships between the four components, namely, strength, weakness, opportunity, and threat ● Hard to tie the information to customers, their needs, and their future purchase plans ● Public information may lead to minimal competitive advantages
A perceptual map is a graphical depiction of the market perception of a product. The key difference between a grid and a perceptual map is that the latter is constructed with statistical data mining techniques while the former is usually derived from marketing research data summaries. We now consider the illustrative case study of Company X in Figure 5-5. A survey is conducted on three groups of audiences for product A: prospects, high-value customers (customers who have
Understanding the Market through Marketing Research
purchased high volume of product A), and low-value customers (customers who have purchased low volume of products A). The three groups of audiences are asked to identify whether six specific features are crucial to their purchase decisions. These six features are brand strength, uniqueness of features, pricing, product quality, customer service, and ease of acquiring the product. The three groups of audiences are also asked to identify if they associate any of these six features with the three main vendors, Company X, Competitor Y, and Competitor Z. The results of the survey are compiled and correspondence analysis, which we will discuss in Chapter 7, is conducted to construct the perceptual map shown in Figure 5-8. From the map, we can observe the distances between the three target audiences, the six product features, and the three companies. Close proximity indicates a higher degree of association. On the map, Company X is close to
Low-value customers
Dimension 2
Competitor Y Uniqueness of features
Pricing Prospects
Brand strength
Competitor Z Easy to acquire
Company X
Customer service Product quality High-value customers
Target audience Company or product attribute Company name Dimension 1
Figure 5.8 Perceptual map analysis of product A. three features: brand strength, product quality, and customer services. Therefore, these three features are Company X’s strengths and competitive advantages. In addition, Company X is the vendor that is closest to the high-value customer segment. This means that Company X is viewed by this audience segment more favorably than its two competitors. Like the other competitive analysis formats, the perceptual map has its advantages and disadvantages, as shown in Table 5-14.
99
100
Data Mining and Market Intelligence
Table 5-14 Advantages and disadvantages of a perceptual map Advantages of perceptual map
Disadvantages of perceptual map
● Provides graphic representation
● Requires significant investment
of multiple dimensions simultaneously ● Relationships between competitive advantages, target audience, customer needs, and customer future purchase plans are easily quantifiable ● Audience feedback provides objective view of the competitive landscape ● Proprietary analysis may lead to competitive advantage
in time, resources, and expertise in gathering and analyzing relevant data
■ Overview of marketing research Marketing research is research that helps advance understanding of the market and the customers, generating information that helps make better marketing investment decisions. The following skills are the required skills for conducting marketing research. ● ● ● ● ● ● ● ● ● ●
Economic, business, and statistical knowledge Experience in syndicated research and customized research Experience in primary and secondary data Knowledge of survey sampling, sample size, and questionnaire design Knowledge of focus group research Knowledge of panel studies Knowledge of request for proposal (RFP) and research vendor management Knowledge of list rental and list brokerage business Ability to communicate and explain complex research concepts to both business and IT audiences Ability to provide actionable recommendations to address business issues.
Figure 5-9 shows a step-by-step thought process for marketing research planning and implementation.
Understanding the Market through Marketing Research
Identify business objectives
Determine final research deliverable requirements
Search current available Y syndicated research and determine if it addresses the need
Utilize syndicated research
N Acquire customized research: solicit research vendor proposals with a RFP
Select appropriate vendor proposal
Actively participate in the research process including sample selection and questionnaire design
Translate research results to actionable business recommendations
Figure 5-9 Effective marketing research thought process. Throughout the remaining of this chapter, we will introduce the following important market research topics: syndicated research versus customized research, primary data versus secondary data, sample size, questionnaire design, focus groups, and panel studies.
Syndicated research versus customized research Syndicated research, which can be acquired through subscription, is research that is prepackaged by research companies. This type of research is conducted on the basis of the research firms’ assumptions, specifications, and criteria. When searching for market data and intelligence, we should first consider syndicated research since it is one of the most costeffective sources. Different research firms specialize in different industries and products. Subscriptions are usually on a one-time, quarterly, or annual basis. In addition to selling prepackaged syndicated research, research companies often provide consulting services arranged along the following lines. ●
Subscription to an inquiry service grants subscribers direct access to analysts for additional information beyond the standard reports. There is usually a threshold on how much time a subscriber can spend
101
102
Data Mining and Market Intelligence
●
with analysts either face to face or by phone over the duration of his subscription. Occasionally, analysts may recommend a one-time project to address a subscriber ’s additional needs. A one-time project may lead to a customized research project, as we discuss in the next section.
Customized research tends to be more expensive than syndicated research as the former is customized for very specific needs and the latter is intended for a broader audience base. Customized research has very specific objectives and deliverables customized to a particular marketer ’s needs and often involves collecting primary survey data. Table 5-15 illustrates the main differences between syndicated research and customized research.
Table 5-15 Syndicated research versus customized research Research specification
Data collectors
Cost
Syndicated research
Third-party research company
Third party
Low
Customized research
Marketers themselves
Third party or marketers themselves
High
The following step-by-step process should be followed for planning and executing customized research: ● ● ● ● ● ● ● ●
Identification of business objectives Identification of deliverables that allow the research project to meet the objectives Creation of an RFP to solicit research vendors’ proposals Evaluation and selection of vendor proposals Determination of sample size and source Designing of the questionnaire(s) Collection of data Analysis of results to derive learning.
Customized research planning case study Company ABC is a storage system supplier for Fortune 500 companies in the US. The overall objective of ABC is to understand the future storage spending of its customers, their vendor preferences, their purchase
Understanding the Market through Marketing Research
processes, and the appropriate marketing messages. Specifically, Company ABC wants to address the following questions through customized research. ● ● ● ● ●
How much do the Fortune 500 companies plan to spend on storage in 2008? Do different industries have different levels of need in storage systems in 2008? Which vendors are on the top of mind among the Fortune 500 companies when it comes to storage system purchases? What are the Fortune 500 companies’ selection criteria for storage vendors? What marketing messages will resonate well with these Fortune 500 companies?
The following five deliverables can be used to address the questions above: ● ● ● ● ●
Understanding the 2008 budgets for storage systems among the Fortune 500 companies Analyzing the 2008 storage budgets by industry Compiling vendor rankings from survey respondents at Fortune 500 companies Getting feedback regarding vendor selection criteria Getting input regarding drivers and barriers for purchasing storage systems.
After Company ABC identifies its research objectives and deliverables, it creates an RFP to solicit vendor proposals. An RFP is an effective way of collecting vendor proposals for evaluation. An RFP does not need to be overly complex. However, it needs to cover the following key components: ● ● ● ● ● ● ●
Project overview Objectives Deliverables Methodology Proposal submission Project timeline General conditions and terms. The next example illustrates the use of RFP to solicit vendor proposals.
Project overview ABC, a company specializing in storage systems, wants to understand the future needs of this market to prioritize marketing investments and resources.
103
104
Data Mining and Market Intelligence
Objectives The project has the following objectives: ● ● ● ● ●
Understanding the 2008 budgets for storage systems of Fortune 500 companies Analyzing the 2008 storage budgets by industry Compiling vendor rankings from survey respondents at Fortune 500 companies Getting feedback regarding vendor selection criteria Getting input regarding drivers and barriers for purchasing storage systems
Deliverables ● ● ● ●
Executive summary Analysis and recommendations to support the business objectives Raw survey data An on-site presentation explaining the results and conclusions of the research
Methodology Blind face-to-face interviews Proposal submission Submit proposal to ABC by September 27, 2007. Contact information: Sheila Wu, Research Manager Tel: (703) 446-5272, e-mail: [email protected] Project completion timeline Completion by November 30, 2007. General conditions and terms All information provided herein is the proprietary of ABC, Inc. This information is furnished specifically and solely to allow the prospective vendor to estimate the cost of executing this project. Any other usage of this information is strictly prohibited without the prior written consent of ABC. After distributing the RFP, Company ABC waits for the research vendors to submit their responses. ABC should look for the following key components in a vendor proposal: ● ● ● ● ● ● ●
Project overview Objectives Methodology Data and analysis Survey sample and questionnaire (if collection of primary survey data is required) Deliverables Project timeline
Understanding the Market through Marketing Research
● ● ● ●
Vendor project team (team member qualification and biography) Overview of vendor capability (competitive strengths relative to the other vendors) Fees General conditions and terms (including legal and contractual agreements).
A vendor proposal is, in essence, a research plan. A proposal should correspond to the key components in the RFP with more in-depth information. Table 5-16 illustrates a comparison between an RFP and a vendor proposal in terms of key components.
Table 5-16 Comparison of key components between an RFP and a ven-
dor proposal Key component required
RFP
Vendor proposal
Project overview Objectives Deliverables Methodology Data and analysis Survey sample and questionnaire* Proposal submission Project timeline Overview of vendor capability Vendor project team and member bio Professional fee General conditions and terms
Yes Yes Yes Yes
Yes Yes Yes Yes Yes Yes
Yes Yes
Yes
Yes Yes Yes Yes Yes
* If primary survey research is required.
Primary data versus secondary data Primary data refers to research data directly collected from the target audience. Data collected from the target audiences by others (third parties) is called secondary data. It is often more expensive to acquire primary data than secondary data. However, there are situations where collection of primary data is necessary. For example, ● ●
The business objectives cannot be met by any existing syndicated research. The target audience is so specific that no syndicated research can address the specific need.
105
106
Data Mining and Market Intelligence
●
Existing syndicated research reports may offer conflicting information that cannot be reconciled. It is very common for different research companies to produce differing forecasts for the same market.
Surveys In a survey, a questionnaire or a script is used to collect information from a group of people through various communication methods such as direct mail, e-mail, telephone, and face to face. A questionnaire consists of a list of questions in either multiple choice or open-ended text format.
Survey communication methods A variety of survey methods are available. Different audiences may have difference preferences about how they are surveyed. It is important to ensure that the responder composition is representative of the overall target audience. In a direct mail survey, questionnaires are sent to the target audience by postal mail. Respondents fill out the questionnaire and return the questionnaires in a business response envelope (BRE) or a business response card (BRC) by mail. Cost of direct mail can sometimes be high and is driven mainly by direct mail production and mail postage. Cost per response is higher if the target audience reached is irrelevant or unresponsive. Direct mail response rate is driven mainly by the relevance of the target audience and address data accuracy. Lower address data quality tends to result in lower response rate. Response rate may vary by industry, product, and service. Response time of direct mail varies and ranges from a couple of days to weeks. Most responses come in within a month. Response time depends on the complexity of the questionnaire as well. The more complex a questionnaire is, the longer its response time will be.
E-mail In an e-mail survey, electronic questionnaires are sent to the target audience by e-mail. Respondents may respond by completing the questionnaire on the web, by e-mail, or by other methods. E-mail cost tends to be low and usually runs at several cents per e-mail sent. E-mail is one of the least expensive ways of conducting a survey. E-mail response time is usually much shorter than direct mail response time. Most responses come in within days. As is the case with direct mail, e-mail survey response rate depends on factors such as target audience and accuracy of e-mail address data.
Understanding the Market through Marketing Research
Phone surveys In a phone survey, the target audience is contacted by phone to answer a list of questions from a questionnaire or a script read by a phone survey representative. Phone interview cost is higher than direct mail and much higher than e-mail. However, phone interviews have the advantage of getting respondent data instantly and clarifying any questions or confusion that respondents may have. When interviewer training is required, additional training cost needs to be factored in. Phone interview response time is real time. Once a respondent is reached and agrees to go through a survey, the data is collected instantly. The challenge is to successfully reach the target audience. Phone interview response rate also depends on factors such as responsiveness of the target audience, accuracy of phone numbers, offer, and target audience availability. Training is essential for phone interviewers. Interviewers need to possess the basic understanding of the use of a script or questionnaire to collect answers from respondents. When the research subject is technical or specialized, additional training needs to be given to interviewers so they can articulate their questions appropriately. In some cases, interviewers need to have certain professional knowledge and experience to effectively carry out the survey. For example, a phone survey on computer server purchases may require interviewers with in-depth technical knowledge in the sever business. Interviewers also need to be very perceptive of the respondent’s reactions and must be able to make adjustments accordingly. Nowadays, phone interviews are often conducted in call centers where every interviewer has a cubicle, a phone, and a computer terminal to access information when needed. Usually a supervisor is assigned several interviewers to monitor. Supervisors are equipped with communication gear to give timely coaching or feedback to their staff interviewers. Interviewers’ access to timely feedback is one advantage that phone interviews may have over other types of interview methods such as direct mail, e-mail, or face-to-face interviews. Computer assisted telephone interviewing (CATI) is designed to enable phone interviewers to conduct telephone interviews effectively. CATI enables predictive dialing, questionnaire management, sample and quota management, data access, data entry, and analysis. Predictive dialing is a feature of CATI that allows for automatic dialing of batches of phone numbers to connect with phone interviewers and those they intend to survey. Sample and quota management is a feature of CATI that tracks and compares a predetermined quota on respondents and the number of respondents that a phone interviewer actually reaches and surveys.
107
108
Data Mining and Market Intelligence
Prescheduled or intercept face-to-face interview Face-to-face interviews can be prearranged with potential interviewees and conducted in a predetermined location at a preset date and time. Prearranged interviews allow for careful screening of the target audience and suitable arrangement of the interview start and end time. Planning prior to interviews can be very time-consuming and resource intensive. On the other hand, face-to-face interviews that are not prearranged are intercept interviews, which do not require a great deal of time for audience screening. The preparation and planning time is minimal, and one can usually find a large sample of potential interviewees in places such as shopping malls. However, the quality of interviewees from intercepts might sometimes be questionable due to lack of prescreening. As is the case with phone interviews, once a respondent agrees to be interviewed, the response time of a face-to-face interview is real time. Response rate depends on numerous factors such as relevance of target audience and timing of the interview. Training is essential for face-to-face interviewers. The interviewers need to have a basic understanding of how to follow a script or questionnaire to collect answers from the respondents. In-person interviewers have the opportunities to observe the respondents face to face and adjust questions accordingly.
Panel studies A panel is a group of people, households, or businesses that respond to questionnaires on a periodical basis. The duration of a panel can vary from days to years. Panel surveys are administered through direct mail, e-mail, and face-to-face interviews. Major research companies usually have established panels for ongoing surveys and monitoring. The cost of panel studies depends on the survey mechanism. Usually, phone and face-to-face interviews are more expensive than direct mail and e-mail. The response time also depends on the survey mechanism. The response rates of panel studies are usually higher since panels usually consist of dedicated respondents. The following is a list of examples on existing panels (Blankenship and Breen, 1995). ●
●
Nielsen Media Research offers national measurement of television viewing National Television Index (NTI) using its People Meter to measure the television viewing of various household members. Arbitron’s Portable People Meter (PPM) measures consumers’ exposure to any encoded broadcast signal (e.g., cable TV and radio).
Understanding the Market through Marketing Research
● ● ●
NPD Group has an online panel of over 3m registered consumers (www.npd.com, 2007). Home testing Institute, a division of Ipsos, has a panel of households available for monthly mailing surveys. ACNielsen SCANTRACK collects scanner-based marketing and sales data weekly from a sample of stores.
Panel surveys have several advantages over other alternatives. First, they provide the possibility of conducting longitudinal studies to observe behavioral changes in the same sample over time. Second, panel surveys usually cost less than nonpanel surveys since there is only a one-time setup cost with panels.
Omnibus studies An omnibus study is an ongoing study in which new questions can be added gradually to a regular panel study. Omnibus studies are costeffective since multiple companies share the up-front survey setup cost. Omnibus studies are cost-effective when there are few questions to be added to the survey. Omnibus studies become less cost-effective when the number of additional questions is large (Blankenship and Breen, 1995).
Focus groups A focus group is a small discussion group led by an experienced moderator, whose role is to stimulate group interactions. This format has the advantage of generating group insight that is not attainable through separate one-on-one surveys. Focus groups can be used for exploring new product ideas, advertising concepts, and customer attitudes and perception. It is a qualitative rather than a quantitative method given that the sample size is very small (usually between 7 and 12 people in a group). However, insight gathered from a focus group can be very helpful for planning further research and analysis via other mechanisms. The cost of a focus group can be significant. Such cost includes the expense in recruiting the group members, the moderator fee, facility access, and equipment for monitoring and recording.
Sampling methods There are two types of sampling methods: probability and nonprobability sampling (Green, Tull, and Albaum, 1988). Probability sampling involves applying some sort of random selection. Nonprobability sampling does
109
110
Data Mining and Market Intelligence
not involve application of random selection. There are four types of probability sampling methods. ●
●
●
●
Simple random sampling: In simple random sampling, each subject has an equal probability of being selected. The first step in this sampling method is to assign each subject a computer-generated random number. For example, to select a sample of 1000 out of a population of 100,000, one would generate a uniform random number for each of the 100,000 records and select the 1000 records with the highest random numbers. Stratified sampling: In stratified sampling, the data is first divided into several mutually exclusive segments, and then a random sample is drawn from each segment. Cluster random sampling: In cluster random sampling, the data is first divided into mutually exclusive clusters (segments), and then one cluster is randomly selected. All of the records in the selected cluster will be measured and included in the final sample. Multistage sampling: In multistage sampling, more than one of the sampling methods mentioned previously are utilized. There are four types of nonprobability sampling methods.
●
●
●
●
Quota sampling: In quota sampling, a sample is selected based on a predefined quota. For example, given a quota of 50:50 female-to-male ratio and a total of 1000 subjects, 500 females and 500 males will be selected. Convenience sampling: In convenience sampling, samples are drawn from data sources that are easy or ‘convenient’ to acquire. For example, in clinical trials or shopping mall intercept surveys, respondents are acquired based on their availability. Availability does not guarantee that the sample is representative of the population, however. Judgment sampling: In judgment sampling, the sampler has a predefined set of characteristics on which the sampling is based. For example, in a mall intercept survey, the samplers may target adults with an age range between 20 and 30. Snowball sampling: In snowball sampling, the sampler relies on ‘viral marketing’ or ‘word of mouth’ to increase his sample size. In this case, the original sample size may be small but the sample size increases as those sampled refer people they know to the sampling process.
Sample size One frequently asked question in survey research is how big the sample size should be. In general, we assume that the data has a normal distribution
Understanding the Market through Marketing Research
and select a sample size that will give the desired result within a given confidence level such as 95%. A confidence level is defined as the percentage of time when the result is expected to be accurate not due to chance.
Sample size based on sample mean In this section, we discuss the derivation of an appropriate sample size that allows for proper estimation of the mean of an attribute in a sample. The first step in the derivation is to determine an acceptable standard error of the attribute mean estimation, denoted as E. For instance, we may assume the acceptable standard error as 0.5 years when estimating the mean age of the persons in a sample. The second step is to assess the standard deviation of the age in the population, denoted as . Let us assume that the standard deviation of the age is 3 years. The third step in the derivation is to identify the Z score (the concept of Z score will be discussed in Chapter 6) for a predetermined confidence level, such as 95%. In a normal distribution, the Z score at a 95% confidence level is 1.96. The fourth step of the derivation is to compute the appropriate sample size. The sample size is obtained based on the following formula: n=
3 2 (1.96)2 2 Z2 = = 138 0. 52 E2
(5.6)
In this example, the appropriate sample size is 138.
Sample size based on sample proportion This section discusses the process of deriving the appropriate sample size for estimating the percentage of voters with a particular voting disposition. The first step in the derivation is to estimate the general voting disposition of the population p such as 45% voting for party X. The second step in the derivation is to determine a standard error E such as 0.5%. The third step in the derivation is to identify the Z score given a confidence interval such as 95%. In a normal distribution, the Z score at a 95% confidence level is 1.96. The fourth step is to compute the appropriate sample size. The sample size can be computed based on the following formula: E⫽Z
p(1 ⫺ p) n
The appropriate sample size n is 38,032.
(5.7)
111
112
Data Mining and Market Intelligence
■ Research report and results presentation It is important to deliver a final research report or presentation that clearly addresses the initial business objectives. Data and information are important, but actionable recommendations are even more crucial. One common mistake frequently seen in research reporting is the presentation of an abundance of data and charts with no actionable recommendations. The following is a framework for the basic structure of an effective research report or presentation.
Structure of a research report ●
●
●
●
●
●
●
Background: This section gives an overview of the project background. This overview should be consistent with the original proposal and the RFP. Outline: The outline section consists of the topics the report discusses. The objective of the outline is to give the reader a clear idea on what to expect throughout the report. Executive summary: This is one of the most important sections in the whole report. Busy executives often scan only the summary section to determine if the report is worth further reading. It must be factual and must provide the answers needed to address project objectives. The executive summary must also highlight a set of practical and actionable recommendations. Research methodology: This section needs to clearly state the research methodology employed. For example, if a survey is included in the study, survey mechanism such as direct mail and e-mail needs to be clearly stated. If questionnaires are involved, they should be included in an appendix. Data sources: In this section, the source of the data and the sample size, if applicable, need to be specified. Clear description of data attributes needs to be given, and data collection methods need to be stated as well. Detailed information about the data can be included in an appendix. Key findings: Key findings are compilation of results in more granular detail than what is presented in the executive summary. It is a good idea to break down the findings into different sections and have a summary for each section. This helps the reader not to get mired in data and numbers. Recommendations: Recommendations must be actionable, practical, and need to be fact-based and analysis-driven.
Understanding the Market through Marketing Research
●
●
Reference and acknowledgments: Acknowledgments need to be given when quoting a data source or a piece of analysis that one does not have ownership of. Written permissions from owners of the data sources may be required. Appendix: In this section, we can insert additional information such as questionnaires, anecdotal commentary, and detailed information on raw data.
■ References Blankenship, A.B., and G.E. Breen. State of the Art Marketing Research. AMA/NTC Business Books, Chicago, Illinois, 1995. comScore Networks comScore Media Matrix. comScore Press Release. Reston, Virginia, February 28, 2006. Hesselbein, F. and R. Johnston. The Drucker Foundation. On Leading Change: A Leader to Leader Guide. Jossey-Bass, San Francisco, CA, 2002. House, R.J., P.J. Hanges, M. Javidan, and P.W. Dorfman. Culture, Leadership, and Organization: The Global Study of 62 Societies. Sage Publications, Thousand oaks, CA Inc, 2004. Green, P., D.S. Tull, and G. Albaum. Research for Marketing Decisions, 5th ed. Prentice-Hall, Englewood Cliffs, New Jersey, 1988.
113
This page intentionally left blank
CHAPTER 6
Data and Statistics Overview
This page intentionally left blank
This chapter gives an overview of data and basic statistics with an emphasis on the data types and distributions that drive the selection of data mining techniques particularly relevant to quantitative marketing research.
■ Data types The data we are concerned with results from assigning values (historical or hypothetical) to variables used in statistical analysis. Therefore, when we refer to data types, we also refer to the types of the variables the data originates from. There are two data types: non-metric data and metric data. Within each data type, there are sub data types. Under the nonmetric data type, there are three sub types: binary, nominal (categorical), and ordinal. A binary variable has only two possible values. Whether or not a survey has received a response can be characterized by a binary variable with only two possible values: ‘response’ and ‘no response’. A nominal or categorical variable can have more than two values. For example, a variable ‘income group’ can have three possible values: ‘high income’, ‘medium income’, and ‘low income’. The values of a nominal or categorical variable are given for identification purposes rather than for quantification. Group number, a variable used to identify specific groups, is an example of nominal variable. An ordinal data type differs from binary or nominal types in that it denotes an order or ranking. Ordinal data does not quantify the difference between any two rankings, however. Assume we rank three movies on the basis of their popularity. ‘Spiderman II’, ‘Sweet Home Alabama’, and ‘Anger Management’ have the ranking of first, second, and third, respectively. ‘Spiderman II’ is more popular than ‘Sweet Home Alabama’ but the ranking does not provide any information on the difference in popularity between the two movies. Metric data is also called numeric data and can be either discrete or continuous. A discrete variable, such as the age of persons in a population, takes on finite values. A continuous variable is a variable for which, within the variable range limits, any value is possible. Time to complete a task is a variable with a continuous data type.
■ Overview of statistical concepts This section provides an overview of fundamental statistical concepts including a number of basic data distributions.
118
Data Mining and Market Intelligence
Population, sample, and the central limit theorem A population is a set of measurements representing all measurements of interest to a sample collector (Mendenhall and Beaver 1991). The population of females aged between 35 and 40 in New York City consists of all females in that age range in New York City. A sample is a subset of measurements selected from a population of interest. For example, a sample may consist of 500 women aged between 35 and 40, randomly selected from the various boroughs in New York City. In this example, the sample size is 500. According to the central limit theorem, the distribution of the sum of independent and identically distributed random variables tends to a normal distribution as the number of such variables increases indefinitely (Gujarati 1988). The concept of normal distribution will be discussed in the data distribution section in this chapter.
Random variables In what follows we are interested in data that can be modeled as random variables. A random variable is a mathematical entity whose value is not known until an experiment is carried out. In this context, carrying out an experiment means observing the occurrence of an event and associating a numerical value to the event. For example, consulting a news report to determine whether the stock market went up from yesterday to today is an experiment and the fact that the market went up or down is the event in question. The amount by which the stock market went up or down is the value associated with the event just described. This value is the realization of a random variable. In this case, the random variable is the stock market change between yesterday and today. Random variables are called discrete if they take on discrete values, or continuous if they take on continuous values. The Growth Domestic Product (GDP) is an example of a continuous random variable, whose value becomes known when the GDP is reported. The number of customers per hour that visit a store is an example of a discrete random variable. Next we review basic concepts of probability that we need in order to model data as random variables.
Probability, probability mass, probability density, probability distribution, and expectation Probability is the likelihood of the occurrence of an event. Since random variables are numerical values associated with events, in what follows we
Data and Statistics Overview
will simply refer to probability as the likelihood that a random variable realizes (or takes on) a particular value. If the random variable is discrete, the probability that it will take on a particular value is given by its so-called probability mass. The probability mass is a positive number less or equal to one. The probability mass that a discrete random variable X takes the value xj is denoted by p(xj). To describe continuous random variables we need the concept of probability density. If X is a continuous random variable, the probability that X takes on a value within the interval x and x ⫹ dx is given by p(x) dx, where dx is a differential and f(x) is the probability density function. Notice the distinction between X, the random variable, and x, the values X can take. The probability distribution function (not to be confused with the density function) is the probability that a random variable will take on values less or equal to a particular value. If the random variable is discrete and can take on n values, its probability distribution function is defined as follows. j⫽i
P(xi ) ⫽ ∑ p(x j ) j⫽1
(6.1)
If the random variable is continuous, the probability distribution function is defined as follows: x
∫
F( x ) ⫽
f (s)ds
(6.2)
xmin
where xmin is the minimum value random variable X can take. In the case of a discrete variable X that can take on n values, the expectation is defined as follows. i⫽n
E(X ) ⫽ ∑ xi p(xi )
i ⫽ 1, 2, 3 , … , n
(6.3)
i⫽1
If random variable X is continuous, its expectation is given by the formula xmax
E(X ) ⫽
∫
xf (x )dx
(6.4)
xmin
where xmin and xmax are the smallest and largest values the continuous random variable X can take. The expectation is usually denoted by the Greek letter .
119
120
Data Mining and Market Intelligence
Mean, median, mode, and range There are two types of means: arithmetic and geometric. The properties we discuss next are defined for both continuous and discrete random variables. For simplicity, however, in this section we focus on the discrete case. In the case of discrete random variable, X, the arithmetic mean is given by the average of its possible values.
∑ i⫽1 xi ⫽ i⫽n
Xa
i ⫽ 1, 2, 3 , … , n
n
(6.5)
If the values 2, 6, 8, 9, 9, and 11 are instances of the random variable X, its mean is Xa ⫽
2 ⫹ 6 ⫹ 8 ⫹ 9 ⫹ 9 ⫹ 11 ⫽ 7.5 6
The geometric mean of a discrete random variable is given by the geometric average of its possible values. i⫽n
X g ⫽ n x1 ⫻ x2 ⫻ … ⫻ xn ⫽ n ∏ xi
i ⫽ 1, 2, 3 , … , n
(6.6)
i⫽1
For the same realized values of X, the geometric mean is X g ⫽ 6 2 ⫻ 6 ⫻ 8 ⫻ 9 ⫻ 9 ⫻ 11 ⫽ 6.64 The median is the value in the middle position in a sorted array of values. If there are two values in the middle position, the median is the average of these two values. The median in the example we are discussing is 8.5. The mode is the number that appears most frequently in a group of values. In our example, the mode is 9. The range is the difference between the largest and the smallest values in a group of values. In our example, the range is 9, the difference between 2 and 11.
Variance and standard deviation The variance of a population of N observations, 2, is the mean of the squares of the deviations of the observations from the population mean . 2 ⫽
1 N
i⫽N
∑ ( x i ⫺ )2 i⫽1
(6.7)
Data and Statistics Overview
The standard deviation of a population is the positive square root of the population variance. ⫽
1 N
∑ i⫽1
i⫽N
( x i ⫺ )2
(6.8)
The standard error of the mean of a sample of size N is given by the expression ⫽
N
(6.9)
The variance of a sample of size n ⬍ N is defined in the same way as the variance of the population, with N replaced by n. However, especially when the sample size is small, it is preferable to use an alternative expression for the variance of the sample where n is replaced by n ⫺ 1, as follows, s2 ⫽
1 i⫽n ∑ ( x i ⫺ x )2 n ⫺ 1 i⫽1
(6.10)
The reason why this formula is preferred to the expression for the variance is that s2 is an unbiased estimator of the population variance. The corresponding expression for the standard deviation of a sample is (Mendenhall and Beaver 1991). s⫽
1 i⫽n ∑ ( x i ⫺ x )2 n ⫺ 1 i⫽1
(6.11)
In our example of the sample with five realized values, 2, 6, 8, 9, 9, and 11, the variance of the sample is 9.9 and the standard deviation of the sample is 3.15.
Percentile, skewness, and kurtosis As before, we focus on the discrete case for simplicity. The p percentile is a value such that p% of the observations in a sample have a value less than this value. The skewness gives an indication of the deviation from symmetry of a density function (Rice 1988).
Skew ⫽
1 n
i⫽n
∑ ( x i ⫺ x )3 i⫽1
s3
(6.12)
121
122
Data Mining and Market Intelligence
The kurtosis characterizes tails of a distribution (Rice 1988) 1 n Kurtosis ⫽
∑ i=1 (xi ⫺ x )4 i⫽n
(6.13)
s4
In our example of the sample with five realized values, 2, 6, 8, 9, 9, and 11, the skewness of the sample is ⫺0.642 and the kurtosis of the sample is 1.84. We often use the normal density as a reference to characterize the tails’ size by defining the excess kurtosis. Since the kurtosis of the normal density function is equal to three, the excess kurtosis is given by
∑ i⫽1
i⫽N
Excess kurtosis ⫽
( xi ⫺ )4
N4
⫺3
(6.14)
Probability density functions The probability density function defines the distribution of probability among different realizable values of a random variable. This section gives an overview of probability density functions of eight commonly used data distributions: uniform, binomial, Poisson, exponential, normal, chisquare, Student’s t, and F distributions.
Uniform distribution A random variable with uniform distribution has a constant probability density function. If a and b are the minimum and maximum values the random variable can take, the uniform density function is f ( x) ⫽
1 ( b ⫺ a)
The expectation and variance of x are given by ⫽
2 ⫽
a⫹b 2
( b ⫺ a )2 12
(6.15)
Data and Statistics Overview
Normal distribution The probability density of a normally distributed random variable is f (x ) ⫽
1 2 2 e⫺(x⫺ ) /2 2
(6.16)
where ⫺⬁ ⬍ = x⬍ = ⬁, is the expectation, and is the standard deviation. The normal density can be standardized by rescaling the random variables as follows. Z⫽
x⫺
(6.17)
The density function of Z is normal with zero expectation and unit variance. The notation Z ~ N(0,1) is used to denote that Z is a random variable drawn from a standardized normal distribution.
Binomial distribution The probability density function of a binomial random variable is as follows (Mendenhall and Beaver 1991). f (x ) ⫽
N! p x qN⫺x x ! (N ⫺ x )!
(6.18)
A binomial event has only two outcomes. For example, an undertaking whose outcome can be described by success or failure can be characterized by a binomial distribution. The integer value x is the number of successes in a total of N trials where 0 ⱕ x ⱕ N. The success outcome has a probability p and the failure outcome has a probability of q ⫽ 1 ⫺ p. The expectation and variance of a binomial random variable are ⫽ Np 2 ⫽ Np(1 ⫺ p)
Poisson distribution A Poisson distribution characterizes the number of occurrences of an event in a given period of time. This distribution is appropriate when there is no memory affecting the likelihood of the number of events from period to period. The probability density function of a Poisson distribution is as follows.
123
124
Data Mining and Market Intelligence
x e⫺ x!
f (x ) ⫽
(6.19)
The variable x represents the number of event occurrences during a given period of time, during which on average events occur. Both the expectation and the variance of x are equal to .
Exponential distribution Exponential random variables characterize inter-arrival times in Poissondistributed events. The probability density function of an exponential distribution is as follows. f (x ) ⫽ e⫺x
(6.20)
where ⱖ 0, and the expectation and variance are given by ⫽ n
2 ⫽
1 1 2
The exponential distribution reflects absence of memory in the interarrival times of Poisson-driven events.
Chi-square ( 2) distribution A chi-square density characterizes the distribution of the sum of independent standardized normally distributed random variables, Zi. i⫽k
k2 ⫽ ∑ Zi2
(6.21)
i⫽1
Here, k is the degree of freedom as well as the number of independent standardized normal distribution variables. The probability density function of the chi-square distribution is as follows. ⫺
f (x ) ⫽
e
x k ⫺1 2 x2
k ⎛k⎞ 2 2 ⎜⎜⎜ ⎟⎟⎟ ⎝2⎠
The gamma function, , is defined as
(6.22)
Data and Statistics Overview
⬁
( ) ⫽ ∫ t ⫺1 e⫺t dt
(6.23)
0
and has the recursive property ( ⫹ 1) ⫽ ( )
(6.24)
The mean of a chi-squared random variable is equal to k, and its variance is 2 ⫽ 2k
Student’s t distribution A Student’s t distribution describes the ratio, t, of a standardized normally distributed random variable, Z1, and the square root of a 2 distributed random variable, Z2, over its degrees of freedom (Gujarati, 1988). t⫽
Z n Z1 ⫽ 1 Z2 Z2 n
(6.25)
The probability density function of t is
⎛ n ⫹ 1 ⎞⎟ −( n⫹1) Γ⎜⎜ ⎟ ⎝ 2 ⎟⎠ ⎛⎜ x 2 ⎞⎟ 2 f (x ) ⫽ ⎜1 ⫹ ⎟⎟ ⎛ n ⎞ ⎜⎝ n⎠ nπΓ ⎜⎜ ⎟⎟⎟ ⎝2⎠
(6.26)
The mean of a t distribution is zero. Its variance is 2 ⫽
k k⫺2
F distribution The F distribution describes the ratio of two 2 distributed random variables Z1 and Z2,with k1 and k2 degrees of freedom, respectively Z1 k F⫽ 1 Z2 k2
(6.27)
125
126
Chapter 6
The probability density function is as follows.
f (x ) ⫽
(k1 x )k1 k 2 k2 (k1 x ⫹ k 2 )k1⫹k2 ⎛k k ⎞ xB ⎜⎜ 1 , 2 ⎟⎟⎟ ⎝2 2⎠
(6.28)
⎛k k ⎞ k k Here, B ⎜⎜ 1 , 2 ⎟⎟⎟ is the beta function with parameters 1 , 2 . ⎜⎝ 2 2 ⎠ 2 2 The expectation and variance of an F-distributed variable are ⫽
2 ⫽
k2 k2 ⫺ 2
for k 2 ⬎ 2
2k22 (k1 ⫹ k2 ⫺ 2) k1 (k2 ⫺ 2)2 (k2 ⫺ 4)
for k2 ⬎ 4
(6.29)
The variance does not exist when k2 ⱕ 4.
Independent and dependent variables An independent variable is also called a predictive variable. Prediction in this context means estimating the possible value of a dependent variable with a given level of confidence. A dependent variable is also called an outcome variable.
Covariance and correlation coefficient Covariance measures the level of co-variability between two random variables. If X and Y are random variables, their covariance is defined by the expression cov(X ,Y ) ⫽ E((x ⫺ x )(y ⫺ y )) ⫽ E(xy ) ⫺ x y
(6.30)
where x and y are the mean of X and Y, respectively. A correlation coefficient between two variables X and Y gives an indication of the level of linear association between the two variables. There are several standard formulations for level of association, of which the Pearson correlation coefficient is the most popular. The Pearson correlation coefficient,
Data and Statistics Overview
r⫽
cov(X ,Y ) xy
(6.31)
measures the level of linear association between two random variables. Here, x and y are the standard deviations of X and Y, respectively. Besides the Pearson correlation coefficient, the Kendall’s coefficient of concordance and the Spearman’s rank correlation coefficient are also commonly used measures of association for numeric variables. The Pearson’s coefficient of mean square contingency and the Cramer ’s contingency coefficient are used to measure association between nominal variables. The Kendall-Stuart c, the Goodman–Kruskal , and Sommer ’s d are used to measure association between ordinal variables (Liebetrau, 1983).
Kendall’s coefficient Two pairs of observations (Xi,Yi) and (Xj,Yj) are said to be concordant if (Xi⫺Xj)(Yi⫺Yj) ⬎ 0. They are said to be discordant if (Xi⫺Xj)(Yi⫺Yj) ⬍ 0. They are said to be tied if (Xi⫺Xj)(Yi⫺Yj) ⫽ 0. Kendall’s coefficient of concordance is defined as
⫽ c ⫺ d
(6.32)
where c is the probability of concordance
c ⫽ P[(Xi ⫺ X j )(Yi ⫺ Yj ) ⬎ 0] and d is the probability of discordance
d ⫽ P[(Xi ⫺ X j )(Yi ⫺ Yj ) ⬍ 0] Given the probability of ties,
t ⫽ P[(Xi ⫺ X j )(Yi ⫺ Yj ) ⫽ 0] the following condition must be satisfied
c ⫹ d ⫹ t ⫽ 1 The following are two alternative estimations of the Kendall coefficient of concordance (Liebetrau, 1983).
⫽
(C ⫺ D) 2(C ⫺ D) ⫽ ⎛ n ⎞⎟ n(n ⫺ 1) ⎜⎜ ⎟ ⎜⎝ 2 ⎟⎠
(6.33)
127
128
Data Mining and Market Intelligence
where C is the number of concordant pairs, D is the number of discordant pairs, and n is the total number of pairs. An alternative expression (Liebetrau, 1983) is
⫽
( C ⫺ D) ⎫ ⎧⎪ ⎡⎛ n ⎞ ⎪⎨ ⎢⎜ ⎟⎟ ⫺ U ⎤⎥ ⎡⎢⎛⎜ n ⎞⎟⎟ ⫺ V ⎤⎥ ⎪⎪⎬ ⎜ ⎜ ⎥⎦ ⎢⎣⎝⎜ 2 ⎠⎟ ⎥⎦ ⎪⎪ ⎪⎪⎩ ⎢⎣⎝⎜ 2 ⎠⎟ ⎭
1
(6.34) 2
⎛n ⎞ ⎛m ⎞ where U ⫽ ∑ ⎜⎜⎜ i ⎟⎟⎟, V ⫽ ∑ ⎜⎜ j ⎟⎟⎟ , and mi is the number of observations ⎜2 ⎠ ⎝2 ⎠ i j ⎝ in the ith set of the X variables, and nj is the number of observations in the jth set of the Y variables. Spearman’s rank correlation coefficient is similar to Pearson’s except that the former is based on ranks rather than on values. Ranks are determined by the relative values of the numbers in a series. In a series of N numbers, the largest number has a rank of one, the second largest number has a rank of two, and the smallest number has a rank of N.
∑ i⫽1 (Ri ⫺ R)(Si ⫺ S ) n
b ⫽
{∑
n (Ri i⫽1
⫺ R)2 ∑ i⫽1 (Si ⫺ S )2 n
}
1
2
(6.35)
Where Ri is the rank of Xi among the X⬘s, Si is the rank of Yi among the Y⬘s, and n is the total number of pairs. Both Pearson’s coefficient of mean square contingency and Cramer ’s contingency coefficient are based on the following chi-square statistics of a contingency table. I
J
x2 ⫽ ∑ ∑ i⫽1 j⫽1
(nij ⫺ npij )2 npij
(6.36)
Contingency tables display frequency data in a two-by-two cross tabulation and are used by researchers to examine the independence of two methods of classifying the data. For instance, a group of individuals can be classified by whether they are married and whether they are employed. In this case marriage and employment are the two methods of classifying the individuals. Figure 6-1 shows a typical contingency table. Pearson’s coefficient of mean square contingency is a statistic used to measure the deviation of the realized counts from the expected counts for determining the independence of the two classification methods. The formula for the Pearson’s coefficient is as follows (Liebetrau 1983).
Data and Statistics Overview
Y
1
2
1
2
n11
.
.
.
J
Totals
n12
n1j
n1⫹
p11
p12
p1j
p1⫹
n21
n22
n2j
n2⫹
p21
p22
p2j
p2⫹
.
X
. . I
Totals
ni1
ni2
nij
ni⫹
pi1
pi2
pij
pi⫹
n⫹1 n⫹2
n⫹j
n⫹⫹ ⫽ n
p⫹1 p⫹2
p⫹j
p⫹⫹ ⫽ 1
Figure 6-1 Visualization of a two-way contingency table (Liebetrau 1983).
I
J
2 ⫽ ∑ ∑ i⫽1 j⫽1
( pij ⫺ pi⫹ p+ j )2 pi+ p+ j
I
J
⫽∑∑ i⫽1 j⫽1
pij 2 pi⫹ p⫹j
⫺1
(6.37)
Cramer ’s contingency coefficient, given by Eq. (6.38), measures the association between two variables as a percentage of their maximum possible variation. ⎛ φ 2 ⎞⎟1 / 2 ⎟ , ν ⫽ ⎜⎜⎜ ⎜⎝ q ⫺ 1 ⎟⎟⎠
(6.38)
where q is min {I, J}. The Kendall-Stuart c, the Goodman–Kruskal , and Somers’ d are statistics that are derived from Kendall’s .
129
130
Data Mining and Market Intelligence
The Kendall-Stuart c equals the excess of concordant over discordant pairs times another term representing an adjustment for the size of the table. Goodman and Kruskal’s gamma is a symmetric statistic that ranges from ⫹1 to ⫺1, based on the difference between concordant pairs and discordant pairs. Somers’ d is Goodman and Kruskal’s gamma modified to penalize for pairs tied only on X. Kendall-Stuart coefficient
c ⫽
2q(C ⫺ D) 2(C ⫺ D) ⫽ 2 [n2 (q ⫺ 1)/ q] n (q ⫺ 1)
(6.39)
Goodman–Kruskal’s coefficient ⫽
c ⫺ d
c ⫺ d
⫽ ⫽ I J I J 2 1 ⫺ t
c ⫹ d 1 ⫺ ∑ i⫽1 pi⫹ ⫺ ∑ j⫽1 p⫹j 2 ⫹ ∑ i⫽1 ∑ j⫽1 pij 2 (6.40)
Somers’ coefficient dY ,X ⫽
c ⫺ d
c ⫺ d
c ⫺ d ⫽ ⫽ I Y X XY 1 ⫺ t ⫺ t
c ⫹ d ⫹ t 1 ⫺ ∑ i⫽1 pi⫹2
(6.41)
Here, tX is the probability that two randomly selected observations are tied only on X, and tXY is the probability that two randomly selected observations are tied on both X and Y.
Tests of significance A significance test quantifies the statistical significance of hypotheses. We will follow the paradigm established by Neyman-Pearson to posit significance tests. In establishing a significance test, the probability distributions are grouped into two aggregates, one of which is called the null hypothesis, denoted by H0, and the other of which is called the alternative hypothesis, denoted as HA (Rice 1988). Null hypotheses often specify, or partially specify, the value of a probability distribution (Rice 1988). The acceptance area is the area under the probability density curve of the distribution specified by the null hypothesis. The rejection area is the area under the probability density curve of the distribution specified by the alternative hypothesis. There are two types of significance tests: one-tailed and two-tailed. A one-tailed test is one that specifies the rejection area under the only tail
Data and Statistics Overview
of the probability density curve of the test statistics. A two-tailed test is one that specifies the rejection area under the two tails of probability density curve of the distribution of the test statistics (Mendenhall and Beaver 1991). For instance, a null hypothesis may state that the probability of getting ten successes in one hundred trials is 0.1. The alternative hypothesis in a one-tailed test may state that the probability of getting ten successes in one hundred trials is less than 0.1. A two-sided alternative hypothesis may state that the probability of getting ten successes in one hundred trails is less than 0.1 or greater than 0.1. According to the Neyman-Pearson paradigm, a decision as to whether or not to reject H0 in favor of HA is made on the basis of T(X), where X denotes the sample values and T(X) is a suitable statistic (Rice 1988). This decision is affected by the error tolerance, which is defined by either error of type I or error of type II. A type I error consists in rejecting H0 when it is true. The probability of rejecting H0 when it is true is denoted as , called the significance level. In a one-tailed test, the probability of T(X) exceeding the critical statistic T*(X) is . The confidence level is (1⫺ ) and the 100(1⫺ ) percent confidence interval is T(X) ⬎ T*(X). In a two-tailed test, the probability of T(X) exceeding critical T*(X) is /2 and the probability of T(X) not exceeding critical ⫺T*(X) is /2 . A type II error occurs when we accept H0 when it is false. The probability of accepting H0 when it is false is denoted as . The power of a test is 1⫺.
Z Test In a Z test, the test statistic T(X) is defined as follows, Z⫽
X⫺
(6.42)
where is the mean of X, and is the standard deviation of X, where X is assumed to follow a normal distribution. A Z table shows the critical Z scores at a pre-specified significance level . If we assume a pre-specified of 0.05 and a two-tailed test, then the shaded area under the probability density curve is 0.5 subtracted by /2 . This area, 0.475 (or 47.5%), corresponds to a Z score of 1.96, as shown in Figure 6-2.
t Test In a t test, T(X) is called a t score and is a statistic used to determine whether a null hypothesis can be rejected. When we conduct a t test, we assume that the data has a Student’s t distribution. The t score is given by
131
132
Data Mining and Market Intelligence
0 z
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0
0.00 0.0000 0.0398 0.0793 0.1179 0.1554 0.1915 0.2257 0.2580 0.2881 0.3159 0.3413 0.3643 0.3849 0.4032 0.4192 0.4332 0.4452 0.4554 0.4641 0.4713 0.4772 0.4821 0.4861 0.4893 0.4918 0.4938 0.4953 0.4965 0.4974 0.4981 0.4987
0.01 0.0040 0.0438 0.0832 0.1217 0.1591 0.1950 0.2291 0.2611 0.2910 0.3186 0.3438 0.3665 0.3869 0.4049 0.4207 0.4345 0.4463 0.4564 0.4649 0.4719 0.4778 0.4826 0.4864 0.4896 0.4920 0.4940 0.4955 0.4966 0.4975 0.4982 0.4987
0.02 0.0080 0.0478 0.0871 0.1255 0.1628 0.1985 0.2324 0.2642 0.2939 0.3212 0.3461 0.3686 0.3888 0.4066 0.4222 0.4357 0.4474 0.4573 0.4656 0.4726 0.4783 0.4830 0.4868 0.4898 0.4922 0.4941 0.4956 0.4967 0.4976 0.4982 0.4987
0.03 0.0120 0.0517 0.0910 0.1293 0.1664 0.2019 0.2357 0.2673 0.2967 0.3238 0.3485 0.3708 0.3907 0.4082 0.4236 0.4370 0.4484 0.4582 0.4664 0.4732 0.4788 0.4834 0.4871 0.4901 0.4925 0.4943 0.4957 0.4968 0.4977 0.4983 0.4988
0.04 0.0160 0.0557 0.0948 0.1331 0.1700 0.2054 0.2389 0.2704 0.2995 0.3264 0.3508 0.3729 0.3925 0.4099 0.4251 0.4382 0.4495 0.4591 0.4671 0.4738 0.4793 0.4838 0.4875 0.4904 0.4927 0.4945 0.4959 0.4969 0.4977 0.4984 0.4988
0.05 0.0199 0.0596 0.0987 0.1368 0.1736 0.2088 0.2422 0.2734 0.3023 0.3289 0.3531 0.3749 0.3944 0.4115 0.4265 0.4394 0.4505 0.4599 0.4678 0.4744 0.4798 0.4842 0.4878 0.4906 0.4929 0.4946 0.4960 0.4970 0.4978 0.4984 0.4989
0.06 0.0239 0.0636 0.1026 0.1406 0.1772 0.2123 0.2454 0.2764 0.3051 0.3315 0.3554 0.3770 0.3962 0.4131 0.4279 0.4406 0.4515 0.4608 0.4686 0.4750 0.4803 0.4846 0.4881 0.4909 0.4931 0.4948 0.4961 0.4971 0.4979 0.4985 0.4989
0.07 0.0279 0.0675 0.1064 0.1443 0.1808 0.2157 0.2486 0.2794 0.3078 0.3340 0.3577 0.3790 0.3980 0.4147 0.4292 0.4418 0.4525 0.4616 0.4693 0.4756 0.4808 0.4850 0.4884 0.4911 0.4932 0.4949 0.4962 0.4972 0.4979 0.4985 0.4989
0.08 0.0319 0.0714 0.1103 0.1480 0.1844 0.2190 0.2517 0.2823 0.3106 0.3365 0.3599 0.3810 0.3997 0.4162 0.4306 0.4429 0.4535 0.4625 0.4699 0.4761 0.4812 0.4854 0.4887 0.4913 0.4934 0.4951 0.4963 0.4973 0.4980 0.4986 0.4990
0.09 0.0359 0.0753 0.1141 0.1517 0.1879 0.2224 0.2549 0.2852 0.3133 0.3389 0.3621 0.3830 0.4015 0.4177 0.4319 0.4441 0.4545 0.4633 0.4706 0.4767 0.4817 0.4857 0.4890 0.4916 0.4936 0.4952 0.4964 0.4974 0.4981 0.4986 0.4990
Figure 6-2 Z table.
t⫽
X⫺X , s
(6.43)
where X is the mean of X and s is the standard deviation of X. A t table shows the critical t score at a prespecified significance level , parametrically in the number of degrees of freedom. If we assume a prespecified of 0.05 and a two-tailed test, then the shaded area under the
Data and Statistics Overview
probability density curve is 0.5 subtracted by /2 . To identify the critical t value, we need two pieces of information, the degrees of freedom, n⫺1, and the pre-specified significance level . If we assume that the total number of observations in the sample is 30 and is 0.05, the number of degrees of freedom of the sample is 29 and /2 is 0.025. With this information, in Figure 6-3 we identify the critical t value as 2.0452.
t(p,df) df \p 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 inf
Figure 6-3 t table.
0.4 0.3249 0.2887 0.2767 0.2707 0.2672 0.2648 0.2632 0.2619 0.2610 0.2602 0.2596 0.2590 0.2586 0.2582 0.2579 0.2576 0.2573 0.2571 0.2569 0.2567 0.2566 0.2564 0.2563 0.2562 0.2561 0.2560 0.2559 0.2558 0.2557 0.2556 0.2533
0.25 1.0000 0.8165 0.7649 0.7407 0.7267 0.7176 0.7111 0.7064 0.7027 0.6998 0.6974 0.6955 0.6938 0.6924 0.6912 0.6901 0.6892 0.6884 0.6876 0.6870 0.6864 0.6858 0.6853 0.6849 0.6844 0.6840 0.6837 0.6834 0.6830 0.6828 0.6745
0.1 3.0777 1.8856 1.6377 1.5332 1.4759 1.4398 1.4149 1.3968 1.3830 1.3722 1.3634 1.3562 1.3502 1.3450 1.3406 1.3368 1.3334 1.3304 1.3277 1.3253 1.3232 1.3212 1.3195 1.3178 1.3163 1.3150 1.3137 1.3125 1.3114 1.3104 1.2816
0.05 6.3138 2.9200 2.3534 2.1318 2.0150 1.9432 1.8946 1.8595 1.8331 1.8125 1.7959 1.7823 1.7709 1.7613 1.7531 1.7459 1.7396 1.7341 1.7291 1.7247 1.7207 1.7171 1.7139 1.7109 1.7081 1.7056 1.7033 1.7011 1.6991 1.6973 1.6449
0.025 12.7062 4.3027 3.1825 2.7765 2.5706 2.4469 2.3646 2.3060 2.2622 2.2281 2.2010 2.1788 2.1604 2.1448 2.1315 2.1199 2.1098 2.1009 2.0930 2.0860 2.0796 2.0739 2.0687 2.0639 2.0595 2.0555 2.0518 2.0484 2.0452 2.0423 1.9600
0.01 31.8205 6.9646 4.5407 3.7470 3.3649 3.1427 2.9980 2.8965 2.8214 2.7638 2.7181 2.6810 2.6503 2.6245 2.6025 2.5835 2.5669 2.5524 2.5395 2.5280 2.5177 2.5083 2.4999 2.4922 2.4851 2.4786 2.4727 2.4671 2.4620 2.4573 2.3264
0.005 63.6567 9.9248 5.8409 4.6041 4.0321 3.7074 3.4995 3.3554 3.2498 3.1693 3.1058 3.0545 3.0123 2.9768 2.9467 2.9208 2.8982 2.8784 2.8609 2.8453 2.8314 2.8188 2.8073 2.7969 2.7874 2.7787 2.7707 2.7633 2.7564 2.7500 2.5758
0.0005 636.6192 31.5991 12.9240 8.6103 6.8688 5.9588 5.4079 5.0413 4.7809 4.5869 4.4370 4.3178 4.2208 4.1405 4.0728 4.0150 3.9651 3.9216 3.8834 3.8495 3.8193 3.7921 3.7676 3.7454 3.7251 3.7066 3.6896 3.6739 3.6594 3.6460 3.2905
133
134
Data Mining and Market Intelligence
Experimental design The main objective of experimental design is to ensure the validity of the conclusions from a study or survey. Experimental design is used to avoid study or survey design flaws that may skew the results. An experimental design is a process that seeks to discover the functional forms that relate the independent (predictive) variables and the dependent (outcome) variables in a study (Green, Tull, and Albaum 1988). Depending on the level of information available, an experimental design aims to accomplish any of the following tasks. ● ● ●
Getting numeric parameter estimates only if the statistical function form is already known Building a model if the statistical function form is unknown Identifying relevant variables (independent and dependent) if the statistical functional form is known but the variables are unknown.
Experimental design terminology The following is a list of frequently used terms in experimental design (Green, Tull, and Albaum 1988). ● ● ●
●
● ●
●
● ●
Units: Units are individuals, subjects or objects. Treatments: Treatments are the independent (or predictive) variables in an experimental design, calibrated to observe potential causality. Control units: These are objects, individuals, or subjects that are not subjected to any treatment. A group that consists of control units is called a control group. Test units: These are objects, subjects, or individuals that are given a particular treatment. A group that consists of test units is a treatment or test group. Natural experiment: An experiment that requires minimum intervention and no calibration of variables. Controlled experiment: An experiment that requires an investigator ’s intervention and calibration of variables to discover a causality effect. Two kinds of interventions are necessary: random placement of subjects into a control or a treatment group, and calibration of at least one assumed to be a causal variable. Experiments that meet both intervention conditions are called true experiments. Quasi-experiment: Experiments that contain manipulation of at least one assumed causal variables, but do not have random assignment of subjects into control or experiment group. Block: A block is a group of similar units of which roughly equal numbers of units are assigned to each treatment group. Replication: Replication is the creation of repeated measurements in a control or treatment group.
Data and Statistics Overview
● ●
● ● ●
Completely randomized design: This is a design where test units are assigned experimental treatments on a random basis. Full factorial design: This type of design assigns an equal number of observations to all combinations of the treatments involving at least two levels of at least two variables. Latin square design: Latin square design is a technique for reducing the number of observations required in a full factorial design. Cross-over design: In this design, different treatments are applied to the same test unit in different periods of time. Randomized-block design: This design is usually used when a researcher needs to eliminate a possible source of error. In this case, each test unit is regarded as a ‘block’ and all treatments are applied to each of these blocks.
■ References Green, P.E., D.S. Tull, and G. Albaum. Research for Marketing Decisions, 5th ed. Prentice Hall, Englewood Cliffs, New Jersey, 1988. Gujarati, D.N. Basic Econometrics, 2nd ed. McGraw-Hill, New York, 1988. Liebetrau, A.M. Measures of Association. Quantitative Applications in the Social Sciences, Sage Publications, Thousand Oaks, CA, 1983. Mead, R. The Design of Experiments – Statistical Principles for Practical Application. Cambridge University Press, New York, 1988. Mendenhall, W., and R. Beaver. Introduction to Probability and Statistics, 8th ed. PWS-Kent Publishing Company, Boston, MA, 1991. Rice, J.A. Mathematical Statistics and Data Analytics, Statistics/Probability Series. Wadsworth & Brooks/Cole, Belmont, CA, 1988.
135
This page intentionally left blank
CHAPTER 7
Introduction to Data Mining
This page intentionally left blank
A wide variety of data mining approaches have been developed to address a broad spectrum of business problems. Techniques such as logistic regressions are used for building targeting models, and approaches such as association analysis are used for building cross-sell or up-sell models. Effective use of data mining to identify potential revenue opportunities along the sales pipeline may result in higher returns on investment and the creation of a competitive advantage. The objective of this chapter is to introduce the fundamentals of the most commonly used data mining techniques. Chapters 8–10 discuss several case studies based on some of these techniques.
■ Data mining overview We define data mining in terms of ● ●
The use of statistical or other analytical techniques to process and analyze raw data to find meaningful patterns and trends The extraction and use of meaningful information and insight to produce actionable business recommendations and decisions
The focal point of effective data mining is to analyze data in order to make actionable business recommendations. Without the latter, data mining is an intellectual exercise with no real life application. In our experience, insufficient focus on actionable business recommendations is often the main reason that data mining may not have been as widely adopted by some organizations as would have been desirable. Data mining can be applied to the solution of a broad range of business problems. The following is a list of standard applications of data mining techniques. ●
Development of customer segmentation Customer segmentation and profiling analysis constitutes the first step toward understanding the target audience. Understanding of the target audience drives effective advertising, offers, and messaging. As we pointed out in Chapter 5, marketing plan objectives determine a variety of segmentation types. For example, if the marketing plan calls for the creation of segments with differentiated needs, then needbased segmentation is required. Common segmentation types are: – Need-based segmentation – Demographics-based segmentation – Value-based segmentation – Product purchase-based segmentation – Profitability-based segmentation
140
Data Mining and Market Intelligence
●
●
●
●
●
●
●
●
●
Customer profiling Profiling analysis creates descriptions of segments by their unique characteristics and attributes. For instance, a segment profile may consist of attributes such as an age range of thirty-five and older, and an annual household income of $75,000 or higher. New customer acquisition New customer acquisition can be costly. Predictive targeting models built with data mining techniques allow us to effectively target prospects with the greatest propensity of converting to customers. Minimization of customer attrition or churn Attrition among existing customers results in immediate revenue loss. Data mining can be used to predict future attrition and mitigate customer defection by understanding the factors responsible for attrition. Maximization of conversion Increase in conversions of responders to leads and leads to buyers can expedite the sales process. High conversion is one of the keys to high investment returns. Data mining can help understand the primary conversion drivers. Cross-selling and up-selling of products Experience shows that it is much more expensive to generate revenues from new customers than from existing customers. The main reason is that building relationships and trust is costly and time-consuming. Data mining can be used to build models for quantifying additional products and services sales to existing customers. Personalization of messages and offers Personalized messages and offers tend to solicit higher response rates from the target audience than generic ones. Data mining techniques such as collaborative filtering can be used to create real-time, personalized offers and messages. Inventory optimization Data mining facilitates the determination of more accurate forecasts of inventory needs, avoiding unnecessary waste due to over or under stockpiling of inventory. Forecasting marketing program performance It is a common need of firms to forecast revenues, responses, leads, and web traffic. Data mining techniques such as time series and multivariate regression analyses can be applied to address such forecasting needs. Fraud detection The federal government of the United States was an early adopter of data mining technology. As part of the investigation of the Oklahoma bombing crime in 1995, the FBI used data mining analysis to sift through thousands of reports submitted by agents in the field looking
Introduction to Data Mining
for connections and possible leads (Berry and Linoff 1997). In the credit industry, customer fraud can cause significant financial damage to lenders. Predictive modeling can be applied to address this issue by modeling the probability of fraud at an individual level.
■ An effective step by step data mining thought process Figure 7-1 illustrates a step-by-step thought process for data mining.
Step 1 Identify business objectives and goals
Step 5 Identify data sources
Step 2
Step 3
Determine key business areas and metrics to focus
Translate business issues to technical problems
Step 6 Perform analysis
Step 4 Select appropriate data mining techniques and software tools
Step 7 Translate Analytical results into actionable business recommendations
Figure 7-1 Effective data mining thought process.
Step one: identification of business objectives and goals The first step of the process is to identify the objectives and the goals of a marketing effort. Objectives are something defined at a more abstract level and in a less quantitative manner than goals, which are usually quantifiable. For example, a business objective may be to increase sales of the current fiscal year and the goal may be to increase the sales of the
141
142
Data Mining and Market Intelligence
current fiscal year by 15% given the same amount of investment as that of the last fiscal year.
Step two: determination of the key focus business areas and metrics Once the objectives and goals of a marketing effort have been identified, the next step is to determine on which business areas to focus and what metrics to use for measuring returns. For instance, incremental marketing returns may come from the existing customer base, from new customers, from an increase in the efficiency of marketing operations, or from a reduction in the number of fraud cases. Consider an online publisher whose main revenue source is advertising. Advertising revenue is based on cost per thousand impressions (CPM). The cost of having an ad exposed to the one million impressions is $1000 at a CPM of $1. Assume the publisher has as an objective to increase traffic (impressions) to his web site and a goal of increasing his advertising revenue by 15% over the next three months. There are several business areas that the publisher can focus on to accomplish his goal. The following are four examples of marketing efforts that the publisher may consider. ● ● ● ●
An increase in his investment in search marketing to drive traffic to his site Advertisement of his site via online banners on others’ sites Launching a promotional activity such as a sweepstake on the radio to drive traffic to his site Deployment of an online blog on his site to increase traffic volume.
The choices are numerous and the publisher needs to select a main focus by assessing the advantages and disadvantages of each option. The metrics used to measure the success of a marketing effort need to be consistent with the business goals. In this example, if the goal is to increase advertising revenue by 15% over the next three months then the appropriate return metric is clearly the advertising revenue.
Step three: translation of business issues into technical problems The third step in an effective data mining thought process is the translation of business issues into technical problems. Wrong translation will lead to waste of resources and opportunities. Continuing with the online
Introduction to Data Mining
publisher example: If the focus business area is to advertise on other web sites to promote his own, the publisher needs to determine which of those sites are most appropriate. As an example of a technical answer to the question of where the publisher can place his advertisement, consider the following approach. The publisher can segment the traffic to his web site by referral sources. A referral site or source is the origination site that leads particular visitors to the publisher ’s site. Some visitors may have arrived at the publisher ’s site via a Google search or a Yahoo search. In this case, either the Google or the Yahoo search is the referral source. The publisher can then determine which of the referral sources are the best traffic sources based on traffic volume, traffic growth, visitor profile, and cost. He can then emphasize investment in the most effective referral sources and actively look for new referral sources with similar characteristics. All of the above analysis requires a web analytic tool. Therefore, expertise in the selection, deployment, and creation of reports from a web analytic tool needs to be brought into the analysis process.
Step four: selection of appropriate data mining techniques and software tools Data mining techniques are based on analytic methods or algorithms, such as logistic regressions, and decision trees. Data mining software is an application that implements data mining techniques such that the user does not need to write the data mining algorithms he uses. SAS Enterprise Miner, IBM Intelligent Miner, SPSS Clementine, and Knowledge STUDIO are examples of data mining software.
Step five: identification of data sources Once the appropriate data mining technique and software have been established, proper data sources need to be identified to effectively leverage data mining. For example, historical customer purchase data is required for conducting cross-sell or up-sell analysis. The following is a list of common data sources for data mining (Rud 2001). ●
Internal sources – Customer databases – Transaction databases – Marketing history databases – Solicitation mail and phone tapes – Data warehouse
143
144
Data Mining and Market Intelligence
●
External sources – Third-party list rentals – Third-party data appends Additional data sources are:
● ● ● ● ● ● ● ● ● ● ● ●
Enterprise resource planning (ERP) systems Point of sales (POS) systems Financial databases Customer relationship management (CRM) systems Supply chain management (SCM) systems, such as SAP and People Soft Marketing research and intelligence databases Campaign management systems Advertising servers E-mail delivery systems Web analytic systems Web log files Call center systems
After the appropriate data sources are identified, it is essential for make sure that the data is cleansed and standardized in preparation for data mining analysis.
Step six: conduction of analysis There are three stages in data mining analysis: modeling building, model validation, and real life testing. A comprehensive analysis must include all three stages. Skipping the model validation and real life testing stages may increase the risk of rolling out unstable models. The data needs to be divided into two subsets, one for model building and the other for validation.
Model building A subset of the available data is used for model building. A common practice is to use 50–70% of the data for model building and the remaining data for model validation. In general, several models are built and the best ones are chosen based on statistics measuring model effectiveness. If the R-squared (R2) statistic is used to evaluate the effectiveness of multiple regression models, then the model with the highest R2 is selected.
Model validation After a subset of the data has been used for model building, the remaining data is used for validation. One common mistake is to build models and validate them on the same set of data. This is a serious error that can
Introduction to Data Mining
artificially inflate the power of the models and make validation results look much better than they actually are. It is very important to conduct out-of-sample (sample referring to the subset of data used for model building) validation. If a model works well with the validation test, it will likely be successful in real-life tests.
Real-life testing The best way to tell if a model works is to try it out with a small-scale real life test. To test a targeting model, marketing promotions need to be sent to both a control group and a test group. A control group consists of a random selection of all available prospects. A test group consists of the prospects that the model predicts are more likely to respond. A comparison in the response rates between the test group and the control group provides insights into the robustness of a targeting model. If the test group has a higher response rate than that of a control and the result is statistically significant, then the model is robust. If the test fails, close examination of the model needs to be conducted to understand why the model does not work in a real-life situation. Real-life testing is a roll out prerequisite.
Step seven: translation of analytical results into actionable business recommendations Inferring actionable business recommendations from model results requires explaining the main conclusions of the analysis in nontechnical terms. Throughout the various case studies from Chapter 8 to Chapter 10, we provide specific examples on how to translate analytical results into actionable business recommendations.
■ Overview of data mining techniques The foundations for the development of different data mining techniques are statistical objectives and available data types. There are two common statistical objectives, analysis of dependence and analysis of interdependence (Dillon and Goldstein 1984): ● ●
Analysis of dependence: This type of analysis is used to explore relationships between dependent variables and independent variables. Analysis of interdependence: This method is used to explore relationships among independent variables.
The two common underlying data types of independent and dependent variables are metric and nonmetric.
145
146
Data Mining and Market Intelligence
Basic data exploration A preliminary data exploration is required before building any sophisticated data mining models. Occasionally, a basic data exploration is sufficient to address a business question. For instance, by regarding web traffic as a time series, we may be able to identify visible spikes in traffic pattern that coincide with particular marketing activities. By plotting one customer attribute such as income against customer behavior such as grocery purchases, we might spot a distinct pattern of correlation between income and grocery shopping. Data exploration should always be the first step before building any model. Variables that appear interesting and relevant at the data exploration stage will likely show up as significant contributors in the final model.
Linear regression analysis Regression analysis is a technique for quantifying the dependence between dependent and independent variables. A particular type of regression analysis, linear regression, is most frequently used in data mining. In its more general formulation, linear regression establishes a linear relationship between the dependent variables and the so-called regression parameters, with the independent variables appearing in nonlinear functional forms. A particularly popular form of linear regression occurs when the relationship between dependent and independent variables is itself linear. In this case, linear regression with a single independent variable is called simple linear regression, and regression with several independent variables is called multiple linear regression. The linear technique is widely used to predict a single dependent variable (outcome variable) with one or multiple independent variables (predictive variables). Linear regression problems are addressed with Ordinary Least Squares (OLS) and Maximum Likelihood Estimation (MLE).
Simple linear regression In simple linear regression, there is a single dependent variable and a single independent variable. There is an implied linear relationship between the two variables. Figure 7-2 is a graphical representation of this relationship. The mathematical formula for simple regression is Yi ⫽ 0 ⫹ 1 Xi ⫹ i
(7.1)
where i ⫽ 1, 2, 3, 4, …., n, Xi is the value of independent variable X, and Yi is the value of dependent variable Y. 0 and 1 are regression parameters and i is the error term. The number of degrees of freedom is n ⫺ 1.
Introduction to Data Mining
Y dy (xi,yi)
dx
εi dy
β1
dx
(0,β0) X
Figure 7-2 Illustrating the linear relationship between X and Y.
The estimator of Yi is denoted as Yˆi and is called the regression line. If ˆ0 is the estimator of 0 and ˆ1 is the estimator of 1, the regression line is given by the following expression Yˆi ⫽ ˆ0 ⫹ ˆ1 Xi
(7.2)
The estimator of 1 is (Neter, Wasserman, and Kutner 1990)
∑ i⫽1 (Xi ⫺ X )(Yi ⫺ Y ) i⫽n ∑ i⫽1 (Xi ⫺ X )2 i⫽n
1 ⫽
(7.3)
where X and Y are the means of X and Y, respectively. The variance of ˆ1 is var( 1 ) ⫽
σ2
∑ i⫽1 (Xi ⫺ X )2
where is the variance of variable Y.
i⫽n
(7.4)
147
148
Data Mining and Market Intelligence
The estimator of 0 can be shown to be (Neter, Wasserman, and Kutner 1990) ˆ0 ⫽ Y ⫺ ˆ1 X
(7.5)
The variance of ˆ0 is 2 ∑ i⫽1 Xi i⫽n
s2 (ˆ0 ) ⫽
2
(7.6)
n∑ i⫽1 (Xi ⫺ X )2 i⫽n
A significance test using t statistics can be applied to determine whether ˆ0 and ˆ1 are statistically significant. The t statistic for ˆ1 is t(ˆ0 ), expressed as ˆ0 / S(ˆ0 ). In the case where a 95% confidence level is used for the test, if t(ˆ0 ) is greater than t0.025 , n⫺2 (ˆ0 ) or less than −t0.025 , n⫺2 (ˆ0 ) , then ˆ is statistically significant. The 95% confidence 0 interval of ˆ0 is ⎡ i⫽n X ⎢ˆ ∑ i⫽1 i ⎢ 0 ⫺ t0.025 , n⫺2 ⫻ , ˆ0 ⫹ t0.025 , n⫺2 i⫽n ⎢ 2 ⫺ ( ) n X X ∑ i⫽1 i ⎢⎣ ⎤ i⫽n ⎥ ∑ i⫽1 Xi ⎥ ⫻ i⫽n n∑ i⫽1 (Xi ⫺ X )2 ⎥⎥ ⎦
The t statistic for ˆ1 is t(ˆ1) , expressed as t(ˆ1 ) ⫽
ˆ1 s(ˆ1)
(7.7)
In the case where a 95% confidence level is used for the test, if t(ˆ1) is greater than t0.025 , n⫺2 (ˆ1) or less than ⫺ t0.025 , n⫺2 (ˆ1) , then ˆ1 is statistically significant. The 95% confidence interval of ˆ is given by 1
⎡ ⎛ ⎜ ⎢ ⎢ ⫺ ⎜⎜⎜ t ⎢ 1 ⎜ 0.025 , n⫺2 ⫻ ⎜⎜ ⎢ ⎝ ⎢⎣
⎞⎟ ⎛ ⎜⎜ ⎟⎟ ⎟⎟ , 1 ⫹ ⎜⎜ t0.025 , n⫺2 ⫻ ⎜⎜ i⫽n ⎟ ⎜⎝ ∑ i⫽1 (Xi ⫺ X )2 ⎟⎟⎠ σ
⎞⎟⎤ ⎟⎟⎥⎥ ⎟⎟⎥ i⫽n 2⎟ ⎟⎟⎥ ⫺ X X ( ) ∑ i⫽1 i ⎠⎥⎦
Introduction to Data Mining
Key assumptions of linear regression Simple linear regression relies on four key assumptions that need to be satisfied for conclusions to apply (Neter, Wasserman, and Kutner 1990). Assumption 1: The mean of error term i, conditional on Xi is zero. E( i | Xi ) ⫽ 0 where i ⫽ 1, 2, 3, 4, …, n. Assumption 2: The covariance between the error terms, i⬘s, is zero. cov( i , j ) ⫽ 0 where i ⫽ j, i ⫽ 1, 2, 3, 4, …, n, and j ⫽ 1, 2, 3, 4, …, n. Assumption 3: The variance of i is constant (a situation referred to as homoscedasticity.) var( i ) ⫽ i 2 ⫽ j 2 ⫽ var( j ) where i ⫽ j, i ⫽ 1, 2, 3, 4, …, n, and j ⫽ 1, 2, 3, 4, …, n. Assumption 4: The covariance between Xi and i is zero, namely, cov(Xi, i) ⫽ 0 where i ⫽ 1, 2, 3, 4, …, n.
Multiple linear regression In multiple linear regression, there is a single dependent variable and more than one independent variable. We can describe a multiple regression model with p ⫺ 1 independent variables as follows. Yi ⫽ 0 ⫹ 1 X1i ⫹ 2 X 2i ⫹ 3 X 3 i ⫹ ⋯ ⫹ p⫺1 X p⫺1, i ⫹ i
(7.8)
for i ⫽ 1, 2, 3, 4, …, n, and p 艌 3. We can also express a multiple regression equation in matrix form (Dillon and Goldstein 1984). ⎛ Y1 ⎞⎟ ⎛⎜ 0 ⫹ 1 X11 ⫹ 2 X 21 ⫹ 3 X 31 ⫹ ⋯ ⫹ p⫺1 X p⫺1, 1 ⫹ 1 ⎞⎟ ⎟⎟ ⎜⎜ ⎟ ⎜ ⎟ ⎜⎜ Y ⎟⎟ ⎜⎜ ⫹ X ⫹ X ⫹ X ⫹ ⋯ ⫹ X ⫹ 1 12 2 22 3 32 p⫺1 p⫺1, 2 2 ⎟ ⎟⎟ ⎜⎜ 2 ⎟⎟⎟ ⎜⎜ 0 ⎟⎟ . ⎜⎜ . ⎟⎟ ⫽ ⎜⎜ ⎟⎟ ⎜⎜ . ⎟⎟ ⎜⎜ . ⎟⎟ ⎜⎜ . ⎟⎟ ⎜⎜ ⎟⎟ . ⎜⎜⎝ Y ⎟⎟⎠ ⎜⎜ ⫹ X ⫹ X ⫹ X ⫹ ⋯ ⫹ X ⎟ ⎜⎝ 0 n 1 1n 2 2n 3 3n p⫺1 p⫺1, n ⫹ n ⎟ ⎠
(7.9)
149
150
Data Mining and Market Intelligence
Equation 7.9 can be expanded to the following. ⎛ Y1 ⎞⎟ ⎛⎜⎜ 1 X11 ⎜⎜ ⎟ ⎜ ⎜⎜ Y ⎟⎟ ⎜⎜ 1 X 12 ⎜⎜ 2 ⎟⎟⎟ ⎜⎜ ⎜⎜ . ⎟⎟ = ⎜⎜ . . ⎜⎜ . ⎟⎟ ⎜⎜ . . ⎜⎜ . ⎟⎟ ⎜⎜ . . ⎜⎝⎜ Y ⎟⎟⎠ ⎜⎜ n ⎜⎝ 1 X1n
X 21
X 31
.
.
.
.
.
X 22 . . . X2n
X 32 . . . X3 n
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
X p⫺1, 1 ⎞⎟ ⎛ 0 ⎞ ⎛ ⎞ ⎟⎟ ⎜ 1 ⎟⎟ ⎟⎟ ⎜⎜ ⎟⎟ ⎜⎜ ⎟ ⎟ ⎜⎜ ⎟ X p⫺1, 2 ⎟⎟ ⎜⎜ 1 ⎟⎟ ⎜⎜ 2 ⎟⎟⎟ ⎟ ⎜ ⎟ . ⎟⎟⎟ ⎜⎜ . ⎟⎟⎟ ⫹ ⎜⎜ . ⎟⎟ ⎜ ⎟ ⎟⎟ ⎜⎜ . ⎟⎟⎟ . . ⎟⎟ ⎜⎜ ⎜ . ⎟⎟⎟ ⎜⎜ . ⎟⎟⎟ ⎜⎜ . ⎟⎟⎟ ⎜ ⎟ ⎟ ⎟ X p⫺1, n ⎟⎠ ⎜⎝ p⫺1 ⎠ ⎜⎝ n ⎟⎠
(7.10)
In matrix form, this can be written as Y ⫽ X ⫹
(7.11)
The estimated parameter matrix is ⫽ (X ′X )⫺1 X ′Y
(7.12)
where X and Y are matrices whose entries are the realized values of the corresponding random variables. The covariance matrix of the estimated matrix is s2 ( ′) ⫽
1 (Y ′Y ⫺ ′X ′Y )(X ′X )⫺1 n⫺ p
(7.13)
The standard deviation matrix of the estimated matrix is s( ) , a diagonal matrix whose entries are the square root of the diagonal matrix elements of s2 ( ) . The 95% confidence interval of the estimated matrix is as follows [ ⫺ t0.025 , n⫺p⫹1 ⫻ s( ), ⫹ t0.025 , n⫺p⫹1 ⫻ s( )]
Goodness of fit measure R2 and the F statistic The term R2 is also called the multiple coefficient of determination. R2 measures the total variance of the sample data explained by the regression model, and is given by the ratio of the variance explained by the multiple regression (Sum of Squares of Regression, or SSR) and the total variance (Total Sum of Squares, or TSS). The values of R2 range from zero to one. The difference between TSS and SSR is called Sum of Squares of Errors, or SSE. The higher R2 is, the higher the fraction of the sample variance explained by the linear regression model. R2 is given by the expression R2 ⫽
∑ ( y ⫺ y )2 ∑ ( y i ⫺ y )2
⫽
SSR TSS ⫺ SSE SSE ⫽ ⫽ 1⫺ TSS TSS TSS
(7.14)
Introduction to Data Mining
As the number of independent variables increases in a particular model, the coefficient of determination tends to increase. When an increase in R2 is due to the increase in the number of independent variable rather than the incremental explanatory power of the additional independent variables, the model power is inflated. To avoid adoption of a model with inflated explanatory power, the adjusted R2 can be used instead of R2. Adjusted R2 = 1 ⫺ (1 ⫺ R2 )
n⫺1 n⫺ p⫺1
(7.15)
where p is the number of independent variables. The F statistic is another statistical measure of the robustness of multiple regression. The F statistic is obtained by dividing Mean Squared Regression (MSR) by Mean Squared Error (MSE) (Neter, Wasserman, and Kutner 1990). MSR is defined as SSR divided by the degrees of freedom, p. MSE is SSE divided by its degrees of freedom, n ⫺ p ⫺ 1. F⫽
SSR/ p MSR ⫽ MSE SSE /(n ⫺ p ⫺ 1)
(7.16)
We reject the null hypothesis that all the parameter estimates are zeros if the F statistic of a multiple regression model is greater than the critical F value, F1– ,p,n–p–1 where is the significance level. Additional regression techniques have been developed over the years to facilitate selection of independent variables. These techniques include backward, forward, and stepwise selection methods. Chapter 12 of the book Applied Statistical Methods (Neter, Wasserman, and Kutner 1990) has an in-depth discussion of these methods.
Cluster analysis Cluster analysis is used to uncover interdependence between members of a sample. In this context, by members we mean objects of study, such as individuals or products, and by sample we mean the collection of such individuals or products used for conducting the study. By member attributes, we mean variables that describe the features and characteristics of members. For instance, age and income are member attributes of individuals. Through cluster analysis, members with similar values in the variables under analysis are grouped into clusters. Each member can only belong to one cluster. Cluster analysis is widely used in customer segmentation. Identification of members with similar characteristics is the key to cluster analysis. The following section provides an overview on how to measure similarity between members of a sample.
151
152
Data Mining and Market Intelligence
Measurement of similarity between sample members Comprehension of the concept of similarity is the key to the understanding of cluster analysis. Various criteria, such as distance and correlation, can be used to measure similarity between sample members. In this chapter, we will focus on distance as a similarity measure of members. The Euclidean distance (Dillon and Goldstein 1984) measures the distance between two sample members, i and j, with p attributes. The shorter the distance is, the more similar the two members are to each other. If we assume that the realized values of the attributes of a sample member i can be represented by vector Xi = (X1i , X 2i , X 3 i , … , X pi ) and the values of the attributes of member j can be represented by a vector X j = (X1 j , X 2 j , X 3 j , … , X pj ) , then the Euclidean distance between the two members is d=
k⫽p
∑ (Xki ⫺ Xkj )2
(7.17)
k⫽1
The Mahalanobis distance (Dillon and Goldstein 1984) is another method of measuring distances between members, and has the advantage over the Euclidean distance in that it takes into consideration correlation between the attributes of the members. The Mahalanobis distance, m, is defined by m = (Xi ⫺ X j )′S⫺1 (Xi ⫺ X j )
(7.18)
where S is the covariance matrix of the member attributes. Clustering techniques comprise hierarchical and partitioning methods (Dillon and Goldstein 1984). Hierarchical methods, in their turn, can be classified as either agglomerative or divisive. Agglomerative methods start out by treating each member as a cluster and then grouping members with particular similarities into clusters. Once a member is assigned to a cluster, it cannot be reassigned to another cluster. Divisive methods, commonly known as decision trees, begin by splitting the members into two or more groups. Each group can be further split into two or more subgroups, with the splitting process continuing until a preselected statistic reaches an assumed critical value.
Hierarchical agglomerative methods There are four common approaches under hierarchical agglomerative methods (Dillon and Goldstein 1984). ● ● ● ●
Nearest neighbor (single linkage) Furthest neighbor (complete linkage) Average linkage Ward’s error of sum of squares.
Introduction to Data Mining
The nearest-neighbor approach defines the distance between a member and a cluster of members as the shortest distance between the member and the cluster. The furthest neighbor approach defines the distance between a member and a cluster of members as the longest distance between the member and the cluster. The average linkage approach defines the distance between a member and a cluster of members as the average distance between the member and the cluster. This distance can be any statistical distance measure, such as Euclidean and Mahalanobis. The three approaches share the common rule that members and clusters that are close to one another are grouped into large clusters. The nearest-neighbor approach begins by grouping the two members with the shortest distance into a cluster. The approach next calculates the distance between this cluster and each of the remaining members and continues to group members and clusters that are closest to one another. The process continues until a cluster containing all members is formed. We next discuss an example of the nearest-neighbor method applied to a set of five members (A, B, C, D, and E). Figure 7-3 shows a matrix whose entries represent the distances between any two members.
A
B
C
D
E
A
0
4
65
12
9
B
4
0
45
34
10
C
65
45
0
17
22
D
12
34
17
0
12
E
9
10
22
12
0
Figure 7-3 Distance matrix illustration. As we can see in Figure 7-3, the distance between members A and B is the shortest one. Therefore, we start out by grouping these two members into the first cluster. After members A and B are grouped into one cluster, we calculate the distances between this cluster and the remaining members, as shown in Figure 7-4.
153
154
Data Mining and Market Intelligence
A
B
C
D
E
A
0
0
65
12
9
B
0
0
45
34
10
C
65
45
0
17
22
D
12
34
17
0
12
E
9
10
22
12
0
Figure 7-4 Distances after formation of the first cluster. The nearest distance between the first cluster (AB) and each of the remaining members and the distances between the remaining members are as follows (AB) and C: (AB) and D: (AB) and E: C and D: C and E: D and E:
min(65, 45) ⫽ 45 min(12, 34) ⫽ 12 min(9, 10) ⫽ 9 17 22 12
From this we infer that A, B, and E now form a cluster since nine is the shortest distance. The distance between this new cluster and the remaining members are. (ABE) and C: (ABE) and D: C and D:
min(65, 45, 22) ⫽ 22 min(12, 34, 12) ⫽ 12 17
Since 12 is the shortest distance, this means that A, B, E, and D now form a cluster and C remains as a cluster of its own. In conclusion, the nearest-neighbor method has created two final clusters. One cluster consists of members A, B, D, and E, and the other cluster contains member C only.
Introduction to Data Mining
We next apply the furthest neighbor approach to the same set of members. The distance between members A and B is the shortest so these two are grouped into the first cluster from the outset. After members A and B are grouped into one cluster, the distances between the cluster and the remaining members are calculated. (AB) and C: (AB) and D: (AB) and E: C and D: C and E: D and E:
max(65, 45) ⫽ 65 max(12, 34) ⫽ 34 max (9, 10) ⫽ 10 17 22 12
From this we infer that A, B, and E now form a cluster since ten is the shortest distance. The distance between the cluster and the remaining members are (ABE) and C: (ABE) and D: C and D:
max(65, 45, 22) ⫽ 65 max(12, 34, 12) ⫽ 34 17
This means that C and D form a cluster since seventeen is the shortest distance. In conclusion, the furthest neighbor method has created two final clusters. One cluster consists of members A, B, and E, and the other cluster contains members C and D. We next apply the average linkage approach to the same member sample. Since the distance between members A and B is the shortest, these two members are grouped into the first cluster from the outset. After members A and B are grouped into one cluster, the distances between the cluster and the remaining members are calculated and the results are as follows. (AB) and C: (AB) and D: (AB) and E: C and D: C and E: D and E:
Average(65, 45) ⫽ 55 Average(12, 34) ⫽ 23 Average(9, 10) ⫽ 9.5 17 22 12
From this we conclude that A, B, and E now form a cluster since 9.5 is the shortest distance. The distances between the cluster and the remaining members are (ABE) and C: (ABE) and D: C and D:
Average(65, 45, 22) ⫽ 44 Average(12, 34, 12) ⫽ 19.3 17
155
156
Data Mining and Market Intelligence
Members C and D form a cluster since seventeen is the shortest distance. Members A, B, and E form the other cluster. Ward’s Error Sum of Squares (Ward’s ESS) is a clustering approach that creates clusters by minimizing the sum of the within-cluster variance. The within-cluster variance is defined as (Dillon and Goldstein 1984) ⎛ nj ⎞⎟2 ⎞⎟⎟ 1 ⎜ ⎜ ESS ⫽ ∑ ⎜⎜ ∑ X 2ij ⫺ ⎜⎜ ∑ Xij ⎟⎟⎟ ⎟⎟⎟ ⎟⎠ ⎟⎟ n j ⎜⎝⎜ i=1 j⫽1 ⎜ ⎜⎝ i⫽1 ⎠ j⫽k ⎛ ⎜ nj
(7.19)
Here, Xij is the attribute value of member i in cluster j. We next discuss an example where we apply the Ward’s approach to a sample of four members (A, B, C, and D). From the outset, each member forms its individual cluster. The attribute values of the four members are as follows. A: 4 B: 10 C: 5 D: 20 We first compute the Ward’s ESS for every possible cluster that can be formed with two members in the sample. ESS of members A and B = 4 2 ⫹ 10 2 ⫺ 12 ( 4 ⫹ 10)2 ⫽ 18 ESS of members A and C ⫽ 4 2 ⫹ 52 ⫺⫹ 12 ( 4 ⫹ 5)2 ⫽ 0.5 ESS of members A and D ⫽ 4 2 ⫹ 20 2 ⫺ 12 ( 4 ⫹ 20)2 ⫽ 128 ESS of members B and C ⫽ 10 2 ⫹ 52 ⫺ 12 (10 ⫹ 5)2 ⫽ 12.5 ESS of members B and D ⫽ 10 2 ⫹ 20 2 ⫺ 12 (10 ⫹ 20)2 ⫽ 50 ESS of members C and D ⫽ 52 ⫹ 20 2 ⫺ 12 (5 ⫹ 20)2 ⫽ 112.5 Members A and C form a cluster since the Ward’s ESS of their cluster is the lowest. Next we compute the Ward’s ESS for any possible cluster that consists of the (AC) cluster and each of the remaining members. ESS of the cluster that consists of members A, C, and B is 4 2 ⫹ 10 2 ⫹ 52 ⫺ 31 ( 4 ⫹ 10 ⫹ 5)2 ⫽ 20.67 ESS of the cluster that consists of members A, C, and D is 4 2 ⫹ 52 ⫹ 20 2 ⫺ 31 ( 4 ⫹ 5 ⫹ 20)2 ⫽ 160.67 The Ward’s ESS for the cluster of members A, C, and B is smaller than that of the clusters with members A, C, and D, and B and D. Therefore,
Introduction to Data Mining
members A, B and C form a cluster, and member D remains by itself in one cluster.
Hierarchical divisive methods: AID, CHAID, and CART Hierarchical divisive methods start out by splitting a group of members into two or more subgroups and proceeds in the same splitting approach based on predetermined statistical criteria. The most common divisive methods are decision tree approaches such as Automatic Interaction Detection (AID), Chi-Square Automatic Interaction Detection (CHAID), and Classification and Regression Tree (CART). A decision tree approach starts out at the root of a tree where all members reside and splits the members into different subgroups (called branches or nodes). A tree is built in such a way that the variance of the dependent variable is maximized between groups and is minimized within groups. For instance, a group of consumers may be split into different age (independent variable) groups to maximize the variance of the household income (dependent variable) between the age groups. Figure 7-5 illustrates how a small decision tree looks like. There are two splits in the tree. The nodes that stop splitting are called terminal nodes. Nodes one, three, four, and five are terminal nodes. AID is a divisive approach that splits a group of members into binary branches. While in this approach the dependent variable needs to be metric, the independent variables can be either nonmetric or metric.
Root (Node 0) Split 1
Node 1
Node 2
Node 3
Split 2
Node 4
Figure 7-5 Decision tree output illustration.
Node 5
157
158
Data Mining and Market Intelligence
Chi-Square Automatic Interaction Detection (CHAID) is more flexible than AID in that CHAID allows a group of members to be split into two or more branches. Given its flexibility, CHAID is more widely used in data mining than AID. The following are the key characteristics of CHAID. ● ●
●
The dependent variable is usually nonmetric. The independent variables are either nonmetric or metric data with no natural zero value and no specific distribution constraints, but their possible number of groups should be no more than 15. (Struhl 1992) The chi square statistic is used to determine whether to further split a node.
We next discuss an example where CHAID is used to better understand the member ’s attributes that are highly associated with the responsiveness to a direct mail promotion. The dependent variable, a binary variable with a value of ‘Yes’ or ‘No’, denotes the existence or absence of a response to the promotion. Assume there are 10,000 individuals in the marketing sample. Among them, 6000 individuals respond and 4000 do not respond to the promotion. Therefore, the overall all response rate to the direct mail promotion is 60%. Assume there are three responder attributes (independent variables) of interest, age, income, and gender. The CHAID approach assesses all the values of age, income, and gender to create splits with significant chi square statistics. Figure 7-6 illustrates the output of CHAID, where there are four terminal codes (nodes one, three, four, and five) in the tree. Age and gender are the drivers of responsiveness. Female subjects aged between 25 and 45 are the most responsive groups, followed by male subjects in the same age group. Subjects aged over 45, regardless of gender, are the least responsive individuals. We next review another example of a CHAID application to marketing. Assume a marketing manager at a gourmet cookware store wishes to find out which types of customers are more likely to purchase a high-end BBQ grill. The store has collected three pieces of demographic information about its customers: household income, marital status of the household head, and the number of children in the household. Assume the store has five years of detailed transaction history, and there are a total of 25,000 customers in the store database. The dependent variable, existence of a past purchase of a high-end grill, is a nominal variable with two possible values: purchase or no purchase. The three independent variables are household income, marital status of the household head, and the number of children in the household. CHAID is used to split the 25,000 customers into subgroups based on household income, the marital status of the household head, and the number of children.
Introduction to Data Mining
Node 0 Total: 10,000 Responders: 6000 Non-responders: 4000 Response rate: 60% Age [25–45]
45 Node 3 Total: 2500 Responders: 1000 Non-responders: 1500 Response rate: 40%
Gender Female Node 4 Total: 4000 Responders: 3000 Non-responders: 1000 Response rate: 75%
Male Node 5 Total: 2200 Responders: 1200 Non-responders: 1000 Response rate: 55%
Figure 7-6 CHAID analysis applied to direct mail promotion responses.
Once the marketing manager understands which customer attributes are highly associated with the purchase of a BBQ grill, he can contact a list broker to rent lists of prospects with similar attributes. These prospects will form an ideal target audience for a BBQ grill marketing campaign. Consider yet another example of a CHAID application to marketing. Assume a high tech company specializing in enterprise networking security plans to launch an e-mail promotion. The promotion with a gift card offer will target a group of technology magazine subscribers. The purpose of the campaign is to acquire new customers who may currently be using a competitor ’s product and are considering additional purchases of similar products. Assume that there are over 50,000 subscribers to the technology magazine. The gift card offer has a value of $50 per subscriber so the offer cost will amount to $2.5 million if all the subscribers are targeted. In other words, in this particular situation it would be very costly to target all of the subscribers. The marketing manager at the high tech firm decides to segment the magazine subscriber list based on six attributes.
159
160
Data Mining and Market Intelligence
● ●
● ● ● ●
The industry that a subscriber works for. The size of the company that a subscriber works for: For instance, the company size is 5000 if a subscriber works for a company with 5000 employees. Subscriber ’s office location type such as branch office, headquarters, and single location. Subscriber ’s role in network security purchase decision making such as authorizing and influencing. Subscriber job function such as information system, marketing and accounting. Subscriber ’s job title such as CTO, IT manager, and marketing VP.
The marketing manager then uses the CHAID approach to analyze his internal database and to construct a profile of past security product buyers. The buyer profile indicates that networking security product buyers tend to work for large companies (company size of 1000 or more) in the banking industry. These buyers also tend to be IT managers and IT directors working at branch offices. Based on the above profile, the marketing manager next instructs the magazine publisher to select a targeted list of subscribers that are IT managers or IT directors who work at the branch offices of large banks. Assume the magazine company has a total of 2000 subscribers that meet the selection criteria. The total gift card offer cost is 2000 multiplied by $50, which amounts to $100,000. This cost is within the company’s program budget. Classification and Regression Tree (CART) is another decision tree approach. In this approach, both the dependent and the independent variable can be either metric or nonmetric (Struhl 1992). Like CHAID, CART is also widely used in data mining for marketing. Although the original CART algorithm allowed two-node splits only, there are now CART software implementations with revised algorithms that offer the flexibility of creating splits into more than two nodes. The following is a list of key characteristics of CART (Breiman, Friedman, Olshen, and Stone 1998). ● ● ●
The dependent variable is metric or nonmetric with no specific distribution constraints. The independent variables can be either nonmetric or metric with no specific distribution constraints. In cases where the dependent variable and independent variable are both metric, the relationship between them can be either linear or nonlinear.
When the dependent variable is metric, accuracy statistic measures such as average least squares can be used to determine whether a node continues to split.
Introduction to Data Mining
1 n
n
∑ (Yi ⫺ Yˆ i )2
(7.20)
i⫽1
Yi is observed value of dependent variable Y of member i, Yˆi is the predicted value of variable Y and n is the number of members in the node. A node is split to minimize this particular statistic. When the dependent variable is nonmetric, the misclassification rate (the percentage of cases being misclassified) is used to determine whether a node should split further. The optimal tree is the one that minimizes the overall misclassification rate of all nodes in the tree (Breiman, Friedman, Olshen, and Stone 1998). In the next example, CART is used to better understand the drivers for customer purchases of a product. Assume three independent variables are analyzed: age, income, and gender. The dependent variable is the average annual purchase amount of a customer. Age and income are metric variables and gender is a nonmetric variable. There are a total of 5000 customers and the average purchase amount of these customers is $300. Figure 7-7 shows the final output of CART. There are three terminal codes (nodes two, three, and four) in the tree, and the 5000 customers are
Node 0 Total: 5000 Average Purchase: $300
Income ⬍$50,000
ⱖ$50,000 Node 4 Total: 1300 Average Purchase: $585
Node 1 Total: 3700 Average Purchase: $200
Age ⬍30 Node 2 Total: 3000 Average Purchase: $100
Figure 7-7 CART output illustration.
ⱖ30 Node 3 Total: 700 Average Purchase: $628
161
162
Data Mining and Market Intelligence
assigned into these three nodes. The customers in node one on average have a purchase amount of $200. The customers in node two on average have a purchase amount of $100 and those in node three on average have a purchase amount of $628 and those in node four on average have a purchase amount of $585. CART has segmented the customers based on their purchase volume.
Partitioning methods Partitioning methods assume that the initial number of clusters is predetermined. Unlike the hierarchical methods we discussed earlier, partitioning techniques allow for the reassignment of members from one cluster to another. One of the best-known partitioning techniques is the K-Means clustering method (Dillon and Goldstein 1984), which starts out by grouping the members into K clusters. There are numerous ways of creating these initial K clusters. Members with close proximity are grouped into the same clusters, and then are moved from one cluster to another to minimize the error of partition. If Xi,j,l is value of the jth attribute of member i in cluster l, X j , l is the mean of jth attribute in cluster l, p is the number of attributes, and n is the total number of members, the partition error is defined by i⫽n j⫽p
E ⫽ ∑ ∑ ( X i , j , l ⫺ X j , l )2
(7.21)
i⫽1 j⫽1
We next discuss an example on how to create clusters based on the K-means approach. Assume there are six students (A, B, C, D, E, and F) with scores in three subjects: English, math, and music (as indicated in Table 7-1). The value of K is set to three. The Euclidean distances of scores between the students are shown in Figure 7-8.
Table 7-1 K-means clustering example – student score raw data Student
English score
Math score
Music score
A B C D E F
60 100 55 98 70 98
90 85 70 65 80 100
78 90 40 95 44 78
Introduction to Data Mining
A
B
C
D
E
F
A
0
42
43
49
37
39
B
42
0
69
21
55
19
C
43
69
0
70
18
65
D
49
21
70
0
60
39
E
37
55
18
60
0
48
F
39
19
65
39
48
0
Figure 7-8 Student score distance matrix. Three initial clusters are formed based on the distance matrix in Figure 7-9: A, (BDF), and (CE). We next compute the mean scores in English, math, and music by cluster. These mean scores, shown in Table 7-2, are used to derive the error of partition of the clusters. The error of partition of the initial three clusters, as defined by Eq. 7.21, is E ⫽ (100 ⫺ 98.7)2 ⫹ (98 ⫺ 98.7)2 ⫹ (98 ⫺ 98.7)2 ⫹ (85 ⫺ 83.3)2 ⫹ (65 ⫺ 83.3)2 ⫹ (100 ⫺ 83.3)2 ⫹ (90 ⫺ 87.7)2 ⫹ (95 ⫺ 87.7)2 ⫹ (78 ⫺ 87.7)2 ⫹ (55 ⫺ 62.5)2 ⫹ (70 ⫺ 62.5)2 ⫹ (70 ⫺ 75)2 ⫹ (80 ⫺ 75)2 ⫹ (40 ⫺ 42)2 ⫹ (44 ⫺ 42)2 ⫽ 942.51
New clusters are formed if E decreases as a result of moving one student from one cluster to another. The final cluster configuration is the one with the lowest E.
Principal component analysis Principal component analysis is a data reduction technique that can reduce the number of variables under analysis. The technique creates new variables called principal components that are linear combinations of the original variables. Principal components are uncorrelated to one another. We assume that there are m principal components (PC) derived from p original variables. Each principal component can be expressed as a linear combination of the original variables.
163
164
Data Mining and Market Intelligence
A B C D E Frequency table
1 2 3 4
A B C D E 1 2 3 4
A B C D E 1 2 3 4
Row profiles
Column profiles
1
D
A
B
2 C
4 3
E 1
D
A
B
2 C
4 3
E
Figure 7-9 Correspondence analysis process.
Table 7-2 Mean scores by cluster Student
Mean English score
Mean Math score
Mean Music score
A B, D, and F C and E
60 98.7 62.5
90 83.3 75
78 87.7 42
Introduction to Data Mining
PC1 ⫽ w11 X1 ⫹ w12 X 2 ⫹ w13 X 3 ⫹ ⋯ ⫹ w1p X p PC2 ⫽ w21 X1 ⫹ w22 X 2 ⫹ w23 X 3 ⫹ ⋯ ⫹ w2 p X p ⋮ PCm ⫽ wm1 X1 ⫹ wm 2 X 2 ⫹ wm 3 X 3 ⫹ ⋯ ⫹ wmp X p
(7.22)
where PCi is the ith principal component, wij is the coefficient of the jth original variable in the ith principal component, and Xj is the jth original variable. It is required that the sum of the squares of the coefficients in each principal component is one (Brooks 2002). For the ith principal component, this translates into the constraint wi12 ⫹ wi 2 2 ⫹ wi 3 2 ⫹ ⋯ ⫹ wip 2 ⫽ 1
(7.23)
It is also required that the coefficient vectors of the principal components must be orthogonal, namely, wi′w j ⫽ 0
(7.24)
where wi ⫽ [wi1 , wi 2 , … , wip ], w j ⫽ [w j1 , w j 2 , … , w jp ], and i ≠ j. If is the variance–covariance matrix of the original variables X and i is the ith eigenvalue of , the following condition must be satisfied for a nontrivial solution to exist. det ( ∑ ⫺ i I ) ⫽ 0
(7.25)
The corresponding eigenvector of i is the factor loading vector wi. Given that is a symmetric matrix, the resulting eigenvectors are orthogonal to one another. The length of the eigenvectors can be scaled to unit length, as given by Eq. 7.23. The fraction of the total variance in the original variables that is explained by the ith principal component is given by
∑
i i⫽p i=1 i
(7.26)
Factor analysis Factor analysis is also a data reduction technique that uncovers the underlying factors, fewer in number than the original variables that are common to the original variables. The original variables are linear combinations of these common factors. Notice the contrast with principal component
165
166
Data Mining and Market Intelligence
analysis, where the principal components are linear combinations of the original variables. With wij (also called factor loading) the coefficient of the jth common factor in the ith original variable, fj the jth common factor, and error terms i, we can express the original variables, Xi, in terms of common factors as follows. X1 ⫽ w11 f1 ⫹ w12 f 2 ⫹ w13 f 3 ⫹ ⋯ ⫹ w1k f k ⫹ 1 X 2 ⫽ w21 f1 ⫹ w22 f 2 ⫹ w23 f 3 ⫹ ⋯ ⫹ w2 k f k ⫹ 2 ⋮ X m ⫽ wm1 f1 ⫹ wm 2 f 2 ⫹ wm 3 f 3 ⫹ ⋯ ⫹ wmk f k ⫹ m
(7.27)
In matrix notation, Eq. 7.27 can be expressed as X ⫽ wf ⫹
(7.28)
With X a m ⫻ 1 matrix that contains the original variables, w a m ⫻ k matrix that contains all the coefficients of k common factors in the m original variables, f a k ⫻ 1 matrix that contains the k common factors, and an m ⫻ 1 matrix with error terms. It is assumed that the error terms are uncorrelated with each other and with the common factors, namely E(⬘) ⫽ 0 and E(f⬘) ⫽ 0 With the covariance matrix of the common factors, and the covariance matrix of the error terms, the variance of the original variables is given by the expression. var(X ) ⫽ wff ′w ′ ⫹ var() ⫽ wΘw ′ ⫹
(7.29)
Assuming that the common factors have a variance of one and that they are not correlated with each other, then ⫽ I (identity matrix) and Eq. 7.29 becomes (Dillon and Goldstein 1984) m
k
var(X ) ⫽ ww ′ ⫹ ⫽ ∑ ∑ w ij 2 ⫹
(7.30)
i⫽1 j⫽1
The fraction of the variance of the original variables explained by the common factors is given by
∑ i⫽1 ∑ j⫽1 wij 2 i⫽m
j⫽k
(7.31)
var(X )
Discriminant analysis Discriminant analysis is a technique that examines the differences between two or more groups of members with respect to multiple members attributes
Introduction to Data Mining
(Klecka 1980). The dependent variable is a variable that indicates the group a member belongs to. An example of a dependent variable is a variable that has three possible values: ‘high-achiever group’, ‘medium-achiever group’, and ‘low-achiever group’. The independent variables are attributes associated with the members. Discriminant analysis can be used to predict which group a particular member with given attributes belongs to. To study the characteristics of each group using discriminant analysis, we start out by creating discriminant functions. Discriminant functions are linear combinations of the independent variables (member attributes), defined as follows. D1 ⫽ b11 X1 ⫹ b12 X 2 ⫹ b13 X 3 ⫹ ⋯ ⫹ b1p X p D2 ⫽ b21 X1 ⫹ b22 X 2 ⫹ b23 X 3 ⫹ ⋯ ⫹ b2 p X p ⋮ Dm ⫽ bm1 X1 ⫹ bm 2 X 2 ⫹ bm 3 X 3 ⫹ ⋯ ⫹ bmp X p
(7.32)
where Di is the ith discriminant function, bip is the discriminant coefficient of the pth indepentent variable in the ith discriminant function, and Xp is the pth independent variable. A discriminant function is created to maximize the ratio of its betweengroup variance and the within-group variance. The value of a discriminant function is referred to as a discriminant score. Only those independent variables that are predictive of the dependent variable are included in the discriminant functions. Such variables are called discriminating variables. Statistics, such as Wilks’ lambda, the chi-square, and the F statistic are used to determine which independent variables are predictive (Klecka 1990). Wilks’ lambda is also used to assess the statistical significance of discriminant functions. The following is a list of key assumptions of discriminant analysis. ● ● ● ●
Assumption 1: The dependent variable (group identity variable) is nonmetric. Assumption 2: The discriminating variables have a multivariate normal distribution. Assumption 3: The variance–covariance matrix of the discriminating variables is the same across groups. Assumption 4: Each member can belong to one and only one group.
There are similarities between linear regression, CHAID, and discriminant analysis in that all of these three techniques are used to explore the dependence between a set of dependent and independent variables. However, there are basic differences between these three techniques in terms of assumptions (Dillon and Goldstein 1984). In a linear regression, the dependent variable is assumed to be a normally distributed random
167
168
Data Mining and Market Intelligence
variable and the independent variables are assumed to be fixed. In discriminant analysis, it is assumed that the dependent variable is fixed and the discriminating variables are normally distributed. In CHAID, there are no distributional assumptions about the dependent or independent variables. A further difference between CHAID and discriminant analysis is that discriminant analysis constructs discriminant functions as linear combinations of the discriminating variables (independent variables), while CHAID does not assume any such linear relationship. What CHAID and discriminant analysis have in common is that both minimize misclassification by maximizing the ratio of the variance between groups and the variance within groups.
Correspondence analysis Correspondence analysis, also called dual scaling, is used to analyze the association between two or more categorical variables and to visually represent this association in a low dimensionality diagram, called a perceptual map. This method is particularly useful for analyzing large contingency tables. In correspondence analysis it is assumed that the variables under analysis need to be categorical variables. In his book, Applied Correspondent Analysis, Clausen proposes a stepby-step process for conducting a correspondence analysis, illustrated in Figure 7-9. The steps in the process of a correspondence analysis with two categorical variables are as follows. ●
●
●
Step one is the creation of a frequency table with the two categorical variables: Assume that the X variable has k possible values and the Y variable has l possible values. In this k ⫻ 1 frequency table, entry nij is the number of members whose X variable value equals i and whose Y variable value equals j. The number of members whose X variable equals i is ni and whose Y variable equals j is nj. ni and nj are called the row total and column total, respectively. Step two is to set up a row profile table and a column profile table. The frequency table can be transformed to a row profile table by dividing each entry nij by the row total ni. The frequency table can be transformed into a column profile table by dividing each entry nij by the column total nj. Step three is to generate two key underlying dimensions for variables X and Y and to plot both variables on a two-dimensional map. The dimensions are selected based on the proportion of the variance of the original variables that these dimensions explain. The higher the proportion is, the more significant the dimension is. A detailed discussion (Clausen 1998) on the mathematical derivation of the dimensions is beyond the scope of this book.
Introduction to Data Mining
We next discuss a correspondence analysis example. Assume a firm conducts an online survey to measure the satisfaction level of visitors to the firm’s website. The total number of visitors is 4492. The visitors are classified into three types based on their familiarity of the site: first time visitors, frequent visitors, and infrequent visitors. The visitors are asked which of the following three features of the website is the most important to them: content, navigation, and presentation. Correspondence analysis will be used to analyze the survey response data and SAS will be used as the data-mining tool. In step one of the analysis, a contingency table (frequency table) is created based on visitor types and visitors’ selection of the most important website feature (content, navigation, or presentation) as illustrated in Table 7-3. Next the row profiles (row percentages) and the column profiles (column percentages) are created. Tables 7-4 and 7-5 show the row profile and the column profile respectively. The column profiles show that the infrequent visitors are likely to rate navigation as an important driving factor of their satisfaction of the site. Frequent visitors tend to identify content as the driving factor of their satisfaction with the site.
Table 7-3 Contingency table of visitor type and importance of the three
website features Visitor type
New visitor Infrequent visitor Frequent visitor Total
Most important web site feature Content
Navigation
Presentation
Total
230 799 2400 3429
100 450 100 650
100 113 200 413
430 1362 2700 4492
Table 7-4 Web visitors row profiles Visitor type
New visitor Infrequent visitor Frequent visitor
Row profile Content
Navigation
Presentation
Total
0.54 0.59 0.89
0.23 0.33 0.04
0.23 0.08 0.07
1 1 1
169
170
Data Mining and Market Intelligence
Table 7-5 Web visitors column profiles
Column profile
Visitor type
Content
Navigation
Presentation
New visitor Infrequent visitor Frequent visitor Total
0.07 0.23 0.70 1
0.15 0.70 0.15 1
0.25 0.27 0.48 1
In step two of the analysis, two key dimensions are created. Figure 7-10 shows the SAS output of the analysis. The first dimension explains 87.89% of the variance of the data, while the second explains 12.11% of the variance of the data. In the third step of the analysis a two-dimensional correspondence map is created. Table 7-6 and Table 7-7 show the row and column coordinates of the two new dimensions. The first dimension shows a large weight on navigation, an indication of high association between this dimension and the navigation feature. Therefore, we may label this dimension as ‘site navigation’. The second dimension has a large weight on site presentation, an indication of high
Inertia and chi-square decomposition
Dimension
Singular value
Principal Chiinertia square
percent
Cumulative percent 18 36 54 72 90
----+----+----+----+----+--Dimension 1
0.39754
0.15804 709.904 87.89
Dimension 2
0.14755
0.02177
Total
87.89 ****************
97.797 12.11 100.00 ***
0.17981 807.701 100.00
Degrees of freedom ⫽ 4
Figure 7-10 Correspondence analysis key dimensions and their explanatory power.
Introduction to Data Mining
Table 7-6 Row coordinates of the two dimensions in
correspondence analysis Visitor type
Dimension 1
Dimension 2
New visitor Infrequent visitor Frequent visitor
0.3900 0.5165 ⫺0.3277
0.4298 ⫺0.1152 ⫺0.0103
Table 7-7 Column coordinates of the two dimensions
in correspondence analysis Web site feature
Dimension 1
Dimension 2
Content Navigation Presentation
⫺0.1995 0.9256 0.2000
⫺0.0356 ⫺0.1033 0.4577
association between this dimension and the site presentation feature. Therefore, we may label this dimension as ‘site presentation’. Figure 7-11 shows the SAS output of the correspondence analysis. The corresponding code is shown in Figure 7-12.
Dimension 2 (12.11%)
0.50
Presentation
New visitors
0.25
0.00
Frequent visitors Content
Navigation infrequent visitors
⫺0.25
⫺0.75 ⫺0.50 ⫺0.25
0.00
0.25
Dimension 1 (87.89%)
Figure 7-11 Correspondence analysis map.
0.50
0.75
1.00
171
172
Data Mining and Market Intelligence
*--- Create Input Data---; data sasuser. corres_ex; length visitor $30; input visitor $ content navigation preso; cards; ‘New_Visitors’ 230 100 100 ‘Infrequent_Visitors’ 799 450 113 ‘Frequent_Visitors’ 2400 100 200 run;
*---Perform Simple Correspondence Analysis---; proc corresp all data=sasuser.corres_ex outc=outcorres; var content navigation preso; id visitor; run; *---Plot the simple correspondence analysis results---; %plotit(data=outcorres, datatype=corresp);
Figure 7-12 SAS code for generating correspondence map.
Analysis of variance Analysis of Variance (ANOVA) is a statistical technique used to quantify the dependence relationship between dependent and independent variables. The technique is often used in experimental design where we wish to assess the impact of stimuli (treatments or independent variables) on one or more dependent variables. Although widely used for assessing experimental results in the pharmaceutical and social sciences, ANOVA has recently gained traction in marketing, especially when applied to the understanding of marketing stimuli on audience responses. To be consistent with the application of ANOVA to experimental design, we will use the treatment and block concepts introduced in Chapter 6. Treatments
Introduction to Data Mining
are the independent (or predictive) variables in an experimental design, calibrated to observe potential causality. Blocks are groups of similar units of which roughly equal numbers of units are assigned to each treatment group. Units are objects, individuals, or subjects that are either subjected, or not subjected to a treatment in an experiment. One-way ANOVA is used to analyze the treatment effects on subjects in an experiment. Two-way ANOVA is used to analyze both the treatment and the block (sometimes referred as replication) effects on subjects (Snedecor and Cochran 1989). One-way ANOVA assumes that any changes in subject behavior or characteristics are the result from treatment. This relationship between the subject behavior (dependent variable) and the treatment effects is expressed as: Xij ⫽ ⫹ j ⫹ ij
(7.33)
where Xij is the value of the dependent variable for subject i under treatment j, is the mean of the dependent variable across all subjects, j is the difference between and j, the mean of the dependent variable for the subjects under treatment j, and ij is the error term of subject i under treatment j. The error term represents the portion of subject behavior change that is due to random effect. In a one-way ANOVA analysis, the F statistic is used to assess the statistical significance of the treatment effects. The F statistic is the ratio of the mean treatment sum of squares (definition to follow) and the mean error sum of squares (definition to follow) with (a ⫺ 1, n ⫺ a) degrees of freedom given a treatments and n subjects. The mean treatment sum of squares is defined as a 1 n j ( j ⫺ )2 ∑ a ⫺ 1 j⫽1
(7.34)
where nj is the number of subjects under treatment j, and j is the mean of the dependent variable of the subjects under treatment j. The mean error sum of squares is 1 n⫺a
a i⫽n j
∑ ∑ (Xij ⫺ µ j )2 j⫽1 i⫽1
(7.35)
Hence, the F statistic is
∑ j⫽1 n j ( j ⫺ )2 /(a ⫺ 1) a i⫽n ∑ j⫽1 ∑ i⫽1 (Xij ⫺ j )2 /(n ⫺ a) j⫽a
j
(7.36)
173
174
Data Mining and Market Intelligence
If the F statistic for the treatment effects is greater than the critical F value at a specified significance level, then there is a difference in the mean of the dependent variable across the different treatments groups and the treatment effects are statistically significant. Two-way ANOVA assumes that changes in subject behavior or characteristic are due to the treatment effects and the block effects. This relationship between the subject behavior, the treatment effects, and the block effects is expressed as Xijk ⫽ ⫹ j ⫹ k ⫹ ijk
(7.37)
where Xijk is the value of the dependent variable of subject i in block k under treatment j, is the mean of the dependent variable across all subjects, j is the difference between and j, the mean of the dependent variable for the subjects under treatment j, k is the difference between and k, the mean of the dependent variable for the subjects in block k, and ijk is the error term of subject i in block k under treatment j. For a two-way ANOVA analysis, the F statistic is used to assess the statistical significance of the treatment effects and the block effects. The F statistic for assessing the treatment effects is the ratio of the mean treatment sum of squares (Eq. 7.34) and the mean error sums of squares with (a ⫺ 1, ab ⫺ a ⫺ b ⫹ 1) degrees of freedom given a treatments and b blocks. The F statistic for assessing the statistical significance of the treatment effects is ab ⫺ a ⫺ b ⫹ 1 j⫽a ∑ j⫽1 n j ( j ⫺ )2 1 a ⫺ Fa⫺1, ab⫺a⫺b⫹1 ⫽ j⫽a k⫽b i⫽n ,j⫽a ,k⫽b ∑ i⫽1,j j⫽1, k⫽1 (Xijk ⫺ )2 ⫺ ∑ k⫽1 mk ( k ⫺ )2 ⫺ ∑ j⫽1 ni ( j ⫺ )2
(7.38) where Xijk is the value of the dependent variable for subject i in block k under treatment j, is the mean of the dependent variable across all subjects, j is the mean of the dependent variable for the subjects under treatment j, k is the mean of the dependent variable for the subjects in block k, nj is the number of subjects under treatment j, and mk is the number of subjects in block k. If the F statistic for the treatment effects is greater than the critical F value at a specified significance level, then there is a difference in the mean of the dependent variable between treatments and the treatment effects are statistically significant. The F statistic for assessing the block effects is the ratio of mean block sum of squares and the mean error sums of squares with
Introduction to Data Mining
(b ⫺ 1, ab ⫺ a ⫺ b ⫹ 1) degrees of freedom given a treatments and b blocks. The mean block sum of squares is b 1 mk ( k ⫺ )2 ∑ b ⫺ 1 k⫽1
(7.39)
The F statistic for assessing the statistical significance of the block effects is the ratio of mean block sum of squares and mean error sums of squares with (b ⫺ 1, ab ⫺ a ⫺ b ⫹ 1) degrees of freedom (Mead 1989). ab ⫺ a ⫺ b ⫹ 1 k⫽b ∑ k⫽1 mk ( j ⫺ )2 1 b ⫺ Fb⫺1, ab⫺a⫺b⫹1 ⫽ i⫽n , j⫽a , k⫽b j⫽a k⫽b ∑ i⫽1,j j⫽1, k⫽1 (Xijk ⫺ )2 ⫺ ∑ k⫽1 mk ( k ⫺ )2 ⫺ ∑ j⫽1 n j ( j ⫺ )2
(7.40) If the F statistic for the block effects is greater than the critical F value at a specified significance level, then there is a difference in the mean of the dependent variable between blocks. In other words, the block effects are statistically significant.
Canonical correlation analysis Canonical correlation analysis is used to analyze correlation between two groups of metric variables, where each set consists of one or more variables. Simple and multiple linear regression are particular cases of canonical analysis. Simple linear regression has one variable in both sets, while multiple linear regression has one variable in one set of variables, and multiple variables in the other set. Often, one set of variables is interpreted as dependent variables and the other as independent variables, as in the case of linear regression (Dillon and Goldstein 1984). Given a set of X variables and a set of Y variables, a canonical analysis finds X*, a linear combination of the X variables, and Y*, a linear combination of the Y variables such that X* and Y* are highly correlated. X* and Y* are called canonical variates. The analysis often results in multiple sets of X* and Y*. The coefficients in the linear combinations are called canonical weights. With this, X* and Y* can be written as: X * ⫽ a1 X1 ⫹ a2 X 2 ⫹ ⋯ ⫹ am X m
(7.41)
Y * ⫽ b1Y1 ⫹ b2 Y2 ⫹ ⋯ ⫹ bp Yp
(7.42)
where a1, a2, …, am are the canonical weights for canonical variate X*, and b1, b2, …, bp are the canonical coefficients for the canonical variate Y*.
175
176
Data Mining and Market Intelligence
The set of X* and Y* with the highest correlation among all the possible set of canonical variates is called the first set of canonical variates. Canonical variates are normalized to have unit variance (Dillon and Goldstein 1984). We next discuss an example of canonical correlation analysis. Assume company ABC conducts an online survey to measure how the demographics of its website visitors correlate with their satisfaction about the firm’s website. The respondents are asked to rate their satisfaction in the following five areas. ● ● ● ● ●
Satisfaction with ABC as a company Overall satisfaction with the website Satisfaction with the content of the website Satisfaction with the ease of navigation of the website Satisfaction with the presentation of the website.
The respondents are also asked to answer the following set of questions about their own characteristics (web activities and their demographics). ● ● ● ●
Frequency of their visit to the website of company ABC The number of personal computers in the respondent’s company The number of employees in the respondent’s company Their need for technology consulting services.
The canonical correlation analysis technique can be applied to determine how the respondent satisfaction about the website correlates with the respondent characteristics (demographics and web activities). Figure 7-13 shows the SAS output of the canonical correlation coefficients. The four largest canonical correlation coefficients are 0.39, 0.19, 0.07, and 0.007. The canonical analysis creates four satisfaction canonical variates and four characteristics canonical variates. Figure 7-14 shows the eigenvalues associated with each of the four canonical coefficients. Table 7-8 shows how the five original satisfaction variables are correlated with the four satisfaction canonical variates. Table 7-9 shows the Pearson product-moment correlation coefficients between the four original respondent characteristic variables and the four characteristic canonical variates.
Multi-dimensional scaling analysis The technique is used to construct a low-dimensional (two-dimensional, for example) map that best describes the relative positions and proximity of members in a multi-dimensional space. Multi-dimensional scaling (MDS) can be applied to either metric or nonmetric data. MDS applied to metric and nonmetric data are called metric MDS and nonmetric MDS, respectively. MDS is similar to factor analysis and principal component analysis in that all of them are data reduction techniques. Here, we will focus on metric MDS.
Introduction to Data Mining
The CANCORR Procedure Canonical correlation analysis
Canonical correlation
Adjusted Canonical correlation
Approximate standard error
Squared correlation
1
0.391113
0.389369
0.012468
0.152969
2
0.190260
0.187646
0.014187
0.036199
3
0.071957
.
0.014644
0.005178
4
0.007380
.
0.014719
0.000054
Canonical
Figure 7-13 Top four canonical correlation coefficients.
Test of H0: The canonical correlations in the current row and all that follow are zero Eigenvalues of Inv (E)*H = CanRsq/(1-CanRsq) Likelihood Approximate Eigenvalve Difference Proportion Cumulative ratio F Value Num df Den df Pr > F 1 2 3 4
0.1806 0.0376 0.0052 0.0001
0.1430 0.8083 0.8083 0.81209783 0.0324 0.1681 0.9765 0.95875856 0.0052 0.0233 0.9998 0.99476798 0.0002 1.0000 0.99994554
49.48 16.30 4.03 0.13
20 15281