Probably not very originally, I start with this unique perception of big data authored by professor of psychology and behavioural economics Dan Ariely.
Big data is like teenage sex: everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it...
The statement has attracted lots of attention and apt comments. One of my favourite ones:
How the term big data became one of the most overused terms in 2013? Seriously, why so much fuss? There have been large datasets around for a long time. In recent years they have become more ubiquitous and received more visibility than ever before. This trend is driven by several important factors that had been unfolding in tandem.
The first major trend is increasing digital presence of existing companies and rising numbers of companies that operate primarily online. This is linked to accelerating technological progress when it comes to capturing and storing data on all customer interactions and decreasing costs of data storage. These trends have been further magnified by rising competition that fuelled concerns among marketers that staying behind in data-driven marketing can lead to losing the competitive edge.
Many companies have been keen to jump on the technology bandwagon quickly. But the things have often not worked out as planned. Gartner research reports that 56% of surveyed companies still struggle how to get value out of big data. I would add that a number of companies have not mastered their much smaller and more manageable sources of information.
It really feels that many people speak about big data but there are not that many of those who actually roll up their sleeves and put the data to use. This is not a trivial task. So where to start?
How to Get There? Let’s Take This Step-by-Step
The challenge can be broken down into several areas that need to be addressed: data collection, data accessibility, modelling with model implementation and people.
This comes as no surprise that to excel in big data analytics, the company needs to collect all relevant data sources. It does not matter how big the data are. The most important is to start.
All numeric and qualitative data should be stored in a centrally managed data warehouse. The information that marketer often have at their fingertips include sales data, marketing activities, advertising campaigns of own, but ideally also competitor, brands.
Joel Rubinson provides several recommendations for digital data collection encouraging integration of all data sources. This includes incorporating data streams from web analytics, social media, customer data, business performance and CRM.
Companies should design their data collection and processing infrastructure with regard to potential developments in the future to avoid major infrastructure overhauls.
Depending on data granularity and volume, the information often need to be pre-processed in a form streamlined for further use. With reduction in storage cost, provisioning for real-time processing takes precedence over size considerations.
Data collection is the critical, but only the first step. For data to become useful they need to be available to all users and stakeholders in a user-friendly way.
A number of years ago, before big data extravaganza, I worked in an organisation, where the core of employees used data for their research work. The institution collected data from multiple sources, cleaned and stored them in data warehouse. Users had access to all data in several ways. Regular standardised reports were coded in Unix environment and updated every month. Alternatively, a majority of users were accessing the data through Excel add-on. Series were extracted using intuitive Excel formulas. Those with broader data needs tapped into coding documentation. I was quite amazed by this ingenious approach. Even though a number of years passed, I find it a very user-friendly and efficient approach how to make the data in the centralized system accessible business-wide.
A number of companies, where data collection and integration should be everyday reality, somehow cannot get there. For example, FMCG marketers that are buying scantrack sales data often just download what they need ad hoc. What prevents them from setting up internal databases containing all sales series and building customised, regularly updated dashboards for sales and marketing teams? These could be integrated with data on marketing activities – advertising, trade, promotions etc, readily available for a quick descriptive analysis or modelling.
As a next step, companies can add all digital data on their website visits, use of apps, shoppers’ data from loyalty cards or online shopping. Data should be distributed to users through dashboards with downloading functionality.
Many companies keep their data all over the organisation with one department not aware what data other departments collect or have access to.
In online-based companies, the situation is generally better. They realized from their inception the importance of collecting and analysing information. Here, the challenge comes with an exponentially growing amount of customer interactions. Humongous sets of raw data logs require immediate processing for further use. Stakeholders should have a say in this process to make sure that data format and initial transformations meet business requirements. If companies do not get this right, their valuable data sources are unutilised.
Data Modelling and Model Implementation
This brings as to building predictive models and data mining. For this stage, data need to be prepared in the format that enables including all relevant variables in the model.
The good news is when we get to modelling; the big data are not that big anymore! Even if the dataset covers all customers, their number can stretch to millions. Otherwise, sampling procedures can be used to reduce the number of records without a loss of robustness and predictive power of models. See recent article by Tom Harford and discussion.
Predictive models can be used to their full potential only when they are implemented. In other words, the business rules and algorithms should be built into customer databases and incorporated into business decision-making.
This highlights several implications.
First, models need to be developed with their purpose and objective in mind. Business questions and challenges need to be the driving force behind analytical projects. It must be clear and transparent how the company should act on the project results. Second, companies have to build technological capabilities to implement models.
People: Data Scientists
Building predictive models and applying machine learning algorithms require qualified specialists. Companies need the right people to bring big data analytics to life. The job of building models and extracting insights from large datasets – titled often Data Scientist – famously described by Hal Varian, Chief Economist at Google, "as the the sexiest job of the 21st century", attracted lots of talent from variety of backgrounds.
Data Scientists are generally expected to have some training in computer science. They have solid background in statistics, econometrics and math. Ideally they should have strong business acumen and the ability to communicate findings to non-technical audience. There might be such accomplished men and women, who are equally well trained and experienced in all three areas. However, most people are prone to specialize according to their natural talents and interests.
If a company wants to succeed in using big data analytics for the benefit of organisation, they will need to build teams of people who stand out in their own domain of expertise but their skills to some extent overlap. This ensures continuity and, most importantly, smooth communication and collaboration. Depending on company size and needs, data analytics teams can have as few as three or even two people in small organisations. They do not have to be structured as one team but several core competencies need to be present. A person with Computer science/Data-warehousing background makes sure that infrastructure is in place; the right data are captured, stored and processed from their raw form. This person can be part of IT or Business Operations Department.
Another person should have strong background in statistics and data mining. This specialist is able to turn data into predictive models and insights. (S)he is able to extract data from database and guide the IT-skilled colleagues through the process of data collection, basic data processing and selection of analytical tools. The size and analytical needs of the organisation determine whether Data Scientist can be mainly technically strong specialist or need to be also zealous evangelist of data-driven decision-making across the organisation.
Building relationships with other departments is critical. In addition, to fully integrate data analytics into decision-making, it is a must to have a sponsor in the top-ranks of the organisation.
Where are We Heading
If (big) data analytics teams want to see their relevance, contribution and credibility grow business-wide - they need to change the way companies operate. This means defining (and if required redefining) the way companies approach and solve problems, collect and store data and apply the findings to optimising business operations.
It is critical to get right the entire process instead of chasing individual parts. I am positive that we’ll soon see more and more marketing managers using for their decision-making daily updated dashboards tailored to their business needs; regardless how big the data behind the user-friendly charts and tables are.
Sydney, 07 May 2014,
Elena Yusupova, PhD