top of page

A small essay on Big Data


Fact – there is a lot of data and it is growing exponentially. The difficulty of dealing with the volume and the variety of data, as well as the ability to process it into useful information is something that is on the minds of most executives. New technologies and data ecosystems have been and continue to be developed. More and more data collection “things” are being introduced into our environment. The rapid popularization of IOT is only amplifying the impact. The definition of “big data” continues to be a moving target with greater data size and complexity as well as storage and processing capability improvements. With all of this going on, how does an organization plan and manage its data strategy? How does a company prepare its infrastructure and data management tools so that it can be relevant and capable three or five years into the future? These are not easy questions to answer, but I will share my thoughts on the matter in this post.


Just like any other function, data management can be divided into a ‘process’ and a ‘project’. Process means that the data sources are known and available, the ETL has been completed and the tools/infrastructure are in place – it’s just about execution. One of my clients had an ongoing process of collecting customer data, merging it with various 3rd party data sets and creating targeted marketing campaigns. The whole process was well established, all data streams known and set up, data mapping was done and working properly, etc. – you get the picture. This process was in place for some time and was one of the key revenue generators.


Data projects, on the other hand, is where I want to focus this post. It seems, at least in my experience, that data teams spend a lot more time on data projects than data processes. I guess it can be expected – processes tend to be automated and stable, with occasional tuning required to keep things that way. Projects take most of the time and present the greatest challenge. The most common challenges I have encountered when working with clients in this domain included: core technology changes, new data sets, and self-service capabilities. It seems that executives responsible for this function have competing priorities in the form of data, tools and outputs.


What data do we have? What data do we need? What data can we get and at what cost? What data do we need now? These are the typical questions I have seen around data sets. What technology platform do we need? How do we roll out the platform? What infrastructure do we need and when? These are some of the questions that have to be answered in parallel with the data set questions. Let’s keep going… What reporting do we need? How will we use the data? What data can we sell and via what vehicle? That’s just a small subset of questions regarding the output. Again, these questions need to be answered in parallel with the data and tool questions. No wonder so many data teams are overwhelmed. Is there a way out of this mess?


In my opinion – yes, there is a way to manage data projects in a way that gets the needed results faster and reduces the stress on the team. Just like with any other project it is all about prioritization. The goal is to do less in parallel and focus on executing faster sequentially. So, how do you prioritize data projects? This is where a fundamental assumption comes in – business drives data decisions. Agree? If not, I would love to hear your point of view. Here is my logic – in most organizations data will not create new insights that will completely change the business strategy. A retailer that uses big data to better understand its customers will generally make marginal changes to their products and services based on new insights. I have a hard time imagining that a grocery store that collects and analyzes customer shopping behavior will suddenly realize that its marketing strategy and focus on rewarding loyal customers is wrong and needs to completely change. What shopper data will provide is more and more layers of understanding the reasons why customers make certain choices and will help continually improve existing products and services. So, I say it again, I believe that data supports business decisions and business strategy, it rarely shapes it. If I look at research institutes, for example those working in the field of physics or genetics, in those cases data and its insights can very much dictate what happens next. In these research environments people generally create hypothesis and then validate them. It is a continuous process of self-learning. I know, I know, self-learning also exists in the business world, but let’s be realistic – on a very different scale. Continuously improving customer targeting for a marketing campaign is not the same as testing the quantum field theory.


So, if my argument about business driving data decisions is true, then we can conclude that the key to effectively prioritizing data projects lies with the business. Let’s stay with the retailer example. Let’s assume that the data team at ACME retailer is facing a large backlog of projects that can all be grouped into three categories: data, tools, outputs. Let’s separate these three categories into smaller groups – let’s assume that each has a sub-category of either ‘new’ or ‘existing’. There may be an existing server farm that needs to be upgraded or there may be a new data management software that needs to be implemented. The team may have a set of projects where existing data set needs some “clean-up” to improve data quality or there may be an entire backlog of new data sources that various business units are pushing to bring in. Sounds familiar? I bet it does. So, how to prioritize these projects?


The first step is the most difficult and it requires executive leadership, in all the sense of the word. This is where my approach differs from most others – I present senior leadership with a binary choice. It is not an easy choice, but it does need to be made. Here it is – what is your goal for the next 12 months – quantity or quality? In other words – do you want to stabilize the organization and prepare it for growth or do you want to grow? I cannot say how many times the answer was “both”. That is an easy answer to give that creates a nightmare downstream, for the people doing the actual work. I believe that choosing the easy way out by saying “both” is not a sign of a strong leader. Ideally, the executive team needs to see five, ten years down the road and be able to strategically plan when and how to change. Notice that I said “change”, not “grow”. I strongly believe and recommend all my clients to make this difficult, yet critical decision – grow (quantity) or stabilize (quality). If you ask this question and hear “both”, your prioritization will go only so far and you will be stuck in the same vicious cycle of constantly changing priorities and a constant pressure to do more, faster. You will probably start looking at various management techniques, such as SAFe, to try and accelerate the value that your team creates, but, while they can help some, the root cause of your problems will still be there. Different management techniques cannot solve for lack of leadership – they can cover it a bit, like makeup.


Now, let’s assume we received a good answer. It doesn’t matter if the goal is stability (quality) or growth, as long as it’s one of them. A quick side note – I am a realist and I understand that some parallel activities from the two choices may occur, especially if the interdependency is limited. When I say one vs. the other, I assume an 80/20 rule. So, if the goal is to stabilize and to focus on quality, then our six sub-categories just got cut by half and we focus on existing data, existing tools and reports and all the improvements that we believe are needed. This, of course, is not the end of the prioritization process. Not all improvements are equal in value. This is where I recommend creating a Value Stream which will help show the flow from “raw materials” all the way to the customer and post-sale support. This Value Stream will help identify and visually depict where pain points exist and how they impact the overall value creation. If you have many issues at the start of the process, does it make sense to fix problems on the output side? I doubt it. We all know the ancient words of wisdom – “garbage in, garbage out”. Going through this exercise will help bring Customer, Supplier and Employee Experiences into the picture and prioritize based on those. This is not a simple exercise, but a very important one. Everything in the organization is connected and we need to see it.


Another additional element that will help prioritize data projects further are the KPIs. Having the right KPI framework in place is important as it will help separate the noise from signals that require action. Imagine you have two products that are asking for new data. Let’s assume that both believe that that data will help grow sales of that product. Those beliefs are grounded in certain assumptions, such as marketing data, past performance, etc. With proper KPI analysis we can determine which product is likely to benefit from the new data and which one is not. I have had scenarios where a basic KPI analysis showed that investing more into the current product suite will not produce desired results, but will rather keep the status quo. On the other hand, there may be a product that shows plenty of “growth signals” and additional data set may be just what it needs to take off.


So, after deciding on “quantity” vs. “quality”, after mapping out and analyzing the Value Stream, after performing a KPI analysis, the list of potential priorities is usually quite small. The last step is pretty well known – a dependency analysis. Some projects, or at least certain deliverables of these projects, may be pre-requisites to other projects. For example, we may want to build out the infrastructure before we bring in a new data set. Some efforts may indeed happen in parallel and that is where a Project Roadmap can be a very helpful tool. It helps visualize what needs to be done when, based on agreed upon “definitions of done”, taking team capacities into consideration.


If all the work I have described above happens, then Agile teams can truly deliver value fast because the business will know what “value” is! The productivity of your team will increase significantly and, before you know it, your backlog will be gone. As you can see, success of data projects starts in the Board Room.


2 views0 comments

Recent Posts

See All
bottom of page