Data science is popular. More and more organizations are discovering the benefits and have set up teams to go along with this trend. They are doing this in order to harness the potential of the large amount of available data and computing power. Thus, solutions are being built to automate even the most complex tasks and to gather insights that were previously unthinkable due to all kinds of limitations. That's the power of data science. But did you know that data science becomes many times more powerful when it includes geographic data?
But many organizations don't manage to get data science right, or manage to get it right. In this article I explain the importance of geo data in the process of data science and how everything starts with building a good business case. The field of data science is broad, which makes the term data science a catch-all term. Data analytics, machine learning, Artificial Intelligence (AI) and robotics are often mixed up, which causes a lot of confusion. We use the definition that data science is the umbrella term for the field that focuses on extracting value from data by means of algorithms and scientific methods. This often involves recognizing complex patterns in large amounts of data.
Because much of the data we work with contains a location component, it is important to include geographic patterns. Only with the addition of the location component of data do you see the full story. How far along is your organization with geo data? To find out, we previously wrote the article "Data Science and Geographic Data, where does your organization stand?".
Realizing the potential of data science is not easy. This is because data is often scattered throughout the organization in individual silos. This makes it difficult to bring the data together. We also often see that not everyone within the organization is willing to adopt data insights and adjust processes.
Demonstrating a solid business case is therefore essential for a successful product. In this article, we take you through the steps required to build a good business case, based on an existing data science case from a utility organization.
1. Spot & Appoint Building a data science business case starts with finding/formulating an appropriate problem statement. and converting this problem into a concrete challenge.
In our example we take the social problem of water scarcity. Water scarcity has been a widely discussed problem in the Netherlands for many years. This varies from scientific arguments about the damage of water scarcity to calls to use as little drinking water as possible in order to maintain pipeline pressure.
Water companies can reduce wasted water by ensuring that as much treated water as possible actually comes out of consumers' taps. Water that is lost in the process of drinking water distribution is also called 'Non-Invoiced Water' (NIRG). In the Netherlands, depending on the region, the NIRG percentage ranges from 5% to 10%. Could this percentage be reduced with the help of data science?
2. Research Further research is required to answer the challenge. What exactly is NIRG? It is the difference between the amount of water treated and billed. Among other things, it is used as a performance indicator for water utilities. NIRG is often expressed as the percentage difference compared to the total volume of drinking water produced. It has a strong spatial component, as water is lost somewhere in the thousands of miles of distribution pipeline network. The challenge is to track where the loss occurs and why. What is striking is that the question already has a location component.
Taking it a step further, the question asks what factors all contribute to water loss. This can be broken down into the categories of physical losses and sham losses. Physical losses are, for example, leaks on the distribution network, which are sometimes noticed immediately but sometimes remain invisible for a long time. Also included in this category are operations that result in water losses such as spudding.
Apparent losses are situations where a loss is visible due to administrative errors, without there actually being a loss of water. Examples include water meters that pass on inaccurate readings or errors that occur in the billing process, as a result of which part of the water is not correctly charged. In addition, it can also happen that a consumer consumes water before it is registered by the meter. If this happens intentionally, it is a case of theft of drinking water.
3. Explore & test In order to come up with a possible solution to reduce the NIRG percentage, the data science tribe organized a short brainstorming session to explore potential solutions. The data science tribe is one of Tensing's knowledge groups that meets every month to catch up on the latest innovations in the field and to experiment with them directly.
In the session on NIRG, many different ideas emerged that could be implemented to address the problem. From building a predictive maintenance model to predict the likelihood of pipe leakage to qualitatively analyzing a specific DMA (hydraulically isolated zone in the pipe network) to get a better idea of the extent to which the various factors that emerged in the study play a role. This can help determine which follow-up projects are most important to pursue. By working with domain experts to test the promising ideas and test them with available data, the choice for a specific solution can be made.
4. Quantify To complete the story, it is necessary to quantify what can be achieved by carrying out this project. Suppose an average water company delivers about 3.4 billion gallons per week. That means 3.4 million cubic meters per week, which is 176.8 million cubic meters per year. If there is now a 5% loss on an annual basis, 8.8 million cubic meters are lost, which roughly represents a value of a turnover of 12.3 million Euro. If we can achieve a saving of 0.2% with the investigated solution, this will result in €500.000 extra turnover.
In this article we used an example to show how a business case can be built for a data science project. For this we went through 4 steps: spot & name a challenge, research the problem statement, explore & test potential solutions and finally quantify the value that can be realized by putting (one of) the solutions in place.
In a webinar on the same topic, we go further into this roadmap and the possibilities of Geo AI. Download the recording for more information.